## Analysis of Following Preference in Sina Weibo and Twitter

1  Introduction

Sina Weibo, which was launched in 2009, is the most popular Chinese micro-blogging service. It has been reported that Sina Weibo has more than 500 million registered users by the end of 2012. Sina Weibo and Twitter have a lot in common, such as similar functionalities and website design. User population is one of their main differences: most of Sina Weibo users are Chinese. Following preference is the preference of users when they decide whom to "follow", which is a basic behavior on micro-blogging services. Following someone means you are to receive his micro-blogs (called weibos on Sina Weibo and tweets on Twitter) on your homepage.

This work provides a case to study online behaviors of different user population and an opportunity for personalities and regional culture study sociologically. In terms of the following preference, we found that the users of Sina Weibo are more concentrated and hierarchical: they are more likely to follow people at higher or the same social levels, while Twitter users are open to follow people from various levels. We believe culture background produces such result. This work is obtained based on our previous work [1] on edge balance ratio, which is proposed to study the general property of a directed complex network.

2  What is behind following preference

Distinguished from other online social networks like Facebook and MySpace, micro-blogging services build bridges between ordinary people, celebrities, and organizations with "follow". Following preference reveals the intention why people use micro-blogging services and their online habits. Micro-blogging users not only contact with their friends and make new friends, but also follow celebrities to find out what their idols are doing and follow medias and organizations to obtain news and messages. The former is the social role and the latter is the media role of micro-blogging services. Besides, the online following preference indicates the preference of making friends in reality and their attitude to celebrities, medias, companies, governments and other organizations. From the following preference people can infer users' social custom or social habits. Studying the following preference of difference user population together provides an opportunity to study human behaviors and culture from different regions.

3  Data sets

The data set of Sina Weibo contains 80.8 million users’ profiles and 7.2 billion relations, which is crawled from July, 2011 to February, 2012. It covers about 16% number of all users. The data set of Twitter is from [2] contains 41 million users and 1.5 billion relations.

4  Analysis

4.1  Reciprocal following preference

Reciprocal networks or friends' networks are subgraphs of the relation networks, where the relations between any connected users are reciprocal. Reciprocal networks are very close to the real social networks. Reciprocal following preference can be summarized as homophily. Homophily is firstly reported by sociology researches and they found people prefer to associate with who are similar to themselves.

There are 621 million pairs of friends in Sina Weibo data set and 47.3% of them are in the same province. Twitter doesn’t have a standard format for geographic information so time zone is used to represent the location. It concluded that Twitter users with fewer than 2,000 friends are likely to be geographically close.

Figure 1: Number of followers of user's friends and that of himself. All users are divided into groups according to the number of followers. One "+" represents the median number of followers one group users' friends have.

The median of followers of a user's friends and that of himself is plotted in Fig .1. The dashed line stands for the mean in log scale. One may find there are both significant positive correlation between the number of followers of the user's friends and that of himself.

4.2  Following distribution

Following distribution gives a statistical result about how the relations are distributed between user groups which are divided according to the number of followers.

Figure 2: Following distribution of Sina Weibo and Twitter. X-axis: the number of followers of the "source" user of the relation. Y-axis: the number of followers of the target'' user. The area of circle is proportional to the number of relations

It is concluded from Fig. 2 that both Sina Weibo and Twitter users prefer to follow users who have the similar or more number of followers because the circles above the diagonal are larger than those below the diagonal. Besides, this kind of following preference is more significant for Sina Weibo users.

However, following distribution fails to show the following preference of celebrities. Also the number of celebrities is small, their following preference has influence on the whole structure.

4.3  Assortative mixing

Assortative mixing [3] or assortativity is a global measure of the preference of nodes to connect similar nodes. In assortative networks, nodes tend to connect nodes with similar properties. On the contrary, nodes in disassortative networks tend to connect nodes unlike them. For instance, high degree nodes tend to connect high degree nodes in assortative networks and to connect low degree nodes in disassortative networks.

For directed networks, one approach of assortativity by a set of four assortativity measures is introduced in [4]. Let $\alpha, \beta \in \{in,out\}$ be index of the degree type, and $s^{\alpha}$ and $t^{\beta}$ denote the in-degree or out-degree of the source node and the target node for edge $i$. The definition of assortativity is given by

$$r(\alpha, \beta) = \frac{\sum_{i}[(s_i^{\alpha}-\overline{s^{\alpha}})(t_i^{\beta}-\overline{t^{\beta}})]}{M\sigma^{\alpha}\sigma^{\beta}},$$

where $M$ is the number of edges, $\overline{s^{\alpha}}$ is the average in or out degree of the source node, $\sigma^{\alpha}=\sqrt{M^{-1}\sum_{i}^{}(s_i^{\alpha}-\overline{s^{\alpha}})^2}$. $\overline{t^{\beta}}$ and $\sigma^{\beta}$ are similarly defined for target node. The network is assortative mixing if $r$ is positive and disassortative mixing if $r$ is negative. If $r$ is close to $0$, it means no significant correlation between degrees of source and target nodes.

Figure 3: Assortativity profile of Sina Weibo and Twitter.

In Fig. 3, Twitter shows slight disassortative property as the four $r(\alpha, \beta)$ are all negative and they are close to zero. For Sina Weibo, $r(in,out)$ and $r(out,out)$ are positive and the rest are negative. The disassortative property is due to the existence of abundant unbalanced'' relations linked from small degree users to large degree ones, which makes Sina Weibo and Twitter distinguished from traditional social networks. The remarkable difference between Sina Weibo and Twitter in Fig. 3 is that $r(out,in)$ of Sina Weibo is smaller and $r(out,out)$ is larger. This indicates users with small followings tend to follow users with large followers and small followings. Therefore, normal users on Sina Weibo have stronger preference to follow people with very large number of followers.

4.4  Edge balance ratio

We porpose edge balance ratio recently [1] to describe the balance property of edges in directed networks. Generally in directed networks, nodes connected by one directed edge always have different centralities, which means the edge is unbalanced. Edge balance ratio $R$ of a directed edge from node A to node B is defined as

$$R=\left\{ \begin{array}{ll} \displaystyle{\frac{d({\rm B})}{d({\rm A})}}, &d({\rm A})\neq0; \\ \infty, &d({\rm A})=0, \end{array} \right.$$

where $d({\rm B})$ and $d({\rm A})$ are centralities of node B and A basically. Centralities include in-degree and PageRank, which reflect the importance of nodes intuitively. The distribution of the edge balance ratio profiles the balance property of a network. The balance property of a social network reflects the following preference of users statistically.

Figure 4: Edge balance ratio distribution of relations in Sina Weibo and Twitter. X-axis is the edge balance ratio calculated by the number of followers or PageRank. Y-axis is the probability density function.

Fig .4 displays the distributions of edge balance ratio of all the relations for Sina Weibo and Twitter. The edge balance ratio determines the type of relations in the network.

(1)   The relations with edge balance ratio far larger than one reflect users' hope to obtain news, gossips, or other type of messages from users with large influence and high reputation.

(2)   The relations with edge balance ratio close to one reflect users' needs to keep connections with their friends.

(3)   The relations with edge balance much less than one contain rich hidden information and reveal the unique following preference.

Twitter has more the third type edges, which indicates the following preference of Twitter users might be less hierarchical than that of Sina Weibo users. Because highly ranked Sina Weibo users seldom follow lowly ranked users and there are more relations from highly ranked users to lowly ranked users on Twitter.

5  Culture background and following preference

The following preference in social networks might derive from underlying reasons such as regional social custom and personality, which is worthy of future interdisciplinary study. Different culture background might be one of reasons. Dominant Sina Weibo users are Chinese who are affected by Chinese culture and the Twitter users are globally distributed. In Chinese culture, relationship is the central idea of the social society, which is associated with the traditional Confucianism doctrine. Chinese rely heavily on their relationships for many things like job promotion and doing business. Therefore they are educated  imperceptibly to cultivate an intricate web of relationships from young age and treat their relationships as resources. For utilitarian purpose, it is more useful to following people who have more followers, because they are more likely to have stronger relationships.

6  Conclusions

We found the following preference of Sina Weibo users is more concentrated and hierarchical: they are more likely to follow people at higher or the same social levels, and less likely to follow people lower than themselves. In contrary, the same kind of following preference is much weaker on Twitter, whose users seem to be open as they follow people from various levels.

This work provides a case to study online behaviors of different user population and an opportunity for personalities and regional culture study sociologically. However, further researches will include more parameters and the researches of psychology and sociology will be referred to comprehend users' behavior in online social networks.

7  Download the paper

[1] Z. Chen, X. Wang, P. Liu, and Y. Gu. Follow Whom? Chinese Users Have Different Choice.

8  Contact the authors

For any question and comment about the paper and results, please contact Yuantao Gu.

References

[1] X. Wang, Z. Chen, P. Liu, and Y. Gu. Edge balance ratio: Power law from vertices to edges in directed complex network. IEEE Journal of Selected Topics in Signal Processing, 7(2):184-194, 2013.

[2] H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? Proceedings of the 19th International Conference on World Wide Web, 591-600. ACM, 2010.

[3] M. E. J. Newman. Assortative mixing in networks. Phyisical Review Letters, 89(20):208701/1–4, 2002.

[4] J. G. Foster, D. V. Foster, and et al. Edge direction and the structure of networks. Proceedings of the National Academy of Sciences of the United States of America, 107(24):10815–10820, 2010.