User activity patterns

An IRC channel is always active and enables the real time exchange of posts among users about a specific topic. User interaction is instantaneous, the post written by user u 1 is immediately visible to all other users logged into this channel and user u 2 may reply right away. Fig. 1 illustrates the dynamics in such a channel. As time evolves new users may enter, others may leave or stay quiet until they write follow-up posts at a later time.

Figure 1 Communication activity over an IRC channel. A) Schema of the evolution of a conversation in an IRC channel. At every time step, a user enters a post expressing a positive, negative, or neutral emotion. B) Probability distribution of the user activity over all the IRC channels. The activity is expressed as the time interval τ between two consecutive posts of the same user. Inset: Probability distribution of the user activity for individual IRC channels. The time is measured in minutes. C) Scaled probability distribution of the time interval ω ch between consecutive posts entered in all the 20 IRC channels. The solid line represents stretched exponential fit to the data. Inset: Probability distribution of the time interval ω ch between consecutive posts entered in all the 20 IRC channels without rescaling. The time is measured in minutes. Full size image

To characterize these activity patterns, we analyzed the waiting-time, or inter-activity time distribution P(τ), where τ refers to the time interval between two consecutive posts of the same user in the same channel and ask about the average response time. We find that τ is power-law distributed P(τ) ~ τ−α with some cut-off (Fig. 1B), with an exponent α = 1.53 ± 0.02. The fit is based on the maximum likelihood approach proposed by Clauset et al.9 and the power-law nature of the distribution could not be rejected (p = 0.375).

This finding (a) is inline the power-law distribution already found for diverse human activities1,2,3,5,6,7 and (b) classifies the communication process as belonging to the regime where posts arrive faster than they can be processed. We note that for α < 2, no average response time is defined (which would have been the case, however, for the highly attentive regime). Further, we observe in the plot of Fig. 1B a slight deviation from the power-law at a time interval of about one day, which shows that some users have an additional regularity in their behavior with respect to the time of the day they enter the online discussion. Such deviations were usually treated as power-laws with an exponential cut-off and can even be explained based on simple entropic arguments10,11. However, because of the “bump” around the one day time interval, our distribution also seems to provide further evidence to the bi-modality proposed by Wu et al.12. We should note, however, that the tail is better fitted by a log-normal distribution (KS = 0.136) rather than an exponential (KS = 0.190) or a Weibull (KS = 0.188) one (again using the maximum likelihood methodology described by Clauset et al.9) as shown in Fig. 1B. Here, KS stands for the Kolmogorov-Smirnov statistical test; the smaller this number, the better the fit.

We now focus on an important difference between online chats and previously studied forms of communication, such as mail or email exchange, which mostly involve two participants. Due to the collective nature of chats, a chatroom automatically aggregates the posts of a much larger amount of users, which allows us to study their collective temporal behavior. If ω denotes the time interval between two consecutive posts in the same channel independent of any user (also denoted as inter-event time and to be distinguished from the inter-activity time characterizing a single user), we find that the distribution P(ω) is is still fat-tailed, but does not follow a power-law. Interestingly, the time interval between posts significantly depends on the topic discussed in the channel (Inset of Fig. 1C). Some “hot” topics receive posts at a shorter rate than others, which can be traced back to the different number of users involved into these discussions. Specifically, we find that the average inter-event time 〈ω〉 ch depends on the amount of users in the conversation and becomes smaller for more popular channels, as one would expect.

If we rescale the channel dependent inter-event distribution P ch (ω) using the average inter-event time 〈ω〉 ch per channel and plot 〈ω ch 〉 P ch (ω ch ) versus ω ch /〈ω ch 〉, we find that all the curves collapse into one master curve (Fig. 1C). The general scaling form that we used is P(ω) = (1/<ω>)F(ω/<ω>), where F(x) is independent of the average activity level of the component and represents a universal characteristic of the particular system. Such scaling behavior was reported previously in the literature describing universal patterns in human activity13. We fit this master curve by a stretched exponential14,15,16

where the stretched exponent γ is the only fit parameter, while the other two factors a γ and β γ are dependent on γ14. A histogram of the γ values across the 20 channels is shown in Supplementary Figure S2. Using only the regression results with p < 0.001 we find that the mean value of the stretched exponents is 〈γ〉 = 0.21 ± 0.05.

We note that stretched exponentials have been reported to describe the inter-event time distribution in systems as diverse as earthquakes15 and stock markets16. These systems commonly exhibit long range correlations which seem to be the origin of the stretched exponential inter-event time distributions14. Long range correlations have also been reported in human interaction activity5,17 and we tested their presence in the temporal activity over IRC communication. As shown in the Supplementary Figure S3, we verified the existence of long range correlations in the conversation activity. We found that the decay of the autocorrelation function of the inter-event time interval between consecutive posts within a channel is described by a power-law

with exponent . In addition, we applied the Detrended Fluctuation Analysis (DFA) technique18, described in detail in the Methods section and we found a Hurst exponent value, , which is well in agreement with the scaling relation ν ω = 2 − 2H ω . For a more detailed discussion about scaling relations and memory in time series please refer to19.

In conclusion, our analysis of user activities have revealed a universal dynamics in online chatting communities which is moreover similar to other human activities. This regards (a) the temporal activity of individual users (characterized by a power-law distribution with exponent 3/2) and (b) the inter-event dynamics across different channels, if rescaled by the average inter-event time (characterized by a stretched exponential distribution with just one fit parameter). We will use these findings as a point of departure for a more in-depth analysis – because obviously the essence of online communication in chatrooms, as compared to other human activities, is not really covered. From the perspective of activity patters, there is not so much new here, which leads us to ask for other dimensions of human communication that could reveal a difference.

Emotional expression patterns

Human communication, in addition to the mere transmission of information, also serves purposes such as the reinforcement of social bonds. This could be one of the reasons why human languages are found to be biased towards using words with positive emotional charge20. Humans, from the early stages of our lives, develop an affective communication system that enables us to express and regulate emotions21. But emotions are also the mediators of our consumer responses to advertising22 and many scientists acknowledge their importance in motivating our cognition and action23. However, despite the increasing time we spend online, the way we express our emotions in online communities and its impact on possibly large amounts of people is still to be explored.

Consequently, we are interested in the role of expressed emotions in online chatting communities. Users, by posting text in chatrooms, also reveal their emotions, which in return can influence the emotional response of other users, as illustrated in Fig. 1A. To understand this emotional interaction, we carry out a sentiment analysis of each post which is described in detail in the Methods section. This automatic classification returns the valence v for each post, i.e. a discrete value {−1, 0, +1} that characterizes the emotional charge as either negative, neutral, or positive.

Instead of using the real time stamp of each post as in the analysis of the user activity, we now use an artificial time scale in which at each (discrete) time step one post enters the discussion, so the number of time steps equals the total number of posts. We then monitor how the total emotion expressed in a given channel evolves over time. We use a moving average approach that calculates the mean emotional polarity over different time windows. In Fig. 2A we plot the fraction of neutral, negative and positive posts as a function of time, for different sizes of the time window. While it is obvious that the emotional content largely fluctuates when using a very small time window, we find that for decreasing time resolution (i.e. increasing time window) the fractions of emotional posts settle down to an almost constant value around which they fluctuate. From this, we can make two interesting observations: (i) the emotional content in the online chats does not really change in the long run (one should notice that times of the order 103 are still large compared to the time window DT = 50 used), i.e. we observe fluctuations that depend on the time resolution, but no “evolution” towards more positive or negative sentiments. (ii) For the low resolution, the fraction of neutral posts dominates the positive and negative posts at all times. In fact there is a clear ranking where the fraction of negative posts is always the smallest. Both observations become even more pronounced when averaging over the 20 IRC channels, as Fig. 2B shows.

Figure 2 Emotional expressions over different time scales. A) Fraction of expressions with negative, neutral and positive emotion values under different time scales for one channel. B) Fraction of expressions with negative, neutral and positive emotion values for the 20 IRC channels. Full size image

Our findings differ from previous observations of emotional communication in blog posts and forum comments which identified a clear tendency toward negative contributions over time, in particular for periods of intensive user activity24,25. Such findings suggest that an increased number of negative emotional posts could boost the activity and extend the lifetime of a forum discussion. However, blog communication in general evolves slower than e.g. online chats. Hence, we need to better understand the role of emotions in real time Internet communication, which obviously differs from the persistent and delayed interaction in blogs and fora.

To further approach this goal, we analyse to what extend the rather constant fraction of emotional posts in IRC channels is due to a persistence in the emotional expressions of users. For this, we apply the DFA technique18, to the time series of positive, negative and neutral posts. Since our focus is now on the user, we reconstruct for every user a time series that consists of all posts communicated in any channel, where the time stamp is given by the consecutive number at which the post enters the user's record. In order to have reliable statistics, for the further analysis only those users with more than 100 posts are considered (which are nearly 3000 users). As the examples in the Supplementary Figure S4 show, some users are very persistent in their (positive) emotional expressions (even that they occasionally switch to neutral or negative posts), whereas others are really antipersistent in the sense that their expressed emotionality rapidly changes through all three states. The persistence of these users can be characterized by a scalar value, the Hurst exponent H, (see the Material and Methods Section for details) which is 0.5 if users switch randomly between the emotional states, larger than 0.5. if users are rather persistent in their emotional expressions, or smaller than 0.5 if users have strong tendency to switch between opposite states, as the antipersistent time series of Fig. S4 shows.

If we analyse the distribution of the Hurst exponents of all users, shown in the histogram of Fig. 3A, we find (a) that the emotional expression of users is far from being random and (b) that it is clearly skewed towards H > 0.5, which means that the majority of users is quite persistent regarding their positive, negative or neutral emotions. This persistence can be also seen as a kind of memory (or inertia) in changing the emotional expression, i.e. the following post from the same user is more likely to have the same emotional value.

Figure 3 Hurst exponents and emotional persistence. A) Hurst exponents (H) of the emotional expression of individual users, obtained using the DFA method. Only users contributed more than 100 posts were considered and we used the exponents obtained with fitting quality R2 > 0.98. B) Hurst exponent (H) versus the mean emotion polarity expressed by individual users, again only from users who contributed more than 100 posts. C) Hurst exponents (H) of the emotions expressed in the 20 IRC channels. The values are averages of the Hurst exponents obtained from 10 different segments of the same channel and the error bars show the standard deviation. The horizontal dashed line shows the expected value for random time series (H = 0.5) and the gray squares show the value obtained from shuffling the real time series to destroy any correlations. The difference in exponents of the real and the shuffled time series is statistically significant with p < 0.001. Full size image

The question whether persistent users express more positive or negative emotions is answered in Fig. 3B, where we show a scatter plot of H versus the mean value of the emotions expressed by each user. Again, we verify that the majority of users has H > 0.5, but we also see that the mean value of emotions expressed by the persistent users is largely positive. This corresponds to the general bias towards positive emotional expression detected in written expression20. The lower left quadrant of the scatter plot is almost empty, which means that users expressing on average negative emotions tend to be persistent as well. A possible interpretation for this could be the relation between negative personal experiences and rumination as discussed in psychology26. Antipersistent users, on the other hand, mostly switch between positive and neutral emotions.

Are the more active users also the emotionally persistent ones? In Supplementary Figure S6 we show a scatter plot of the Hurst exponent dependent on the total activity of each user. Even though the mean value of H does not show any such dependence, we observe large heterogeneity on the values of H for users with low activity. Furthermore, in Supplementary Figure S7 we show that the Hurst exponent of a very active user varies only slightly if we divide his time series into various segments and apply the DFA method to these segments. Thus we can conclude that active users tend to be emotionally persistent and, as most persistent users express positive emotions, they tend to provide some kind of positive bias to the IRC, whereas users occasionally entering the chat may just try to get rid of some negative emotions.

This leads us to the question how persistent the emotional bias of a whole discussion is. While Fig. 3A has shown the persistence with respect to the different users, Fig. 3C plots the persistence for the different channels, which each feature a very different topic. This persistence holds even even if we analyse only certain segments of the channel, as it is shown in Supplementary Figure S8. So, we conclude that the persistence of the discussion per se (which is different from the persistence of the users which can leave or enter a arbitrary times) reflects a certain narrative memory. Precisely, for each chat, we observe the emergence of a certain (emotional) ”tone” in the narration which can be positive, negative or neutral, dependent the emotional expressions of the (majority of) persistent users. If we reshuffle these time series such that the same total number of positive, negative and neutral posts is kept, but temporal correlations are destroyed, then the persistence is lost as well as Fig. 3C shows. We note that we could not find evidence of correlations using the autocorrelation function of the emotion time series, while the observed persistence in the fluctuations of user emotional expression, as captured by the Hurst exponent is very robust. This indicates that the chat community assumes an emotional memory locally encoded in the current messages (from the user perspective), while the size of the conversation is too large to detect it through averaging techniques.

An agent-based model for chatroom users

After identifying both the activity patterns and the emotional expression patterns of users in online chats, we setup an agent-based model that is able to reproduce these stylized facts. We start from a general framework27, designed to model and explain the emergence of collective emotions in online communities through the evolution of psychological variables that can be measured in experimental setups and psychological studies28,29. This framework provides a unified approach to create models that capture collective properties of different online communities and allows to compare the different emotional microdynamics present in various types of communication. The case of IRC channel communication is of particular interest because of its fast and ephemeral nature. Thus, we have designed a model for IRC chatrooms, as shown in Fig. 4A. The agents in our model are characterized by two variables, their emotionality, or valence, v which is either positive or negative and their activity, or arousal, which is represented by the time interval τ between two posts s in the chatroom. The valence of an agent i, represented by the internal variable v i , changes in time due to a superposition of stochastic and deterministic influences27,30:

The stochastic influences are modeled as a random factor A v ξ i normally distributed with zero mean and amplitude A v and represent all changes of the individual emotional state apart from chat communication. The deterministic influences are composed of an internal decay of parameter γ v and an external influence of the conversation. The change in the valence caused by the emotionality of the field (h + − h − ) is measured in valence change per time unit through the parameter b. Previous models under the same framework27,31 had an additional saturation term in the equation of the valence dynamics. This way the positive feedback between v and h was limited when the field was very large. But, as we show in Fig. 2, chatrooms do not show the extreme cases of emotional polarization observed in other communities. Thus, we simplify the dynamics of the valence without using any saturation terms, since a large imbalance between h + and h − is unrealistic given our analysis of real IRC data.

Figure 4 Modeling schema and simulation results. A) Schematic representation of the model: The horizontal layer represents the agent, the vertical layer the communication in the chatroom where posts are aggregated. After a time lapse τ, which follows the power-law distribution of Fig. 1B, the agents writes a post s which implicitly expresses its emotions, v. Posts read in the chatroom feed back on the emotional state v of the agent. B) Hurst exponents for the individual behavior of agents in isolation with A v ∈ [0.2, 0.5] and γ v ∈ [0.2, 0.5]. Only the exponents derived with fitting quality R2 > 0.9 are considered. C) Scaled probability distribution of the time interval ω′ between consecutive posts in 10 simulations of the model. Stretched exponential fit shows similar behavior to real IRC channel data. Full size image

In general, the level of activity associated with the emotion, known as arousal, can be explicitly modeled by stochastic dynamics as well31. Here, the activity of an agent is estimated by the time-delay distribution that triggers the expression of the agent, i.e. by the power-law distribution P(τ) ~ τ−1.53 shown in Fig. 1B. Assuming that an agent becomes active and expresses its emotion at time t, it will become active again after a period τ. The agent then writes a post in the online chat the emotional content of which is determined by its valence (see below). This information is stored in an external field common for all agents, which is composed of two components, h − and h + , for negative and positive information and their difference measures the emotional charge of the communication activity. Since we are interested in emotional communication, we assume that all neutral posts entered, or already present, in a chatroom do not influence the emotions of the agents participating to the conversation. Thus, the dynamics of the field is influenced only by the amount of agents expressing a particular emotion at a given time: N + (t) = Σ i (1 − Θ(−1 * s i )) and N − (t) = Σ i (1−Θ(s i )), where Θ is the Heaviside step function. Therefore, the time dynamics of the fields can be described as:

These two field components, h + and h − , decay exponentially with a constant factor γ h , i.e. their importance decays very fast as they move further down the screen (posts never disappear, but become less influential). Each field increases by a fixed amount c from every post stored in it. The values of the valence of the agents are changed by the field components, as described by Eq. 3. In contrast with traditional means of communication, online social media can aggregate much larger volumes of user-generated information. This is why h is defined without explicit bounds. Chatrooms pose a special case to this kind of communication, as they can contain large amount of posts but limited amount of users. Most IRC channels have technical limitations for the amount of users that can be connected at once, which in turn is reflected in the total amount of posts present in the general discussion. In our model, h might take any value, but the empirical activity pattern combined with the fixed size of the community dynamically constraints it to limited values.

Whenever an agent creates a new post in an ongoing conversation, the variable, s i , obtain its value in the following way:

The thresholds V − and V + represent a limit value of the valence that determines the emotional content of each post and in general can be asymmetric, as humans tend to have different thresholds for the triggering of positive and negative emotional expression. Each action contributes to the amount of information stored in the information field of the conversation, increasing h − if s = −1 or h + if s = +1.

We emphasize that the way we model the agent behavior is very much in line with psychological research, where emotional states are represented by valence and arousal, following the dimensional representation of core affect32. The valence, v, represents the level of pleasure experienced by the emotional state, while the arousal represents the degree of activity induced by the emotional state and determines the moment when posts are created. Continuously the agent's valence relaxes to a neutral state and is subject to stochastic influences, as show empirically in33. The effect of chatroom communication on an agent's emotionality is modeled as an empathy-driven process34 that influences the valence. In the valence dynamics we propose in Eq. 3, agents perceive a positive influence when their emotional state matches the one of the community and a negative one in the opposite case. When a post is created, its emotional polarity is determined by the valence, as it was suggested by experimental studies on social sharing of emotions26,35.

All the assumptions of our model are supported by psychological theories. Parameter values and dynamical equations can be tested against experiments in psychology, providing empirical validation for the emotional microdynamics28,29. Furthermore, our model provides a consistent view of the emotional behavior in chatrooms leading to testable hypotheses that can drive future psychology research.

We performed extensive computer simulations using different parameter sets (see supplementary material for details). By exploring the parameter space, we identified which parameter sets lead to similar conversation patterns as observed in the real data. We used such set to simulate chats in 10 channels and we analysed the agent's activity and their emotional persistence. The results are shown in Fig. 4B, C. Specifically, we find that (a) the distribution of Hurst exponents for individual agents is shifted towards positive values similar to the one observed in real data, this way reproducing the emotional persistence of the conversation without assuming any time dependence between user expressions. Further, we reproduce (b) the empirically observed stretched exponential distribution for the rescaled time delays ω′ between consecutive posts, without any further assumptions.

We do note, however, that the stretched exponent, γ = 0.59 (p < 0.001), of the simulated distribution is different from real IRC channels where it was γ = 0.21, i.e. there is a faster decay in the simulations. This could be explained by the fact that in the real chat users usually write after they have read the previous post, i.e. there are additional correlations in the times users enter a chat. These, however, are not considered in the simulations, because agents post in the chat at random after a given time interval τ, i.e. there is no additional coupling in posting times. Following the same approach as we did for the real data, we calculated the Hurst exponent of the inter simulated event time-series of the discussions. We found that H ω′ = 0.75, however, we did not observe a power-law decay of the autocorrelation function (see Supplementary Figure S12). This suggests that the observed correlations are due to the power-law distributed inter-event times used as input to our model and it is inline with the above discussion about the absence of coupling that also explains the difference in the stretched exponents.