Two important considerations emerge from the analysis of Fig. 2 : (i) positive tweets spread broader than neutral ones, and collect more favorites, but interestingly negative posts do not spread any more or less than neutral ones, neither get more or less favorited. This suggests the hypothesis of observing the presence of positivity bias ( Garcia, Garas & Schweitzer, 2012 ) (or Pollyanna hypothesis ( Boucher & Osgood, 1969 )), that is the tendency of individuals to favor positive rather than neutral or negative items, and choose what information to favor or rebroadcast further accordingly to this bias. (ii) Negative content spread much faster than positive ones, albeit not significantly faster than neutral ones. This suggests that positive tweets require more time to be rebroadcasted, while negative or neutral posts generally achieve their first retweet twice as fast. Interestingly, previous studies on information cascades showed that all retweets after the first take increasingly less time, which means that popular content benefit from a feedback loop that speeds up the diffusion more and more as a consequence of the increasing popularity ( Kwak et al., 2010 ).

(A) the average number of retweets, (B) the average number of favorites, and (C) the average number of seconds passed before the first retweet, as a function of the polarity score of the given tweet. The number on the points represent the amount of tweets with such polarity score in our sample. Bars represent standard errors.

Here we are concerned with studying the relation between content sentiment and information diffusion. Figure 2 shows the effect of content sentiment on the information diffusion dynamics and on content popularity. We measure three aspects of information diffusion, as function of tweets polarity scores: Fig. 2A shows the average number of retweets collected by the original posts as function of the polarity expressed therein; similarly, Fig. 2B shows the average number of times the original tweet has been favorited; Fig. 2C illustrates the speed of information diffusion, as reflected by the average number of seconds that occur between the original tweet and the first retweet. Both Figs. 2A and 2C focus only on tweets that have been retweeted at least once. Figure 2B considers only tweets that have been favorited at least once. Note that a large fraction of tweets are never retweeted (79.01% in our dataset) or favorited (87.68%): Fig. 2A is based on the 4,147,519 tweets that have been retweeted at least once ( RT ≥ 1), Fig. 2B reports on the 2,434,523 tweets that have favorited at least once, and Fig. 2C is comprised of the 1,619,195 tweets for which we have observed the first retweet in our dataset (so that we can compute the time between the original tweet and the first retweet). Note that the retweet count is extracted from the tweet metadata, instead of being calculated as the number of times we observe a retweet of each tweet in our dataset, in order to avoid the bias due to the sampling rate of the Twitter gardenhose . For this reason, the average number of retweets reported in Fig. 2A seems pretty high (above 100 for all classes of polarity scores): by capturing the “true” number of retweets we well reflect the known broad distributions of content popularity of social media, skewing the values of the means toward larger figures. The very same reasoning applies for the number of favorites. Due to the high skewness of the distributions of number of retweets, number of favorites, and time before first retweet, we performed the same analysis as above on median values rather than averages. The same trends hold true: particularly interesting, average and median seconds before the first retweet are substantially identical. The results for the average and median number of retweets and favorites are also comparable, factoring out some small fluctuations.

Conversations’ dynamics and sentiment evolution

To investigate how sentiment correlates with content popularity, we now only consider active and exclusive discussions occurred on Twitter in September 2014. Each topic of discussion is here identified by its most common hashtag. Active discussions are defined as those with more than 200 tweets (in our dataset, which is roughly a 10% sample of the public tweets), and exclusive ones are defined as those whose hashtag never appeared in the previous (August 2014) and the next (October 2014) month.

Inspired by previous studies that aimed at finding how many types of different conversations occur on Twitter (Kwak et al., 2010; Lehmann et al., 2012), we characterize our discussions according to three features: the proportion p b of tweets produced within the conversation before its peak, the proportion p d of tweets produced during the peak, and finally the proportion p a of tweets produced after the peak. The peak of popularity of the conversation is simply the day which exhibits the maximum number of tweets with that given hashtag. We use the Expectation Maximization (EM) algorithm to learn an optimal Gaussian Mixture Model (GMM) in the (p b , p a ) space. To determine the appropriate number of components (i.e., the number of types of conversations), we adopt three GMM models (spherical, diagonal, and full) and perform a 5-fold cross-validation using the Bayesian Information Criterion (BIC) as quality measure. We vary the number of components from 1 to 6. Figure 3B shows the BIC scores for different number of mixtures: the lower the BIC score, the better. The outcome of this process determines that the optimal number of components is four, in agreement with previous studies (Lehmann et al., 2012), as captured the best by the full GMM model. In Fig. 3A we show the optimal GMM that identifies the four classes of conversation: the two dimensions represent the proportion p b of tweets occurring before (y axis) and p a after (x axis) the peak of popularity of each conversation.

Figure 3: Dynamical classes of popularity capturing four different types of Twitter conversations. (A) shows the Gaussian Mixture Model employed to discover the four classes. The y and x axes represent, respectively, the proportion of tweets occurring before and after the peak of popularity of a given discussion. Different colors represent different classes: anticipatory discussions (blue dots), unexpected events (green), symmetric discussions (red), transient events (black). (B) shows the BIC scores of different number of mixture components for the GMM (the lower the BIC the better the GMM captures the data). The star identifies the optimal number of mixtures, four, best captured by the full model.

The four classes correspond to: (i) anticipatory discussions (blue dots), (ii) unexpected events (green), (iii) symmetric discussions (red), and (iv) transient events (black). Anticipatory conversations (blue) exhibit most of the activity before and during the peak. These discussions build up over time registering an anticipatory behavior of the audience, and quickly fade out after the peak. The complementary behavior is exhibited by discussions around unexpected events (green dots): the peak is reached suddenly as a reaction to some exogenous event, and the discussion quickly decays afterwards. Symmetric discussions (red dots) are characterized by a balanced number of tweets produced before, during, and after the peak time. Finally, transient discussions (black dots) are typically bursty but short events that gather a lot of attention, yet immediately phase away afterwards. According to this classification, out of 1,522 active and exclusive conversations (hashtags) observed in September 2014, we obtained 64 hashtags of class A (anticipatory), 156 of class B (unexpected), 56 of class C (symmetric), and 1,246 of class D (transient), respectively. Figure 4 shows examples representing the four dynamical classes of conversations registered in our dataset. The conversation lengths are all set to 7 days, and centered at the peak day (time window 0).

Figure 4: Example of four types of Twitter conversations reflecting the respective dynamical classes in our dataset. (A) shows one example of anticipatory discussion (#TENNvsOU); (B) an unexpected event (#MileyPor40Principales); (C) a symmetric discussion (#PrayForRise); and (D) a transient event (#KDWBmeetEd).

Figure 4A represents an example of anticipatory discussion: the event captured (#TENNvsOU) is the football game Tennessee Volunteers vs. Oklahoma Sooners of Sept. 13, 2014. The anticipatory nature of the discussion is captured by the increasing amount of tweets generated before the peak (time window 0) and by the drastic drop afterwards. Figure 4B shows an example (#MileyPor40Principales) of discussion around an unexpected event, namely the release by Los 40 Principales of an exclusive interview to Miley Cyrus, on Sept. 10, 2014. There is no activity before the peak point, that is reached immediately the day of the news release, and after that the volume of discussion decreases rapidly. Figure 4C represents the discussion of a symmetric event: #PrayForRise was a hashtag adopted to support RiSe, the singer of the K-pop band Ladies’ Code, who was involved in a car accident that eventually caused her death. The symmetric activity of the discussion perfectly reflects the events1: the discussion starts the day of the accident, on September 3, 2014, and peaks the day of RiSe’s death (after four days from the accident, on September 7, 2014), but the fans’ conversation stays alive to commemorate her for several days afterwards. Lastly, Fig. 4D shows one example (#KDWBmeetEd) of transient event, namely the radio station KDWB announcing a lottery drawing of the tickets for Ed Sheeran’s concert, on Sept. 15, 2014. The hype is momentarily and the discussion fades away immediately after the lottery is concluded.

Figure 5 shows the evolution of sentiment for the four classes of Twitter conversations: it can be useful to remind the average proportions of neutral (42.46%), positive (35.95%), and negative (21.59%) sentiments in our dataset, to compare them against the distributions for popular discussions. Also worth noting, although each discussion is hard-cast in a class (anticipatory, unexpected, symmetric, or transient), sometimes spurious content might appear before or after the peak, causing the presence of some small amount of tweets where ideally we would not expect any (for example, some tweets appear after the peak of an anticipatory discussion). We grayed out the bars in Figs. 5A, 5B and 5D, to represent non-significant amounts of tweets that are present only as byproduct of averaging across all conversations belonging to each specific class. These intervals therefore do not convey any statistically significant information and are disregarded. (A) For anticipatory events, the amount of positive sentiment grows steadily until the peak time, while the negative sentiment is somewhat constant throughout the entire anticipatory phase. Notably, the amount of negative content is much below the dataset average, fluctuating between 9% and 12% (almost half of the dataset average), while the positive content is well above average, ranging between 40% and 44%. This suggests that, in general, anticipatory popular conversations are emotionally positive. (B) The class of unexpected events intuitively carries more negative sentiment, that stays constant throughout the entire discussion period to levels of the dataset average. (C) Symmetric popular discussions are characterized by a steadily decreasing negative emotions, that goes from about 23% (above dataset’s average) at the inception of the discussions, to around 12% toward the end of the conversations. Complementary behavior happens for positive emotions, that start around 35% (equal to the dataset average) and steadily grow up to 45% toward the end. This suggests that in symmetric conversations there is a general shift of emotions toward positiveness over time. (D) Finally, transient events, due to their short-lived lengths, represent more the average discussions, although they exhibit lower levels of negative sentiments (around 15%) and higher levels of positive ones (around 40%) with respect to the dataset’s averages.