Abstract Happiness and other emotions spread between people in direct contact, but it is unclear whether massive online social networks also contribute to this spread. Here, we elaborate a novel method for measuring the contagion of emotional expression. With data from millions of Facebook users, we show that rainfall directly influences the emotional content of their status messages, and it also affects the status messages of friends in other cities who are not experiencing rainfall. For every one person affected directly, rainfall alters the emotional expression of about one to two other people, suggesting that online social networks may magnify the intensity of global emotional synchrony.

Citation: Coviello L, Sohn Y, Kramer ADI, Marlow C, Franceschetti M, Christakis NA, et al. (2014) Detecting Emotional Contagion in Massive Social Networks. PLoS ONE 9(3): e90315. https://doi.org/10.1371/journal.pone.0090315 Editor: Renaud Lambiotte, University of Namur, Belgium Received: September 20, 2013; Accepted: January 29, 2014; Published: March 12, 2014 Copyright: © 2014 Coviello et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was partially supported by Army Research Office Grant W911NF-11-1-0363, and a grant from the James S. McDonnell Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: Two of the authors, Adam D. I. Kramer and Cameron Marlow, were employees of Facebook Inc. at the time this research was conducted. This does not alter the authors' adherence to all PLOS ONE policies on sharing data and materials. The official data sharing policy at Facebook is that they will work with researchers who want to replicate published findings.

Introduction Happiness and other emotions have recently been an important focus of attention in a wide range of disciplines, including psychology, economics, and neuroscience [1], [2], [3], [4]. Some of this work suggests that emotional states can be transferred directly from one individual to another via mimicry and the copying of emotionally-relevant bodily actions like facial expressions [5]. Experiments have demonstrated that people can “catch” emotional states they observe in others over time frames ranging from seconds to months [6], [7], and the possibility of emotional contagion between strangers, even those in ephemeral contact, has been documented by the effects of “service with a smile” on customer satisfaction and tipping [8]. Longitudinal data from face-to-face social networks has established that emotions as diverse as happiness [9], loneliness [10], and depression [11] are correlated between socially-connected individuals, and related work suggests that these correlations also exist online [4], [12], [13], [14], [15]. However, it is difficult to ascertain whether correlations in observational studies result from influencing the emotions of social contacts (contagion) or from choosing social contacts with similar emotions (homophily) [16]. This problem has been addressed by using experimental methods to estimate network effects [17], [18], [19], [20], [21], [22], but these methods are either limited in scale and external validity, or they require very close collaboration with private companies, which means there are limited opportunities to conduct such experiments. Moreover, even when companies are willing to conduct a large-scale experiment, they may have other goals that constrain its design. For example, they may wish to provide a uniform online experience to all users, which reduces their willingness to create experimental treatment groups of sufficient size to take advantage of their massive scale. Here, we propose an alternative method for detecting emotional contagion in massive social networks that is based on instrumental variables regression, a technique pioneered in economics [23]. In an experiment we would directly control each user's emotional expression to see what impact it has on their friends' emotional expression. However, since this is infeasible in our massive-scale setting, we identify a source of variation that directly affects the users' emotional expression (this variable is called an “instrument”). For this instrument, we use rainfall. Importantly, rainfall is unlikely to be causally affected by human emotional states, so if we find a relationship it suggests that rainfall influences emotional expression and not vice versa. We then measure whether or not the changes induced by the instrument predict changes in the friends' emotional expression. Instead of changing the user's emotion directly with an experimental treatment, we let rainfall do the work for us by measuring how much the rain-induced change in a user's expression predicts changes in the user's friends' expression. More formally, suppose we can represent one person's emotional expression mathematically as follows: (1)This model assumes that emotional expression y jt by person j at time t is an additive linear function of other factors measured in the same time period including a time-specific factor θ t (perhaps it is a holiday), an individual-specific factor f j (some people are always happier than others), the effect β of an exogenous factor x jt (like rainfall); the effect γ of an endogenous factor y it (the emotional expression of each friend i at time t), which is moderated by the strength of relationship a ijt between each friend i and person j at time t (for simplicity, we assume this is binary—a relationship either exists or it does not) and by the degree d jt of person j (a person with more friends is assumed to be less influenced by each); and an error term ε jt . The key variable allowing us to estimate contagion in emotional expression is γ, and our inclusion of the individual-fixed effect f j means that we are controlling for all possible characteristics of the person, which further reduces the likelihood that correlation in emotions is driven by choice of social connections (homophily). Although this model seems straightforward to estimate, it is not. The reciprocal influence of a user on her friend and vice versa makes it difficult to interpret a simple association in their emotional states. Moreover, in large populations, it would be computationally expensive to apply the model to longitudinal content generated by millions of users with billions of friends over thousands of days. We address the problem of computational cost by aggregating individuals into groups. In the supporting information (Text S1), we show that when a subpopulation of individuals experience the same exogenous factor (such as rainfall affecting individuals who are in the same city), equation (1) is equivalent to (2)where for time t, y gt is the average emotion of all people in subpopulation (city) g; θ t and c g are time and subpopulation fixed effects; x gt is the average exogenous factor (rainfall) for people in subpopulation g; Y gt is a weighted average emotional expression of friends of people in subpopulation g; and ε gt is an error term (see Text S1 for derivation). Notice that we can use equation (2) to estimate the social contagion effect γ that appears in equation (1). However, y gt and Y gt are still endogenous, so prior to estimation we use an instrument X gt , the aggregated rainfall of the friends of the people in subpopulation g, to predict exogenous variation in the friends' emotional expression Y gt : (3) Consistent with standard recommendations regarding instrumental variable regression [23], we include in the “first stage” equation (3) all other exogenous explanatory variables in the “second stage” equation (2). Thus, we are estimating the effect of rainfall on average emotion while controlling for time and city fixed effects and for rainfall in all other cities. This mitigates problems that may arise from autocorrelation in weather over time and between nearby cities. We then use predicted values from equation (3) to substitute for the value of Y gt in equation (2) to estimate the social contagion variable γ. This instrumental variables approach effectively addresses the problem of endogeneity (in our case, the fact that two friends influence one another) [23]. One worry in a model like this is that friends' rainfall is correlated, so the instrument might actually just be a proxy for the direct effect of rainfall on a person's emotional expression (a violation of the “exclusion restriction” [23]). Therefore, to break any possible correlation between rainfall x gt in city g and the rainfall X gt of their friends, we only consider how emotional expression is transmitted on days when it is not directly raining on city g (that is, we only include observations for which x gt = 0, see Text S1). Then, in a separate model, we consider only days when it is raining in city g. Another worry is that there is an “ecological fallacy” in this model since we are using city-level measures to estimate individual-level effects. In Text S1 we mathematically formalize the relationship between the individual and aggregate level models to show there is no problem in our case, but here we explain in words. The ecological fallacy occurs when there are opposing effects of individual-level and aggregate-level variation. For example, Robinson showed that U.S. states with more immigrants had higher literacy rates (perhaps because literate state populations were more tolerant of immigration), even though immigrants were less likely to be literate (perhaps because they had not yet learned English) [24]. However, a key factor that reduces the likelihood this is a problem in our model is that people in a city usually all experience the same weather on the same day, so city-level variation is a good predictor of individual-level variation (if you were in New York on a given day you probably experienced the same weather as everybody else in New York that day). Compare this to the Robinson example, where state-level immigration rates are a very poor predictor of individual-level immigrant status, which allows for the possibility of opposite correlations with literacy at the aggregate and individual level. We apply our method to data collected for a set of 1180 days on Facebook from January 2009 to March 2012. The study was approved by and carried out under the guidelines of the Institutional Review Board at the University of California, San Diego, which waived the need for participant consent. To protect participant confidentiality, researchers did not personally view any names of users or words posted by users, and all analysis of identified data took place in the same secure location on servers where Facebook currently keeps users' data. Users of Facebook interact with each other in many ways, mostly textual. To measure emotional expression, we use “status updates” (also called “posts”) which are undirected text-based messages that a user's social contacts (Facebook friends) may view on their own News Feed. Relying on the Linguistic Inquiry Word Count (LIWC), a widely used and validated word classification system [25],[26], we determine whether a post uses words that express positive or negative emotions. Although this is not the only way to measure sentiment [27], this method has previously been used to measure the emotional content of online messages [28]. We then use two different metrics to quantify the average emotional state of a user during a day (see Text S1): the fraction of posts expressing positive emotions (“positive rate”); and the fraction expressing negative emotions (“negative rate”). Note that the positive and negative emotions are not two ends of the same scale. Some messages will express both positive and negative emotions just as individuals experience mixed emotions on occasion, so it is possible to score high on both measures. We then aggregate individual observations by city and day, restricting our attention to all English-speaking Facebook users residing in the 100 most populous US cities.

Discussion Our estimates of the social contagion of emotional expression suggest that there may be large-scale spillovers in online networks. What people feel and say in one place may spread to many parts of the globe on the very same day. Yet the 1.5∶1 estimated ratio of the indirect to the direct effect is actually somewhat lower than other kinds of network effects measured experimentally. For the spread of giving behavior in a public goods experiment, for example, it is estimated that each dollar given yielded two dollars in giving by others [32]. For voting behavior, a recent large-scale experiment suggested the ratio is about four to one [17]. While the method we elaborate here is potentially applicable to a wide variety of emotions and behaviors online, an important limitation is that we cannot use this method to estimate contagion effects within subpopulations. It is plausible that these effects might be even stronger when subpopulations are geographically defined (as in cities), since many studies suggest that physical proximity increases social influence between connected individuals [16]. Another limitation is that instruments are not always readily available, and in some cases it may be unclear whether they are causally and directly related to the outcome variable of interest. However, when such instruments are indeed available, this approach may prove to be a useful alternative to costly large-scale experiments with limited external validity since they require neither experimental control nor alteration of the user environment. Although there are many factors that affect human emotions [33], [34], [35], we have confirmed here that individual expression of emotions depends on what others in an individual's social network are expressing. These results imply that emotions themselves might ripple through social networks to generate large-scale synchrony that gives rise to clusters of happy and unhappy individuals. And new technologies online may be increasing this synchrony by giving people more avenues to express themselves to a wider range of social contacts. As a result, we may see greater spikes in global emotion that could generate increased volatility in everything from political systems to financial markets [36]. Our results are also consistent with prior work on the evolutionary basis of human emotions and with prior work focusing on the fleeting, direct spread of emotions. In addition to their internal and psychological relevance [37], emotions play a specifically social role: when humans experience emotions, they do not generally keep them to themselves, but rather, they tend to show them. Like laughter and smiling [38], emotions expressed online may serve the evolutionarily adaptive purpose of enhancing social bonds. Human laughter, for example, is believed to have evolved from the “play face” expression seen in other primates in relaxed, social situations [39]. Such facial expressions and positive emotions enhance social relations by producing analogous pleasurable feelings in others [16], by rewarding the efforts of others, and by encouraging ongoing social contact. Given the organization of people (and early hominids) into social groups larger than pairs [40], such spread in emotions probably served evolutionarily adaptive purposes. In this regard, it is noteworthy that, during our study period, users were increasingly expressing emotions as they adapted to Facebook as a new communication environment. Our findings also have significance for public wellbeing. To the extent that clinical or policy maneuvers increase the happiness of one person, they may have cascade effects on others in their social networks, thereby enhancing the efficacy and cost-effectiveness of the intervention, and these results suggest that such cascade effects may be promoted online. For example, providing better care for those who are suffering might not only improve their happiness, but also the happiness of numerous others, thereby further vindicating the benefits of medical care or public policy.

Supporting Information Text S1. This document contains the detailed derivation of the model, description of estimation techniques, exposition of results and robustness tests. https://doi.org/10.1371/journal.pone.0090315.s001 (PDF)

Author Contributions Conceived and designed the experiments: LC YS ADIK CM MF NAC JHF. Performed the experiments: LC YS ADIK CM MF NAC JHF. Analyzed the data: LC YS ADIK CM MF NAC JHF. Wrote the paper: LC YS ADIK CM MF NAC JHF.