Abstract Massive amounts of fake news and conspiratorial content have spread over social media before and after the 2016 US Presidential Elections despite intense fact-checking efforts. How do the spread of misinformation and fact-checking compete? What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? How to reduce the overall amount of misinformation? To explore these questions we built Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. Hoaxy captures public tweets that include links to articles from low-credibility and fact-checking sources. We perform k-core decomposition on a diffusion network obtained from two million retweets produced by several hundred thousand accounts over the six months before the election. As we move from the periphery to the core of the network, fact-checking nearly disappears, while social bots proliferate. The number of users in the main core reaches equilibrium around the time of the election, with limited churn and increasingly dense connections. We conclude by quantifying how effectively the network can be disrupted by penalizing the most central nodes. These findings provide a first look at the anatomy of a massive online misinformation diffusion network.

Citation: Shao C, Hui P-M, Wang L, Jiang X, Flammini A, Menczer F, et al. (2018) Anatomy of an online misinformation network. PLoS ONE 13(4): e0196087. https://doi.org/10.1371/journal.pone.0196087 Editor: Alain Barrat, Centre de physique theorique, FRANCE Received: January 20, 2018; Accepted: April 5, 2018; Published: April 27, 2018 Copyright: © 2018 Shao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All the analyses presented in this paper can be replicated by collecting data through the Hoaxy API (https://market.mashape.com/truthy/hoaxy) or downloading the network dataset at doi.org/10.5281/zenodo.1154840. Funding: C.S. was supported by the China Scholarship Council. X.J. was supported in part by the National Natural Science Foundation of China (No. 61272010). G.L.C. was supported by Indiana University Network Science Institute. The development of the Botometer platform was supported in part by DARPA (grant W911NF-12-1-0037) and Democracy Fund. A.F. and F.M. were supported in part by the James S. McDonnell Foundation (grant 220020274) and the National Science Foundation (award CCF-1101743). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The viral spread of online misinformation is emerging as a major threat to the free exchange of opinions, and consequently to democracy. Recent Pew Research Center surveys found that 63% of Americans do not trust the news coming from social media, even though an increasing majority of respondents uses social media to get the news on a regular basis (67% in 2017, up from 62% in 2016). Even more disturbing, 64% of Americans say that fake news have left them with a great deal of confusion about current events, and 23% also admit to passing on fake news stories to their social media contacts, either intentionally or unintentionally [1, 2, 3]. Misinformation is an instance of the broader issue of abuse of social media platforms, which has received a lot of attention in the recent literature [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. The traditional method to cope with misinformation is to fact-check claims. Even though some are pessimistic about the effectiveness of fact-checking, the evidence is still conflicting on the issue [16, 17]. In experimental settings, perceived social presence reduces the propensity to fact-check [18]. An open question is whether this finding translates to the online setting, which would affect the competition between low-and high-quality information. This question is especially pressing. Even though algorithmic recommendation may promote quality under certain conditions [19], models and empirical data show that high-quality information does not have a significant advantage over low-quality information in online social networks [20, 15]. Technology platforms, journalists, fact checkers, and policymakers are debating how to combat the threat of misinformation [21]. A number of systems, tools, and datasets have been proposed to support research efforts about misinformation. Mitra and Gilbert, for example, proposed CREDBANK, a dataset of tweets with associated credibility annotations [22]. Hassan et al. [23] built a corpus of political statements worthy of fact-checking using a machine learning approach. Some systems let users visualize the spread of rumors online. The most notable are TwitterTrails [24] and RumorLens [25]. These systems, however, lack monitoring capabilities. The Emergent site [26] detected unverified claims on the Web, tracking whether they were subsequently verified, and how much they were shared. The approach was based on manual curation, and thus did not scale. The development of effective countermeasures requires an accurate understanding of the problem, as well as an assessment of its magnitude [27, 28]. To date, the debate on these issues has been informed by limited evidence. Online social network data provides a way to investigate how human behaviors, and in particular patterns of social interaction, are influenced by newsworthy events [29]. Studies of news consumption on Facebook reveal that users tend to confine their attention on a limited set of pages [30, 31]. Starbird demonstrates how alternative news sites propagate and shape narratives around mass-shooting events [32]. Articles in the press have been among the earliest reports to raise the issue of fake news [33]. Many of these analyses, however, are hampered by the quality of available data—subjective, anecdotal, or narrow in scope. In comparison, the internal investigations conducted by the platforms themselves appear to be based on comprehensive disaggregated datasets [34, 35], but lack transparency, owing to the two-fold risk of jeopardizing the privacy of users and of disclosing internal information that could be potentially exploited for malicious purposes [36]. Motivated by these limitations, in previous work we presented a prototype of Hoaxy, an open platform for the study of the diffusion of misinformation and its competition with fact-checking [37]. Here we build upon this prior effort, contributing to the debate on how to combat digital misinformation in two ways: We describe the implementation and deployment of the Hoaxy system, which was first introduced in a 2016 demo [37]. The system has been collecting data on the spread of misinformation and fact checking from the public Twitter stream since June of 2016. It is now publicly available (hoaxy.iuni.iu.edu). Users can query the tool to search instances of claims and relative fact checking about any topic and visualize how these two types of content spread on Twitter.

We leverage the data collected by Hoaxy to analyze the diffusion of articles from low-credibility sources and fact-checks on Twitter in the run up to and wake of the 2016 US Presidential Election. This analysis provides a first characterization of the anatomy of a large-scale online misinformation diffusion network. When studying misinformation, the first challenge is to assess the truthfulness of a claim. This presents several difficulties. The most important is scalability: it is impossible to manually evaluate a very large number of claims, even for professional fact-checking organizations with dedicated staff. Here we mitigate these issues by relying on a list of low-credibility sources compiled by trusted third-party organizations. In the run-up to and wake of the 2016 US Presidential Elections, several reputable media and fact checking organizations have compiled lists of popular sources that routinely publish unverified content such as hoaxes, conspiracy theories, fabricated news, click bait, and biased, misleading content. We manually assess that the great majority of the articles published by these sources, considered here, contain some form of misinformation or cannot be verified (see Methods). For brevity, in the remainder of the paper we refer to articles from low-credibility sources simply as “articles.” Hoaxy retrieves the full and comprehensive set of tweets that share (i.e., include a link to) articles and fact-checks. These tweets are important because, by tracking them, we can observe how a particular piece of content spreads over the social network. It is important to note that Hoaxy collects 100% of these tweets, not a sample. This lets us obtain, for any given piece of misinformation in our corpus, the full picture of how it spreads and competes with subsequent fact-checking, if any. In this paper we address three research questions: RQ1: How do the spread of misinformation and fact-checking compete?

How do the spread of misinformation and fact-checking compete? RQ2: What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors?

What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? RQ3: How to reduce the overall amount of misinformation? We pose our first question (RQ1) to investigate whether those who are responsible for spreading articles are also exposed to corrections of those articles. Regretfully, only 5.8% of the tweets in our dataset share links to fact-checking content—a 1:17 ratio with misinformation tweets. We analyze the diffusion network in the run up to the election, and find a strong core-periphery structure. Fact-checking almost disappears as we move closer to the inner core of the network, but surprisingly we find that some fact-checking content is being shared even inside the main core. Unfortunately, we discover that these instances are not associated with interest in accurate information. Rather, links to Snopes or Politifact are shared either to mock said publications, or to mislead other users (e.g., by falsely claiming that the fact-checkers found a claim to be true). This finding is consistent with surveys on the trust of fact-checking organizations, which find strong polarization of opinions [38]. Our second question (RQ2) is about characterizing the core of the article diffusion network. We find the main core to grow in size initially and then become stable in both size and membership, while its density continues to increase. We analyze the accounts in the core of the network to identify those users who play an important role in the diffusion of misinformation. The use of Botometer, a state-of-the-art social bot detection tool [12], reveals a higher presence of social bots in the main core. We also consider a host of centrality measures (in-strength, out-strength, betweenness, and PageRank) to characterize and rank the accounts that belong in the main core. Each metric emphasizes different subsets of core users, but interestingly the most central nodes according to different metrics are found to be similar in their partisan slant. Our last question (RQ3) addresses possible countermeasures. Specifically we ask what actions platforms could take to reduce the overall exposure to misinformation. Platforms have already taken some steps this direction, by prioritizing high-quality over low-quality content [35, 39]. Here we consider a further step by investigating whether penalizing the main purveyors of misinformation, as identified by RQ2, yields an effective mitigation strategy. We find that a simple greedy solution would reduce the overall amount of misinformation significantly.

Discussion The rise of digital misinformation is calling into question the integrity of our information ecosystem. Here we made two contributions to the ongoing debate on how to best combat this threat. First, we presented Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. We described key aspects of its design and implementation. All Hoaxy data is available through an open API. Second, using data from Hoaxy, we presented an in-depth analysis of the misinformation diffusion network in the run up to and wake of the 2016 US Presidential Election. We found that the network is strongly segregated along the two types of information circulating in it, and that a dense, stable core emerged after the election. We characterized the main core in terms of multiple centrality measures and proposed an efficient strategies to reduce the circulation of information by penalizing key nodes in this network. The networks used in the present analysis are available on an institutional repository (see Methods). Recall that Hoaxy collects 100% of the tweets carrying each piece of misinformation in our collection, not a sample. As a result, our analysis provides a complete picture of the anatomy of the misinformation network. Of course, our methodology has some unavoidable limitations. First of all, Hoaxy only tracks a fixed, limited set of sources, due to data volume restrictions in the public Twitter API. Of these sources, it only tracks how their content spreads on Twitter, ignoring other social media platforms. Facebook, by far the largest social media platform, does not provide access to data on shares, ostensibly for privacy reasons, even though a significant fraction of misinformation spreads via its pages [30], which are understood to be public. Thus we acknowledge that coverage of our corpus of misinformation is incomplete. Nonetheless, by focusing on low-credibility sources that have come to the attention of large media and fact-checking organizations, and that have been flagged as the most popular purveyors of unverified claims, Hoaxy captures a broad snapshot of misinformation circulating online. Second, Hoaxy does not track the spread of unsubstantiated claims in the professional mainstream press. News websites do report unverified claims, in a manner and with a frequency dictated by their own editorial standards. For example, hedging language is often used to express degrees of uncertainty [59]. While most claims reported in the mainstream media are eventually verified, many remain unverified, and some even turn out to be false. Some instances of misinformation may see their spread boosted as a result of additional exposure on mainstream news outlets. Understanding the dynamics of the broader media and information ecosystem is therefore needed to fully comprehend the phenomenon of digital misinformation, but it is outside the scope of the present work. Third, we consider only US-based sources publishing English content. This is an unavoidable consequence of our reliance on lists produced by US-based media organizations. Different sources will be of course active in different countries. Worrisome amounts of misinformation, for example, have been observed in the run-up to the general elections in France [14]. To foster the study of misinformation in non-US contexts, we have released the code of Hoaxy under an open-source license, so that other groups can build upon our work [60, 61]. Last but not least, it is important to reiterate that the articles collected by Hoaxy are in general not verified. Inspection of our corpus confirms that not all articles collected by Hoaxy are completely inaccurate. As far as the present analysis is concerned, we provide an assessment of the rate of confirmed articles in our dataset (see Methods). When used as a search engine for misinformation, Hoaxy addresses this limitation by showing the most relevant fact-checking articles matching the input query, thereby facilitating claim verification. We hope that the data, software, and visualizations offered by the Hoaxy platform will be useful to researchers, reporters, policymakers, and, last but not least, ordinary Internet users as they learn to cope with online misinformation.

Acknowledgments Thanks to Ben Serrette and Valentin Pentchev of the Indiana University Network Science Institute (iuni.iu.edu) for supporting the development of the Hoaxy platform. Clayton A. Davis developed the Botometer API. Kaicheng Yang provided bot scores for the robustness analysis. Nic Dias was instrumental in developing the rubric for article verification. We are grateful to Onur Varol, Mihai Avram, Zackary Dunivin, Gregory Maus, and Vincent Wong who assisted in verification of articles and annotation of central accounts. We are also indebted to Twitter for providing data through their API. C.S. thanks the Center for Complex Networks and Systems Research (cnets.indiana.edu) for the hospitality during his visit at the Indiana University School of Informatics, Computing, and Engineering.