Fake news is an important issue on social media. This article provides an overview of fake news characterization and detection in Data Science and Machine Learning research.

By Kai Shu and Huan Liu, Arizona State University

Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information allow users to consume and share the news. On the other hand, it can make viral “fake news”, i.e., low-quality news with intentionally false information. The quick spread of fake news has the potential for calamitous impacts on individuals and society. For example, the most popular fake news was more widely spread on Facebook than the most popular authentic mainstream news during the U.S. 2016 president election. Therefore, fake news detection on social media has attracted increasing attention from researchers to politicians.

Fake news detection on social media has unique characteristics and presents new challenges. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult to detect based on news content. Thus, we need to include auxiliary information, such as user social engagements on social media, to help differentiate it from the true news. Second, exploiting this auxiliary information is nontrivial in and of itself as users’ social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. This quick guide is based on a recent survey [1] that presents issues of fake news detection on social media, state-of-the-art research findings, datasets, and further directions. Next, we will highlight the major perspectives for this survey.

Characterization and Detection

Figure 1 is an overview of detecting fake news on social media, including two phases: characterization and detection. Fake news itself is not a new problem, and the media ecology has been changing over time from newsprint to radio/television, and recently online news and social media. The impact of fake news on traditional media can be described from the perspective of psychology and social theories. For example, two major psychology factors make consumers naturally vulnerable to the fake news: (i) Naïve Realism: consumers tend to believe that their perceptions of reality are the only accurate views. (ii) Confirmation Bias: consumers prefer to receive information that confirms their existing views. As another example, social identity theory and normative influence theory describe that preference for social acceptance is essential to a person’s identity, making people choose “socially safe” option for consuming news, even the news being shared is fake news.

Fake news on social media has its unique characteristics. For example, malicious accounts can be easily and quickly created to boost the spread of fake news, such as social bots, cyborg users, or trolls. In addition, users are selectively exposed to certain types of news because of the way news feed appear on the homepage in social media. Therefore, users on social media tend to form groups containing like-minded people where they are likely to polarize their opinions, resulting in an echo chamber effect.

Figure 1. Fake news detection on social media: from characterization to detection

The aforementioned theories are valuable in guiding research of fake news detection. Existing algorithms for fake news detection can be generally categorized as (i) News Content Based and (ii) Social Context Based.

News content based approaches focus on extracting various features in fake news content, including knowledge-based and style-based. Since fake news attempts to spread false claims, knowledge-based approaches aim to using external sources to fact-check the truthfulness of the claims in news content. In addition, fake news publishers often have malicious intents to spread distorted and misleading, requiring particular writing styles to appeal to and persuade a wide scope of consumers that are not seen in true news articles. Style-based approaches try to detect fake news by capturing the manipulators in the writing style.

Social context based approaches aim to utilize user social engagements as auxiliary information to help detect fake news. Stance-based approaches utilize users’ viewpoints from relevant post contents to infer the veracity of original news articles. In addition, propagation-based approaches reason about the relations of relevant social media posts to guide the learning of credibility scores by propagating credibility values between users, posts, and news. The veracity of a news piece is aggregated by the credibility values of relevant social media posts.

Datasets

Even though online news can be collected from different sources, manually determining the veracity of news is a challenging task, usually requiring annotators with domain expertise who performs a careful analysis of claims and additional evidence, context, and reports from authoritative sources. Existing public datasets of fake news are rather limited due to these challenges. To facilitate the research for fake news detection, this survey [1] provides a usable dataset, named FakeNewsNet, which includes news content and social context features with reliable ground truth fake news labels.

Promising Future Research

Fake news detection on social media is a newly emerging research area. The survey [1] discusses related research areas, open problems, and future research directions from a data mining perspective. As shown in Figure 2, research directions are outlined in four perspectives: Data-oriented, Feature-oriented, Model-oriented, and Application-oriented.

Figure 2. Future directions and open issues for fake news detection on social media

Data-oriented: it focuses on different aspects of fake news data, such as benchmark data collection, psychological validation of fake news, and early fake news detection.

Feature-oriented: it aims to explore effective features for detecting fake news from multiple data sources, such as news content and social context.

Model-oriented: it opens the door to build more practical and effective models for fake news detection, including supervised, semi-supervised and unsupervised models.

Application-oriented: it encompasses research that goes beyond fake news detection, such as fake new diffusion and intervention.

[1] Shu, K., Sliva, A., Wang, S., Tang, J. and Liu, H., 2017. Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD Explorations Newsletter, 19(1), pp.22-36.

Bio: Kai Shu is a Graduate Research Assistant at Arizona State University, His research interests include social media mining, especially information credibility, fake news, and machine learning. Huan Liu is a professor in School of Computing, Informatics, and Decision Systems Engineering, Ira A. Fulton Schools of Engineering, Arizona State University.

Related: