ABSTRACT

Social media has increasingly become central to the way billions of people experience news and events, often bypassing journalists---the traditional gatekeepers of breaking news. Naturally, this casts doubt on the credibility of information found on social media. Here we ask: Can the language captured in unfolding Twitter events provide information about the event's credibility? By examining the first large-scale, systematically-tracked credibility corpus of public Twitter messages (66M messages corresponding to 1,377 real-world events over a span of three months), and identifying 15 theoretically grounded linguistic dimensions, we present a parsimonious model that maps language cues to perceived levels of credibility. While not deployable as a standalone model for credibility assessment at present, our results show that certain linguistic categories and their associated phrases are strong predictors surrounding disparate social media events. In other words, the language used by millions of people on Twitter has considerable information about an event's credibility. For example, hedge words and positive emotion words are associated with lower credibility.