The rise of social media has turned it into a source of basic news for many people; even Twitter's character limit allows for a brief description of a newsworthy event, along with a link to more details. However, the ease of creating and sharing information through social networks has also raised concerns about how easily they can be used to spread misinformation, either accidentally or with intent. Some researchers at Yahoo have tracked the spread of news (reliable and otherwise) through Twitter, and found that it's possible to create an automated system that identifies newsworthy events and judges their reliability with an accuracy of nearly 90 percent.

The authors (who are based in Barcelona and Chile—working for Yahoo might not be all bad news) note that assessing credibility is not simply an academic exercise, as a hacked Twitter account produced a fake tsunami warning last year. A lot of people aren't very good at it, and the lack of easy indications of credibility online lead readers to focus on irrelevant items, like the visual design of a source's webpage. Tweets, which often contain little more than an icon associated with a source, would seem to make matters even more challenging.

Their approach to assaying tweets is a pretty nice demonstration of the use of the Internet. They picked out trending topics using a Web service called Twitter Monitor (NB: the site was down when we checked it), which detects bursts of activity that contain specific sets of keywords. That enabled them to identify about 2,500 "topics" that appeared to experience a sudden burst, and collected all the tweets on the topic within a two-day window surrounding its peak activity.

To identify which of these topics were news and assess their credibility, the team turned to real, live humans. To keep from boring anyone they knew to death, they used Mechanical Turk to get people to go through the tweets and identify whether a given story was news, and whether the tweet conveyed credible information. Each one was given to multiple Turk users, and classifications were only accepted if they were generally agreed upon.

In all, they ended up with 747 news topics, conveyed through tweets of varying credibility. They then parsed the tweets and obtained user information in order to generate metadata for each tweet: a user's history with the service including followers and tweet count, the information linked to in the tweet, how it was propagated through RTs, and the emotional content of the tweet text, including any use of emoticons or exclamation points. All of this was turned over to a set of machine learning algorithms, which were first set loose on a training set, and then had their accuracy tested with the rest of the tweets.

The results generated by the different methods, which included support vector machines, Bayesian networks, and decision trees, were "comparable," according to the authors, but the best results were produced by a J48 decision tree. When it comes to identifying which topics were news, this algorithm displayed an 89 percent accuracy, meaning that about nine times out of ten, its diagnosis matched the one generated by a panel of humans.

When it comes to credibility, the same decision tree achieved an accuracy of about 86 percent. Looking at the decision tree allowed the authors to make some inferences about what factors let the algorithm successfully identify credible tweets. It appears that those with the most invested in the service—in terms of past activity, followers, and friends—tend to convey more accurate information; "low credible news are mostly propagated by users who have not written many messages in the past," according to the authors.

As for the tweet itself, the absence of a URL often signaled that something was amiss with its contents. Negative sentiments in the tweet, in contrast, tended to be associated with credibility. The more retweets it received, the better.

Although these were the most significant individual factors, it's important to emphasize that the accuracy of the algorithm was increased by considering all of them at once (along with a few other, less significant factors). Simply relying on any one of them probably won't help all that much.

The authors indicate that they're hoping to work next on trying to get their system to work with smaller datasets, specifically the first few tweets on a given topic. If that works, it should ultimately allow near real-time detection of the credibility of news stories as they break on Twitter.