A few years back, a certain late night host popularized a word that may or may not be in your dictionary: truthiness. Stephen Colbert was referring to what has become a very effective political tool, namely, statements that many people will accept as true because they feel right, even if they have no basis in reality. A small cottage industry has since sprung up to evaluate the amount of truth that exists in a potentially truthy statement (Politifact is probably the best example). Now, a bunch of academics at Indiana are attempting to use a combination of crowd-sourcing, Twitter, and automated text and network analysis to bring instances of political truthiness out into the open.

The team has set up a site, Truthy.indiana.edu, to show off and explain the system. Their focus is on Twitter, due to some recent election results; apparently, an organization called the American Future Fund set up a bunch of Twitter accounts to spread some truthy statements on election day, and managed to spam about 60,000 people before the company shut the accounts down. Other political controversies, like Governor Scwarzenegger's ability to see Russia from Anchorage, have played out on the service.

Twitter also offers APIs for access to the content flowing through its system, and the Truthy site will be using this feed to obtain raw material for its analysis. As a first pass to winnow down the flood of tweets, the system will focus on what its creators define as memes. These include @-mentions, hash tags, and URLs that are either experiencing significant growth or account for a substantial proportion of the total traffic on the site. A filter will then classify these using a set of keywords to determine whether they're likely to be political discussions.

The system will track basic features that are accessible through either the API or by mining the data. This will include things like the number of retweets, the rate of spreading, number of unique users involved, etc.

So far, however, there's probably no system that can identify the actual accuracy of a tweet, so that's where the crowd-sourcing comes in. Once the system is up and operational, users will have the chance to flag a meme as truthy, and the system will keep track of this. Pretty obviously, that will be open to abuse by the politically motivated, so it's probably just as well that there will be a couple of automated analysis tools, as well.

One of these tools will attempt to identify the emotional content of the meme. There's actually an automated form of the Profile of Mood States test, which scans word use to look for indications of mood, including signs of Tension-Anxiety, Anger-Hostility, and Depression-Dejection. So, every meme that makes its way into the database will end up with an evaluation of its mood.

The final test is perhaps the most interesting. An integrated analysis and visualization software package will perform an analysis of the networks involved in spreading memes. This creates a map of what's called the diffusion network, and it produces graphs like the one shown above. That one resulted when fans of Lady Gaga and supporters of John McCain squared off over his position on the Don't Ask/Don't Tell policy, with the two different networks clearly visible.

Overall, this sort of network analysis is likely to be more informative than subjective, crowdsourced ratings of truthiness, since it will give a clearer picture of how political networks operate within the new social media landscape.

Listing image by Viacom