Weasel words (Image: Chris Batson/Alamy)

IT’S getting harder to trust what you read on Wikipedia. An army of shadowy fake accounts is manipulating the online encyclopedia’s entries for money and damaging the site’s credibility.

Last month, Wikipedia announced that it had blocked some 250 “sock puppet” accounts – fake accounts set up by users who are often paid by companies to edit articles in their favour. Now, Ragib Hasan at the University of Alabama at Birmingham and his colleagues have developed a tool that analyses the way articles are written and spots if they are edited by the same person.

One of the big problems for Wikipedia editors trying to uncover such accounts is that the IP addresses of users can only be accessed by a few administrators because of the need for privacy, says Hasan. So editors have to rely on their own experience to determine whether multiple accounts are actually the work of a single individual.


Hasan’s team wanted to know if they could use algorithms to unmask the sock puppets by analysing the language they use. The challenge in spotting similarities in writing styles is that, in Wikipedia editing, as in much of social media writing, the articles are so short that there is little material to work with, says team member Thamar Solorio.

They looked at the editing notes for more than 600 of Wikipedia’s sock-puppet investigations. These were used as the training material for an algorithm that scanned some 230 features of the writing, such as grammatical quirks. The team showed the algorithm could predict which accounts were puppet accounts with a 75 per cent accuracy rate – defined as agreeing with the decision of Wikipedia’s investigators (arxiv.org/abs/1310.6772).

“Sock-puppet investigations are incredibly time consuming for Wikipedia editors, so anything that can help reduce the workload should be welcome,” says Hasan.

Mor Naaman at Cornell Tech in New York likes the team’s work, but says the algorithm needs to become more accurate: “The authors mostly relied on syntactic features, and used only a few other linguistic markers, so there is definitely room for improvement.”

The fake accounts problem is just the latest issue to plague Wikipedia. It has been criticised because its editors are predominantly white, Western and 90 per cent male, which skews both the articles it covers and their content.

This article appeared in print under the headline “Unmask the Wiki sock puppets by the way they write”