Hoshi Ludwig

Wikipedia

Chart by Nithum Thain/Hlud

We've all heard anecdotes about trolling on Wikipedia and other social platforms, but rarely has anyone been able to quantify levels and origins of online abuse. That's about to change. Researchers with Alphabet tech incubator Jigsaw worked with Wikimedia Foundation to analyze 100,000 comments left on English-language Wikipedia. They found predictable patterns behind who will launch personal attacks and when.

The goal of the research team was to lay the groundwork for an automated system to "reduce toxic discussions" on Wikipedia. The team's work could one day lead to the creation of a warning system for moderators. The researchers caution that this system would require more research to implement, but they have released a paper with some fascinating early findings.

To make the supervised machine-learning task simple, the researchers focused exclusively on ad hominem or personal attacks, which are relatively easy to identify. They defined personal attacks as directed at a commenter (i.e., "you suck"), directed at a third party ("Bill sucks"), quoting an attack ("Bill says Henri sucks"), or just "another kind of attack or harassment." They used Crowdflower to crowdsource the job of reviewing 100,000 Wikipedia comments made between 2004-2015. Ultimately, they used over 4,000 Crowdflower workers to complete the task, and each comment was annotated by 10 different people as an attack or not.

Once the researchers had their dataset, they trained a logistic regression algorithm to recognize whether a comment was a personal attack or not. "With testing, we found that a fully trained model achieves better performance in predicting whether an edit is a personal attack than the combined average of three human crowd-workers," they write in a summary of their paper on Medium.

Who is launching personal attacks?

The researchers unleashed their algorithm on Wikipedia comments made during 2015, constantly checking results for accuracy. Almost immediately, they found that they could debunk the time-worn idea that anonymity* leads to abuse. Although anonymous comments are "six times more likely to be an attack," they represent less than half of all attacks on Wikipedia. "Similarly, less than half of attacks come from users with little prior participation," the researchers write in their paper. "Perhaps surprisingly, approximately 30% of attacks come from registered users with over a 100 contributions." In other words, a third of all personal attacks come from regular Wikipedia editors who contribute several edits per month. Personal attacks seem to be baked into Wikipedia culture.

The researchers also found that an outsized percentage of attacks come from a very small number of "highly toxic" Wikipedia contributors. A whopping 9% of attacks in 2015 came from just 34 users who had made 20 or more personal attacks during the year. "Significant progress could be made by moderating a relatively small number of frequent attackers," the researchers note. This finding bolsters the idea that problems in online communities often come from a small minority of highly vocal users.

The algorithm was also able to identify a phenomenon often called the "pile-on." They found that attacking comments are 22 times more likely to occur close to another attacking comment. "Personal attacks cluster together in time," the researchers write. "Perhaps because one personal attack triggers another." Though this shouldn't be surprising to anyone who has ever taken a peek at Twitter, being able to quantify this behavior is a boon for machine learning. It means that an algorithm might be able to identify a pile-on before it really blows up, and moderators could come in to de-escalate before things get really ugly.

Depressingly, the study also found that very few personal attacks are moderated. Only 17.9% of personal attacks lead to a warning or ban. Attackers are more likely to be moderated if they have launched a number of attacks or have been moderated before. But still, this is an abysmal rate of moderation for the most obvious and blatant form of abuse that can happen in a community.

The researchers conclude their paper by calling for more research. Wikipedia has released a dump of all talk-page comments to the site between 2004-2015 via Figshare, so other researchers will have access to the same dataset that the Jigsaw and Wikimedia Foundation team did. Understanding how attacks affect other users is urgent, say the researchers. Do repeated attacks lead to user abandonment? Are some groups attacked more often than others? The more we know, the closer we get to having good tools to aid moderators. Such tools, the researchers write, "might be used to help moderators build dashboards that better visualize the health of Wikipedia conversations or to develop systems to better triage comments for review."

* UPDATE: To clarify, the researchers are describing non-registered users as "anonymous," as opposed to registered users who have linked pseudonyms (and occasionally real names). So the distinction here is between anonymous/unregistered and pseudonymous/registered. One of the researchers, Lucas Dixon, points out that the team discussed this distinction intensively with Wikipedians on the Wikimedia Meta-Wiki.

Listing image by Hoshi Ludwig