Mapping hate speech to predict ethnic violence

In the months leading up to the Rwandan genocide of 1994, the radio station Radio Television Libre des Mille Collines blanketed the country with anti-Tutsi propaganda, inciting its Hutu listeners to "exterminate the cockroaches." During the genocide, the station took on an even more active role, reading out lists of people to be killed and their locations.

The role played by the station only became widely understood outside of Rwanda after the violence was over. Three of its former executives were eventually indicted by a U.N. tribunal for their part in the genocide, but what if the world had been monitoring Milles Collines before the killing started?

That’s the idea behind Hatebase, a new initiative from the Sentinel Project, a Canadian group that aims to use social media and other technology to identify early warning signals for ethnic conflict.

There are two main features to Hatebase. The first is a Wikipedia-like interface which allows users to identify hate speech terms by region and the group they refer to. This could have some value for researchers, but Hatebase’s developers are especially excited by the second main feature, which allows users to identify instances when they’ve heard these terms used.

"The real value is the sightings, says Hatebase’s developer Timothy Quinn. "As soon as you have logged incidents of hate speech you can start mapping that stuff, looking at frequency, severity, the migration of terms geographically. There’s a whole lot of value when people start mapping it against the real world."

He points to a real-world example: "If you were following events in Rwanda in the early 90s, you’d want to start looking for the word "cockroach" as a metric. That’s exactly the sort of thing we’re gunning for."

Of course, there’s sadly always an ambient level of hate speech for any given society. Spend a day reading comment threads on YouTube and you might get the impression that the United States is on the verge of a full-blown race war. The developers are focusing their efforts primarily on regions where they believe there is already significant potential for ethnic conflict. Quinn notes that the tensions leading up to Kenya’s recent presidential election provided him with many of the terms used to seed Hatebase’s initial list of hatespeech.

Sentinel Project Executive Director Christopher Tuckwood suggests another potential flashpoint. "Something that we’ve seen in Iran is an apparent correlation between when certain public officials make statements against the Bahai religious minority and upticks in attacks against them – arson against their homes, vandalism, government raids, that sort of thing. Once we start to see early correlations between language and physical actions taken against a particular minority, we can do more accurate forecasting."

There’s what could be called the Human Stain problem. In Philip Roth’s novel, a college professor loses his position for unthinkingly using the term "spook," a slur for African-Americans but also — as he intended it — a word for ghosts. How can software identify terms that are only hateful in certain contexts?

"The problem is never going to be entirely solved, but we’re looking for community identified vocabulary and then mapping that against regionality," Quinn says. "If you were looking at Rwanda in the 90s and you started seeing a lot of sightings of "cockroach" in a specific region, it could be that there’s a cockroach outbreak, but in that context it’s pretty likely it’s not and in any case it’s a significant datapoint."

The developers are quick to say that the data collected by Hatebase alone can’t be used to predict ethnic violence, but used in conjunction with other warning factors, it could provide some helpful clues for when words are about to spill over into actions.