C.J. Burton/Getty

The internet has a personality problem. Abusive behaviour is a scourge in most corners of the internet, from website comments sections to social media to chat sessions in online games. Attempts to clamp down have yielded frustrating results – the concept is just too slippery. But a new way of automatically identifying the subtle linguistic fingerprints of true hate speech – and separating it from more benign uses of similar words – could finally help crack down on the worst offenders.

“Hate speech is notoriously difficult to detect,” says Dana Warmsley at Cornell University in New York. Simply using offensive language does not make someone abusive. People swear for all sorts of reasons. Friends call each other names for fun.

Most platforms rely on their users to report abuse. But human moderators cannot keep up with the torrent of objectionable content.


Another option is to detect hate speech automatically. Earlier this year, Google tried to assign comments a “toxic” score based on how similar they were to other phrases people had previously deemed offensive. However, the shortcomings overwhelmed the positive effects. “you’re pretty smart for a girl” was deemed 18 per cent similar to comments people had deemed toxic, whereas “i love Fuhrer” was 2 per cent similar.

So Haji Mohammad Saleem at McGill University in Montreal, Canada, and his colleagues went for a different approach. Instead of focusing on isolated words and phrases, they taught machine learning software to spot hate speech by learning how members of hateful communities speak. They trained their system on a data dump that contains most of the posts made to Reddit between 2006 and 2016. They focused on three groups who are often the target of abuse: African Americans, overweight people and women. For each of these, they chose the most active support and abuse groups on Reddit to train their software. They also took comments from Voat – a forum site similar to Reddit – as well as individual websites dedicated to hate speech.

Offensive speech

The team found that their approach contained fewer false positives than a keyword-based detector. For example, it was able to flag comments that contained no offensive keyword, such as “I don’t see the problem here. Animals attack other animals all the time,” in which the term “animals” was being used as a racist slur.

“Comparing hateful and non-hateful communities to find the language that distinguishes them is a clever solution,” says Thomas Davidson at Cornell University, who – together with Warmsley – has trained machine learning software to tell the difference between offensive and inoffensive speech by feeding it hand-picked examples of both.

But he is not convinced the solution is as widely applicable as Saleem and his colleagues suggest. The team tested their system on Reddit comments but they have not shown that it will catch targeted abuse on Twitter or Facebook, for example.

“It’s a sensible approach but it won’t catch everything,” says Joanna Bryson at the University of Bath, UK. The system still missed clearly offensive speech, such as “Black people are terrible” and others that are clearly racist, or fat-shaming. Bryson points out that a keyword-based approach would have caught these comments.

But it could be yet another tool in the hands of competent moderators. Despite the difficulties, there may be no way to avoid keeping humans in the loop. “Ultimately, hate speech is a subjective phenomenon that requires human judgment to identify,” says Davidson.

Journal reference: arXiv.org/abs/1709.10159