Conversation AI is an offshoot of one of the most successful of Google’s “moonshot” projects, Google Brain. It has helped revolutionize the field of machine learning through large-scale neural networks, and given Google advantages such as software that is more skillful than humans at recognizing images. But Conversation AI won’t be able to defeat online abuse. Though Jigsaw’s stated goal is to “fight the rise of online mobs,” the program itself is a far more modest—and therefore more plausible—project. Conversation AI will primarily streamline the community moderation that is today performed by humans. So even if it is unable to neutralize the worst behavior online, it might foster more and better discourse on some sites.

Allusion detection

Jigsaw is starting Conversation AI at the New York Times, where it will be rolled out in a few months to help the company manage its online comments. Human moderators currently review nearly every comment published on the site. Right now, Conversation AI is reading 18 million of them, learning to detect each individual category of comments that get rejected—insubstantial, off-topic, spam, incoherent, inflammatory, obscene, attack on commenter, attack on author, attack on publisher.

The Times’s goal is not necessarily to reduce abuse in its comments, a problem it already considers under control. Instead, it hopes to reduce the human moderators’ workload. “We don’t ever expect to have a system that’s fully automated,” Erica Greene, engineering manager of the New York Times community team, told me. Times community editor Bassey Etim estimates that somewhere between 50 and 80 percent of comments could eventually be auto-­moderated, freeing up employees to devote their efforts to creating more compelling content from the paper’s comment sections.

The New York Times site poses very different problems from the real-time free-for-all of Twitter and Reddit. And given the limitations of machine learning—as it exists today—Conversation AI cannot possibly fight abuse in the Internet’s wide-open spaces. For all the dazzling achievements of machine learning, it still hasn’t cracked human language, where patterns like the ones it can find in Go or images prove diabolically elusive.

Illustrations by Erik Carter

The linguistic problem in abuse detection is context. Conversation AI’s comment analysis doesn’t model the entire flow of a discussion; it matches individual comments against learned models of what constitute good or bad comments. For example, comments on the New York Times site might be deemed acceptable if they tend to include common words, phrases, and other features. But Greene says Google’s system frequently flagged comments on articles about Donald Trump as abusive because they quoted him using words that would get a comment rejected if they came from a reader. For these sorts of articles, the Times will simply turn off automatic moderation.

It’s impossible, then, to see Conversation AI faring well on a wide-open site like Twitter. How would it detect the Holocaust allusions in abusive tweets sent to the Jewish journalist Marc Daalder: “This is you if Trump wins,” with a picture of a lamp shade, and “You belong here,” with a picture of a toaster oven? Detecting the abusiveness relies on historical knowledge and cultural context that a machine-learning algorithm could detect only if it had been trained on very similar examples. Even then, how would it be able to differentiate between abuse and the same picture with “This is what I’m buying if Trump wins”? The level of semantic and practical knowledge required is beyond what machine learning currently even aims at.

Consequently, a dedicated Twitter troll will no doubt find a novel way of expressing abuse that evades a system like Conversation AI. By blocking some comments, machine learning could do a decent job of getting commenters to stop casually calling each other “fags” and “homos,” if that’s the goal. But machine learning will not be able to foil a person hell-bent on insinuating that someone is queer.

In other words, Conversation AI will enable moderation tasks to be executed more efficiently in communities that already tend to be pretty well behaved. It is incapable of rooting out the worst of the abuse we hear about, which frequently shows up on sites with minimal moderation standards. Policing abuse on Twitter and Reddit is impossible without fundamentally altering the nature of those platforms.

Gated communities

Policing abuse on Twitter and Reddit is impossible without fundamentally altering the nature of those platforms.

Facebook’s success is a reminder that most people, and certainly most companies, prefer a relatively sheltered and controlled environment to one where strangers can intrude into others’ business and start fights. So if Conversation AI or similar tools make it easier and more efficient to exercise such control, it’s a reminder that “solving” the abuse problem, whether through human or automated means, requires moving away from maximal inclusivity as the highest ideal online. Even seemingly “open” communities such as StackExchange and MetaFilter require constant moderator intervention and community policing. Truly anarchic communities, such as Twitter, 4chan, and some channels on Reddit, prove to be the exceptions online, not the rule. Nor are anarchic communities moneymakers. Twitter has had trouble attracting a buyer, partly because of its reputation for abusive content, while Reddit has had a high degree of staff turnover and difficulties monetizing. The Wild West nature of those sites will become only more apparent if tools like Conversation AI make moderated sites function even better.

It’s worth noting one big potential downside. Because Conversation AI is being trained to approve content that hews to certain lexical, grammatical, and stylistic guidelines, it won’t just filter out abusive content. It could also tend to strike diverse content. That raises questions of what censorship-minded governments could do with it. Just as the Times curates its communities, so too can the governments of Turkey and China curate theirs. While Jigsaw efforts like Project Shield aim to provide defenses for politically sensitive websites, Conversation AI makes it easier to filter out unwanted speech—but the question is, unwanted by whom? There is no label on the box that says, “Use only to prevent abuse.”

David Auerbach is writing a book on human and computer languages and their convergence, to be published by Pantheon. He worked for 11 years as a software engineer at Google and Microsoft, primarily in server infrastructure.