Google’s Jigsaw research division and the Wikimedia Foundation collaborated on an automated “harassment detection” tool called Detox to help identify hostile comments on Wikipedia. Users recently found the tool would rate comments directed at women as more hostile than the same comments directed at men. Foundation staff shut down the tool after it was also found to treat people identifying themselves as gay to be hostile.

The tool is related to Google’s Perspective API, which was involved in similar controversies over its reliability. Detox was used in 2017 to analyze the “toxicity” of the Wikipedia community, which received significant media coverage.

Part of the collaboration between the Wikimedia Foundation and Google’s Jigsaw division, Detox’s collaborators included then-Head of Research at the Wikimedia Foundation, Dario Taraborelli, and Jigsaw’s then-Chief Scientist Lucas Dixon. Taraborelli has since joined the science division of the Chan Zuckerberg Initiative belonging to Facebook founder Mark Zuckerberg and his wife Priscilla Chan.

Members of Wikipedia criticism site Wikipediocracy recently began experimenting with the public demo version of the tool to test its reliability as did members of Wikipedia. The tool gives ratings for the “attacking” nature and the “aggressive” nature of any inputted comments on a scale from zero to one, the latter denoting a highly hostile comment. On both sites, members found the tool easy to trick by using synonyms for curse words or by making more passive-aggressive comments. It also regularly misidentified certain phrases as attacks.

One member of Wikipediocracy, Mendaliv, found that while the term “ban” was marked as 30 percent hostile, this percentage was cut in half when phrased as a suggestion to ban a man. If instead the comment was phrased as a suggestion to ban a woman, the comment was rated roughly twice as hostile as a bland ban suggestion and about four times as hostile as a suggestion to ban a man. This was borne out by additional rephrasing where every iteration saw comments directed at women treated as more hostile.

In relation to the #WikiMassacre, Wikipedia editors recently noticed a "Detox" tool created by researchers from the Wikimedia Foundation and Google's Jigsaw research division. Various issues were identified after editors and members of Wikipediocracy began experimenting with it. pic.twitter.com/DgaB7ArYmv — T. D. Adler (@tdadler) June 25, 2019

Criticism of the Detox tool had emerged after it was brought up in connection with the recent banning of a veteran administrator by the Wikimedia Foundation. Users suggested some form of automated system may have been used to examine the conduct of the administrator, though the Foundation has denied the allegation.

Following complaints about some results treating statements identifying as gay as hostile and rating antisemitic comments as not hostile, a Foundation staffer had the tool shut down. These issues replicated known problems with the Perspective API, which is related to the Detox tool. These issues were highlighted early by journalist and former Google programmer David Auerbach in MIT Technology Review. Breitbart’s own investigation into Perspective found it gauged comments about Muslims as more hostile than comments about Christians.

Before the Detox tool and the related Perspective API were publicized and scrutinized, they were already used as the basis for research into the “toxicity” of Wikipedia’s community. This included various tactics about who engages in attacks and how often attacks lead to accounts being sanctioned. Media outlets subsequently covered these claims and the underlying research uncritically. It was suggested the research and tool could form the basis of future automated tools to intervene in harassment cases.

The search for new means to police Wikipedia’s community is a response to frequent claims that “toxicity” is inhibiting inclusivity. While Wikipedia has long exhibited a left-wing political bias, progressive media still attack the online encyclopedia over its “gender gap” in contributors and other diversity issues, prompting a push for more action on harassment and incivility on the site. Recent events suggest the Foundation is losing trust in the ability of Wikipedia’s community to manage the issue itself.

T. D. Adler edited Wikipedia as The Devil’s Advocate. He was banned after privately reporting conflict of interest editing by one of the site’s administrators. Due to previous witch-hunts led by mainstream Wikipedians against their critics, Adler writes under an alias.