Machine learning systems are already being widely used to flag potentially problematic content to a human workforce: For example, Facebook and YouTube are using machine learning-powered software to scan and flag content to human moderators. Meanwhile, Perspective API, developed by Google Jigsaw, has been used to flag potentially inappropriate content for review on both Wikipedia and the New York Times comments section.

In a letter to Amnesty International, Twitter has called machine learning “one of the areas of greatest potential for tackling abusive users”. Twitter CEO Jack Dorsey has similarly said that “We think that we can reduce the amount of abuse and create technology to recognize it before a report has to be made.” Twitter has also said that it is focused on machine learning in an effort to combat spam and automated accounts, and that it has begun acting against abusive accounts that have not yet been reported.

However, the trend towards using machine learning to automate content moderation online also poses risks to human rights. For example, David Kaye, the UN Special Rapporteur on Freedom of Expression, has noted (paras 32-25) that “automation may provide value for companies assessing huge volumes of user-generated content.” He cautions, however, that in subject areas dealing with issues which require an analysis of context, such tools can be less useful, or even problematic.

We have already seen that there can be serious human rights consequences when algorithms mistakenly censor content. In June 2017, Google announced "four steps intended to fight terrorism online", among them more rigorous detection and faster removal of content related to 'violent extremism' and 'terrorism'. The automated flagging and removal of content resulted in the accidental removal of hundreds of thousands of YouTube videos uploaded by journalists, investigators, and human rights organizations.

The simple reality is that the use of machine learning necessarily accepts working within margins of error. For example, the decision to weight an algorithm towards greater precision will result in increased detection of genuinely abusive tweets, at the risk of missing abusive content which is more subtle (equivalent to casting the net too narrow). On the other hand, weighting an algorithm towards greater recall would capture a wider range of abusive content, at the risk of also capturing false positives - that is to say, content that should be protected as legitimate speech (equivalent to casting the net too wide). These trade-offs are value-based judgements with serious implications for freedom of expression and other human rights online.

Amnesty International and Element AI’s experience using machine learning to detect online abuse against women highlights the risks of leaving it to algorithms to determine what constitutes abuse. As it stands, automation may have a useful role to play in assessing trends or flagging content for human review, but it should, at best, be used to assist trained moderators, and certainly should not replace them. Human judgement by trained moderators remains crucial for contextual interpretation, such as examination of the intent, content and form of a piece of content, as well as assessing compliance with policies. It is vital that companies are transparent about how exactly they are using automated systems within their content moderation systems and that they publish information about the algorithms they develop.