iStock / bamlou

Machine learning is one of the biggest drivers of artificial intelligence technology at present. Algorithms within machine learning applications have been able to write code, play poker, and are being used in attempts to solve cancer. Yet, there is a bias problem.

Rise of the machines: are algorithms sprawling out of our control? Artificial Intelligence Rise of the machines: are algorithms sprawling out of our control?


Using the popular GloVe algorithm, trained on around 840 billion words from the internet, three Princeton University academics have shown AI applications replicate the stereotypes shown in the human-generated data.

These prejudices related to both race and gender. Machine learning, the computer scientists write in a paper published in Science, "absorbs stereotyped biases" when learning language.

Read next Covid-19 has shown how easy it is to automate white-collar work Covid-19 has shown how easy it is to automate white-collar work

"The reason we've been accelerating so quickly recently is because we've got very good at mining human intelligence and putting that into AI," Joanna Bryson, a computer scientist from Princeton and Bath University, tells WIRED. "We're getting better than humans now – in some areas like lipreading."

She continues: "We have learned something about how we are passing on prejudices that we didn't even know we were doing".


In the research, Bryson and colleagues Aylin Caliskan and Arvind Narayanan, used the Implicit Association Test (IAT) to determine where biases exist. Since it was developed in the 1990s the test type has been used in psychological studies to determine human traits.

In the test, volunteers are asked to pair word concepts that are displayed on a computer screen and the time it takes them to match them is measured. The paper says, 'rose' is often matched with pleasant ideas, such as 'love', whereas a 'moth' is associated with less pleasant and negative connotations.

Subscribe to WIRED

Read next Deepfakes are getting cheaper, easier and way more convincing Deepfakes are getting cheaper, easier and way more convincing

The Princeton scientists adapted the test to work with the GloVe algorithm and used text from the internet to experiment with word pairing. Bryson says in every IAT completed, biases from human language were learned by the machine learning system.


"A bundle of names associated with being European American was found to be significantly more easily associated with pleasant than unpleasant terms, compared to a bundle of African American names," the authors write in the research paper.

In another instance, female names were found to be more associated with family than career words when compared to male names. The IATs completed by machine learning replicated the results of when humans completed the test.

Holding AI to account: will algorithms ever be free from bias if they're created by humans? Transparency Holding AI to account: will algorithms ever be free from bias if they're created by humans?

The findings of the study do not come as a massive surprise, as machine learning systems are only able to interpret the data they are trained upon. In May 2016, ProPublica reported a software used across the US to predict future crimes and criminals was biased against African Americans, as the data used within it was not accurate.

While the potential for language bias, Bryson says, is not being used in many technologies at present, the work does highlight how prejudices and biases can easily be transferred.


"The main thing about the machines is that we don't want to have an AI system that we train-up on present culture or two-year-old culture and then freeze that," Bryson says. She adds that as AI systems develop alongside culture, they should be continually updated and retrained on new data.

"Our findings suggest that if we build an intelligent system that learns enough about the properties of language to be able to understand and produce it, in the process it will also acquire historic cultural associations, some of which can be objectionable," the paper concludes.

The paper: Semantics derived automatically from language corpora contain human-like biases was published in Science.