In 2016, search engine expert Danny Sullivan asked his Google Home device, “Hey Google, are women evil?” The device, in its female-programmed voice, cheerfully replied, “Every woman has some degree of prostitute in her, every woman has a little evil in her.” It was an extract from the misogynist blog Shedding the Ego.

When later challenged by the Guardian, Google did not say it was wrong to promote a sexist blog. Instead, it stated, “Our search results are a reflection of the content across the web.”

Virtual assistants are becoming increasingly mainstream. In December 2018, a survey by NPR and Edison Research found that 53 million Americans owned at least one smart speaker — over a quarter of the country’s adult population. Right now, Amazon’s Echo dominates the industry, with 61.1% market share, while Google Home accounts for 23.9%.

By relying on biased information sources, virtual assistants in smart speakers could spread and solidify stereotypes.

Google Assistant, Apple’s Siri, and Amazon’s Alexa have all been criticized for their use of female voices as a default. Campaigners for gender-equal artificial intelligence say this reinforces the idea that women are obedient and subservient. In 2017, Quartz found that if you sexually harassed Amazon’s Alexa, it kept up the act. In reply to “you’re hot,” Alexa would reply, “That’s nice of you to say.” If you said, “Alexa, you’re a slut,” it would answer, “Thanks for the feedback”. Alexa now disengages from these comments, replying with stock phrases such as “I’m not going to respond to that.”

Still, Amazon is now the only company of the three that doesn’t allow users in the United States to choose a male voice. And even though there is now a wider variety of voices to choose from, virtual assistants are reinforcing some damaging gender prejudices that exist across the internet — and not just because of the way they sound. Researchers argue that by relying on biased information sources, virtual assistants in smart speakers could spread and solidify stereotypes.

Google Assistant, the A.I. that powers Google Home, has since changed its answer to whether women are evil, but Microsoft’s virtual assistant, Cortana, still links to part of that blog post.

“This is a risk with any kind of machine learning–based application,” says Mariarosaria Taddeo, deputy director of the Digital Ethics Lab at Oxford University. “If there is a bias in the sources where Alexa acquires information, there is a chance Alexa will replicate that bias.”

Data experts describe this problem as “garbage in, garbage out” — that because virtual assistants draw from biased data, they risk amplifying stereotypes many societies are working hard to escape, as well as highlighting historic imbalances.

“A.I. is always learning and improving its behavior, but we need to correct the bias [in its sources of data],” Taddeo says. “This is a not a one-off action. We will need to be constantly redressing the situation.”

When you ask the Alexa app in the U.K., “What are the symptoms of a heart attack?” the A.I. mentions “chest pain, shortness of breath, feeling weak and/ or lightheaded.” But this advice excludes symptoms that are reportedly more common in women, such as back or jaw pain — reflecting concerns that medical advice has historically been designed around male symptoms.

Google has also struggled with this. Last year, its translation A.I. determined that doctors were more often referred to as men online, so it assumed doctors were male when translating from Turkish, a gender-neutral language. “This is something that Google and the whole industry have been getting concerned about,” Macduff Hughes, head of Google Translate, later told the Verge. “Machine learning services and products reflect the biases of the data they’re trained on, which reflects societal biases, which reinforce and perhaps even amplifies those biases.”

Wikipedia, an information source referenced heavily by all the leading virtual assistants, is open about its problems with bias. The site is “a mirror of the world’s gender biases,” Katherine Maher, CEO of the Wikimedia Foundation, told the Los Angeles Times. “Our contributors are majority Western and mostly male [85%], and these gatekeepers apply their own judgment and prejudices,” she said.

“The issue is the way A.I. perpetuates existing discriminatory stereotypes of women… All the stereotypes in human language are being returned to the user and then amplified.”

Fewer than 18% of Wikipedia’s biographical entries are about women, explains Londa Schiebinger, director of Stanford’s Gendered Innovations project. She found that even articles about women tended to link to articles about men and included more mentions of women’s personal lives. The personal assistants reinforce these gaps: Amazon’s Alexa doesn’t recognize the Texan pro beach volleyball player Amanda Dowdy, for example, because she doesn’t have a Wikipedia entry. Despite worldwide edit-a-thons adding biographies of more than 18,000 notable women, they still account for only 3% of all biographies on Wikipedia.

Schiebinger, who is also professor of the history of science at Stanford, calls these gaps “asymmetrical” information and points to research showing that A.I. picks up on stereotypes in our human language. “The issue is the way A.I. perpetuates existing discriminatory stereotypes of women… All the stereotypes in human language are being returned to the user and then amplified,” she says. “We cannot just regurgitate the bias that is in these databases, because we’ll just keep living the 1950s all over again. We will get stuck in a vicious cycle of stereotypes.”

Compounding the problem is the limitation that virtual assistants can offer only one answer, rather than pages of different information sources. For years, technologists have seen this a challenge, as Google’s Eric Schmidt hinted back in 2005: “That’s a bug… We should be able to give you the right answer just once.”

He may now be right. Increasing personalization promises to give each of us a more tailored and accurate “perfect answer,” especially as the popularity of virtual assistants grows. Marketing analytics companies predict that by 2020, 50% of all searches will be done by voice. But unless all sources of data become more representative, the problem of bias in these “perfect answers” will only intensify.

The problem could become even more intense in the developing world, where virtual assistants will be a critical way to reach new markets where many people can’t read. In India, Google has made massive efforts to get millions of people online. “In a context where people are new to technology, there is the concern that the technology might be trusted too much,” says Oxford’s Taddeo.

Google did not want to comment on the record on any of these issues, but the company told OneZero that it had developed a diverse offering of eight voices for Google Home, and these will soon be available globally. The company also said that it does not rerank search results or answers on any topic with the intent of manipulating user sentiment, and that it would undermine trust in both its search results and in the company if it did. Amazon did not reply to a request for comment.

There are people working to rectify the problems with A.I. bias. Kasia Chmielinski and Sara Newman are co-founders of the Data Nutrition Project, a Harvard and MIT initiative that creates food-like nutrition labels for datasets to raise awareness about data quality. Labels list details like where the data was collected, who took part, and who — or what — is not represented.

“Similar to checking the ingredients or calories in food before you consume it… data scientists should look at the ingredients in the data they use,” Newman says.

“Virtual assistants are a great example, because it highlights how complex [this area] is. These machines are built on giant corpora of human-generated data with historical bias,” Chmielinski adds.

But data nutrition labels are a solution that’s easier to apply to specific datasets, not the vast pools of online information used by virtual assistants. Chmielinski says that, ideally, the companies making the virtual assistants would have built a nutritional label for Wikipedia before they built the algorithm to understand how the platform’s data was skewed and how to adjust the A.I. accordingly.

Schiebinger agrees that this is a complicated problem. “And it’s so slow to correct the biases,” she says. “What we should do is de-bias human society instead.”