Photo by Adrian Tombu

Artificial intelligence is infiltrating our daily lives, with applications that curate your phone pics, manage your email, and translate text from any language into another. Google, Facebook, Apple, and Microsoft are all heavily researching how to integrate AI into their major services. Soon you’ll likely interact with an AI (or its output) every time you pick up your phone. Should you trust it? Not always.




AI can analyze data more quickly and accurately than humans, but it can also inherit our biases. To learn, it needs massive quantities of data, and the easiest way to find that data is to feed it text from the internet. But the internet contains some extremely biased language. A Stanford study found that an internet-trained AI associated stereotypically white names with positive words like “love,” and black names with negative words like “failure” and “cancer.”

Luminoso Chief Science Officer Rob Speer oversees the open-source data set ConceptNet Numberbatch, which is used as a knowledge base for AI systems. He tested one of Numberbatch’s data sources and found obvious problems with their word associations. When fed the analogy question “Man is to woman as shopkeeper is to...” the system filled in “housewife.” It similarly associated women with sewing and cosmetics.


While these associations might be appropriate for certain applications, they would cause problems in common AI tasks like evaluating job applicants. An AI doesn’t know which associations are problematic, so it would have no problem ranking a woman’s résumé lower than an identical résumé from a man. Similarly, when Speer tried building a restaurant review algorithm, it rated Mexican food lower because it had learned to associate “Mexican” with negative words like “illegal.”

So Speer went in and de-biased ConceptNet. He identified inappropriate associations and adjusted them to zero, while maintaining appropriate associations like “man/uncle” and “woman/aunt.” He did the same with words related to race, ethnicity, and religion. To fight human bias, it took a human.

Numberbatch is the only semantic database with built-in de-biasing, Speer says in an email. He’s happy for this competitive advantage, but he hopes other knowledge bases will follow suit:

This is the threat of AI in the near term. It’s not some sci-fi scenario where robots take over the world. It’s AI-powered services making decisions we don’t understand, where the decisions turn out to hurt certain groups of people.﻿


The scariest thing about this bias is how invisibly it can take over. According to Speer, “some people [will] go through life not knowing why they get fewer opportunities, fewer job offers, more interactions with the police or the TSA...” Of course, he points out, racism and sexism are baked into society, and promising technological advances, even when explicitly meant to counteract them, often amplify them. There’s no such thing as an objective tool built on subjective data. So AI developers bear a huge responsibility to find the flaws in their AI and address them.

“There should be more understanding of what’s real and what’s hype,” Speer says. “It’s easy to overhype AI because most people don’t have the right metaphors to understand it yet, and that stops people from being appropriately skeptical.


“There’s no AI that works like the human brain,” he says. “To counter the hype, I hope we can stop talking about brains and start talking about what’s actually going on: it’s mostly statistics, databases, and pattern recognition. Which shouldn’t make it any less interesting.”