by Priyanka Boghani

Artificial intelligence has already started to shape our lives in ubiquitous and occasionally invisible ways. In its new documentary, In The Age of AI, FRONTLINE examines the promise and peril this technology. AI systems are being deployed by hiring managers, courts, law enforcement, and hospitals — sometimes without the knowledge of the people being screened. And while these systems were initially lauded for being more objective than humans, it’s fast becoming clear that the algorithms harbor bias, too.

It’s an issue Joy Buolamwini, a graduate researcher at the Massachusetts Institute of Technology, knows about firsthand. She founded the Algorithmic Justice League to draw attention to the issue, and earlier this year she testified at a congressional hearing on the impact of facial recognition technology on civil rights.

“One of the major issues with algorithmic bias is you may not know it’s happening,” Buolamwini told FRONTLINE. We spoke to her about how she encountered algorithmic bias, about her research, and what she thinks the public needs to know.

This interview has been edited for length and clarity.

On her first encounter with algorithmic bias.

The first time I had issues with facial detection technology was actually when I was an undergraduate at Georgia Tech, and I was working on a robot. The idea with this robot was to see if I could get it to play peek-a-boo with me. And peek-a-boo doesn’t really work if your robot can’t see you, and my robot couldn’t see me. To get my project done, I borrowed my roommate’s face. She was lighter skinned than I was. …That was my first time really using facial analysis technology and seeing that it didn’t work for me the same way it worked for other people. …

I went on to do many things and became a graduate student at MIT and I started working on projects that used facial analysis technology, face detection. So one project I did was something called the Aspire Mirror. You look into a mirror, a camera detects your face and then a lion can appear on you, or you can be somebody you’re inspired by…

[I]t wasn’t detecting my face consistently, so I got frustrated. So what do you do when you get frustrated with your program? You debug. I started trying to figure out ways to make it work. I actually drew a face on my hand, and the system detected the face on my palm. And I was like, “Wait, wait, wait, if it’s detecting the face I just drew on my palm, then anything’s a possibility now.” So I looked around my office and the white mask was there. So I was like, “There’s no way! But why not?”

I pick up the white mask, and I put it on and it’s instantaneous when I put on that white mask, and I mean just — the symbolism of it was not lost to me. This is ridiculous that the system can detect this white mask that is not a real person, but cannot necessarily detect my face. So this is really when I started thinking, “Okay, let’s a dig a bit deeper with what’s going on with these systems.” …

On digging a bit deeper into facial analysis technology.

Here was a question: Do these systems perform differently on various faces? There was already a 2012 report that actually came out from an FBI facial analysis expert showing that facial recognition systems in particular worked better on white faces than black faces. They didn’t work as well on youthful faces. And they didn’t work as well on women as compared to men. This was 2012, and why I keep bringing that up is this was before the deep learning revolution…

Now we had a different approach that was supposed to be working much better. My question was, given these new approaches to facial analysis and facial recognition, are there still biases? Because what I’m experiencing, what my friends are experiencing — and what I’m reading about with reports that say, “Oh, we’ve solved face recognition,” or “We’re 97% accurate from benchmarks” — those reports were not lining up to my reality. …

What I focused on specifically was gender classification. …I wanted to choose something that I thought would be straightforward to explain, not that gender is straightforward — it’s highly complex. But insomuch as we were seeing binary gender classification, I thought that would be a place to start. By this time my weekend hobby was literally running my face through facial analysis and seeing what would happen. So some wouldn’t detect my face and others would label me male. And I do not identify as male. This is what led down that corridor.

On finding the “gold standard benchmarks” were not representative.

When I ran this test, the first issue that I ran into which gave me some more insight with the issue we’re talking about — algorithmic bias — was that our measures for how well these systems perform were not representative of the world. …We’ve supposedly done well on gold standard benchmarks. So I started looking at the benchmarks. These are essentially the data sets we use to analyze how well we’re doing as a research community or as an industry on specific AI tasks. So facial recognition is one of these tasks that people are benchmarked on all the time.

“What I started to see was something I call ‘power shadows’ — when either the inequalities or imbalances that we have in the world become embedded in our data.”

The thing is, we often times don’t question the status quo or the benchmark. This is the benchmark, why would I question it? But sometimes the gold standard turns out to be pyrite. And that is what was happening in this case. When I went to look at the research on the breakdown of various facial analysis systems, what I found was one of the leading gold standards, labeled “Faces in the Wild,” was over 70% male and 80% white. This is when I started looking into more and more data sets and seeing that you had massive skews. Sometimes you had massive skews because you were using celebrities. I mean, celebrities don’t necessarily look like the rest of the world. What I started to see was something I call “power shadows” — when either the inequalities or imbalances that we have in the world become embedded in our data. …

All this to say, the measures that we had for determining progress with facial analysis technology were misleading because they weren’t representative of people — at least the U.S. in that case. …We didn’t have data sets that were actually reflective of the world, so for my thesis at MIT, I created what I call the Pilot Parliaments Benchmark. I went to UN women’s websites, I got a list of the top 10 nations in the world by their representation of women in parliament. … So I chose European countries and African nations to try to get a spread on opposite ends of skin types, lighter skin and darker skin. After I ran into this issue that the benchmarks were misleading, I needed to make the benchmark.

On what her research found.

Then finally, I could get to the research question. …So I wanted to know how accurate are they at this reduced task of binary gender classification — which is not at all inclusive — when it comes to guessing the gender of the face? And it turned out that there were major gaps. This was surprising because these were commercially sold products. … You know how the story goes. It turns out, the systems work better on male-labeled faces than female-labeled faces, they work better on lighter faces than darker-skinned faces.

But one thing we did for this study, which I would stress for anybody who’s thinking about doing research in algorithmic bias or concerned with algorithmic bias and AI harms, is we did an intersectional analysis. We didn’t just look at skin type. We didn’t just look at gender. We looked at the intersection. And the inspiration for this was from Kimberlé Crenshaw, a legal scholar who in 1989 introduced the term of intersectionality. …What would happen with the analysis is if you did it in aggregate just based on race, or if you did it in aggregate based on just gender, you might find based on those axes that there isn’t substantial evidence of discrimination. But if you did it at the intersection, you would find there was a difference. And so I started looking at the research studies around facial analysis technologies and facial recognition technologies and I saw that usually we just have aggregate results — just one number for accuracy. People are just optimizing for that overall accuracy, which means we don’t get a sense of the various ways in which the system performs for different types of people. It’s the differences in the performance, the accuracy disparities that I was fascinated by, but not just on a single axis but also on the intersection. So when we did the intersectional breakdown — oooh, it was crazy. …

We weren’t doing anything to try to trick the system. It was an optimistic test. This is why I was very surprised, because even with this optimistic test, in the worst-case scenario for the darkest-skinned women, you actually had error rates as high as 47% on a binary classification task. …

I shared the results with the companies and I got a variety of responses. But I think the overall response, at least with the first study, was there was an acknowledgement of an issue with algorithmic bias.

On how AI is already affecting people’s lives.

There’s a paper that just came out from Science which is devastating, showing risk assessment algorithms used in health care… actually have racial bias against black patients. We’re talking about health care where the whole point is to try to optimize the benefit and what they were seeing was because they used how much money is spent on an individual as a proxy for how sick they were, it turned out it was not a good proxy because black patients who were actually sick were being said to be not as sick as they were. …

“When these systems fail, they fail most the people who are already marginalized, the people who are already vulnerable.”

You also have AIs that are determining the kind of ads people see. And so there have been studies that show you can have discriminatory ad targeting. Or you can have a situation where you have an ad for CEO and the system over time learns to present that CEO ad to mainly men. You were saying, how do you know if you’ve encountered bias — the thing is you might never know if you’ve encountered the bias. …Something that might happen to other people — you see phenotypic fails with passport renewals. So you have a report from a New Zealand man of Asian descent being told that his eyes are closed and he needs to upload another photo. Meanwhile, his eyes are not closed. You have, in the UK, a black man being told his mouth is open. His mouth was not open. You have these systems that are seeping into every day.

You have AI systems that are meant to verify if you’re who you say you are. And so one way that can happen is with ride share apps. Uber, for example, will ping drivers to have them verify their ID. There’s actually a report from trans drivers who were saying that they were being repeatedly [asked] to submit their IDs because they were not matching. They were being either kicked out of the system or having to stop the car, test it again, which means you’re not getting the same level of economic opportunity. …

When these systems fail, they fail most the people who are already marginalized, the people who are already vulnerable. And so when we think about algorithmic bias, we really have to be thinking about algorithmic harm. That’s not to say we don’t also have the risk of mass surveillance, which impacts everybody. We also have to think about who’s going to be encountering the criminal justice system more often because of racial policing practices and injustices.

On what the public needs to know about algorithmic bias.

There’s no requirement for meaningful transparency, so these systems can easily be rolled out without our ever knowing. So one thing I wish people would do more of and something that companies also ought to do more of is having transparency so that you even know that an AI system was used in the first place. You just might never get the callback. You just might pay the higher price. You would never actually know. What I want the public to know is AI systems are being used in hidden ways that we should demand are made public.

The other thing I want the public to have is actually choice — affirmative consent. Not only should I know if an AI system is being used, but let’s say it makes the wrong decision or something that I contest. There’s no path to due process that’s mandated right now. So if something goes wrong, what do you do?

Sometimes I’ll hear, at least in the research community, efforts to “de-bias” AI or eradicate algorithmic bias. And it’s a tempting notion, let’s just get rid of the bias and make the systems more fair, more inclusive, some ideal. And I always ask, but have we gotten rid of the humans? Because even if you create some system you believe is somehow more objective, it’s being used by humans at the end of the day. …I don’t think we can ever reach a true state of something being unbiased, because there are always priorities. This is something I call the “coded gaze.” The “coded gaze” is a reflection of the priorities, the preferences and also the prejudices of those who are shaping technology. This is not to say we can’t do our best to try to create systems that don’t produce harmful outcomes. I’m not saying that at all. What I am saying is we also have to accept the fact that being human we’re going to miss something. We’re not going to get it all right. …

“What I want the public to know is AI systems are being used in hidden ways that we should demand are made public.”

Instead of thinking about “Oh, we’re going to get rid of bias,” what we can think about is bias mitigation — knowing that we have flaws, knowing that our data has flaws, understanding that even systems we try to perfect to the best of our abilities are going to be used in the real world with all of its problems. …

Before we get to the point where it’s having major harms with real world consequences, there need to be processes in place to check through different types of bias that could happen. So, for example, AI [systems] now have algorithmic risk assessments that they have as a process of really thinking through what the societal impact of the system are in its design and development stages before you get to the deployment. Those kinds of approaches, I believe, are extremely helpful, because then we can be proactive instead of reacting to the latest headline and playing bias whack-a-mole. …

On proposals for oversight and regulation.

You have a proposal for an Algorithmic Accountability Act, this is a nationwide push that would actually require assessing systems for their social impact. And I think that’s really important. We have something with the Algorithmic Justice League that’s called the Safe Face Pledge, which outlines actionable steps companies can take to mitigate harms of AI systems. …

I absolutely think regulation needs to be the first and foremost tool, but alongside regulation providing not just the critique of what’s wrong with the system, but also steps that people can take to do better. Sometimes the step to take to do better is to commit to not developing a particular kind of technology or particular use case for technology. So with facial analysis systems, one of our banned uses is any situation where lethal force can be used. So it would mean we’re not supporting facial recognition on police body cameras. Or facial recognition on lethal autonomous weapons. …

And I think the most important thing about the Safe Face Pledge that I’ve seen is one, the conversations that I’ve had with different vendors, where whether or not they adopt it actually going through those steps and thinking about their process and changes they can make in the process I believe has made internal shifts that likely would not hit the headlines. Because people would rather quietly make certain kinds of changes. The other thing is making it where the commitments have to be part of your business processes. Not a scouts’ honor pledge, just trust us. If you are committed to actually making this agreement, it means you have to change your terms of service and your business contracts to reflect what these commitments are. …

On what should be done to fix the problem.

One, I think, demand transparency and ask questions. Ask questions if you’re using a platform, if you’re going to a job interview. Is AI being used? The other thing I do think is supporting legislative moves. …

When I started talking about this, I think in 2016, it was such a foreign concept in the conversations that I would have. And now, today, I can’t go online without seeing some kind of news article or story about a biased AI system of some shape or form. I absolutely think there has been an increase in public awareness, whether through books like Cathy O’Neil’s Weapons of Math Destruction. There’s a great new book out by Dr. Ruha Benjamin — Race After Technology.

People know it’s an issue and so I’m excited about that. Has there been enough done? Absolutely not. Because people are just now waking up to the fact that there’s a problem. Awareness is good, and then that awareness needs to lead to action. That is the phase we’re in. Companies have a role to play, governments have a role to play and individuals have a role to play.

When you see the bans in San Francisco [of facial recognition technology by the city’s agencies]… what you saw was a very powerful counter-narrative. What we were hearing was that this technology is inevitable, there’s nothing you can do. …When you hear there’s nothing you can do, you stop trying. But what was extremely encouraging to me with the San Francisco ban — and then you have Somerville that came from the folks who are in Boston — people have a voice and people have a choice. This technology is not inherently inevitable. We have to look at it and say: What are the benefits and what are the harms? If the harms are too great, we can put restrictions and we can put limitations. And this is necessary. I do look to those examples and they give me hope.