Human bias can seep into AI systems. Amazon abandoned a recruiting algorithm after it was shown to favor men’s resumes over women’s; researchers concluded an algorithm used in courtroom sentencing was more lenient to white people than to black people; a study found that mortgage algorithms discriminate against Latino and African American borrowers.

The tech industry knows this, and some companies, like IBM, are releasing “debiasing toolkits” to tackle the problem. These offer ways to scan for bias in AI systems — say, by examining the data they’re trained on — and adjust them so that they’re fairer.

But that technical debiasing is not enough, and can potentially result in even more harm, according to a new report from the AI Now Institute.

The three authors say we need to pay attention to how the AI systems are used in the real world even after they’ve been technically debiased. And we need to accept that some AI systems should not be designed at all.

The facial recognition systems that may be beyond “fixing”

Facial recognition technology is pretty good at identifying white people, but it’s notoriously bad at recognizing black faces. That can produce very offensive consequences — like when Google’s image-recognition system labeled African Americans as “gorillas” in 2015. But given that this tech is now used in police surveillance, which disproportionately targets people of color, maybe we don’t exactly want it to get great at identifying black people. As Zoé Samudzi recently wrote in the Daily Beast:

In a country where crime prevention already associates blackness with inherent criminality, why would we fight to make our faces more legible to a system designed to police us? … It is not social progress to make black people equally visible to software that will inevitably be further weaponized against us.

In other words, ensuring that an AI system works just as well on everyone does not mean it works just as well for everyone. Although the report doesn’t explicitly say we should scrap the facial recognition systems used for police surveillance, it does emphasize that we can’t assume diversifying their datasets will solve the problem — it might just exacerbate it.

Facial recognition tech has also caused problems for transgender people. For example, some trans Uber drivers have had their accounts suspended because the company uses a facial recognition system as a built-in security feature, and the system is bad at identifying the faces of people who are transitioning. Getting kicked off the app cost the trans drivers fares and effectively cost them a job.

Is the solution here to correct the bias in the AI system by ensuring that plenty of trans people are included in its training data? Again, debiasing might sound nice — until you realize that that would entail collecting tons of data on a community that has reason to feel extremely uncomfortable with data collection.

A few years ago, a computer science professor who wanted to train software to recognize people undergoing hormone replacement therapy collected videos from trans YouTubers without their consent. He got a lot of pushback, as The Verge reported:

Danielle, who is featured in the dataset and whose transition pictures appear in scientific papers because of it, says she was never contacted about her inclusion. “I by no means ‘hide’ my identity … But this feels like a violation of privacy … Someone who works in ‘identity sciences’ should understand the implications of identifying people, particularly those whose identity may make them a target (i.e., trans people in the military who may not be out).”

Rather than engage in invasive, nonconsensual mass data collection in the name of “fixing” an AI system, companies like Uber may do better to just allow a different means of account verification for trans drivers, the new report argues. Even if a company insists on using a facial ID login system for its workers, there’s no reason that should be the sole option.

“Algorithmic gaydar” systems should not be built. Period.

There have also been repeated attempts to create facial recognition algorithms that can tell if someone is gay. In 2017, a Stanford University study claimed an algorithm could accurately distinguish between gay and straight men 81 percent of the time based on headshots. It claimed 74 percent accuracy for women. The study made use of people’s online dating photos (the authors wouldn’t say from which site) and only tested the algorithm on white users, claiming not enough people of color could be found.

This is problematic on so many levels: It assumes that sexuality is binary and that it’s clearly legible in our facial features. And even if it were possible to detect queer sexuality this way, who would benefit from an “algorithmic gaydar” becoming widely available? Definitely not queer people, who could be outed against their will, including by governments in countries where sex with same-gender partners is criminalized. As Ashland Johnson, the Human Rights Campaign’s director of public education and research, put it:

Imagine for a moment the potential consequences if this flawed research were used to support a brutal regime’s efforts to identify and/or persecute people they believed to be gay. Stanford should distance itself from such junk science rather than lending its name and credibility to research that is dangerously flawed and leaves the world — and this case, millions of people’s lives — worse and less safe than before.

One of the authors on the AI Now report, Sarah Myers West, said in a press call that such “algorithmic gaydar” systems should not be built, both because they’re based on pseudoscience and because they put LGBTQ people at risk. “The researchers say, ‘We’re just doing this because we want to show how scary these systems can be,’ but then they explain in explicit detail how you would create such a system,” she said.

Co-author Kate Crawford listed other problematic examples, like attempts to predict “criminality” via facial features and to assess worker competence on the basis of “micro-expressions.” Studying physical appearance as a proxy for character is reminiscent of the dark history of “race science,” she said, in particular the debunked field of phrenology that sought to derive character traits from skull shape and was invoked by white supremacists in 19th century America.

“We see these systems replicating patterns of race and gender bias in ways that may deepen and actually justify injustice,” Crawford warned, noting that facial recognition services have been shown to ascribe more negative emotions (like anger) to black people than to white people because human bias creeps into the training data.

For all these reasons, there’s a growing recognition among scholars and advocates that some biased AI systems should not be “fixed,” but abandoned. As co-author Meredith Whittaker said, “We need to look beyond technical fixes for social problems. We need to ask: Who has power? Who is harmed? Who benefits? And ultimately, who gets to decide how these tools are built and which purposes they serve?”

Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.