Everything we know about the face recognition systems the FBI and police use suggests the software has a built-in racial bias. That isn’t on purpose—it’s an artifact of how the systems are designed, and the data they are trained on. But it is problematic. Law enforcement agencies are relying more and more on such tools to aid in criminal investigations, increasing the risk that something could go wrong.

Law enforcement agencies haven’t provided many details on how they use facial recognition systems, but in June the Government Accountability Office issued a report saying that the FBI has not properly tested the accuracy of its face matching system, nor that of the massive network of state-level face matching databases it can access.

And while state-of-the-art face matching systems can be nearly 95 percent accurate on mugshot databases, those photos are taken under controlled conditions with generally coöperative subjects. Images taken under less-than-ideal circumstances, like bad lighting, or that capture unusual poses and facial expressions, can lead to errors.

Illustration by Sophia Foster-Dimino

The algorithms can also be biased due to the way they are trained, says Anil Jain, head of the biometrics research group at Michigan State University. To work, face matching software must first learn to recognize faces using training data, a set of images that gives the software information about how faces differ. If a gender, age group, or race is underrepresented in the training data, that will be reflected in the algorithm’s performance, says Jain.

In 2012, Jain and several colleagues used a set of mugshots from the Pinellas County Sheriff’s Office in Florida to examine the performance of several commercially available face recognition systems, including ones from vendors that supply law enforcement agencies. The algorithms were consistently less accurate on women, African-Americans, and younger people. Apparently they were trained on data that was not representative enough of those groups, says Jain.

“If your training set is strongly biased toward a particular race, your algorithm will do better recognizing that race,” says Alice O’Toole, head of the face perception research lab at University of Texas at Dallas. O’Toole and several colleagues found in 2011 that an algorithm developed in Western countries was better at recognizing Caucasian faces than it was at recognizing East Asian faces. Likewise, East Asian algorithms performed better on East Asian faces than on Caucasian ones.

In the several years since these studies, the accuracy of commercial algorithms has improved significantly in many areas, and Jain says the performance gaps between different genders and races may have narrowed. But so little testing information is available, it is hard to know. Newer approaches to face recognition, such as the deep learning systems Google and Facebook have developed, can make the same sort of mistakes if the training data is imbalanced, he says.

Jonathon Phillips, an electronic engineer at the National Institute of Standards and Technology, conducts performance tests of commercial algorithms. He says that it’s possible to design a test to measure racial bias in face matching systems. In fact, privacy experts have called for making such tests a requirement.

The FBI and MorphoTrust, the vendor that supplies the bureau’s face recognition software, did not answer e-mailed questions from MIT Technology Review regarding whether they test their algorithms’ performance by race, gender, or age.

The arrangements between vendors and the many state law enforcement agencies using face recognition are also not clear. But Pete Langenfeld, manager of digital analysis and identification for the Michigan State Police, says his organization does not test for group-specific accuracy. He said he does not know if the vendor that supplied the technology performs such tests either, but added that it is proprietary information, and the company isn’t required to release that information.