Facial recognition is increasingly used to police public spaces, board flights and secure laptops, but the engineers behind it have encountered a problem. The datasets upon which algorithms are trained have been found to contain horrendous biases, inherited from their human originators, which come horribly to light when computer vision systems identify black people as “negroids” and criminals. AI systems that have error rates of no more than 1 per cent for lighter-skinned men miss up to 35 per cent for darker-skinned women. When computers are taught to “see”, their world view depends on what humans show them.

The most obvious solution is to make training data more diverse, but this creates its own problems. Earlier this year, the technology company IBM launched its “diversity in faces” database, a training set of one million images of faces taken from Flickr that ostensibly solves the problem of bias with greater diversity. Buried in an article published earlier this year by IBM researchers, however, was a small but revealing detail. To ensure “fairness” and “accuracy”, the researchers found, classifications such as gender, age and skin colour are insufficient. They suggested using “craniofacial distances, areas and ratios” – facial symmetry and skull shapes – to categorise faces within the training data.

The measurement of peoples' skulls and facial features – known as craniometry – is a subject with a disturbing history. During the 19th and 20th centuries, biological determinists and pseudoscientists used skull shapes and face measurements to categorise what they believed were genetically inherited traits belonging to particular groups. Poverty, low educational attainment and criminality, they thought, could be predicted and explained by the supposedly heritable variations between people of different social classes and skin colours.

“The IBM team asked whether age, gender and skin colour were truly sufficient to generate a diverse dataset, and concluded that to have even more vectors of diversity, they would move into this truly strange territory of facial symmetry and skull shapes”, says Kate Crawford, a distinguished research professor at New York University and co-founder of the AI Now Institute. “In some ways, this is reinscribing the old ‘biology is destiny’ belief, and particularly a kind of biological determinism in relation to race, which social scientists have profoundly critiqued for over 50 years.”

Although using facial symmetry and skull shapes to classify people according to their social groups flies in the face of decades of research, Crawford says “we’re seeing a rerun, because it’s an easy way to create a metric – to simplify the world so that it can be quantified and computationally processed.”

“But of course, whether knowing it or not, this harks back to the leading methodological approach used by biological determinists in the 19th century, where skull size was being used, effectively by pseudo-scientists, as a spurious way to claim the inherent superiority of white people over black people.”

We might assume that this kind of thinking belongs firmly in the past. But the idea that skull shapes and facial features are indicative of peoples' character or social identity is enjoying a worrying comeback, thanks to facial recognition software. Last month, the consumer goods company Unilever announced it would use a facial analysis tool developed by the US company HireVue to screen job candidates. The technology is designed to “analyse the language, tone and facial expressions of candidates when they are asked a set of identical job questions” – marking this against 25,000 pieces of facial and linguistic data from other successful hires. Are facial measurements indicative of a person’s employability? The description of this technology – and the images of it that circulated online – suggested Unilever seems to think so. As many commentators noticed on Twitter, HireVue appeared to have reinvented phrenology for the digital age.

Why is craniometry enjoying a renaissance? An idea running through the IBM paper is the notion that more information always produces better outcomes. By drilling down into the tiny differences in the shapes of peoples’ heads and faces, researchers hope to create a database that is genuinely more reflective of human diversity. But collecting and classifying more information about peoples’ visual differences brings us no closer in understanding what group they identify with, or whether they would be good at a job. The answer to those questions is social, rather than biological.

“The issue isn’t just that the types of classification aren’t granular enough,” Crawford says. “That’s what is producing this type of race to have ever-more precise and granular identifications. I actually don’t think that’s going to increase the justice of these systems – I think we’re just going to see more and more of these bespoke, strange categorisations of difference between the skull size and skin tone of various people.”

The attempt to classify and categorise different groups of people using physical measurements was also a goal of the discredited practices of race science during the 19th and 20th century. As the journalist and author Angela Saini has noted, the genetic variations we typically associate with “race” – hair, skin and eye colour – are only surface deep, and fail to capture the complex reality of genetic diversity. Cosmetic differences are an easy heuristic for putting people into social and political boxes, but the idea that appearance dictates even biological differences has long been discredited. In a groundbreaking paper published in 1972, the geneticist Richard Lewontin found that genetic variance is higher within populations – or supposed “races” – than between them. Around 85 per cent of genetic diversity occurs within local groups or populations.

The 20th century is littered with examples of what happens when genetic determinism fuses with advances in technology. IBM has played a role in a number of these dark episodes. The company famously developed a system of punch cards that it leased to the Nazi regime, allowing the Nazis to identify and track the Jewish population. In South Africa, IBM was accused of creating a population tracking and classification system that placed every citizen into one of four racial categories and allowed the apartheid government to monitor the movements, employment opportunities and healthcare of black South Africans.

IBM's Diversity in Faces database is a well-intentioned attempt at fixing the problems of racially biased technologies. But history has shown that when physical information is collected to separate and classify different groups of people, it can amplify the artificial distinctions that others create. The result is not less bias, but more.

Questions directed to IBM about the practical and ethical implications of using craniofacial information in such systems received no response.