As artificial intelligence is used to make more decisions about our lives, engineers have sought out ways to make it more emotionally intelligent. That means automating some of the emotional tasks that come naturally to humans — most notably, looking at a person’s face and knowing how they feel.

To achieve this, tech companies like Microsoft, IBM, and Amazon all sell what they call “emotion recognition” algorithms, which infer how people feel based on facial analysis. For example, if someone has a furrowed brow and pursed lips, it means they’re angry. If their eyes are wide, their eyebrows are raised, and their mouth is stretched, it means they’re afraid, and so on.

Clients can put this tech to use in a variety of ways, building everything from automated surveillance systems that look for “angry” threats to job interview software that promises to weed out bored and uninterested candidates.

But the belief that we can easily infer how people feel based on how they look is controversial, and a significant new review of the research suggests there’s no firm scientific justification for it.

“Companies can say whatever they want, but the data are clear,” Lisa Feldman Barrett, a professor of psychology at Northeastern University and one of the review’s five authors, tells The Verge. “They can detect a scowl, but that’s not the same thing as detecting anger.”

The review was commissioned by the Association for Psychological Science, and five distinguished scientists from the field were asked to scrutinize the evidence. Each reviewer represented different theoretical camps in the world of emotion science. “We weren’t sure if we would be able to come to a consensus over the data, but we did,” Barrett says. It took them two years to examine the data, with the review looking at more than 1,000 different studies.

Their findings are detailed — they can be read in full here — but the basic summary is that emotions are expressed in a huge variety of ways, which makes it hard to reliably infer how someone feels from a simple set of facial movements.

“People, on average, the data show, scowl less than 30 percent of the time when they’re angry,” says Barrett. “So scowls are not the expression of anger; they’re an expression of anger — one among many. That means that more than 70 percent of the time, people do not scowl when they’re angry. And on top of that, they scowl often when they’re not angry.”

“Would you really want outcomes being determined on this basis?”

This, in turn, means companies that use AI to evaluate people’s emotions in this way are misleading consumers. “Would you really want outcomes being determined on this basis?” says Barrett. “Would you want that in a court of law, or a hiring situation, or a medical diagnosis, or at the airport ... where an algorithm is accurate only 30 percent of the time?”

The review doesn’t deny that common or “prototypical” facial expressions might exist, of course, nor that our belief in the communicative power of facial expressions plays a huge role in society. (Don’t forget that when we see people in person, we have so much more information about the context of their emotions than simplistic facial analysis.)

The review recognizes that there’s a huge variety of beliefs in the field of emotion studies. What it rebuts, specifically, is this idea of reliably “fingerprinting” emotion through expression, which is a theory that has its roots in the work of psychologist Paul Ekman from the 1960s (and which Ekman has developed since).

Studies that seem to show a strong correlation between certain facial expressions and emotions are often methodologically flawed, says the review. For example, they use actors pulling exaggerated faces as their starting point for what emotions “look” like. And when test subjects are asked to label these expressions, they’re often asked to choose from a limited selection of emotions, which pushes them toward a certain consensus.

People intuitively understand that emotions are more complex than this, says Barrett. “When I say to people, ‘Sometimes you shout in anger, sometimes you cry in anger, sometimes you laugh, and sometimes you sit silently and plan the demise of your enemies,’ that convinces them,” she says. “I say, ‘Listen, what’s the last time someone won an Academy Award for scowling when they’re angry?’ No one considers that great acting.”

These subtleties, though, are rarely acknowledged by companies selling emotion analysis tools. In marketing for Microsoft’s algorithms, for example, the company says advances in AI allow its software to “recognize eight core emotional states ... based on universal facial expressions that reflect those feelings,” which is the exact claim that this review disproves.

This is not a new criticism, of course. Barrett and others have been warning for years that our model of emotion recognition is too simple. In response, companies selling these tools often say their analysis is based on more signals than just facial expression. The difficulty is knowing how these signals are balanced, if at all.

One of the leading companies in the $20 billion emotion recognition market, Affectiva, says it’s experimenting with collecting additional metrics. Last year, for example, it launched a tool that measures the emotions of drivers by combining face and speech analyses. Other researchers are looking into metrics like gait analysis and eye tracking.

In a statement, Affectiva CEO and co-founder Rana el Kaliouby said this review was “much in alignment” with the company’s work. “Like the authors of this paper, we do not like the naivete of the industry, which is fixated on the 6 basic emotions and a prototypic one-to-one mapping of facial expressions to emotional states,” said el Kaliouby. “The relationship of expressions to emotion is very nuanced, complex and not prototypical.”

Barrett is confident that we will be able to more accurately measure emotions in the future with more sophisticated metrics. “I absolutely believe it’s possible,” she says. But that won’t necessarily stop the current limited technology from proliferating.

AI is perfect for finding spurious connections in data

With machine learning, in particular, we often see metrics being used to make decisions — not because they’re reliable, but simply because they can be measured. This is a technology that excels at finding connections, and this can lead to all sorts of spurious analyses: from scanning babysitters’ social media posts to detect their “attitude” to analyzing corporate transcripts of earnings calls to try to predict stock prices. Often, the very mention of AI gives an undeserved veneer of credibility.

If emotion recognition becomes common, there’s a danger that we will simply accept it and change our behavior to accommodate its failings. In the same way that people now act in the knowledge that what they do online will be interpreted by various algorithms (e.g., choosing to not like certain pictures on Instagram because it affects your ads), we might end up performing exaggerated facial expressions because we know how they’ll be interpreted by machines. That wouldn’t be too different from signaling to other humans.

Barrett says that perhaps the most important takeaway from the review is that we need to think about emotions in a more complex fashion. The expressions of emotions are varied, complex, and situational. She compares the needed change in thinking to Charles Darwin’s work on the nature of species and how his research overturned a simplistic view of the animal kingdom.

“Darwin recognized that the biological category of a species does not have an essence, it’s a category of highly variable individuals,” says Barrett. “Exactly the same thing is true of emotional categories.”