We learn to lie as children, between the ages of two and five. By adulthood, we are prolific. We lie to our employers, our partners and, most of all, one study has found, to our mothers. The average person hears up to 200 lies a day, according to research by Jerry Jellison, a psychologist at the University of Southern California. The majority of the lies we tell are “white”, the inconsequential niceties – “I love your dress!” – that grease the wheels of human interaction. But most people tell one or two “big” lies a day, says Richard Wiseman, a psychologist at the University of Hertfordshire. We lie to promote ourselves, protect ourselves and to hurt or avoid hurting others.

The mystery is how we keep getting away with it. Our bodies expose us in every way. Hearts race, sweat drips and micro-expressions leak from small muscles in the face. We stutter, stall and make Freudian slips. “No mortal can keep a secret,” wrote the psychoanalyst in 1905. “If his lips are silent, he chatters with his fingertips. Betrayal oozes out of him at every pore.”

Even so, we are hopeless at spotting deception. On average, across 206 scientific studies, people can separate truth from lies just 54% of the time – only marginally better than tossing a coin. “People are bad at it because the differences between truth-tellers and liars are typically small and unreliable,” said Aldert Vrij, a psychologist at the University of Portsmouth who has spent years studying ways to detect deception. Some people stiffen and freeze when put on the spot, others become more animated. Liars can spin yarns packed with colour and detail, and truth-tellers can seem vague and evasive.

Humans have been trying to overcome this problem for millennia. The search for a perfect lie detector has involved torture, trials by ordeal and, in ancient India, an encounter with a donkey in a dark room. Three thousand years ago in China, the accused were forced to chew and spit out rice; the grains were thought to stick in the dry, nervous mouths of the guilty. In 1730, the English writer Daniel Defoe suggested taking the pulse of suspected pickpockets. “Guilt carries fear always about with it,” he wrote. “There is a tremor in the blood of a thief.” More recently, lie detection has largely been equated with the juddering styluses of the polygraph machine – the quintessential lie detector beloved by daytime television hosts and police procedurals. But none of these methods has yielded a reliable way to separate fiction from fact.

That could soon change. In the past couple of decades, the rise of cheap computing power, brain-scanning technologies and artificial intelligence has given birth to what many claim is a powerful new generation of lie-detection tools. Startups, racing to commercialise these developments, want us to believe that a virtually infallible lie detector is just around the corner.

Their inventions are being snapped up by police forces, state agencies and nations desperate to secure themselves against foreign threats. They are also being used by employers, insurance companies and welfare officers. “We’ve seen an increase in interest from both the private sector and within government,” said Todd Mickelsen, the CEO of Converus, which makes a lie detector based on eye movements and subtle changes in pupil size.

Converus’s technology, EyeDetect, has been used by FedEx in Panama and Uber in Mexico to screen out drivers with criminal histories, and by the credit ratings agency Experian, which tests its staff in Colombia to make sure they aren’t manipulating the company’s database to secure loans for family members. In the UK, Northumbria police are carrying out a pilot scheme that uses EyeDetect to measure the rehabilitation of sex offenders. Other EyeDetect customers include the government of Afghanistan, McDonald’s and dozens of local police departments in the US. Soon, large-scale lie-detection programmes could be coming to the borders of the US and the European Union, where they would flag potentially deceptive travellers for further questioning.

But as tools such as EyeDetect infiltrate more and more areas of public and private life, there are urgent questions to be answered about their scientific validity and ethical use. In our age of high surveillance and anxieties about all-powerful AIs, the idea that a machine could read our most personal thoughts feels more plausible than ever to us as individuals, and to the governments and corporations funding the new wave of lie-detection research. But what if states and employers come to believe in the power of a lie-detection technology that proves to be deeply biased – or that doesn’t actually work?

And what do we do with these technologies if they do succeed? A machine that reliably sorts truth from falsehood could have profound implications for human conduct. The creators of these tools argue that by weeding out deception they can create a fairer, safer world. But the ways lie detectors have been used in the past suggests such claims may be far too optimistic.

For most of us, most of the time, lying is more taxing and more stressful than honesty. To calculate another person’s view, suppress emotions and hold back from blurting out the truth requires more thought and more energy than simply being honest. It demands that we bear what psychologists call a cognitive load. Carrying that burden, most lie-detection theories assume, leaves evidence in our bodies and actions.

Lie-detection technologies tend to examine five different types of evidence. The first two are verbal: the things we say and the way we say them. Jeff Hancock, an expert on digital communication at Stanford, has found that people who are lying in their online dating profiles tend to use the words “I”, “me” and “my” more often, for instance. Voice-stress analysis, which aims to detect deception based on changes in tone of voice, was used during the interrogation of George Zimmerman, who shot the teenager Trayvon Martin in 2012, and by UK councils between 2007 and 2010 in a pilot scheme that tried to catch benefit cheats over the phone. Only five of the 23 local authorities where voice analysis was trialled judged it a success, but in 2014, it was still in use in 20 councils, according to freedom of information requests by the campaign group False Economy.

The third source of evidence – body language – can also reveal hidden feelings. Some liars display so-called “duper’s delight”, a fleeting expression of glee that crosses the face when they think they have got away with it. Cognitive load makes people move differently, and liars trying to “act natural” can end up doing the opposite. In an experiment in 2015, researchers at the University of Cambridge were able to detect deception more than 70% of the time by using a skintight suit to measure how much subjects fidgeted and froze under questioning.

Get the Guardian’s award-winning long reads sent direct to you every Saturday morning

The fourth type of evidence is physiological. The polygraph measures blood pressure, breathing rate and sweat. Penile plethysmography tests arousal levels in sex offenders by measuring the engorgement of the penis using a special cuff. Infrared cameras analyse facial temperature. Unlike Pinocchio, our noses may actually shrink slightly when we lie as warm blood flows towards the brain.

In the 1990s, new technologies opened up a fifth, ostensibly more direct avenue of investigation: the brain. In the second season of the Netflix documentary Making a Murderer, Steven Avery, who is serving a life sentence for a brutal killing he says he did not commit, undergoes a “brain fingerprinting” exam, which uses an electrode-studded headset called an electroencephalogram, or EEG, to read his neural activity and translate it into waves rising and falling on a graph. The test’s inventor, Dr Larry Farwell, claims it can detect knowledge of a crime hidden in a suspect’s brain by picking up a neural response to phrases or pictures relating to the crime that only the perpetrator and investigators would recognise. Another EEG-based test was used in 2008 to convict a 24-year-old Indian woman named Aditi Sharma of murdering her fiance by lacing his food with arsenic, but Sharma’s sentence was eventually overturned on appeal when the Indian supreme court held that the test could violate the subject’s rights against self-incrimination.

After 9/11, the US government – long an enthusiastic sponsor of deception science – started funding other kinds of brain-based lie-detection work through Darpa, the Defence Advanced Research Projects Agency. By 2006, two companies – Cephos and No Lie MRI – were offering lie detection based on functional magnetic resonance imaging, or fMRI. Using powerful magnets, these tools track the flow of blood to areas of the brain involved in social calculation, memory recall and impulse control.

But just because a lie-detection tool seems technologically sophisticated doesn’t mean it works. “It’s quite simple to beat these tests in ways that are very difficult to detect by a potential investigator,” said Dr Giorgio Ganis, who studies EEG and fMRI-based lie detection at the University of Plymouth. In 2007, a research group set up by the MacArthur Foundation examined fMRI-based deception tests. “After looking at the literature, we concluded that we have no idea whether fMRI can or cannot detect lies,” said Anthony Wagner, a Stanford psychologist and a member of the MacArthur group, who has testified against the admissibility of fMRI lie detection in court.

A new frontier in lie detection is now emerging. An increasing number of projects are using AI to combine multiple sources of evidence into a single measure for deception. Machine learning is accelerating deception research by spotting previously unseen patterns in reams of data. Scientists at the University of Maryland, for example, have developed software that they claim can detect deception from courtroom footage with 88% accuracy.

The algorithms behind such tools are designed to improve continuously over time, and may ultimately end up basing their determinations of guilt and innocence on factors that even the humans who have programmed them don’t understand. These tests are being trialled in job interviews, at border crossings and in police interviews, but as they become increasingly widespread, civil rights groups and scientists are growing more and more concerned about the dangers they could unleash on society.

Nothing provides a clearer warning about the threats of the new generation of lie-detection than the history of the polygraph, the world’s best-known and most widely used deception test. Although almost a century old, the machine still dominates both the public perception of lie detection and the testing market, with millions of polygraph tests conducted every year. Ever since its creation, it has been attacked for its questionable accuracy, and for the way it has been used as a tool of coercion. But the polygraph’s flawed science continues to cast a shadow over lie detection technologies today.

Even John Larson, the inventor of the polygraph, came to hate his creation. In 1921, Larson was a 29-year-old rookie police officer working the downtown beat in Berkeley, California. But he had also studied physiology and criminology and, when not on patrol, he was in a lab at the University of California, developing ways to bring science to bear in the fight against crime.

In the spring of 1921, Larson built an ugly device that took continuous measurements of blood pressure and breathing rate, and scratched the results on to a rolling paper cylinder. He then devised an interview-based exam that compared a subject’s physiological response when answering yes or no questions relating to a crime with the subject’s answers to control questions such as “Is your name Jane Doe?” As a proof of concept, he used the test to solve a theft at a women’s dormitory.

Facebook Twitter Pinterest John Larson (right), the inventor of the polygraph lie detector. Photograph: Pictorial Parade/Getty Images

Larson refined his invention over several years with the help of an enterprising young man named Leonarde Keeler, who envisioned applications for the polygraph well beyond law enforcement. After the Wall Street crash of 1929, Keeler offered a version of the machine that was concealed inside an elegant walnut box to large organisations so they could screen employees suspected of theft.

Not long after, the US government became the world’s largest user of the exam. During the “red scare” of the 1950s, thousands of federal employees were subjected to polygraphs designed to root out communists. The US Army, which set up its first polygraph school in 1951, still trains examiners for all the intelligence agencies at the National Center for Credibility Assessment at Fort Jackson in South Carolina.

Companies also embraced the technology. Throughout much of the last century, about a quarter of US corporations ran polygraph exams on employees to test for issues including histories of drug use and theft. McDonald’s used to use the machine on its workers. By the 1980s, there were up to 10,000 trained polygraph examiners in the US, conducting 2m tests a year.

The only problem was that the polygraph did not work. In 2003, the US National Academy of Sciences published a damning report that found evidence on the polygraph’s accuracy across 57 studies was “far from satisfactory”. History is littered with examples of known criminals who evaded detection by cheating the test. Aldrich Ames, a KGB double agent, passed two polygraphs while working for the CIA in the late 1980s and early 90s. With a little training, it is relatively easy to beat the machine. Floyd “Buzz” Fay, who was falsely convicted of murder in 1979 after a failed polygraph exam, became an expert in the test during his two-and-a-half-years in prison, and started coaching other inmates on how to defeat it. After 15 minutes of instruction, 23 of 27 were able to pass. Common “countermeasures”, which work by exaggerating the body’s response to control questions, include thinking about a frightening experience, stepping on a pin hidden in the shoe, or simply clenching the anus.

The upshot is that the polygraph is not and never was an effective lie detector. There is no way for an examiner to know whether a rise in blood pressure is due to fear of getting caught in a lie, or anxiety about being wrongly accused. Different examiners rating the same charts can get contradictory results and there are huge discrepancies in outcome depending on location, race and gender. In one extreme example, an examiner in Washington state failed one in 20 law enforcement job applicants for having sex with animals; he “uncovered” 10 times more bestiality than his colleagues, and twice as much child pornography.

As long ago as 1965, the year Larson died, the US Committee on Government Operations issued a damning verdict on the polygraph. “People have been deceived by a myth that a metal box in the hands of an investigator can detect truth or falsehood,” it concluded. By then, civil rights groups were arguing that the polygraph violated constitutional protections against self-incrimination. In fact, despite the polygraph’s cultural status, in the US, its results are inadmissible in most courts. And in 1988, citing concerns that the polygraph was open to “misuse and abuse”, the US Congress banned its use by employers. Other lie-detectors from the second half of the 20th century fared no better: abandoned Department of Defense projects included the “wiggle chair”, which covertly tracked movement and body temperature during interrogation, and an elaborate system for measuring breathing rate by aiming an infrared laser at the lip through a hole in the wall.

The polygraph remained popular though – not because it was effective, but because people thought it was. “The people who developed the polygraph machine knew that the real power of it was in convincing people that it works,” said Dr Andy Balmer, a sociologist at the University of Manchester who wrote a book called Lie Detection and the Law.

The threat of being outed by the machine was enough to coerce some people into confessions. One examiner in Cincinnati in 1975 left the interrogation room and reportedly watched, bemused, through a two-way mirror as the accused tore 1.8 metres of paper charts off the machine and ate them. (You didn’t even have to have the right machine: in the 1980s, police officers in Detroit extracted confessions by placing a suspect’s hand on a photocopier that spat out sheets of paper with the phrase “He’s Lying!” pre-printed on them.) This was particularly attractive to law enforcement in the US, where it is vastly cheaper to use a machine to get a confession out of someone than it is to take them to trial.

But other people were pushed to admit to crimes they did not commit after the machine wrongly labelled them as lying. The polygraph became a form of psychological torture that wrung false confessions from the vulnerable. Many of these people were then charged, prosecuted and sent to jail – whether by unscrupulous police and prosecutors, or by those who wrongly believed in the polygraph’s power.

Perhaps no one came to understand the coercive potential of his machine better than Larson. Shortly before his death in 1965, he wrote: “Beyond my expectation, through uncontrollable factors, this scientific investigation became for practical purposes a Frankenstein’s monster.”

The search for a truly effective lie detector gained new urgency after the terrorist attacks of 11 September 2001. Several of the hijackers had managed to enter the US after successfully deceiving border agents. Suddenly, intelligence and border services wanted tools that actually worked. A flood of new government funding made lie detection big business again. “Everything changed after 9/11,” writes psychologist Paul Ekman in Telling Lies.

Ekman was one of the beneficiaries of this surge. In the 1970s, he had been filming interviews with psychiatric patients when he noticed a brief flash of despair cross the features of Mary, a 42-year-old suicidal woman, when she lied about feeling better. He spent the next few decades cataloguing how these tiny movements of the face, which he termed “micro-expressions”, can reveal hidden truths.

Ekman’s work was hugely influential with psychologists, and even served as the basis for Lie to Me, a primetime television show that debuted in 2009 with an Ekman-inspired lead played by Tim Roth. But it got its first real-world test in 2006, as part of a raft of new security measures introduced to combat terrorism. That year, Ekman spent a month teaching US immigration officers how to detect deception at passport control by looking for certain micro-expressions. The results are instructive: at least 16 terrorists were permitted to enter the US in the following six years.

Investment in lie-detection technology “goes in waves”, said Dr John Kircher, a University of Utah psychologist who developed a digital scoring system for the polygraph. There were spikes in the early 1980s, the mid-90s and the early 2000s, neatly tracking with Republican administrations and foreign wars. In 2008, under President George W Bush, the US Army spent $700,000 on 94 handheld lie detectors for use in Iraq and Afghanistan. The Preliminary Credibility Assessment Screening System had three sensors that attached to the hand, connected to an off-the-shelf pager which flashed green for truth, red for lies and yellow if it couldn’t decide. It was about as good as a photocopier at detecting deception – and at eliciting the truth.

Some people believe an accurate lie detector would have allowed border patrol to stop the 9/11 hijackers. “These people were already on watch lists,” Larry Farwell, the inventor of brain fingerprinting, told me. “Brain fingerprinting could have provided the evidence we needed to bring the perpetrators to justice before they actually committed the crime.” A similar logic has been applied in the case of European terrorists who returned from receiving training abroad.

As a result, the frontline for much of the new government-funded lie detection technology has been the borders of the US and Europe. In 2014, travellers flying into Bucharest were interrogated by a virtual border agent called Avatar, an on-screen figure in a white shirt with blue eyes, which introduced itself as “the future of passport control”. As well as an e-passport scanner and fingerprint reader, the Avatar unit has a microphone, an infra-red eye-tracking camera and an Xbox Kinect sensor to measure body movement. It is one of the first “multi-modal” lie detectors – one that incorporates a number of different sources of evidence – since the polygraph.

But the “secret sauce”, according to David Mackstaller, who is taking the technology in Avatar to market via a company called Discern Science, is in the software, which uses an algorithm to combine all of these types of data. The machine aims to send a verdict to a human border guard within 45 seconds, who can either wave the traveller through or pull them aside for additional screening. Mackstaller said he is in talks with governments – he wouldn’t say which ones – about installing Avatar permanently after further tests at Nogales in Arizona on the US-Mexico border, and with federal employees at Reagan Airport near Washington DC. Discern Science claims accuracy rates in their preliminary studies – including the one in Bucharest – have been between 83% and 85%.

The Bucharest trials were supported by Frontex, the EU border agency, which is now funding a competing system called iBorderCtrl, with its own virtual border guard. One aspect of iBorderCtrl is based on Silent Talker, a technology that has been in development at Manchester Metropolitan University since the early 2000s. Silent Talker uses an AI model to analyse more than 40 types of microgestures in the face and head; it only needs a camera and an internet connection to function. On a recent visit to the company’s office in central Manchester, I watched video footage of a young man lying about taking money from a box during a mock crime experiment, while in the corner of the screen a dial swung from green, to yellow, to red. In theory, it could be run on a smartphone or used on live television footage, perhaps even during political debates, although co-founder James O’Shea said the company doesn’t want to go down that route – it is targeting law enforcement and insurance.

O’Shea and his colleague Zuhair Bandar claim Silent Talker has an accuracy rate of 75% in studies so far. “We don’t know how it works,” O’Shea said. They stressed the importance of keeping a “human in the loop” when it comes to making decisions based on Silent Talker’s results.

Mackstaller said Avatar’s results will improve as its algorithm learns. He also expects it to perform better in the real world because the penalties for getting caught are much higher, so liars are under more stress. But research shows that the opposite may be true: lab studies tend to overestimate real-world success.

Before these tools are rolled out at scale, clearer evidence is required that they work across different cultures, or with groups of people such as psychopaths, whose non-verbal behaviour may differ from the norm. Much of the research so far has been conducted on white Europeans and Americans. Evidence from other domains, including bail and prison sentencing, suggests that algorithms tend to encode the biases of the societies in which they are created. These effects could be heightened at the border, where some of society’s greatest fears and prejudices play out. What’s more, the black box of an AI model is not conducive to transparent decision making since it cannot explain its reasoning. “We don’t know how it works,” O’Shea said. “The AI system learned how to do it by itself.”

Andy Balmer, the University of Manchester sociologist, fears that technology will be used to reinforce existing biases with a veneer of questionable science – making it harder for individuals from vulnerable groups to challenge decisions. “Most reputable science is clear that lie detection doesn’t work, and yet it persists as a field of study where other things probably would have been abandoned by now,” he said. “That tells us something about what we want from it.”

The truth has only one face, wrote the 16th-century French philosopher Michel de Montaigne, but a lie “has a hundred thousand shapes and no defined limits”. Deception is not a singular phenomenon and, as of yet, we know of no telltale sign of deception that holds true for everyone, in every situation. There is no Pinocchio’s nose. “That’s seen as the holy grail of lie detection,” said Dr Sophie van der Zee, a legal psychologist at Erasmus University in Rotterdam. “So far no one has found it.”

The accuracy rates of 80-90% claimed by the likes of EyeDetect and Avatar sound impressive, but applied at the scale of a border crossing, they would lead to thousands of innocent people being wrongly flagged for every genuine threat it identified. It might also mean that two out of every 10 terrorists easily slips through.

History suggests that such shortcomings will not stop these new tools from being used. After all, the polygraph has been widely debunked, but an estimated 2.5m polygraph exams are still conducted in the US every year. It is a $2.5bn industry. In the UK, the polygraph has been used on sex offenders since 2014, and in January 2019, the government announced plans to use it on domestic abusers on parole. The test “cannot be killed by science because it was not born of science”, writes the historian Ken Alder in his book The Lie Detectors.

New technologies may be harder than the polygraph for unscrupulous examiners to deliberately manipulate, but that does not mean they will be fair. AI-powered lie detectors prey on the tendency of both individuals and governments to put faith in science’s supposedly all-seeing eye. And the closer they get to perfect reliability, or at least the closer they appear to get, the more dangerous they will become, because lie detectors often get aimed at society’s most vulnerable: women in the 1920s, suspected dissidents and homosexuals in the 60s, benefit claimants in the 2000s, asylum seekers and migrants today. “Scientists don’t think much about who is going to use these methods,” said Giorgio Ganis. “I always feel that people should be aware of the implications.”

In an era of fake news and falsehoods, it can be tempting to look for certainty in science. But lie detectors tend to surface at “pressure-cooker points” in politics, when governments lower their requirements for scientific rigour, said Balmer. In this environment, dubious new techniques could “slip neatly into the role the polygraph once played”, Alder predicts.

One day, improvements in artificial intelligence could find a reliable pattern for deception by scouring multiple sources of evidence, or more detailed scanning technologies could discover an unambiguous sign lurking in the brain. In the real world, however, practised falsehoods – the stories we tell ourselves about ourselves, the lies that form the core of our identity – complicate matters. “We have this tremendous capacity to believe our own lies,” Dan Ariely, a renowned behavioural psychologist at Duke University, said. “And once we believe our own lies, of course we don’t provide any signal of wrongdoing.”

How science found a way to help coma patients communicate Read more

In his 1995 science-fiction novel The Truth Machine, James Halperin imagined a world in which someone succeeds in building a perfect lie detector. The invention helps unite the warring nations of the globe into a world government, and accelerates the search for a cancer cure. But evidence from the last hundred years suggests that it probably wouldn’t play out like that in real life. Politicians are hardly queueing up to use new technology on themselves. Terry Mullins, a long-time private polygraph examiner – one of about 30 in the UK – has been trying in vain to get police forces and government departments interested in the EyeDetect technology. “You can’t get the government on board,” he said. “I think they’re all terrified.”

Daniel Langleben, the scientist behind No Lie MRI, told me one of the government agencies he was approached by was not really interested in the accuracy rates of his brain-based lie detector. An fMRI machine cannot be packed into a suitcase or brought into a police interrogation room. The investigator cannot manipulate the test results to apply pressure to an uncooperative suspect. The agency just wanted to know whether it could be used to train agents to beat the polygraph.

“Truth is not really a commodity,” Langleben reflected. “Nobody wants it.”

• Follow the Long Read on Twitter at @gdnlongread, and sign up to the long read weekly email here.