Science! Forget subjective screening, which too often slides into racial and ethnic profiling; instead, evaluate travelers entering an airport using a rigorous set of objective measurements that could spot deception. This was the admirable principle behind the Transportation Security Administration (TSA) program known as Screening of Passengers by Observation Techniques (SPOT), which has been operating at airports around the country since 2007 at a total cost of $900 million—or about $200 million a year.

Unfortunately, according to the US government's internal watchdog agency, little real science stands behind the program. In a new report (PDF) released today, the Government Accountability Office (GAO) concluded that "the human ability to accurately identify deceptive behavior based on behavioral indicators is the same as or slightly better than chance." And it dryly noted that programs like SPOT should be "demonstrated to work reliably in their intended environment prior to program deployment."

94 problems

SPOT relies on a network of 3,000 behavior detection officers (BDOs) deployed at 176 airports around the country. BDOs observe passengers waiting to cross security checkpoints into the "sterile" section of an airport. They are trained to observe 94 different signs of stress, fear, and deception, with the goal of calculating a "point total" for an observed individual in less than 30 seconds. The 94 signs remain a secret, but we do know that anyone displaying enough of them is referred for a patdown and secondary screening, during which officers will engage in "casual conversation" to determine whether the traveler poses a potential threat. (The secondary screenings take an average of 13 minutes.) If so, law enforcement officers such as police or FBI agents are brought in to deal with the situation and potentially make an arrest.

In 2008, the official TSA blog explained the program:

The program was designed by Paul Ekman (PhD), a psychology professor at the University of California Medical School, San Francisco. He’s been studying behavioral analysis for the past 40 years and has taught the TSA, Customs and Border Protection, CIA, FBI and other federal agencies to watch for suspicious facial expressions of tension, fear or deception. He has even taught animators at Disney-Pixar to create convincing faces for film characters. After passing along his skills to US Customs, their "hit rate" for finding drugs during passenger searches rose to 22.5 percent from 4.2 percent in 1998. Behavior analysis is based on the fear of being discovered. People who are trying to get away with something display signs of stress through involuntary physical and physiological behaviors. Whether someone’s trying to sneak through that excellent stone ground mustard they bought on vacation, a knife, or a bomb, behavior detection officers like me are trained to spot certain suspicious behaviors out of the crowd. Once we make our determination, we refer these passengers for additional screening or directly to law enforcement.

It sounds pretty science-y, but it turns out that, in practice, BDOs across the country are referring passengers for secondary screenings at very different rates. For a program based on "objective" biometric measurements of deception, this is not the result one would hope to see. (Even the TSA admitted to GAO auditors that some of the observations were "subjective"; it is trying to rein these in.) And Ekman, who helped set up the program, told GAO three years ago that no one knew "how many BDOs are required to observe a given number of passengers moving at a given rate per day in an airport environment, or the length of time that such observation can be conducted before observation fatigue affects the effectiveness of the personnel."

For the report, GAO auditors looked at the outside scientific literature, speaking to behavioral researchers and examining meta-analyses of 400 separate academic studies on unmasking liars. That literature suggests that "the ability of human observers to accurately identify deceptive behavior based on behavioral cues or indicators is the same as or slightly better than chance (54 percent)." That result holds whether or not the observer is a member of law enforcement.

It turns out that all of those signs you instinctively "know" to indicate deception usually don't. Lack of eye contact for instance simply does not correlate with deception when examined in empirical studies. Nor do increases in body movements such as tapping fingers or toes; the literature shows that people's movements actually decrease when lying. A 2008 study for the Department of Defense found that "no compelling evidence exists to support remote observation of physiological signals that may indicate fear or nervousness in an operational scenario by human observers."

Despite the academic literature, the TSA actually began testing the SPOT program in 2003—not with an eye toward finding out if it worked, but with an eye toward seeing if it was practical to run in a major airport. In 2007, the program went live and travelers underwent screening. Once the program was set up in 2007, the TSA did hire an outside consultant to evaluate the system's effectiveness. The resulting study, published in 2011, found some effectiveness in using the SPOT criteria. Due to various weaknesses in the study design and implementation, however, GAO doesn't dub it a reliable guide to evaluating SPOT.

But even if it works, in practice SPOT isn't stopping terrorists—it has largely led to arrests for drug crime, immigration violations, and outstanding warrants. Because the signs of deception and stress tell you nothing about the underlying activity meant to be concealed, an effective SPOT program would simply become a general dragnet focused on air travelers. In 2008, the TSA argued that this wasn't a problem:

Some will say that it shouldn’t be TSA’s job to look for drugs, or money - our job is airport security. But when we spot someone behaving suspiciously, we don’t know what they have; all we know is they’re behaving in a way that says they might pose a threat. In many cases, we find things that might have otherwise gotten through security (money, drugs) and that’s a good sign because it could just as easily been plastic or liquid explosives. The behaviors these drug and currency smugglers exhibit are the same behaviors we expect a terrorist to exhibit.

But of course, according to the GAO, SPOT may not even work—we simply don't know.

The GAO's conclusion from all this is damning. "Ten years after the development of the SPOT program, TSA cannot demonstrate the effectiveness of its behavior detection activities," it wrote. "Until TSA can provide scientifically validated evidence demonstrating that behavioral indicators can be used to identify passengers who may pose threat to aviation security, the agency risks funding activities that have not been determined to be effective." The title of the report sums up the GAO recommendation: "TSA Should Limit Future Funding for Behavior Detection Activities."

For its part, the TSA insists the program works, and it is currently running more studies to evaluate effectiveness.