In 2009, 22-year-old student Nicholas George was going through a checkpoint at Philadelphia International Airport when Transportation Security Administration agents pulled him aside. A search of his luggage turned up flashcards with English and Arabic words. George was handcuffed, detained for hours, and questioned by the FBI.

George had been singled out by behavior-detection officers—people trained in picking out gestures and facial expressions that supposedly betrayed malicious intentions—as part of a US program called Screening Passengers by Observation Technique or SPOT. But the officers were wrong in singling him out, and George was released without charge the same day.

As the incident may suggest, SPOT produced very little useful information throughout its decade-long history. And in light of the technique's failures, some computer scientists have recently concluded a machine could do a better job with this task than humans. But the machine techniques they intend to use share a surprising history with SPOT’s training procedures. In fact, both can be traced back to the same man—Paul Ekman, now an emeritus professor of psychology at the University of California.

The Rise of FACS

Ekman and his fellow psychologist, Wallace V. Friesen, were originally interested in whether nonverbal clues could betray a liar. In 1969, they published a paper titled Nonverbal Leakage and Clues to Deception that considered whether liars involuntarily communicated their deception. The face, they concluded, was equipped to “lie the most and betray the most.”

Facial cues haven’t been the only window into deception over the years. Alternative ideas for lie detection include the polygraph, a device measuring things like heart rate, skin conductivity, and capillary dilation. There are also EEG or fMRI brain scans, eye-tracking, and voice analysis. But analyzing facial expressions seemed the least invasive and didn’t need expensive equipment. So, both psychologists decided to give the human face a much closer examination.

Back then, most facial-expressions researchers simply showed some pictures to subjects and asked for their interpretation. Ekman suspected this measured what people thought about the face itself, not the content of said face. Instead, he and Friesen went about building a Facial Action Coding System that could distinguish all possible facial movements without the bias of human observers. They derived FACS from the anatomy of the face itself.

They spent a year learning to fire their facial muscles separately. When necessary, they used needles and electrical currents to stimulate a muscle. They photographed their faces with each muscle action and used that to describe a set of 28 original action units—basic building blocks of human facial expressions. (AU 1 stood for inner brow raiser, AU 2 for outer brow raiser, and so on.)

While Ekman and Friesen meant their system for use by humans, FACS eventually proved to be a dream come true for the artificial intelligence community. With pictures linked to concise labels and explanations, it was a perfect training database for facial recognition algorithms. As far as making sense of human faces in computer science was concerned, FACS, along with its 500-page manual written by Ekman himself, became the bible.

Making sense of the face

Generally, all facial and affect recognition algorithms follow three basic steps. They begin with finding a face in an input picture or video. Perhaps the most popular today is a method proposed by Paul Viola and Michael Jones in 2001. An algorithm searches the input picture for groups of pixels arranged in patterns indicative of a human face—eyes are usually darker than the cheeks, nose bridge is usually brighter than the eyes, and so on. If the patterns are in the right orientation, the Viola-Jones algorithm labels a face.

Facial recognition software next moves on to feature extraction. One approach relies on discerning geometric features like eyes and the line of the mouth and calculating relative distances and angles between them. Another approach is appearance-based feature extraction. This relies on a large training set; whenever a new picture is thrown at the algorithm, it calculates the distance between the newcomer and each of the pictures it has been trained on. Both techniques end up with a feature vector, a numerical value corresponding to the way we look, along with pretty much every silly face we can come up with.

This can be used for identification, where the software decides if a face under consideration belongs to a given person. But it can also be used for recognizing expressions based on which combination of FACS action units are present. Software is now quite good at this job. FACET, one of the best commercially available affect-recognition software systems, scores way above 80 percent in recognizing emotional expressions.

Still, some real-world scenarios can prove challenging for AI. A team of researchers at the University of Notre Dame tried to use FACET to identify children’s boredom, confusion, and delight in classrooms, but the software could hardly identify a face, much less emotions. The software’s performance depends heavily on the material it has been trained on. And most training databases usually consist of posed expressions where subjects were asked to stand motionless for a while. Kids simply failed to sit still.

If AI struggles to pick emotions that are there for everyone to see, recognizing emotions we deliberately try to hide seems like an impossible challenge. So why do experts still believe? How can a machine catch a sudden expression of surprise, disgust, fear, or anger that, for a fraction of a second, betrays a terrorist?

Signs of deception

There’s a world of apps for affect recognition nowadays. Affectiva, an MIT spin-off founded by Rosalind Pickard, offers AI for recognizing emotions to entertainment and advertising industries. Research on emotion is underway at Facebook; Apple bought Emotient, an affect-recognition startup, a year or so ago.

But nearly all of this software is designed to catch basic facial expressions that usually indicate one of the six basic emotions: anger, disgust, fear, happiness, sadness, and surprise. There’s no expression for deception.

Ekman and Friesen, however, long ago claimed they identified the secret to picking this up: micro-expressions.

In 1969, both psychologists were struggling with a particularly tricky patient. At first glance, she appeared normal, sometimes even cheerful, yet she had made numerous suicide attempts. After long hours of watching the video footage of her counseling session frame by frame, they finally saw what they’d expected. Somewhere in between her smiles, Friesen and Ekman caught a fleeting look of anguish. It lasted two frames.

After researching this phenomenon more closely, they concluded micro-expressions were involuntary expressions of a person’s internal state. They lasted just 1/5 to 1/25 of a second before being suppressed.

Ekman found that untrained observers scored only slightly better than chance in recognizing micro-expressions. So he developed the Micro Expression Training Tool, a program intended to train law enforcement officers in catching micro-expressions during interrogations. Trained interrogators could achieve 70 percent accuracy in spotting, and 80 percent in interpreting, micro-expressions. Some among the computer vision community began to believe a carefully designed artificial intelligence algorithm would do better.

Listing image by National Institutes of Health