By scanning your face, computers can decode your unspoken reaction to a movie, a political debate, even a video call with a friend. Illustration by Bryan Christie

Three years ago, archivists at A.T. & T. stumbled upon a rare fragment of computer history: a short film that Jim Henson produced for Ma Bell, in 1963. Henson had been hired to make the film for a conference that the company was convening to showcase its strengths in machine-to-machine communication. Told to devise a faux robot that believed it functioned better than a person, he came up with a cocky, boxy, jittery, bleeping Muppet on wheels. “This is computer H14,” it proclaims as the film begins. “Data program readout: number fourteen ninety-two per cent H2SOSO.” (Robots of that era always seemed obligated to initiate speech with senseless jargon.) “Begin subject: Man and the Machine,” it continues. “The machine possesses supreme intelligence, a faultless memory, and a beautiful soul.” A blast of exhaust from one of its ports vaporizes a passing bird. “Correction,” it says. “The machine does not have a soul. It has no bothersome emotions. While mere mortals wallow in a sea of emotionalism, the machine is busy digesting vast oceans of information in a single all-encompassing gulp.” H14 then takes such a gulp, which proves overwhelming. Ticking and whirring, it begs for a human mechanic; seconds later, it explodes.

The film, titled “Robot,” captures the aspirations that computer scientists held half a century ago (to build boxes of flawless logic), as well as the social anxieties that people felt about those aspirations (that such machines, by design or by accident, posed a threat). Henson’s film offered something else, too: a critique—echoed on television and in novels but dismissed by computer engineers—that, no matter a system’s capacity for errorless calculation, it will remain inflexible and fundamentally unintelligent until the people who design it consider emotions less bothersome. H14, like all computers in the real world, was an imbecile.

Today, machines seem to get better every day at digesting vast gulps of information—and they remain as emotionally inert as ever. But since the nineteen-nineties a small number of researchers have been working to give computers the capacity to read our feelings and react, in ways that have come to seem startlingly human. Experts on the voice have trained computers to identify deep patterns in vocal pitch, rhythm, and intensity; their software can scan a conversation between a woman and a child and determine if the woman is a mother, whether she is looking the child in the eye, whether she is angry or frustrated or joyful. Other machines can measure sentiment by assessing the arrangement of our words, or by reading our gestures. Still others can do so from facial expressions.

Our faces are organs of emotional communication; by some estimates, we transmit more data with our expressions than with what we say, and a few pioneers dedicated to decoding this information have made tremendous progress. Perhaps the most successful is an Egyptian scientist living near Boston, Rana el Kaliouby. Her company, Affectiva, formed in 2009, has been ranked by the business press as one of the country’s fastest-growing startups, and Kaliouby, thirty-six, has been called a “rock star.” There is good money in emotionally responsive machines, it turns out. For Kaliouby, this is no surprise: soon, she is certain, they will be ubiquitous.

Affectiva is situated in an office park behind a strip mall on a two-lane road in Waltham, Massachusetts, part of a corridor that serves as Boston’s answer to Silicon Valley. The headquarters have the trappings of a West Coast startup—pool table, beanbag chairs—but the sensibility is New England; many of the employees are from M.I.T. From a conference room, the Amtrak line to Boston is visible beyond a large parking lot.

When I visited in September, Kaliouby walked me past charts of facial expressions, some of them scientific diagrams, some borrowed from comics. Kaliouby has a Ph.D. in computer science, and, like many accomplished coders, she has no trouble with mathematical concepts like Bayesian probability and hidden Markov models. But she is also at ease among people: emotive, warm, even flirtatious. She is a practicing Muslim, and until two years ago she wore a head scarf, which had the effect of drawing the eye to her rounded, expressive features. Frank Moss, a former director of M.I.T.’s Media Lab, where she held a postdoctoral position, told me that she has a high “emotional intelligence.” As a mother of two, she worries about technology’s effects on her children.

Affectiva is the most visible among a host of competing boutique startups: Emotient, Realeyes, Sension. After Kaliouby and I sat down, she told me, “I think that, ten years down the line, we won’t remember what it was like when we couldn’t just frown at our device, and our device would say, ‘Oh, you didn’t like that, did you?’ ” She took out an iPad containing a version of Affdex, her company’s signature software, which was simplified to track just four emotional “classifiers”: happy, confused, surprised, and disgusted. The software scans for a face; if there are multiple faces, it isolates each one. It then identifies the face’s main regions—mouth, nose, eyes, eyebrows—and it ascribes points to each, rendering the features in simple geometries. When I looked at myself in the live feed on her iPad, my face was covered in green dots. “We call them deformable and non-deformable points,” she said. “Your lip corners will move all over the place—you can smile, you can smirk—so these points are not very helpful in stabilizing the face. Whereas these points, like this at the tip of your nose, don’t go anywhere.” Serving as anchors, the non-deformable points help judge how far other points move.

Affdex also scans for the shifting texture of skin—the distribution of wrinkles around an eye, or the furrow of a brow—and combines that information with the deformable points to build detailed models of the face as it reacts. The algorithm identifies an emotional expression by comparing it with countless others that it has previously analyzed. “If you smile, for example, it recognizes that you are smiling in real time,” Kaliouby told me. I smiled, and a green bar at the bottom of the screen shot up, indicating the program’s increasing confidence that it had identified the correct expression. “Try looking confused,” she said, and I did. The bar for confusion spiked. “There you go,” she said.

“Have you been reading a lot of exhaustive biographies lately?” Facebook

Twitter

Email

Shopping

Like every company in this field, Affectiva relies on the work of Paul Ekman, a research psychologist who, beginning in the sixties, built a convincing body of evidence that there are at least six universal human emotions, expressed by everyone’s face identically, regardless of gender, age, or cultural upbringing. Ekman worked to decode these expressions, breaking them down into combinations of forty-six individual movements, called “action units.” From this work, he compiled the Facial Action Coding System, or facs—a five-hundred-page taxonomy of facial movements. It has been in use for decades by academics and professionals, from computer animators to police officers interested in the subtleties of deception.

Ekman has had critics, among them social scientists who argue that context plays a far greater role in reading emotions than his theory allows. But context-blind computers appear to support his conclusions. By scanning facial action units, computers can now outperform most people in distinguishing social smiles from those triggered by spontaneous joy, and in differentiating between faked pain and genuine pain. They can determine if a patient is depressed. Operating with unflagging attention, they can register expressions so fleeting that they are unknown even to the person making them. Marian Bartlett, a researcher at the University of California, San Diego, and the lead scientist at Emotient, once ran footage of her family watching TV through her software. During a moment of slapstick violence, her daughter, for a single frame, exhibited ferocious anger, which faded into surprise, then laughter. Her daughter was unaware of the moment of displeasure—but the computer had noticed. Recently, in a peer-reviewed study, Bartlett’s colleagues demonstrated that computers scanning for “micro-expressions” could predict when people would turn down a financial offer: a flash of disgust indicated that the offer was considered unfair, and a flash of anger prefigured the rejection.