A couple years ago, Apple went on a shopping spree. It snatched up PrimeSense, maker of some of the best 3-D sensors on the market, as well Perceptio, Metaio, and Faceshift, companies that developed image recognition, augmented reality, and motion capture technology, respectively.

It’s not unusual for Cupertino to buy other companies’ technology in order to bolster its own. But at the time, it was hard to know exactly what Apple planned to do with its haul. It wasn’t until last month, at the company’s annual talent show, that the culmination of years of acquisitions and research began to make sense: Apple was building the iPhone X.

Perhaps the most important feature in the new flagship phone is its face-tracking technology, which allows you to unlock the phone with your face or to lend your expressions to a dozen or so emoji with Animoji. Apple thinks the iPhone X represents the future of mobile tech, and for many, that’s true. But if you trace most of consumer technology’s most impressive accomplishments back to their origins, more often than not, it’ll lead you to a drab research lab full of graduate students. In the case of Animoji, that research happened to have taken place nearly a decade ago at a pair of Europe’s most prestigious technical schools.

Set in Motion

In the mid 2000s, motion capture was still a laborious process. Creating the nuanced expressions for the characters in Avatar, for example, required the actors to wear painted dots on their face and attach plastic balls to their bodies. These dots, called markers, allow optical systems to track and measure face and body movements in order to construct approximations of how they changed. "Markers help because they simplify the computation of correspondences," says Mark Pauly, a co-founder of Faceshift and head of the Computer Graphics and Geometry Laboratory at EPFL, a school in Lausanne, Switzerland.

Marker technology worked well, but it required significant overhead—a studio, motion capture suits, and of course actors willing to wear all those dots. “Whatever you wanted to create took a lot of money and time,” says Hao Li, director of USC’s Vision and Graphics Lab, who was getting his PhD in Pauly's lab at the time. “We wanted to make it easier.” So Pauly and Li, along with fellow researchers including Thibaut Weise, Brian Amberg, and Sofien Bouaziz (all now at Apple), began exploring how to replace markers and mo-cap suits with algorithms that could track facial expressions using footage captured by a depth-sensing camera. Their goal? To create dynamic digital avatars that could mimic human expression in real time.

There was an issue, though: Algorithmic facial tracking is notoriously difficult pull off. Li calls the human face "one of the holy grails in computer graphics" because it's so difficult to work on. Unlike a static object, the face is constantly deforming; there are no simple rules for a computer to follow.

For a machine to understand facial movement, it needs to understand the many ways a face can look. “The algorithms have to be robust to various lighting changes, occlusions, different extreme head rotations, and standard variations in face appearance across races and different ages,” says Dino Paic, director of sales and marketing at Visage Technologies, a company whose face-tracking software is used by auto and financial clients.