In the last decade, the US government has made a big investment in facial recognition technology. The Department of Homeland Security paid out hundreds of millions of dollars in grants to state and local governments to build facial recognition databases—pulling photos from drivers' licenses and other identification to create a massive library of residents, all in the name of anti-terrorism. In New York, the Port Authority is installing a "defense grade" computer-driven surveillance system around the World Trade Center site to automatically catch potential terrorists through a network of hundreds of digital eyes.

But then an act of terror happened in Boston on April 15. Alleged perpetrators Dzhokhar and Tamerlan Tsarnaev were both in the database. Despite having an array of photos of the suspects, the system couldn't come up with a match. Or at least it didn't come up with one before the Tsarnaev brothers had been identified by other means.

For people who understand how facial recognition works, this comes as no surprise. Despite advances in the technology, systems are only as good as the data they're given to work with. Real life isn't like anything you may have seen on NCIS or Hawaii Five-0. Simply put, facial recognition isn't an instantaneous, magical process. Video from a gas station surveillance camera or a police CCTV camera on some lamppost cannot suddenly be turned into a high-resolution image of a suspect's face that can then be thrown against a drivers' license photo database to spit out an instant match.

Not yet. Facial recognition technology has gotten a lot better in the past decade, and the addition of other biometric technologies to facial recognition is making it increasingly accurate. Facial recognition and other biometric and image processing technologies, such as gait recognition, helped law enforcement find the suspects in the rush of people around Copley Place that day with the help of retailers' own computerized surveillance systems.

The fact is that it's much more likely for a bank or department store to know who you are when you walk past a camera than for law enforcement to make an ID based on video footage. That's because you give retailers a lot more information to work with—and the systems they use are arguably better suited to keeping track of you than most police surveillance systems.

Three steps to (sometimes) finding the perfect match

Under the best circumstances, facial recognition can be extremely accurate, returning the right person as a potential match more than 99 percent of the time with ideal conditions. But to get that level of accuracy almost always requires some skilled guidance from humans, plus some up-front work to get a good image. Depending on the type of facial recognition system, finding the right match usually requires three stages of processing.

Face detection and enhancement

The software looks for patterns in the image that match models in its algorithms for faces. A simpler form of this technology is used in consumer cameras, in photo apps for mobile devices, and in entities like iPhoto or Facebook.

In some circumstances, even detecting a face within an image can be difficult for software without human guidance. Lighting, camera angle, and facial expression can all muddle the process. A photo will often be taken from an angle that requires investigators to do preprocessing. "Typically, you'll do some preprocessing of the image," said Brian Martin, director of Biometric Research for facial recognition system provider MorphoTrust USA. "You can try to get rid of blur or the interlacing artifacts from older cameras. Some people use Photoshop to clean up the image; our company has what we call ABIS Face Examiner Workstation, which is face-specific tools to clean up an image. You can take a non-frontal looking face and physically model it as a three-dimensional image, then rotate it toward the camera and re-render a new face. So you do this sort of cleanup of the image and then submit it to the database."

If an image is too low-resolution, sometimes multiple images can be combined to create a higher-resolution composite. Lower resolution images may still work, but the results are more likely to misidentify the person—or miss him or her completely.

"Hollywood does a pretty good job of creating a myth that you could extract a better image by enhancing and zooming where information wasn't captured," said Masayuki Karahashi, senior vice president of engineering for surveillance and video analysis technology firm 3VR. "You're not going to create more information out of nothing.

Feature registration and extraction

Next, the software tries to identify common facial features to use as reference points to extract a "faceprint"—the centers of the eyes, tip of nose, and corners of the mouth are common features used for this. Again, depending on the quality of the image, a human may have to help the software with this, marking the location of reference points to help the software along.

With the reference points set, the software then adjusts the image to "normalize" it against the images in its database—making sure the face is scaled to the same size and removing other elements of the photo that might reduce the likelihood of a match. Then it runs calculations on the image to generate a faceprint. This is a binary value based on a mathematical representation of the patterns in the face.

There are several approaches to creating a faceprint. Some systems use algorithms that measure the distance between sets of features in the normalized image, while others detect contours and "facial boundaries."

Feature extraction is "the classic way" to gather data for facial recognition, according to Parham Aarabi, a professor of computer science at the University of Toronto and CEO of facial software firm ModiFace. “Another way is to do a direct match," he noted. This technique involves using the facial image itself as the basis of comparison rather than using an algorithmic representation. "A lot of the more recent work in facial recognition has been in direct face-to-face matching," Aarabi said. Other systems use multiple images of an individual to "learn" their facial characteristics to build a model, much like the Faces feature in Apple's iPhoto.

But in all of these approaches, the more detailed a source image is, the better. More data to base the faceprint on means a higher likelihood of success in the next steps—matching and classification.

Matching and classification

The feature-based faceprint of a subject can be used in a number of ways, depending on the facial recognition application. Some systems perform additional indexing based on the images to classify the subject for narrowing searches, processing the faceprint with algorithms that can estimate the age and gender of the subject. Other characteristics, such as skin tone and facial features, can be used to help index the image as well, allowing for searches to be narrowed by race, estimated weight, or hair color.

Classification can also be used with what Martin called "short-term biometrics"—things such as gait recognition, or clothing, or other identifying features (such as a black backpack). These all can help locate a subject within a set of images or video streams. This approach was used to find the Tsarnaev brothers in surveillance video and other images collected from multiple sources by law enforcement. Video analysis showed Dzhokhar walking quickly and calmly away from the site of the second bomb as the first exploded; characteristics such as the brothers' ball caps and backpacks were used to quickly identify the suspects by retailers. These businesses had surveillance systems from vendors such as 3VR that could recognize relevant footage in their systems to provide to law enforcement.

"The fact that they were able to start looking for a person with a white baseball cap, a black bag—they were able to use those as variables to pull up videos," said Masayuki Karahashi, 3VR's senior vice president of engineering. Several 3VR customers were able to automatically pull results from their systems to provide to law enforcement from terabytes of video footage from the day.

Finding the actual identity of someone in an image still requires a match against a facial database. In a facial recognition search, the binary faceprint of the subject is checked against those of a collection of "candidate" images. The bigger the pool of "candidates," the longer it takes to find a match—and the larger the pool of possible matches will likely be.

Performing matching, like everything else in facial recognition, requires significant computation resources. "Given how fast computers have become, it's not that much of an issue," said Aarabi. "If you narrow down a database to 10 million potential matches, that can be done in a reasonably short amount of time, so matching is not really a bottleneck anymore."

According to some National Institute of Standards and Technology benchmarks performed in 2010 (PDF), "Using the most accurate face recognition algorithm, the chance of identifying the unknown subject (at rank 1) in a database of 1.6 million criminal records is about 92 percent." But the study found that for larger data sets, such as the FBI's 12 million image database, the accuracy of searches rapidly degrades. "For other population sizes, this accuracy rate decreases linearly with the logarithm of the population size. In all cases a secondary (human) adjudication process will be necessary to verify that the top-rank hit is indeed that hypothesized by the system," the authors of the study wrote.

Under ideal conditions, a facial recognition scan can at least come close to how such things play out in the movies. And even though facial recognition requires significant computing power to pull off, cloud computing and improved graphics processing are making it a lot easier to deploy—even to consumer devices. In testimony before the Senate Judiciary Committee last July, MorphoTrust's Martin told senators, "The technology is currently at a state where these face recognition algorithms can be deployed in anything from cell phones to large multiserver search engines capable of searching over 100 million faces in just a few seconds with operational accuracy."