The National Security Agency has collected a vast number of digital photos from Internet traffic and the internal networks of foreign governments in order to identify and track persons of interest, according to a report by The New York Times. The images, reportedly extracted from Internet traffic such as e-mail messages and from video conferencing streams, have been used as part of the NSA’s “Identity Intelligence” (I2) program to “track, exploit, and identify targets of interest,” according to a 2011 NSA presentation slide.

According to the documents cited by the Times, the agency began performing facial recognition searches using captured images in 2010, matching photos in Pinwale (the NSA’s longterm store of captured content from external sources) and a terrorist watch list database called Tide. By 2011, the NSA was capturing millions of images daily—and of those, about 55,000 images are “facial recognition quality.” And the NSA expanded its collection and cross-referencing of facial images, pulling from CIA and State Department data from the border crossing stations of a number of countries, as well as airline passenger data and foreign national identity card databases. According to the NSA documents, in 2011 the agency was trying to gain access to the national identity card databases of Saudi Arabia, Pakistan, and Iran.

All of these sources can be used to help identify images mined with Wellspring, the NSA’s program that extracts images from Internet communications and calls out those that appear to be passport images or other ID photos. Indexes for images have been built using a combination of internally developed facial recognition software and technology from Pittsburgh Pattern Recognition (PittPatt)—a firm acquired by Google in 2011. Other software allows the NSA to match the details of outdoor photos with satellite and aerial imagery to pinpoint where the photos were taken.

The fishing’s getting (a little) harder

The NSA’s collection efforts here are focused on individuals and organizations that have been specifically “tasked” or on images collected from databases that have a high value as facial recognition targets. Collecting those images requires a lot more than just skimming Internet traffic.

When the NSA began its facial recognition efforts, the technology of facial recognition was still experiencing growing pains, but capturing images from Internet traffic was fairly easy, thanks to the lack of protection from encryption on many Internet services—and the NSA’s active efforts to gather data from within the networks of those services that did offer encryption for Web clients.

Of the major Web mail providers, only Google was providing SSL encryption at the beginning of 2010. Microsoft added SSL encryption to Hotmail in November of 2010. But SSL wasn’t even an option for Yahoo mail until early in 2013—and Yahoo didn’t turn it on by default until October of 2013.

As part of a collaborative project with National Public Radio and penetration testing provider Pwnie Express—the results of which we’ll publish later this week—we took a look at what sort of content could be fished out of Internet traffic from major Web and mobile services today. Yahoo, Google, Microsoft, Apple, and Facebook now all encrypt images and other content from servers to Web browsers—though there are some exceptions in the mobile realm.

Facebook, for example, encrypts all the images transmitted from its content delivery network to users’ browsers, though the images can still be reached through an unencrypted interface. During our testing, Pwnie Express founder and CTO Dave Porcello found that on an Android 4.1.1 “Jelly Bean” device—admittedly an older phone, but still in wide use—Facebook profile pictures and images were transmitted unencrypted to the Facebook app. Our tests on newer platforms found that the images were encrypted.

However, the NSA doesn't necessarily have to pull images from raw Internet traffic to build its database—particularly for individuals outside the US who are the subject of a Foreign Intelligence Surveillance Act (FISA) Court warrant. The NSA, through the FBI, could simply order service providers to hand over images associated with specific accounts. And a few overseas Web mail providers don't use encryption by default yet, leaving their services exposed to the passive capture of contents.

On the other side of the problem is facial recognition technology itself. The Times’ report indicates that, while the NSA had some successes with facial recognition early, the technology suffered from a high “false positive” rate. A 2011 presentation viewed by the Times showed that for a query using an image of Osama Bin Laden, the NSA’s system returned photos of four men who shared only one obvious facial characteristic with the Al Qaeda leader—a beard.

The problems with facial recognition technology based on ID card databases were demonstrated by the manhunt for the Boston Marathon bombers. While the technology has advanced recently, much of the recognition capability is dependent on the angle and quality of the two images. On many current systems, even matching photos taken from the same angle can be made more difficult by variations in lighting and resolution.

But faces aren’t the only piece of data that the NSA works with in correlating images with individuals. There’s a great deal of contextual data that can be collected along with images—especially those pulled from the databases of foreign governments and from e-mail—that can be used to identify individuals in them and help to build a library of related images that could generate more accurate results.