Watch enough spy thrillers, and you'll undoubtedly see someone setting up a bit of equipment that points a laser at a distant window, letting the snoop listen to conversations on the other side of the glass. This isn't something Hollywood made up; high-tech snooping devices of this sort do exist, and they take advantage of the extremely high-precision measurements made possible with lasers in order to measure the subtle vibrations caused by sound waves.

A team of researchers has now shown, however, that you can skip the lasers. All you really need is a consumer-level digital camera and a conveniently located bag of Doritos. A glass of water or a plant would also do.

Good vibrations

Despite the differences in the technology involved, both approaches rely on the same principle: sound travels on waves of higher and lower pressure in the air. When these waves reach a flexible object, they set off small vibrations in the object. If you can detect these vibrations, it's possible to reconstruct the sound. Laser-based systems detect the vibrations by watching for changes in the reflections of the laser light, but researchers wondered whether you could simply observe the object directly, using the ambient light it reflects. (The team involved researchers at MIT, Adobe Research, and Microsoft Research.)

The research team started with a simple test system made from a loudspeaker playing a rising tone, a high-speed camera, and a variety of objects: water, cardboard, a candy wrapper, some metallic foil, and (as a control) a brick. Each of these (even the brick) showed some response at the lowest end of the tonal range, but the other objects, particularly the cardboard and foil, had a response into much higher tonal regions. To observe the changes in ambient light, the camera didn't have to capture the object at high resolution—it was used at 700 x 700 pixels or less—but it did have to be high-speed, capturing as many as 20,000 frames a second.

Processing the images wasn't simple, however. A computer had to perform a weighted average over all the pixels captured, and even a twin 3.5GHz machine with 32GB of RAM took more than two hours to process one capture. Nevertheless, the results were impressive, as the algorithm was able to detect motion on the order of a thousandth of a pixel. This enabled the system to recreate the audio waves emitted by the loudspeaker.

Most of the rest of the paper describing the results involved making things harder on the system, as the researchers shifted to using human voices and moving the camera outside the room. They also showed that pre-testing the vibrating object's response to a tone scale could help them improve their processing.

But perhaps the biggest surprise came when they showed that they didn't actually need a specialized, high-speed camera. It turns out that most consumer-grade equipment doesn't expose its entire sensor at once and instead scans an image across the sensor grid in a line-by-line fashion. Using a consumer video camera, the researchers were able to determine that there's a 16 microsecond delay between each line, with a five millisecond delay between frames. Using this information, they treated each line as a separate exposure and were able to reproduce sound that way.

Overall, it's an impressive bit of computer science, but the authors are up-front about its potential use in surveillance. The biggest limitation right now is that the camera has to be quite nearby; the team didn't test anything beyond four meters. But they also suggest that a powerful zoom lens might allow the system to work at much greater distances.

The researchers have posted a paper describing their work.