The contents of the user’s screen can be gleaned through video or VoIP calls, or voice-operated virtual assistants, like Amazon Alexa.

A stealthy side-channel tactic for digital surveillance has been uncovered, which allows an attacker to “hear” on-screen images.

According to a team of academic researchers from Columbia University, the University of Michigan, University of Pennsylvania and Tel Aviv University, inaudible acoustic noises emanating from within computer screens can be used to detect the content displayed on those screens. This includes the text on the screen of a computer, or website content that a user may have opened on their desktop. It can also be used to monitor users’ input into on-screen virtual keyboards. This can all be detected and recorded by the microphones built into laptops and webcams; the subtle acoustic signals also can be recorded by a smartphone or speaker placed on a desk next to the screen, or from as far as 10 meters away using a parabolic microphone.

All of this can be transmitted to remote parties.

“Users commonly share audio recorded by these microphones, e.g., during voice over IP and videoconference calls,” the researchers explained in their paper, published last week, adding, “their screen content is being conveyed to anyone who receives the audio stream, or even a retroactive recording.”

In addition to the usual motivations for surveillance, employers can also use the tactic to keep tabs on employees. “An attacker can analyze the audio received during video call to infer whether the other side is browsing the web in lieu of watching the video call, and which web site is displayed on their screen,” according to the research.

The Side Channel

According to the paper, dubbed “Synesthesia” after the medical condition in which senses or perceptions are blended, the channel exists because content-dependent acoustic leakage from LCD screens can be picked up by adjacent microphones.

“The sounds are so faint and high-pitched that they are well-nigh inaudible to the human ear, and thus (unlike with mechanical peripherals) users have no reason to suspect that these emanations exist,” according to the paper. “In fact, users often make an effort to place their webcam (and thus, microphone) in close proximity to the screen, in order to maintain eye contact during videoconferences, thereby offering high-quality measurements to would-be attackers.”

The leakage has existed in screens manufactured and sold for at least the past 16 years, old and new models alike, the researchers said.

The paper explains that computer screens display a rectangular matrix of pixels. Each pixel is typically further divided into red, green, and blue sub-pixels; the color of a pixel can be uniquely represented using 24-bit integers.

Meanwhile, the screen’s refresh rate determines how many times per second an image to be displayed is sent to the screen by the computer’s graphics card. The screen then renders the received image.

“Typically, screens are refreshed approximately 30, 60 or 120 times per second, with a refresh rate of approximately 60 Hz being the most common,” the paper explained.

The analysis of acoustic leakage emitted from LCD computer monitors started with distinguishing simple repetitive images displayed on the target monitor.

“For this purpose, we created a simple program that displays patterns of alternating horizontal black and white stripes of equal thickness (in pixels), which we shall refer to as zebras,” the paper noted. “The period of a zebra is the distance, in pixels, between two adjacent black stripes. Finally, we recorded the sound emitted by a Soyo DYLM2086 screen while displaying different such zebras.”

The team found that the change between the displayed zebras causes clear changes in the monitor’s acoustic signature.

“The momentary power draw [caused by refreshing the pixels], induced by the monitor’s digital circuits, varies as a function of the screen content being processed in raster order,” according to the paper. “This in turn affects the electrical load on the power supply components that provide power to the monitor’s digital board, causing them to vibrate and emit sound.”

By analyzing the acoustic changes, it’s possible to recreate what’s being displayed on a screen.

Remote Attack Methods

The researchers found that a remote adversary who is conversing with the target over the Internet, using a VoIP or videoconferencing service would be able to carry out the attack.

“We obtained the screen’s acoustic emanations by recording the audio sent to a remote party during a Hangouts call, captured using the victim’s own microphone (built into a commodity webcam),” the team said.

Recordings were taken in an office environment with some environmental noise and human speech; the camera was placed naturally by the screen. To simulate the attacker, the team used a second PC running Ubuntu 16.04. It set up a Hangouts connection between the attacker and victim, both running Hangouts over a Firefox browser. At the attacker end, all sound output was directed from the soundcard into a loopback device. It was then possible to capture sound originating at the victim end of the call.

The team discovered that first, commodity webcams and microphones can capture leakage; second, natural and expected positioning of cameras can be sufficient; and, the leakage is present (and prominent) even when the audio is transmitted through a Hangouts call.

The paper also explained that the contents of the user’s screen can be gleaned by voice-operated virtual assistants, like Amazon Alexa devices or Google Home devices.

“Once configured, such devices capture audio at all times, including acoustic leakage from nearby screens,” the paper explained. “The [recording] is archived in cloud servers.”

To ascertain if the screen’s visual content is indeed acoustically captured and subsequently uploaded to Google’s and Amazon’s cloud servers, the team placed a Google Home device next to a Dell 2208WFPt screen displaying alternating zebra patterns.

“We then woke the Google Home device up by using its wake phrase (‘Hey Google’) and kept the recording running during the zebra alternations,” it explained. “Finally, we retrieved the recorded audio from Google’s servers.”

A spectrogram representation of the retrieved audio clearly showed the alternation of zebra patterns, which can be used to deduce the content on the screens. The same effort with Amazon Echo devices showed similar results.

The researchers also found that machine-learning can be used to carry out an attack at scale, and also overcome one of the only obstacles for a remote attacker with no physical access to the screen: inter-screen generalization.

“To attain high generalization, the attacker can use multiple screens from the same model or even similar models from the same vendor,” the paper said. “Note that this training phase can be done once, offline, and utilized for multiple attacks. It can also be done retroactively, after recording the victim and using this recording to fingerprint their screen model.”

It should be said that this is not the first time an audio side-channel has been used to extract information, but the researchers pointed out that while “While some physical side channels have been thoroughly explored, the only previous work extracting information from involuntary acoustic leakage of electronic components is …. acoustic cryptanalysis [GST] work, which exploits coil whine from laptop computers, and does not consider acoustic leakage from displays.”