This article is more than 3 years old

Attackers could theoretically leverage Skype to steal a user’s passwords by collecting and analyzing the sound of their keystrokes.

The method of attack developed by researchers at the Sapienza University of Rome, University of Padua, and UC Irvine isn’t like most sound-based keylogging efforts, which generally require that the attacker makes use of other connected devices, maintains close proximity to the target computer, or first infects it with malware.

Instead this attack relies on an actor remotely eavesdropping on the acoustic emanations of a user’s keystrokes and analyzing them in order to reconstruct the target’s input.

Sounds complicated, but as the researchers note in their paper – entitled “Don’t Skype & Type (S&T)! Acoustic Eavesdropping in Voice-Over-IP” – it’s easy enough to do during a Voice over Internet Protocol (VoIP) call like Skype.

“S&T attack transpires as follows: during a VoIP call between the victim and the attacker, the former types something on target-device, e.g., a password, that we refer to as targettext. Typing target-text causes acoustic emanations from targetdevice’s keyboard, which are then picked up by the targetdevice’s microphone and transmitted to the attacker by VoIP. The goal of the attacker is to learn the target-text by taking advantage of these emanations.”

For the attack to work, both the attacker and victim need to have an uncompromised device connected to each other via a Skype call.

Now this could go one of several ways. The attacker could have a complete profile of their victim, that is, recordings of keystrokes the user has typed in as well as the plaintext script of that input. In that scenario, it would be relatively easy for an attacker to extract the acoustic emanations, segment the data, identify the wave forms, and classify the keys according to that data.

Indeed, with a complete profile, researchers found attackers could accurately guess a key with a 91.7 percent rate of accuracy.

But not all attacks are that easy. Sometimes the attacker might not have any information about the user and might need to collect acoustic emanations from them with the help of an accomplice. Other times, they might need to rely on a database of other users typing on the same target device.

In that latter case specifically, the accuracy rate drops down to 41.87 percent. Not bad for a complete lack of data about the user.

It’s important to note there a few limitations that raise questions about this attack method’s real-world applicability. These are as follows:

An attacker and victim must connect to a call via Skype. (Let’s hope users are only connecting with people whom they trust on VoIP sessions.) An actor must identify what type of device the target is using because each key produces a different sound on a different laptop. If the attacker doesn’t have information about their target, they must rely on a database of other users typing on the same type of device, a resource which might be difficult to procure. The attack assumes the user doesn’t speak loudly so as to interfere with the keystroke’s acoustic emanations.

But where there’s a will, there’s a way. That’s why users should never type out sensitive information like passwords when they’re on a Skype call.

Simple and easy prevention at its best!

Found this article interesting? Follow Graham Cluley on Twitter to read more of the exclusive content we post.