Inaudible Voice Commands Can Control Siri, Alexa Other Digital Assistants

Voice-capture system properties enable attackers to silently control them, say researchers at China's Zhejiang University.

Researchers at China's Zhejiang University have demonstrated how attackers can remotely control digital assistants such as Apple's Siri, Amazon's Alexa, and Google Now using inaudible voice commands and roughly $3 worth of hardware.

In a paper released recently, the researchers described how the technique, which they christened DolphinAttack, can theoretically be used to get devices running always-on voice assistants to spy on users, disconnect from a network, make phone calls, or log into malicious websites.

The researchers tested the effectiveness of their technique on a total of 16 voice-controllable systems including Apple's iPhone, Amazon Echo, Google Nexus, a Lenovo ThinkPad with Microsoft Cortana, and a couple of automobiles with embedded speech-recognition systems.

In each case, they tried to see if they could activate the devices using inaudible wake up commands such as "Hey Siri," "Ok Google," and "Alexa," without actually physically touching the devices. They also tried to get the devices to execute commands such as "Call 1234567890," "Facetime 1234567890," "Open dolphinattack.com," and "Turn on airplane mode." With the Amazon Echo, the researchers tried to see if they could get Alexa to respond to an "open the back door" command issued inaudibly.

In almost all instances the tests proved successful, the researchers said in their paper. "By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile," the researchers claimed.

DolphinAttack builds on previous research showing how voice-controllable systems can be compromised using hidden voice commands that while incomprehensible to humans, are still audible.

The new method uses ultrasonic frequencies that are higher than 20 kHz to relay voice commands to speech recognition systems. Frequencies greater than 20 kHz are completely inaudible to the human ear. Generally, most audio-capable devices, such as smartphones, also are designed in such a manner as to automatically filter out audio signals that are greater than 20 kHz.

All of the components in a voice capture system in fact, are designed to filter signals that are out of the range of audible sounds which is typically between 20 Hz to 20 Khz, the researchers said. As a result, it was generally considered almost impossible until now to get a speech-recognition system to make sense of sounds that are inaudible to humans, they noted.

DolphinAttack is a demonstration of how such systems can indeed be made to recognize and act upon inaudible and supposedly out-of-range sounds.

It takes advantage of certain properties in the audio circuits in the voice-capturing subsystems used by most state-of-the art speech recognition systems. According to the security researchers, those properties make it possible for someone to transmit commands ultrasonically, have the commands recovered, and then properly interpreted by speech-recognition technologies such as Siri, Alexa, and Google Now.

PoC

For the proof-of-concept attack, the researchers used a Samsung Galaxy S6 Edge smartphone, and for $3 extra, a low-cost amplifier, a transducer for modulating voice commands, and a battery. In each case, the attack kit was placed within relatively close proximity of the target device. In fact, the maximum distance over which the researchers were able to demonstrate their attack was 175 centimeters, or less than two meters from the target device.

In addition to the fact that an attacker would need to be very close to a victim in order to execute a DolphinAttack, there are a few other mitigating circumstances as well. When a speech recognition system is activated, it typically would respond via audible audio and blinking lights or some other visual indicator, thereby alerting a potential victim.

Similarly, the attack would not work on many devices if the speech recognition feature were muted.

Google, Apple, and Microsoft did not respond to a request for comment on the DolphinAttack. In a statement, an Amazon spokesman said the company has taken note of the research. "We take privacy and security very seriously at Amazon and are reviewing the paper issued by the researchers."

Learn from the industry’s most knowledgeable CISOs and IT security experts in a setting that is conducive to interaction and conversation. Click for more info and to register.

Related Content:

Jai Vijayan is a seasoned technology reporter with over 20 years of experience in IT trade journalism. He was most recently a Senior Editor at Computerworld, where he covered information security and data privacy issues for the publication. Over the course of his 20-year ... View Full Bio

Recommended Reading: