By now, the privacy threats posed by Amazon Alexa and Google Home are common knowledge. Workers for both companies routinely listen to audio of users—recordings of which can be kept forever—and the sounds the devices capture can be used in criminal trials.

Now, there's a new concern: malicious apps developed by third parties and hosted by Amazon or Google. The threat isn't just theoretical. Whitehat hackers at Germany's Security Research Labs developed eight apps—four Alexa "skills" and four Google Home "actions"—that all passed Amazon or Google security-vetting processes. The skills or actions posed as simple apps for checking horoscopes, with the exception of one, which masqueraded as a random-number generator. Behind the scenes, these "smart spies," as the researchers call them, surreptitiously eavesdropped on users and phished for their passwords.

"It was always clear that those voice assistants have privacy implications—with Google and Amazon receiving your speech, and this possibly being triggered on accident sometimes," Fabian Bräunlein, senior security consultant at SRLabs, told me. "We now show that, not only the manufacturers, but... also hackers can abuse those voice assistants to intrude on someone's privacy."

The malicious apps had different names and slightly different ways of working, but they all followed similar flows. A user would say a phrase such as: "Hey Alexa, ask My Lucky Horoscope to give me the horoscope for Taurus" or "OK Google, ask My Lucky Horoscope to give me the horoscope for Taurus." The eavesdropping apps responded with the requested information while the phishing apps gave a fake error message. Then the apps gave the impression they were no longer running when they, in fact, silently waited for the next phase of the attack.

As the following two videos show, the eavesdropping apps gave the expected responses and then went silent. In one case, an app went silent because the task was completed, and, in another instance, an app went silent because the user gave the command "stop," which Alexa uses to terminate apps. But the apps quietly logged all conversations within earshot of the device and sent a copy to a developer-designated server.

The phishing apps follow a slightly different path by responding with an error message that claims the skill or action isn't available in that user's country. They then go silent to give the impression the app is no longer running. After about a minute, the apps use a voice that mimics the ones used by Alexa and Google home to falsely claim a device update is available and prompts the user for a password for it to be installed.

SRLabs eventually took down all four apps demoed. More recently, the researchers developed four German-language apps that worked similarly. All eight of them passed inspection by Amazon and Google. The four newer ones were taken down only after the researchers privately reported their results to Amazon and Google. As with most skills and actions, users didn't need to download anything. Simply saying the proper phrases into a device was enough for the apps to run.

All of the malicious apps used common building blocks to mask their malicious behaviors. The first was exploiting a flaw in both Alexa and Google Home when their text-to-speech engines received instructions to speak the character "�." (U+D801, dot, space). The unpronounceable sequence caused both devices to remain silent even while the apps were still running. The silence gave the impression the apps had terminated, even when they remained running.

The apps used other tricks to deceive users. In the parlance of voice apps, "Hey Alexa" and "OK Google" are known as "wake" words that activate the devices; "My Lucky Horoscope" is an "invocation" phrase used to start a particular skill or action; "give me the horoscope" is an "intent" that tells the app which function to call; and "taurus" is a "slot" value that acts like a variable. After the apps received initial approval, the SRLabs developers manipulated intents such as "stop" and "start" to give them new functions that caused the apps to listen and log conversations.

Others at SRLabs who worked on the project include security researcher Luise Frerichs and Karsten Nohl, the firm's chief scientist. In a post documenting the apps, the researchers explained how they developed the Alexa phishing skills:

1. Create a seemingly innocent skill that already contains two intents:

– an intent that is started by "stop" and copies the stop intent

– an intent that is started by a certain, commonly used word and saves the following words as slot values. This intent behaves like the fallback intent. 2. After Amazon's review, change the first intent to say goodbye, but then keep the session open and extend the eavesdrop time by adding the character sequence "(U+D801, dot, space)" multiple times to the speech prompt. 3. Change the second intent to not react at all When the user now tries to end the skill, they hear a goodbye message, but the skill keeps running for several more seconds. If the user starts a sentence beginning with the selected word in this time, the intent will save the sentence as slot values and send them to the attacker.

To develop the Google Home eavesdropping actions:

1. Create an Action and submit it for review. 2. After review, change the main intent to end with the Bye earcon sound (by playing a recording using the Speech Synthesis Markup Language (SSML)) and set expectUserResponse to true. This sound is usually understood as signaling that a voice app has finished. After that, add several noInputPrompts consisting only of a short silence, using the SSML element or the unpronounceable Unicode character sequence "�." 3. Create a second intent that is called whenever an actions.intent.TEXT request is received. This intent outputs a short silence and defines several silent noInputPrompts. After outputting the requested information and playing the earcon, the Google Home device waits for approximately 9 seconds for speech input. If none is detected, the device "outputs" a short silence and waits again for user input. If no speech is detected within 3 iterations, the Action stops. When speech input is detected, a second intent is called. This intent only consists of one silent output, again with multiple silent reprompt texts. Every time speech is detected, this Intent is called and the reprompt count is reset. The hacker receives a full transcript of the user's subsequent conversations, until there is at least a 30-second break of detected speech. (This can be extended by extending the silence duration, during which the eavesdropping is paused.) In this state, the Google Home Device will also forward all commands prefixed by "OK Google" (except "stop") to the hacker. Therefore, the hacker could also use this hack to imitate other applications, man-in-the-middle the user's interaction with the spoofed Actions, and start believable phishing attacks.

SRLabs privately reported the results of its research to Amazon and Google. In response, both companies removed the apps and said they are changing their approval processes to prevent skills and actions from having similar capabilities in the future. In a statement, Amazon representatives provided the following statement and FAQ (emphasis added for clarity):

Customer trust is important to us, and we conduct security reviews as part of the skill certification process. We quickly blocked the skill in question and put mitigations in place to prevent and detect this type of skill behavior and reject or take them down when identified. On the record Q&A: 1) Why is it possible for the skill created by the researchers to get a rough transcript of what a customer says after they said "stop" to the skill? This is no longer possible for skills being submitted for certification. We have put mitigations in place to prevent and detect this type of skill behavior and reject or take them down when identified. 2) Why is it possible for SR Labs to prompt skill users to install a fake security update and then ask them to enter a password? We have put mitigations in place to prevent and detect this type of skill behavior and reject or take them down when identified. This includes preventing skills from asking customers for their Amazon passwords. It's also important that customers know we provide automatic security updates for our devices, and will never ask them to share their password.

Google representatives, meanwhile, wrote:

All Actions on Google are required to follow our developer policies, and we prohibit and remove any Action that violates these policies. We have review processes to detect the type of behavior described in this report, and we removed the Actions that we found from these researchers. We are putting additional mechanisms in place to prevent these issues from occurring in the future.

Google didn't say what these additional mechanisms are. On background, a representative said company employees are conducting a review of all third-party actions available from Google, and during that time, some may be paused temporarily. Once the review is completed, actions that passed will once again become available.

It's encouraging that Amazon and Google have removed the apps and are strengthening their review processes to prevent similar apps from becoming available. But the SRLabs' success raises serious concerns. Google Play has a long history of hosting malicious apps that push sophisticated surveillance malware—in at least one case, researchers said, so that Egypt's government could spy on its own citizens. Other malicious Google Play apps have stolen users' cryptocurrency and executed secret payloads. These kinds of apps have routinely slipped through Google's vetting process for years.

There's little or no evidence third-party apps are actively threatening Alexa and Google Home users now, but the SRLabs research suggests that possibility is by no means farfetched. I've long remained convinced that the risks posed by Alexa, Google Home, and other always-listening apps outweigh their benefits. SRLabs' Smart Spies research only adds to my belief that these devices shouldn't be trusted by most people.