It's important not to overstate the security risks of the Amazon Echo and other so-called smart speakers. They're useful, fun, and generally have well thought-out privacy protections. Then again, putting a mic in your home naturally invites questions over whether it can be used for eavesdropping—which is why researchers at the security firm Checkmarx started fiddling with Alexa, to see if they could turn it into a spy device. They did, with no intensive meddling required.

The attack, which Amazon has since fixed, follows the intended flow of using and programming an Echo. Because an Echo's mic only activates to send sound over the internet when someone says a wake word—usually "Alexa"— the researchers looked to see if they could piggyback on one of those legitimate reactions to listen in. A few clever manipulations later, they'd achieved their goal.

"We actually did not hack anything, we did not change anything, we just used the features that are given to developers," says Erez Yalon, the head of research at Checkmarx. "We had a few challenges that we had to overcome, but step by step it happened."

In fact, the researchers used an attack technique more common in mobile devices to carry off their eavesdropping. Whereas on a smartphone you might download a malicious app that snuck into, say, the Google Play Store, the researchers instead created a malicious Alexa applet—known as a "skill"—that could be uploaded to Amazon's Skill Store. Specifically, the researchers designed a skill that acts as a calculator, but has a lot more going on behind the scenes. (The Checkmarx team did not actually make their skill available to the general public.)

'We just used the features that are given to developers.' Erez Yalon, Checkmarx

To use a skill, you have to say your device's wake word for the mic to begin shuttling audio over the internet for processing. In Checkmarx's example, when the user then asks their enabled calculator to do some simple math, that request gets routed to the skill, which returns the answer. Normally, the interaction would end there, and the mic would stop transmitting. But the researchers programmed their skill so that instead, a developer functionality called “shouldEndSession” would automatically keep the Echo listening for another cycle.

Even then, normally Alexa would give a verbal "readback" prompt, letting the user know that it was still actively engaged. The researchers found, though, that they could simply put empty values into this prompt instead of words, meaning the Echo would stay quiet and wouldn't let a user know that the session was continuing.

Finally, the researchers programmed the skill to transcribe words and sentences spoken during the session, and send that data back to the developer. Normally an Alexa skill would only be programmed to transcribe certain commands, but the researchers were able to adjust this so the skill could record any arbitrary word. They also programmed the skill to expect sentences containing almost any number of words, by generating strings of all different length. Those tweaks combined enabled continued eavesdropping on an unsuspecting target after an Echo interaction seemed to have ended, as seen in the video below.

There are clear limitations to this eavesdropping approach. It would only have given attackers transcriptions, not audio recordings, of a target's conversations. Additionally, even an elongated session, though theoretically endless, would in practice only go on for a few minutes before the Echo would typically shut it down. This means that attackers would only get short bursts of surveillance, and even then only following a user's intentional interaction with a compromised Echo. It's also already difficult enough to find Alexa skills that you're actively seeking out; a user stumbling on a malicious one while casually browsing seems fairly unlikely.