In the spring of 2018, a couple in Portland, OR reported to a local news station that their Amazon Echo had recorded a conversation without their knowledge, and then sent that recording to someone in their contacts list. As it turned out, the commands Alexa followed came were issued by television dialogue. The whole thing took a sitcom-sized string of coincidences to happen, but it happened. Good thing the conversation was only about hardwood floors.

But of course these smart speakers are listening all the time, at least locally. How else are they going to know that someone uttered one of their wake words, or something close enough? It would sure help a lot if we could change the wake word to something like ‘rutabaga’ or ‘supercalifragilistic’, but they probably have ASICs that are made to listen for a few specific words. On the Echo for example, your only choices are “Alexa”, “Amazon”, “Echo”, or “Computer”.

So how often are smart speakers listening when they shouldn’t? A team of researchers at Boston’s Northeastern University are conducting an ongoing study to determine just how bad the problem really is. They’ve set up an experiment to generate unexpected activation triggers and study them inside and out.

Sequestered Smart Speakers

The team corralled a group of mainstream smart speakers into a box representing all the major players — four Alexas and one each of her cohorts. We’d love to see them maximize the test subjects by including enough devices of each type to cover all the possible assigned wake words, but that would be pretty expensive.

Then they piped in 125 hours worth of audio from TV shows with rapid-fire dialogue using Netflix. The shows they chose are healthy cross-section of televised entertainment — mostly newer stuff, but some going back a decade or more. Everything from comedy to drama. A video camera trained on the speakers will record any lights that indicate a successful activation. There’s also a microphone to pick up anything the devices say in response to the dialogue stream, and a WAP to capture network traffic in and out of the box.

While the results indicate that these devices aren’t constantly recording (phew!), they do tend to wake up quite frequently for short periods of time — up to 19 times in a 24-hour period. The worst offenders were the Apple and Microsoft speakers, both of which activated more often than the others. Not all of the activations were short and sweet, though — both the Microsoft Invoke and the Echo Dot had accidental activations lasting up to 43 seconds long. That’s plenty of time to record and/or distribute your late-night 16-digit utterances to the QVC operators, or the secret ingredient in your mother-in-law’s Quiche Lorraine.

Are You Talkin’ To Me?

The researchers saw patterns emerge in the dialogue that caused activations lasting five seconds or longer, but the patterns aren’t terribly surprising. Basically, any phrase starting with a word that contains the ‘ey/ay’ sound (e.g. they/may/pay/sleigh) followed with a hard ‘g’ sound (or anything close to it) will wake up a Google Home mini set to listen for ‘hey Google’. The other speakers acted the same way when they heard strings that rhyme with their wake word(s).

At present, the group is still studying activations that lead to recordings being uploaded to the cloud. They’re also trying to determine whether human modifiers such as gender, ethnicity, and accent have any impact on the probability of accidental activation.

Just like the humans that designed them, smart speakers occasionally mishear things, including music lyrics and their own names. Whether they are learning from their mistakes remains to be seen.