While we’ve all grown accustomed to chatting with Siri, talking to our cars, and soon maybe even asking our glasses for directions, talking to our computers still feels weird. But now, Google is putting their full weight behind changing this. There’s no clearer evidence to this, than visiting Google.com, and seeing a speech recognition button right there inside Google’s most sacred real estate - the search box.

Yet all this effort may now be compromised by a new exploit which lets malicious sites turn Google Chrome into a listening device, one that can record anything said in your office or your home, as long as Chrome is still running.

Check out the video, to see the exploit in action

Google’s Response

I discovered this exploit while working on annyang, a popular JavaScript Speech Recognition library. My work has allowed me the insight to find multiple bugs in Chrome, and to come up with this exploit which combines all of them together.

Wanting speech recognition to succeed, I of course decided to do the right thing…

I reported this exploit to Google’s security team in private on September 13. By September 19, their engineers have identified the bugs and suggested fixes. On September 24, a patch which fixes the exploit was ready, and three days later my find was nominated for Chromium’s Reward Panel (where prizes can go as high as $30,000.)

Google’s engineers, who’ve proven themselves to be just as talented as I imagined, were able to identify the problem and fix it in less than 2 weeks from my initial report.

I was ecstatic. The system works.

But then time passed, and the fix didn’t make it to users’ desktops. A month and a half later, I asked the team why the fix wasn’t released. Their answer was that there was an ongoing discussion within the Standards group, to agree on the correct behaviour - “Nothing is decided yet.”

As of today, almost four months after learning about this issue, Google is still waiting for the Standards group to agree on the best course of action, and your browser is still vulnerable.

By the way, the web’s standards organization, the W3C, has already defined the correct behaviour which would’ve prevented this… This was done in their specification for the Web Speech API, back in October 2012.

How Does it Work?

A user visits a site, that uses speech recognition to offer some cool new functionality. The site asks the user for permission to use his mic, the user accepts, and can now control the site with his voice. Chrome shows a clear indication in the browser that speech recognition is on, and once the user turns it off, or leaves that site, Chrome stops listening. So far, so good.

But what if that site is run by someone with malicious intentions?

Most sites using Speech Recognition, choose to use secure HTTPS connections. This doesn’t mean the site is safe, just that the owner bought a $5 security certificate. When you grant an HTTPS site permission to use your mic, Chrome will remember your choice, and allow the site to start listening in the future, without asking for permission again. This is perfectly fine, as long as Chrome gives you clear indication that you are being listened to, and that the site can’t start listening to you in background windows that are hidden to you.

When you click the button to start or stop the speech recognition on the site, what you won’t notice is that the site may have also opened another hidden popunder window. This window can wait until the main site is closed, and then start listening in without asking for permission. This can be done in a window that you never saw, never interacted with, and probably didn’t even know was there.

To make matters worse, even if you do notice that window (which can be disguised as a common banner), Chrome does not show any visual indication that Speech Recognition is turned on in such windows - only in regular Chrome tabs.

You can see the full source code for this exploit on GitHub.

Speech Recognition's Future

Speech recognition has huge potential for launching the web forward. Developers are creating amazing things, making sites better, easier to use, friendlier for people with disabilities, and just plain cool…

As the maintainer of a popular speech recognition library, it may seem that I shot myself in the foot by exposing this. But I have no doubt that by exposing this, we can ensure that these issues will be resolved soon, and we can all go back to feeling very silly talking to our computers… A year from now, it will feel as natural as any of the other wonders of this age.