A Note on Google Assistant’s Fluidity and Speech Recognition

We may earn a commission for purchases made using our links.

I have a bit of a love-hate relationship with Google Now’s voice commands (or rather Google Assistant as it is now called). I love the idea of what it will be able to do one day, and I love what it can do when it works. If it works.

And therein lies the hate. That delay. The unnatural half second pause between saying “Ok Google” and continuing your command. The half second pause that made most of my favorite custom activation ideas on the Moto X Play almost unbearable to actually use (I was a fan of “Would You Kindly”, “Hello Moto”, and “Bridge to Engineering” personally, but the pause made them all fall flat).

That half-second pause is often just enough to break your speech pattern, and just enough to prevent you from speaking the sentence normally. It changes the command from “Would you kindly play music by The Weeknd” into “Would you kindly … play music by The Weeknd”. It forces you to hesitate, sometimes even throwing you off for the rest of the sentence (resulting in the occasional flubbed command).

And being thrown off by a pause is a major issue for Google Assistant. Despite all the talk about natural language processing and being able to have a conversation with Google Assistant, they give you almost no space to correct yourself. If you don’t have a perfectly canned and practiced line ready in your head, if you say the wrong word because you were thrown off by the pause, if you misspeak, if you make a common conversational error, if you do anything wrong, it can and will give you inaccurate results.

My first instinct if I know it has a mistake is to go and try to correct it (“I mean …”), but you can’t just talk back to it. You need to use the activation command first if you’re making a correction (which even then only is sometimes successful), and I can honestly say that I have never remembered to do that. Now, that goes beyond just the delay, and Google claims that they are working on it, but Google Assistant still has a long way to come.

The pause wouldn’t be as bad if it was consistent. One of the biggest issues with the pause isn’t the pause itself, but rather how sometimes it’s quick, and sometimes it’s slow. If you’re on a rock-solid network connection it can be fast enough for you to just keep talking through it right after your activation phrase, but if you’re on a slow network… oh boy. If you’re on a slow network connection, you could be waiting a couple of seconds before it starts recognizing anything. An issue that is only exacerbated by the lack of a beep now. I fully understand why the beep was removed (to make speaking regular sentences possible, with the goal of real conversations with Google Assistant), but it isn’t quite there yet. This could potentially be solved in the future by handling more of the transcription locally, but right now it is extremely frustrating.

Using “OK Google” over Bluetooth is still a pain. Not only is the aforementioned lag still there, but you also run into additional lag from the Bluetooth connection itself, which varies from device to device. On some devices the lag is very low and OK Google can be used easily. On others, like my car, the lag is so long that if you attempt to use it while playing music, it will stop listening before the music cuts out and you hear the “Google Now is ready” beep (which is still around for Bluetooth connections). I’m actually rather surprised that they don’t use the beep on Bluetooth connections to help mark where the recording might begin (and to make sure it doesn’t time out too early). It would be a relatively simple addition, and would go a long way towards making Bluetooth use of OK Google easier (especially since it would allow the phone to guarantee that the speakers had stopped playing music, reducing the amount of background noise). I also have some Bluetooth speakers that OK Google doesn’t seem to work with at all (beyond the activation phrase), but I haven’t had the opportunity to test for what is causing that issue yet, so I can’t really blame Google for it.

Google has been trying to fix this issue for a while, and Google Home is part of their latest attempt. With Google Home, they are trying to compete with the Amazon Echo and their Alexa assistant (which is extremely fluid compared to Google’s current implementation), and it was honestly looking pretty solid in the demo. Yes, it was a quiet room and the commands were relatively simple and they probably have a fantastic internet connection, but Rishi Chandra sounded relatively natural when interacting with Google Home. The pauses seemed like a part of his regular speech pattern, rather than something extra that he needed to account for. It honestly got me excited that maybe, just maybe, Google had fixed the hesitation issue. Unfortunately, then the ad spot was presented, and the illusion came crashing down.

The actors didn’t have Rishi’s calm cadence. They were speaking in their normal voices, and with that, the pauses rang out like a bell. It may have just been because of the contrast against Rishi, but the pauses were noticeable. They were enough to stop it from being a smooth sentence. They made it feel disjointed (and that was just from listening to the sentence, let alone speaking it).

And that’s before even getting into the fact that there was a big disclaimer on the bottom of the screen during the commercial. “Sequences simulated and shortened”, implying that the response time is actually even slower than what was shown. Google was speeding up the response time for the commercial (which is fine. that’s standard practice), and it still felt too slow.

I think it really needs to be stressed at this point that Google Home’s main competitor, Amazon Echo, is extremely fluid in operation. The pause between the activation phrase and it starting to listen isn’t just short, it’s almost unnoticeable. Amazon Echo is truly at the point where, at least from a pacing perspective, you can speak a natural sentence to it. It’s not perfect, the Amazon Echo can definitely continue to see substantial improvement, but in this particular area they have a monumental lead over Google at the moment.

I’m still excited for Google Assistant, and I can’t wait to see how the Internet of Things evolves. There is certainly impressive polish present in the way Assistant (and Google’s iconic voice in general) sounds and how humanly it carries itself by keeping track of conversations. But when it comes to actual operation, the users cannot interact neither as humanly nor as fluidly as they interact with other people. That is, I think, a key point Google needs to address to really sell artificial intelligence as more than an input-output Assistant.

Do you have a home automation hub? What do you think of Google Home? Do you plan on buying one? Let us know in the comments below!