Conclusions

The democratization of voice technology has long been dreamed about, and it’s finally (slowly) coming. The situation out there is still quite fragmented though and some commercial SDKs may still get deprecated with short notice or no notice at all. But at least some solutions are emerging to bring speech detection to all devices.

I’ve built integrations in platypush for all of these services because I believe that it’s up to users, not to businesses, to decide how people should use and benefit from voice technology. Moreover, having so many voice integrations in the same product — and especially having voice integrations that expose all the same API and generate the same events — makes it very easy to write assistant-agnostic logic, and really decouple the tasks of speech recognition from the business logic that can be run by voice commands.

Check out my previous article to learn how to write your own custom hooks in platypush on speech detection, hotword detection and speech start/stop events.

To summarize my findings so far:

Use the native Google Assistant integration if you want to have a full Google experience, and if you’re ok with Google servers processing your audio and the possibility that somewhere in the future the deprecated Google Assistant library won’t work anymore.

integration if you want to have a full Google experience, and if you’re ok with Google servers processing your audio and the possibility that somewhere in the future the deprecated Google Assistant library won’t work anymore. Use the Google push-to-talk integration if you only want to have the assistant, without hotword detection, or you want your assistant to be triggered by alternative hotwords.

integration if you only want to have the assistant, without hotword detection, or you want your assistant to be triggered by alternative hotwords. Use the Alexa integration if you already have an Amazon-powered ecosystem and you’re ok with having less flexibility when it comes to custom hooks because of the unavailability of speech transcript features in the AVS.

integration if you already have an Amazon-powered ecosystem and you’re ok with having less flexibility when it comes to custom hooks because of the unavailability of speech transcript features in the AVS. Use Snowboy if you want to use a flexible, open-source and crowd-powered engine for hotword detection that runs on-device and/or use multiple assistants at the same time through different hotword models, even if the models may not be that accurate.

if you want to use a flexible, open-source and crowd-powered engine for hotword detection that runs on-device and/or use multiple assistants at the same time through different hotword models, even if the models may not be that accurate. Use Mozilla DeepSpeech if you want a fully on-device open-source engine powered by a robust Tensorflow model, even if it takes more CPU load and a bit more latency.

if you want a fully on-device open-source engine powered by a robust Tensorflow model, even if it takes more CPU load and a bit more latency. Use PicoVoice solutions if you want a full voice solution that runs on-device and it’s both accurate and performant, even though you’ll need a commercial license for using it on some devices or extend/change the model.

Let me know your thoughts on these solutions and your experience with these integrations!