At such a low price, the hardware would effectively become disposable, opening up uses that have previously been unimaginable. The devices could be used to build cheap dolls that respond to your kids, for instance, or simple home electronics like lamps that are voice-activated. But Warden also says they could find a use in industrial settings, listening for noises rather than voices—hundreds of sensors spotting tell-tale audio signatures of squeaking wheels in factory equipment, or chirping crickets in a farm field.

Warden, who leads the team at Google that’s developing mobile and embedded applications for the firm’s cloud AI tool, called TensorFlow, realizes that he’s set himself a challenge. Squeezing down, say, the AI that powers Amazon’s AI assistant, Alexa, to run on simple battery-powered chips with clock speeds of just hundreds of megahertz isn’t feasible. That’s partly because Alexa has to interpret a lot of different sounds, but also because most voice recognition AIs use neural networks that are resource-hungry, which is why Alexa sends its processing to the cloud.

So he’s constrained the problem, seeking to identify just a handful of useful commands—such as “on,” “off,” “start,” “stop,” and so on. He’s also traded in regular speech-recognition algorithms. Instead, he takes an audio clip, slices it into short snippets, and then calculates the frequency content of each one. He lines up each of the frequency plots one after the other to create a 2-D image of frequency content versus time, and applies visual-recognition algorithms to identify the distinctive signature of someone saying a single word.