The smart speaker is the product of decades of experimentation with voice recognition and domestic networking that has been made possible, as have so many recent innovations, by massive companies wielding incredible amounts of computing power. Alexa, Siri, and the other artificially intelligent, voice-recognizing (and always female) domestic robo-agents have roots in Bell Labs’ fledgling 1950s experiments with “Audrey,” but their capacity to recognize conversational speech patterns and interact with their owners in a naturalistic way situates them within the ongoing evolution of interactive AI, which once terrified us but now turns us on. These devices’ roles in organizing the mundane duties of domestic life is part of a much broader campaign to network the entire home into a smoothly operating, data-rich whole: Echo can adjust your home’s thermostat and lock your doors, just like Google Home fits into its Nest system, and Apple’s HomePod dialogues with its HomeKit. Freely accessible digital music has been compared to a household utility—like water out of the tap, always available—for years, and with smart speakers, it’s now controllable by the same device that dims your lights.

Digital music files themselves have been remade as “smart” objects for the past several years—“smart” being the latest unavoidable tech buzzword describing technologies that promise to improve experience through mild surveillance. By corralling files into platforms, Spotify, Apple Music, Tidal and their ilk have transformed the simple act of clicking play into a value-generating activity. Streaming songs aren’t exchangeable commodities like they are on CD, vinyl, or even MP3; instead, they’re pleasurable spyware, reporting back copious amounts of proprietary data on listeners (which, the companies promise, is then routed back into an ever-more-personalized and enjoyable user experience). When Spotify CEO Daniel Ek told The New Yorker that his company isn’t in the music space, but the moment space, he was implying that the experience is the commodity—not music, but everyday activities tuned to Spotify’s algorithms and curated playlists. Smart speakers nestle perfectly into a digital music landscape colonized by streaming platforms, the better to curate each activity as a meaningfully soundtracked moment.

Tech designers and engineers look at the world as a set of problems to efficiently, if not artfully, solve. Within certain corners of the digital music space, those problems manifest as barriers to a seamless listening experience—to experiencing streaming music as an atmospheric hum capable of instantaneously accommodating any mood, activity, or nostalgic pang. This is what Amazon Music director Ryan Redington is getting at when he tells me that “voice almost completely removes friction for getting the music quickly.” As an example, Redington describes how he uses music to shift into domestic mode after work. “I used to get home, take out my phone, unlock it, find Amazon Music, find a playlist that I want to listen to, connect to Bluetooth or a receiver in my house, then start playing music,” he explains. With a smart speaker, he claims, all that technological friction disappears. “Now I can just walk in my house, say, ‘Alexa, play’ whatever I want to listen to, and it just works.”

The Echo was not designed explicitly for music, but it was no coincidence that Amazon launched Prime Music, its free service for Amazon Prime members, a few months before the Echo was introduced to the world. (Amazon Music Unlimited, which features millions more tracks and was launched as as a direct competitor to Spotify and Apple Music, debuted in 2016.) “I wouldn’t go as far as to say that [the Echo and Amazon Prime Music] were developed together,” Redington tells me, “but certainly, we knew that this device was being worked on, [and built] our music service to make sure it was very voice-forward.” While Spotify distinguishes itself with personally curated playlists, and Tidal and Apple Music offer artist exclusives on their platforms, Amazon Music hopes to separate itself with voice.

Though its competitors will no doubt catch up quickly, to date Amazon has done far more to integrate streaming music with voice commands. This is a realm that, to put it lightly, can differ starkly from the more familiar process of typing a question into a visual interface. “We are very much down in the weeds on understanding exactly what words customers are using when they ask for something,” explains Alex Luke, Amazon’s global head of programming and content strategy. “What does Alexa say back in response to that utterance, and then what music do we deliver after Alexa says her response?”

Indeed, one of the most significant issues for smart speaker engineers to address is what might be called the single-response problem. “In voice,” Redington explains, “you don’t have the luxury to give customers a lot of results—you have to start playing something.” Unlike a visual interface that can provide a screen full of sorted responses to a question for the user to select from, Alexa can only provide one answer at a time—otherwise there’s friction. In the smart speaker world, getting the right answer first is key. As Redington puts it, “When you ask for something and it works, that’s truly where the magic happens.”

As with all streaming music, the “magic” emerges from the metadata. In a platformed music environment, each individual track is appended with copious digital information that determines where and how it should circulate, from codes that track sales and streams to musical and activity information. Though any streaming platform user is deeply familiar with mood and activity-geared playlists, the frictionless domestic landscape of voice commanded speakers has led to a surge in such requests. “When people say, ‘Alexa, play me happy music,’ that’s something we never saw typed into our app, but we start to see happening a lot through the voice environment,” Redington explains.

While all platforms have teams creating reams of metadata through machine learning techniques and human curation that can determine if a song is “happy,” record labels understandably want to have a say as well. Will Slattery is the global digital sales manager for Ninja Tune, an electronic label that, translated into streaming language, features a lot of lyric-less music that lends itself toward specific moods and activities. “When people start interacting with smart speakers, they’re going to want to say, ‘Alexa, play some chill music,’ or ‘play music for dinner,’” Slattery predicts. “And that’s where a label could jump in and provide the [streaming] companies with that metadata, like, ‘This would be a good song for these specific moods.’” Ninja Tune artist Bonobo, Slattery notes, is very popular on study and concentration playlists—something the producer doesn’t take into account when composing his music, but which he can’t deny once it’s in circulation. “It is strange to imagine an artist hoping they someday get their music on fitness playlists,” as opposed to getting a rave review or a plum Coachella slot, one indie label owner tells me. “But this will change fast. What seems like a slightly absurd way to approach music today will be commonplace tomorrow.”