If you believe the hype, homes in the US are already lousy with speakers containing embodied AI — the most high profile being Amazon’s Echo, in its various incarnations, whose Alexa voice assistant can be commanded to do things like play music, answer trivia questions or tap into third party apps (via an ever-expanding set of so-called ‘skills‘).

But the truth is these virtual personal assistant (VPA) speakers remain very much the domestic exception, not the rule. To give an idea of current and near-term market size, a snapshot forecast put out by Gartner yesterday suggests these VPA-enabled wireless speakers will generate just $3.52BN in global revenue by 2021, up from $0.72BN in 2016.

Gartner tells TechCrunch the forecast is based on estimates of unit sales of between 10-12 million AI speaker devices shipping this year. (Amazon and other makers playing in this space, such as Alphabet with its Google Home speaker, don’t disclose official sales figures.)

For some comparative context, Gartner is expecting the wearables space to generate $30.5BN in revenue this year, off of 310M devices shipped. While the global smartphone market ships more units than a year’s worth of wearables per quarter (~380M) — and more than a billion handsets per year (~1.5BN in 2016).

Still, AI-enabled speakers are definitely having their moment in the sun, as even Apple has decided it’s time to get involved — announcing earlier this summer its own premium offering is on the way. Aka, the HomePod.

“We want to reinvent home music,” was how Apple CEO Tim Cook introduced the gadget at the company’s WWDC keynote, showing how Cupertino is thinking about the emergent device within its own business context. Music first, AI last.

Given the evident disconnect between the level of noise being generated by Echo et al, and still very low-level consumer appetite for buying and living with yet another high-tech gizmo — not to mention one that can suck up so much personal data — Apple’s positioning of HomePod as first and foremost a high end audio device makes perfect mainstream market sense.

Apple’s Siri-based voice control — which, in any case, has lagged behind competing voice assistant techs — is being presented as a secondary extra. And that’s not going to be a problem for the company’s target consumer.

The existing Bluetooth wireless speakers market, i.e. speakers that don’t include AI assistants, is clearly orders of magnitude larger, though it’s also at a different stage of development. But one is a proven market; the other is still experimental.

Experimenting with new touchpoint to embed their AIs is certainly the name of the game for Amazon and Google, whose business models focus on driving services revenue streams, rather than trying to make a fat profit on hardware itself.

“If you’re looking at just music, and you’re looking at a dedicated device for that, that’s where the market is at the moment — there’s no doubt,” agrees Gartner analyst Ranjit Atwal. “This is music plus,” he says of VPA wireless speakers. “And then you get music plus screen, and you get music plus camera… We’re still seeing them evolve, as to what use-cases they can start fitting into.”

“At the moment [these devices are not that] convenient [for consumers to use]. It’s a little bit hard because it maybe doesn’t understand you… They’re putting them out there so they can get more data, learn more about what people are doing to make the experience better,” he adds. “And that’s the tough part for consumers; because what they’re getting at the moment from a hardware perspective is fine… From an experience perspective it learns. And I think that’s something that users will have to get used to or understand over time. That the device, actually, is getting cleverer as they hold onto it.”

For Amazon, the family of Echos it’s busily building out (Echo, Echo Dot, Echo Look, Echo Show… ) means all the more virtual, voice-enabled ecommerce touchpoints — so more opportunities to reach new users and get existing users to buy and subscribe to more of its services.

It’s clear that measuring success in this category will not be so tightly tied to hardware volumes shipped. The bigger story is whether these devices can significantly drive the kinds of digital services Amazon, Google et al want in perpetual reach across consumers’ home — keeping their users transacting even when they’re not actively using a smartphone or laptop.

“Whilst the volumes aren’t going to be necessary big, from a hardware perspective, it’s interesting to see the approaches of the vendors in this space,” agrees Atwal. “It’s not like the smartphone market where everybody needs to sell hardware. It’s different approaches that end up with a speaker but there’s a different reason and business model behind it.”

There are also going to be more types of hardware bodies housing AIs in future. Gartner is projecting a temporary slowdown in sales of VPA wireless speakers next year because it expects other types of devices to have virtual assistants baked in — especially in “connected home scenarios” — citing devices like lighting systems, hubs and wi-fi mesh devices.

So the long view here is that voice control gets embedded into everything in the home. Or at least every device or thing where it makes sense.

“Do you want to be talking to your fridge? You probably do, if you open it up and your milk’s about to finish you might actually want to say to your fridge — order me some more milk or put it on my list,” he suggests. “It’s that one to one interaction — I know you’re talking to a piece of metal or a machine — but it’s that convenience level that will slowly but surely people will get comfortable.”

Atwal predicts the smart home may end up comprising a mix of voice-driven interactions plus button-presses to give some rudimentary online abilities to other things — citing another Amazon device, its Dash buttons, which can be stuck next to things like your washing machine or underwear drawer and used as a low friction route to reorder a particular item the moment you realize you’ve run out of soap powder or clean pants.

“It’s going to be a combination of those voice-driven, automated capabilities that will drive how this comes together,” argues Atwal of the smart home. Though its voice that has the really big potential here, if AI can deliver on its potential.

“As voice becomes more of a natural interaction, and conversational, then as it understands the conversation that you’re having with it — I think that’s really where the voice is going to really provide a difference,” he says. “That’s where it will start to hold its own over time, where it can hold a conversation where it remembers what you said previously and puts that in the context of whatever you say next so you’re not repeating the subject matter… so it becomes like you’re having a conversation with a human.”

Of course the current crop of voice-driven devices are nowhere near being able to sustain such sophisticated, contextual human-esque conversations. So while that’s the clear ambition, it’s less evident when (or if) such a breakthrough interface might show its voice.

“That’s somewhere in the future,” says Atwal, adding: “It’s not clear how quickly or otherwise that will come around.”

Another aspect of the smart speaker craze that Gartner discusses in its forecast is privacy considerations — which it recognizes may be preventing some consumers from feeling comfortable installing a listening device linked to a data-harvesting commercial entity inside their homes.

And while its forecast takes the view that consumers’ concerns about privacy will have been “largely” mitigated by 2020 — on account of what it describes as “educational efforts, adoption by peers and regulatory approvals of the device category” — Atwal rows back on this line somewhat.

He says that while various technical and regulatory aspects can be deployed to give the user more “capabilities around privacy” — such as authenticating and identifying who’s talking to a device; and regulating how you register devices in terms of who’s using them and what data flows — in his view there’s still likely to be a “battle” against perceptions of the tech being seen as inherently creepy.

“There’s various ways from a technical or regulatory perspective [device makers] can give the user more capabilities around privacy. But whether the perception for the user around privacy changes, given its voice, and sits in your home and listens, that’s still [a questionmark],” he tells TechCrunch. “Whether the user perception changes over that time is going to be a battle.”

Another Gartner prediction for the space is that beginning in 2019, third-gen VPA speaker products will start shipping with some AI functions running locally on the device, rather than in the cloud — and it says this is, in part, also a consequence of privacy considerations.

Other drivers here include latency and resilience against network downtime.

“Many customers, especially future enterprise clients, require on-premises solutions, in some cases mandated by confidentiality and regulatory requirements,” notes Gartner research director Werner Goertz.

“Privacy, latency, getting things done quicker,” adds Atwal on this. “You don’t need to go back to the cloud to consult up there — there a lot of things that can be done locally because, actually, it’s just to do with you. You don’t need to go off and find out what everybody else is doing to come back and tell you what you want to do.

“So over time… that latency and speed and getting that done has to start happening at a more local level. And again I use two words: one is convenience, and secondly experience. As you get more of a local level technological input that ultimately would have to make that experience better.”