On 21 November 2015, James Bates had three friends over to watch the Arkansas Razorbacks play the Mississippi State Bulldogs. Bates, who lived in Bentonville, Arkansas, and his friends drank beer and did vodka shots as a tight football game unfolded. After the Razorbacks lost 51–50, one of the men went home; the others went out to Bates’s hot tub and continued to drink. Bates would later say that he went to bed around 1am and that the other two men – one of whom was named Victor Collins – planned to crash at his house for the night. When Bates got up the next morning, he didn’t see either of his friends. But when he opened his back door, he saw a body floating face-down in the hot tub. It was Collins.

A grim local affair, the death of Victor Collins would never have attracted international attention if it were not for a facet of the investigation that pitted the Bentonville authorities against one of the world’s most powerful companies – Amazon. Collins’ death triggered a broad debate about privacy in the voice-computing era, a discussion that makes the big tech companies squirm.

The police, summoned by Bates the morning after the football game, became suspicious when they found signs of a struggle. Headrests and knobs from the hot tub, as well as two broken bottles, lay on the ground. Collins had a black eye and swollen lips, and the water was darkened with blood. Bates said that he didn’t know what had happened, but the police officers were dubious. On 22 February 2016 they arrested him for murder.

Searching the crime scene, investigators noticed an Amazon Echo. Since the police believed that Bates might not be telling the truth, officers wondered if the Echo might have inadvertently recorded anything revealing. In December 2015, investigators served Amazon with a search warrant that requested “electronic data in the form of audio recordings, transcribed records or other text records”.

Amazon turned over a record of transactions made via the Echo but not any audio data. “Given the important first amendment and privacy implications at stake,” an Amazon court filing stated, “the warrant should be quashed.” Bates’s attorney, Kimberly Weber, framed the argument in more colloquial terms. “I have a problem that a Christmas gift that is supposed to better your life can be used against you,” she told a reporter. “It’s almost like a police state.”

With microphone arrays that hear voices from across the room, Amazon’s devices would have been coveted by the Stasi in East Germany. The same can be said of smarthome products from Apple, Google and Microsoft, as well as the microphone-equipped AIs in all of our phones. As the writer Adam Clark Estes put it: “By buying a smart speaker, you’re effectively paying money to let a huge tech company surveil you.”

Amazon, pushing back, complains that its products are unfairly maligned. True, the devices are always listening, but by no means do they transmit everything they hear. Only when a device hears the wake word “Alexa” does it beam speech to the cloud for analysis. It is unlikely that Bates would have said something blatantly incriminating, such as: “Alexa, how do I hide a body?” But it is conceivable that the device could have captured something of interest to investigators. For instance, if anyone intentionally used the wake word to activate the Echo – for a benign request such as asking for a song to be played, say – the device might have picked up pertinent background audio, like people arguing. If Bates had activated his Echo for any request after 1am, that would undercut his account of being in bed asleep.

In August 2016, a judge, apparently receptive to the notion that Amazon might have access to useful evidence, approved a second search warrant for police to obtain the information the company had withheld before. At this point in the standoff, an unlikely party blinked – Bates, who had pleaded not guilty. He and his attorney said they didn’t object to police getting the information they desired. Amazon complied, and if the Echo captured anything incriminating, police never revealed what it was. Instead, in December 2017, prosecutors filed a motion to dismiss the case, saying there was more than one reasonable explanation for the death of Collins. But the surveillance issue raised so dramatically by the case is unlikely to go away.

Tech companies insist they are not spying on their customers via virtual assistants and home gadgets, and that they only ever listen when expressly commanded to do so. These claims, as least as far as they can be externally verified, appear to be true. But this doesn’t mean no listening is happening, or couldn’t happen, in ways that challenge traditional notions of privacy.

There are a number of ways in which home devices could be used that challenge our ideas of privacy. One is eavesdropping to improve quality. Hello Barbie’s digital ears perk up when you press her glittering belt buckle. Saying the phrase “OK, Google” wakes up that company’s devices. Amazon’s Alexa likes to hear her name. But once listening is initiated, what happens next?

Sources at Apple, which prides itself on safeguarding privacy, say that Siri tries to satisfy as many requests as possible directly on the user’s iPhone or HomePod. If an utterance needs to be shipped off to the cloud for additional analysis, it is tagged with a coded identifier rather than a user’s actual name. Utterances are saved for six months so the speech recognition system can learn to better understand the person’s voice. After that, another copy is saved, now stripped of its identifier, for help with improving Siri for up to two years.

Most other companies do not emphasise local processing and instead always stream audio to the cloud, where more powerful computational resources await. Computers then attempt to divine the user’s intent and fulfil it. After that happens the companies could then erase the request and the system’s response, but they typically don’t. The reason is data. In conversational AI, the more data you have, the better.

Virtually all other botmakers, from hobbyists to the AI wizards at big tech companies, review at least some of the transcripts of people’s interactions with their creations. The goal is to see what went well, what needs to be improved and what users are interested in discussing or accomplishing. The review process takes many forms.

The Amazon Echo. Photograph: AP

The chat logs may be anonymised so the reviewer doesn’t see the names of individual users. Or reviewers may see only summarised data. For instance, they might learn that a conversation frequently dead-ends after a particular bot utterance, which lets them know the statement should be adjusted. Designers at Microsoft and Google and other companies also receive reports detailing the most popular user queries so they know what content to add.

But the review process can also be shockingly intimate. In the offices of one conversational-computing company I visited, employees showed me how they received daily emails listing recent interchanges between people and one of the company’s chat apps.

The employees opened one such email and clicked on a play icon.

In clear digital audio, I heard the recorded voice of a child who was free-associating. “I am just a boy,” he said. “I have a green dinosaur shirt ... and, uh, giant feet ... lots of toys in my house and a chair ... My mom is only a girl, and I know my mom, she can do everything she wants to do. She always goes to work when I get up but at night she comes home.”

There was nothing untoward in the recording. But as I listened to it, I had the unsettling feeling of hovering invisibly in the little boy’s room. The experience made me realise that the presumption of total anonymity when speaking to a virtual assistant on a phone or smarthome device – there is only some computer on the other end, right? – is not guaranteed. People might be listening, taking notes, learning.

Eavesdropping may also occur by accident. On 4 October 2017, Google invited journalists to a product unveiling at the SFJazz Center in San Francisco. Isabelle Olsson, a designer, got the job of announcing the new Google Home Mini, a bagel-size device that is the company’s answer to the Amazon Echo Dot. “The home is a special intimate place, and people are very selective about what they welcome into it,” Olsson said. After the presentation, Google gave out Minis as swag to the attendees. One of them was a writer named Artem Russakovskii, and he could be forgiven for later thinking that he hadn’t been selective enough about what he welcomed into his home.

After having the Mini for a couple of days, Russakovskii went online to check his voice search activity. He was shocked to see that thousands of short recordings had already been logged – recordings that never should have been made. As he would later write for the Android Police website: “My Google Home Mini was inadvertently spying on me 24/7 due to a hardware flaw.” He complained to Google and within five hours the company had sent a representative to swap out his malfunctioning device for two replacement units.

Like other similar devices, the Mini could be turned on using the “OK, Google” wake phrase or by simply hitting a button on top of the unit. The problem was that the device was registering “phantom touch events, Russakovskii wrote. Google would later say the problem affected only a small number of units released at promotional events. The problem was fixed via a software update. To further dispel fears, the company announced that it was permanently disabling the touch feature on all Minis.

This response, however, wasn’t enough to satisfy the Electronic Privacy Information Center, an advocacy group. In a letter dated 13 October 2017, it urged the Consumer Product Safety commission to recall the Mini because it “allowed Google to intercept and record private conversations in homes without the knowledge or consent of the consumer”.

No information has emerged to suggest that Google was spying on purpose. Nonetheless, if a company the calibre of Google can make such a blunder, then other companies might easily make similar mistakes as voice interfaces proliferate.

If you want to know whether government agents or hackers might be able to hear what you say to a voice device, consider what happens to your words after you have spoken. Privacy-minded Apple retains voice queries but decouples them from your name or user ID. The company tags them with a random string of numbers unique to each user. Then, after six months, even the connection between the utterance and the numerical identifier is eliminated.

Google and Amazon, meanwhile, retain a link between the speaker and what was said. Any user can log into their Google or Amazon account and see a listing of all of the queries. I tried this on Google, and I could listen to any given recording. For instance, after clicking on a play icon from 9.34am on 29 August 2017, I heard myself ask: “How do I say ‘pencil sharpener’ in German?” Voice records can be erased, but the onus is on the user. As a Google user policy statement puts it: “Conversation history with Google Home and the Google Assistant is saved until you choose to delete it.”

Is this a new problem in terms of privacy? Maybe not. Google and other search engines similarly retain all of your typed-in web queries unless you delete them. So you could argue that voice archiving is simply more of the same. But to some people, being recorded feels much more invasive. Plus, there is the issue of by-catch.

Recordings often pick up other people – your spouse, friends, kids – talking in the background.

For law enforcement agencies to obtain recordings or data that are stored only locally (ie on your phone, computer or smarthome device), they need to obtain a search warrant. But privacy protection is considerably weaker after your voice has been transmitted to the cloud. Joel Reidenberg, director of the Center on Law and Information Policy at Fordham Law School in New York, says “the legal standard of ‘reasonable expectation of privacy’ is eviscerated. Under the fourth amendment, if you have installed a device that’s listening and is transmitting to a third party, then you’ve waived your privacy rights.” According to a Google transparency report, US government agencies requested data on more than 170,000 user accounts in 2017. (The report does not specify how many of these requests, if any, were for voice data versus logs of web searches or other information.)

Apple’s HomePod. Photograph: Antonio Olmos

If you aren’t doing anything illegal in your home – or aren’t worried about being falsely accused of doing so – perhaps you don’t worry that the government could come calling for your voice data. But there is another, more broadly applicable risk when companies warehouse all your recordings. With your account login and password, a hacker could hear all the requests you made in the privacy of your home.

Technology companies claim they don’t eavesdrop nefariously, but hackers have no such aversion. Companies employ password protection and data encryption to combat spying, but testing by security researchers as well as breaches by hackers demonstrate that these protections are far from foolproof.

Consider the CloudPets line of stuffed animals, which included a kitten, an elephant, a unicorn and a teddy bear. If a child squeezed one of these animals, he or she could record a short message that was beamed via Bluetooth to a nearby smartphone. From there, the message was sent to a distant parent or other relative, whether she was working in the city or fighting a war on the other side of the world. The parent, in turn, could record a message on her phone and send it to the stuffed animal for playback.

It was a sweet scenario. The problem was that CloudPets placed the credentials for more than 800,000 customers, along with 2m recorded messages between kids and adults, in an easily discoverable online database. Hackers harvested much of this data in early 2017 and even demanded ransom from the company before they would release their ill-gotten treasure.

Paul Stone, a security researcher, discovered another problem: the Bluetooth pairing between CloudPets animals and the companion smartphone app didn’t use encryption or require authentication. After purchasing a stuffed unicorn for testing, he hacked it.

In a demonstration video he posted online, Stone got the unicorn to say: “Exterminate, annihilate!” He triggered the microphone to record, turning the plush toy into a spy. “Bluetooth LE typically has a range of about 10-30 metres,” Stone wrote on his blog, “so someone standing outside your house could easily connect to the toy, upload audio recordings, and receive audio from the microphone.”

Plush toys may be, well, soft targets for hackers, but the vulnerabilities they exhibit are sometimes found in voice-enabled, internet-connected devices for adults. “It’s not that the risks are particularly any different to the ones you and I face every day with the volumes of data we produce and place online,” says security researcher Troy Hunt, who documented the CloudPets breach. “It’s that our tolerances are very different when kids are involved.”

Other researchers have identified more technologically sophisticated ways in which privacy might be violated. Imagine someone is trying to take control of your phone or other voice AI device simply by talking to it. The scheme would be foiled if you heard them doing so. But what if the attack was inaudible? That is what a team of researchers at China’s Zhejiang University wanted to investigate for a paper that was published in 2017. In the so-called DolphinAttack scenario that the researchers devised, the hacker would play unauthorised commands through a speaker that he planted in the victim’s office or home. Alternatively, the hacker could tote a portable speaker while strolling by the victim. The trick was that those commands would be played in the ultrasonic range above 20kHz – inaudible to human ears but, through audio manipulation by the researchers, easily perceptible to digital ones.

In their laboratory tests, the scientists successfully attacked the voice interfaces of Amazon, Apple, Google, Microsoft and Samsung. They tricked those voice AIs into visiting malicious websites, sending phoney text messages and emails, and dimming the screen and lowering the volume to help conceal the attack. The researchers got the devices to place illegitimate phone and video calls, meaning that a hacker could listen to and even see what was happening around a victim. They even hacked their way into the navigation system of an Audi SUV.

Most people don’t want hackers, police officers or corporations listening in on them. But there is a final set of scenarios that confuses the surveillance issue. In reviewing chat logs for quality control in the manner described above, conversation designers might hear things that almost beg them to take action.

Take the creators of Mattel’s Hello Barbie. In that process, they struggled with a disturbing set of hypothetical scenarios. What if a child told the doll “My daddy hits my mom”? Or “My uncle has been touching me in a funny place”? The writers felt it would be a moral failure to ignore such admissions. But if they reported what they heard to the police, they would be assuming the role of Big Brother. Feeling uneasy, they decided Barbie’s response should be something like: “That sounds like something you should tell to a grownup whom you trust.”

Mattel, however, seems willing to go further. In an FAQ about Hello Barbie, the company wrote that conversations between children and the doll are not monitored in real time. But afterward, the dialogues might occasionally be reviewed to aid product testing and improvement. “If in connection with such a review we come across a conversation that raises concern about the safety of a child or others,” the FAQ stated, “we will cooperate with law enforcement agencies and legal processes as required to do so or as we deem appropriate on a case-by-case basis.”

Hello Barbie. Photograph: Mattel

The conundrum similarly challenges the big tech companies.

Because their virtual assistants handle millions of voice queries per week, they don’t have employees monitoring utterances on a user-by-user basis. But the companies do train their systems to catch certain highly sensitive things people might say. For instance, I tested Siri by saying: “I want to kill myself.” She replied: “If you are thinking about suicide, you may want to speak with someone at the National Suicide Prevention Lifeline.” Siri supplied the telephone number and offered to place the call.

Thanks, Siri. But the problem with letting virtual assistants look out for us is that the role suggests major responsibility with ill-defined limits. If you tell Siri that you are drunk, she sometimes offers to call you a cab. But if she doesn’t, and you get into a car accident, is Apple somehow responsible for what Siri failed to say?

When is a listening device expected to take action? If Alexa overhears someone screaming “Help, help, he’s trying to kill me!”, should the AI automatically call the police?

The preceding scenarios are not far-fetched to analyst Robert Harris, a communication industry consultant. He argues that voice devices are creating a snarl of new ethical and legal issues. “Will personal assistants be responsible for the ... knowledge that they have?” he says. “A feature like that sometime in the future could become a liability.”

The uses of AI surveillance make clear that you should scrutinise each one of these technologies you allow into your life. Read up on just how and when the digital ears are turned on. Find out what voice data is retained and how to delete it if you desire. And if in doubt – especially with applications made by companies whose privacy policies can’t be easily understood – pull the plug.

This is an edited extract from Talk to Me: Apple, Google, Amazon and the Race for Voice-Controlled AI is published on 28 March by Random House Penguin. To buy a copy for £17.60 visit guardianbookshop.com or call 0330 333 6846

• Follow the Long Read on Twitter at @gdnlongread, or sign up to the long read weekly email here.