Kitten videos are harmless, right? Except when they take over your phone.

Researchers have found something new to worry about on the internet. It turns out that a muffled voice hidden in an innocuous YouTube video could issue commands to a nearby smartphone without you even knowing it.

The researchers describe the threat in a research paper to be presented next month at the USENIX Security Symposium in Austin, Texas. They also demonstrate it in this video.

Voice recognition has taken off quickly on phones, thanks to services like Google Now and Apple’s Siri, but voice software can also make it easier to hack devices, warned Micah Sherr, a Georgetown University professor and one of the paper’s authors.

The team found that they could mangle voice commands so that humans can barely recognize the words but software still can. The result condenses the words into a demonic growl.

“Ok Google, Open XKCD.com,” the voice says, and a nearby phone opens that URL.

It’s easy to imagine how a hacker could direct a phone to a website containing malware, or instruct the phone to take a photo.

It might not work every time, but it’s a numbers game. If a million people watch a kitten video with a secret message embedded, 10,000 of them might have have their phone nearby. If 5,000 of those load a URL with malware on it,”you have 5,000 smartphones under an attacker’s control,” Sherr said in a statement.

If the hackers know the ins and outs of the voice recognition software itself, and know its internal workings. they can create voice commands that are even harder to decipher by humans.

The researchers have uploaded samples of a scrambled voice command. In our tests with an Android phone, the commands sometimes went undetected or were misheard. When an audio sample asked “What is my current location,” Google Now heard it as “procrastination.”

But other attempts worked fine. Another audio sample tells the phone to turn on airplane mode, which it did.

To defend against the threat, developers of voice recognition software could incorporate filters to differentiate between human and computer-generated sounds, the paper said.