A company that specializes in detecting voice fraud is sounding the alarm over an emerging threat. With the help of AI-powered software, cybercriminals are starting to clone people’s voices to commit scams, according to Vijay Balasubramaniyan, CEO of Pindrop.

“We’ve seen only a handful of cases, but the amount of money stolen can reach as high as $17 million,” he told PCMag.

During a presentation at RSA, Balasubramaniyan said Pindrop has over the past year also investigated about a dozen similar cases involving fraudsters using AI-powered software to “deepfake” someone’s voice to perpetrate their scams.

“We’re starting to see deepfake audios emerge as a way to target particular speakers, especially if you’re the CEO of a company, and you have a lot of YouTube content out there,” he said. “What these fraudsters are starting to do is use that to start synthesizing your audio.”

Vijay Balasubramaniyan, CEO of Pindrop

The scheme builds upon a classic attack known as business email compromise, where the fraudster will use fake emails to pretend to be a senior officer at a company. The goal is to fool a lower-level employee to send a large money request to the fraudster’s bank account.

Deepfaking someone’s voice can take the scheme to the next level. Just hearing your CEO’s voice on a phone can convince you to follow orders and comply with a large money request, even though it may not be legit, Balasubramaniyan said.

“All you need is five minutes of someone’s audio and you can create a fairly realistic clone,” he added. “If you have five hours or more of their audio, then you can create something that’s not perceptible by humans.”

In one of the investigated cases, Balasubramaniyan said the victim CEO actually had little public content revealing his voice. However, the CEO did do monthly all-hands meetings at his company, which were recorded and later exposed in a breach. "Then they (the scammers) started to use this audio content to synthesize his voice," Balasubramaniyan added.

The good news is that the deepfaking threat is still small relative to other phone call-related scams involving identity theft. That said, the technology to authentically clone voices is already here (but fortunately not widespread). During his presentation, Balasubramaniyan demoed an internal system his company created to synthesize voices from public figures. He showed it off, deepfaking President Donald Trump’s voice to say the US should give North Korea a “bloody nose.”

The technology works by searching for Trump’s previous audio recordings on the internet to simulate his voice, which takes less than a minute. “You can generate any kind of audio content, and create this on demand,” Balasubramaniyan added. (Last year, AI researchers showed a similar concept that authentically cloned podcast host Joe Rogan’s voice.)

Clearly, the threat is disturbing. In addition to perpetrating scams, audio deepfakes also risk spreading misinformation that can dupe the public. Fortunately, Pindrop and other computer scientists are working on solutions to detect deepfakes. In Pindrop’s case, the company has created an AI-powered algorithm that can discern human speech from deepfake audio tracks. It does this by checking how the spoken words are actually pronounced and whether they match with human speech patterns.

“We start looking for the deformities,” he added. “Is the pace at which he is saying (the words) even humanly possible?”

Nevertheless, the looming threat of audio deepfaking may force users to be more careful when it comes uploading their voice to the internet. Balasubramaniyan predicted there may one day be a market for anti-voice cloning security services, like there is for data security. “Your going to have companies that create mechanisms to detect these attacks,” he said. “These system needs to start protecting your audio content if ever a version of you shows up that’s not you.”

Further Reading

Security Reviews