Video Once again, artificially intelligent software has been demonstrated automatically editing videos of talking heads to make them say things they haven’t actually uttered. And it's getting better at it. Today, it's altering footage of boffins, and Mark Zuckerberg and Kim Kardashian, but next it could be you. Probably not.

But maybe.

Deepfakes, content doctored by deep-learning algorithms to seemingly change reality, are all the rage at the moment. Open-source code and massive amounts of data scraped from the internet, whether it’s clips of adult-movie actresses or the voice of popular podcaster Joe Rogan, have made it easier to craft deepfakes.

Check out these ones, below, made by a pair of artists, going by the names Bill Posters and Daniel Howe, who collaborated with CannyAI, a tech company based in Israel. They produced fake videos of US President Donald Trump, Facebook CEO Mark Zuckerberg, and celebrity socialite Kim Kardashian saying stuff they never said, as part of a preview of Spectre, an art installation in the UK, and posted them on Instagram.

Here’s a video of what appears to be Zuckerberg talking about controlling billions of people’s stolen data. It's not perfect but you get the idea of where this technology is gradually progressing. Facebook-owned Instagram declined to remove the video, by the way, as it would be rather hypocritical: Facebook refused to take down maliciously altered videos of US politician Nancy Pelosi, after all. Posters and Howe have Zuck over a barrel, here.

It’s not bad, though the voice is, to our ear, dubbed in from an actor: the machine-learning part is matching the footage of the chief exec to the impersonator, it seems. The Kim Kardashian example is better, and her eyeroll and subtle movement of her hands are spot on.

Details of the technology used by CannyAI aren't public, so take the AI part with a pinch of salt. If it truly is machine-learning based, it perhaps works in a similar way to a method revealed this month in a paper by eggheads at Stanford University, the Max Planck Institute for Informatics, Princeton University, and Adobe.

Text-based editing of talking heads

To use this particular AI system, all you have to do is obtain a video clip and a transcript of someone talking, and then edit that transcript, and run it all through the code, and lip-synch it with edited audio, to produce a video of the person saying the doctored script. You can use it to subtly alter interviews – removing single words to reverse the meaning of sentences, or change one or two words at a time – and invent a new reality.

“We presented the first approach that enables text-based editing of talking-head video by modifying the corresponding transcript,” the paper stated. "As demonstrated, our approach enables a large variety of edits, such as addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis."

Here’s how it works:

Youtube Video

It requires a clear video of a talking head, and a transcript of what is being said in the original video. The team's machine-learning model, a recurrent neural network, carefully analyzes the audio and video to link the person's mouth movements to their speech.

Next, the model takes an edited version of the script, and searches for the person's mouth movements that match the required sounds, in order to get the talking head to visually pronounce the new words. The selected lip movements are blended into the source video at the correct moments to produce footage that appears to show the face saying words not previously spoken. Now the audio needs to be edited: this can be done by cutting words from the original recording as required, or getting an actor to impersonate the target, or using a voice synthesizer to generate a new audio track. When the new audio and doctored video are synchronized, hey presto, you’ve got yourself a deepfake.

Diagram summarizing the process ... Click to enlarge. Image credit: Fried et al.

For best results, it requires about an hour of video clips of a specific person talking, and the neural network has to be retrained to adjust for every new person.

Generating a synthetic composite mask and adjusting to fit the talking head's face ... Image credit: Fried et al.

AI technology isn’t explicitly needed to make these sorts of deepfakes: someone with tight video-editing skills and software can pull off the same caper, too, with enough time. However, this machine-learning approach aims to be fast and automatic, so anyone can use it whenever they need it. And eventually, with improvements, its output may be harder to detect as fake, due to the smooth blending and subtle tweaks, compared to a fake produced by hand using something like Final Cut Pro.

When the researchers asked 138 people to determine if a collection of videos were doctored or not, the edited videos were rated as real 59.6 per cent of the time, on average (see page 12 of the paper). So, yeah, they’re not convincing enough right now to dupe everyone, though they're good enough for most people.

And as the technology continues to improve, the threat of deepfakes spreading believable false information, made-up interviews and confessions, and lies increases.

The boffins discussed the ethical quandary. “We acknowledge that bad actors might use such technologies to falsify personal statements and slander prominent individuals. We are concerned about such deception and misuse,” they wrote in their paper.

Although they haven’t provided any solutions to counter deepfakes, they hoped that by releasing the details of their research, it would help others develop new “fingerprinting and verification techniques,” such as digital watermarks and signatures, to identify faked or doctored footage.

“We hope that publication of the technical details of such systems can spread awareness and knowledge regarding their inner workings, sparking and enabling associated research into the aforementioned forgery detection, watermarking and verification systems. Finally, we believe that a robust public conversation is necessary to create a set of appropriate regulations and laws that would balance the risks of misuse of these tools against the importance of creative, consensual use cases,” they concluded. ®