Fake Malcolm Turnbull: How hard is it to make a convincing deepfake video?

Updated

We set out to see if we could fake Malcolm Turnbull.

Researchers are warning that a new wave of artificial intelligence technology could make it so easy to create fake videos that it will undermine the public's ability to trust what they see.

Take this footage, for example. A team of academics working out of leading US and European universities created a fake version of Barack Obama from publicly available footage, controlled by the footage of the man against the green background:

[ Take our deepfakes quiz to see if you can tell the difference between real and fake videos.]

But how far away is the technology for a member of the public, and what even is a deepfake?

To find out we set out to try recreate our then Australian prime minister, Malcolm Turnbull.

(And yes, don't worry, we did hear about Scott Morrison taking charge. But, you know, Australian prime ministers change faster than our meagre ABC AI skills can cope with.)

To see what was possible we turned to a program called Deepfakes, machine learning software that made its way into the public domain via the dark corners of the internet, where it was unsurprisingly being used to make fake celebrity porn.

Compared to what's available to researchers, it's pretty limited — it can't recreate a whole scene, but what it can do is recreate a person's face, and put that onto another person.

To use the program you don't really need to know how to code — all you need is a relatively fast computer. It does all the work, though you need to give it the right ingredients. Here's how it works.

First you need to find a lot of images of Malcolm Turnbull's face. We gathered these from ABC photos and videos. But in today's era of social media saturation, finding pictures of almost anyone's face is a pretty simple task.

You give the Deepfake program all those photos, and it analyses them. It's using pre-trained facial recognition algorithms to work out what part of the photo is a face, and then mapping the facial features.

This kind of map can easily be generated for any face — but that's not yet enough information for the computer to understand how to rebuild Mr Turnbull's face.

Next, the program needs to break down each face map into its component parts.

It does that by analysing every pixel in the photo: each one has a colour value.

If you take the colour values of all those pixels, you get a long sequence of numbers.

It then gives each pixel another value based on where it is in relation to every other pixel in the face.

It essentially maps the relationships between all those pixels — and repeats that process for all the pictures we provided.

So each picture of Mr Turnbull is now, in the eyes of the computer, a series of numbers made up of colour values and pixel positions.

This is where the machine learning (or AI) kicks in, identifying the pattern in all those numbers to figure out the essence of Mr Turnbull's face.

When we say machine learning, what is actually happening is a little less futuristic — it's maths.

Essentially computers are really good at doing maths, and machine learning takes advantage of this to get a computer to complete a series of calculations that would take a person way too long.

So, the Deepfakes program runs a series of different algorithms, each one attempting to recreate the face in the photos. Each time it runs an algorithm, it looks at what works and what doesn't — and then refines the process.

The way this process works is modelled on what we know about the human brain, so it's called an artificial neural network.

On the first pass through, the neural network tries to create an image of Mr Turnbull using its formula and the facial recognition points it generated earlier.

It doesn't do a very good job.

But this provides a baseline from which the neural network can tweak its mathematical formulas and try again. It tests how accurate the formula is by seeing how close its creation is to the sequence of numbers from a real photo.

It can use the difference between those figures to work out where it went wrong, change its maths, and try again … and again … and again.

This is the machine learning. It's refining its maths every time it tries to create Mr Turnbull's face, and eventually it gets much better.

For our model of Malcolm Turnbull, the neural network did just under 100,000 iterations — or about 8,000,000,000,000 calculations.

Anyone with a good understanding of maths could do one of those calculations, but to do all of them you'd be working for more than 12 million years. And yet it took our computer just over 10 hours.

OK, so now we have a good idea of what Malcolm Turnbull's face looks like, what next?

One thing researchers in this field do to test how well they are doing, is put the face back on the person to see how accurate it is.

So we gave Deepfakes some video of Malcolm Turnbull, and it used facial recognition to map out the points in each frame and placed the fake face over his real one.

It's hard to tell the difference, so to make it easier to see what's real and what's fake we've darkened the real footage:

Remember, that's not Malcolm Turnbull's face. His head and the background are real, but the face is an entirely artificial creation, almost like something you'd see in a Pixar movie.

So how does that compare to the original video? Take a look (the real Mr Turnbull is on the right).

So obviously this bit of free software isn't going to be the undoing of modern society, all we've managed to do is seamlessly replace Malcolm Turnbull's face … with Malcolm Turnbull's face.

But could you place Turnbull into a different scene?

It's trickier to do this convincingly, but it's certainly possible. The problem is, we don't want to create anything that might be even a slightly convincing fake.

So, here's a video where we've placed a re-creation of Malcolm Turnbull's current face onto a man who was standing on the steps of Parliament House in 1975 after The Dismissal of Gough Whitlam's government.

Again, it's far from perfect (Malcolm Turnbull was 21 at the time for starters) — but it provides a sense of what's possible. Imagine putting a politician's face into a compromising scene, with a bit of assistance from some video editing software we could go a long way to making it more believable.

And what about sound?

By now you've probably noticed that none of our videos have any sound. While similar research and technology is happening that can create a digital version of a person's voice in much the same way, there currently aren't any publicly available tools offer the same ability to use any audio clip to learn a person's voice.

A company called Lyrebird AI uses machine learning to offer a way to create your own digital voice, but has made the decision not to release a version that would allow you to copy anyone's voice.

It released audio of Barack Obama and Donald Trump to show what is possible, but in a statement explained why it is keeping this capability back.

"Imagine that we had decided not to release this technology at all. Others would develop it and who knows if their intentions would be as sincere as ours: they could, for example, only sell the technology to a specific company or an ill-intentioned organisation. By contrast, we are making the technology available to anyone and we are introducing it incrementally so that society can adapt to it, leverage its positive aspects for good, while preventing potentially negative applications."

And these concerns are not irrational — in 2016 Adobe previewed a program called VoCo, which offered this capability, but never released it.

And after rumours emerged that a tape recording of Donald Trump saying the 'N-word' existed, alt-right conspiracy theorist Alex Jones started suggesting that such a tape could be made using VoCo.

All anyone knows about VoCo comes from one YouTube video, and Adobe was upfront about saying they'd make sure that you could test if audio had been manipulated this way, but that didn't stop Alex Jones from using the possibility of its existence to spread misinformation.

So is deepfake news something I should be worried about now?

What can we learn from all this? Firstly it confirms that this technology is not at a point where we could realistically fake anything. The limitations of the program, where you can only swap faces, and the amount of time we had to spend learning about how to use to make deepfakes is far more difficult than spreading falsehoods though traditional means.

But that doesn't mean you should rest easy. Claire Wardle is an expert in verification of the vast swathes of information that the digital era creates every day.

For her, at the moment the perception of deepfakes has far more potential for damage than any efforts to create misinformation with fake videos.

"It's deepfakes in the context of our information environment that is the most terrifying," she said.

"The fear I have is we can't stop the march of technology, we can't stop deepfakes, but the reason deepfakes are so terrifying is because it will allow Donald Trump or anyone else to deny what they said.

"I'm not saying that we don't need to think about the implications of artificial intelligence in the creation of these videos, but my worry is that the way that we're talking about it. Part of the problem is that the more we talk about it, we talk about it being scary, it allows politicians to undermine journalism, to say 'If you do hear a recording, you can't trust it anyway because it's a deepfake'."

But she said the visual nature of a deepfake is problematic, because people were less critical of visual misinformation.

"As society we've always been much more trusting of photographs, but you don't have to manipulate a photo to frame it in a certain way," she said.

"That has always been an issue about imagery: we don't even have to manipulate it, and then Photoshop came along, and this feels to me like an extension of that.

"But I think in the ecosystem we're in now it's sitting in a pretty dangerous space, because around a whole host of different types of information and visuals, people are less trusting."

Avoiding the 'info-pocalypse'

So at the moment the existence of this technology is more dangerous than any actual fake video, but there is no reason to think this will always be the case.

Yes the free software we found on the internet can't recreate a whole scene, but that's a limitation of how it was built, not our ability to make a believable fake with it.

And the machine learning software that powers it is the same as what is used by cutting edge researchers around the world.

It's called TensorFlow, a product of Google's Brain team. It's used to work out how to make your computers go faster, your search results more accurate, and in a range of scientific breakthroughs.

It's also just one of a range of publicly available AI platforms — Microsoft, Amazon and Baidu have all released their own versions.

This democratisation of machine learning is already having profound impacts on the world we live in, but Centre for Social Media Responsibility chief technologist Aviv Ovadya fears we haven't seen its full impact in the news arena just yet.

He says that time will come if this sort of technology becomes more user friendly.

"The goal here is not to stop all fakery," he says. "It's to prevent our institutions from crossing over the threshold into 'so much failure that they can't function'. To avoid tipping into 'info-pocalypse'."

So if we can't stop the rise of deepfakes, what can we do?

According to Mr Ovadya, we need to address the problem at every stage: creation, distribution and consumption.

"We can't stop this technology from existing — but we can make it harder to create fakes for malicious ends. What if investors, engineers, and even app stores decided they didn't want enable deep fake technology?" he asks.

"Making it just a little harder to create a fake won't stop the Kremlin. It won't stop anyone with a computer who really wants to manipulate video. But it might help in Myanmar or any other country most people just have phones. And even small obstacles to usage can actually have significant impacts on what people do.

"We also need to make it easier for a witness to film a real video of corruption — and show that it's real. Camera and phone companies can help here. We may be able to make authenticity infrastructure that helps people prove where and when a recording was captured.

"Once a video has been created — either fake or real — what determines who sees it? Distribution platforms — this means social networks, video uploading sites, messaging systems, recommendation engines — and all of the ways these sorts of thing can be manipulated.

"This is all about the incentives and right now, in our world, sensationalism is rewarded because it attracts attention — it makes money, manipulates beliefs and drives people to act.

"A convincing fake video can be naturally viral. Wouldn't you click on video of Trump or Clinton taking a bribe? A platform like YouTube shouldn't stop people from being curious but it can try to ensure that it doesn't recommend sensationalism and forgery more than reality. And it can invest in systems to recognise and reward good actors over bad.

"This is where that forgery detection and authenticity infrastructure that I mentioned earlier can help. And also more traditional journalism and fact checking.

"After a video is distributed, then the audience either believes it — and acts on it — or they don't. Here we need to involve almost every aspect of society from actors to diplomats — and especially teachers.

For Claire Wardle, it comes down to building trust, something in short supply in the media at the moment.

"In order to build trust with audiences it requires really, really listening to audiences, really being part of communities, telling their stories, and being transparent in the storytelling process, really showing how the sausage is made," she said.

And she wants media organisations to help equip audience to do their own verification, but in an era where there are more demands for our attention than ever, do audiences really want that?

Credits

Reporter: Tim Leslie

Developer: Nathan Hoad

Designer: Ben Spraggon

Editor: Matthew Liddy

Technical advice: Professor Daniel Angus

Imagery credits

Topics: science-and-technology, internet-technology, media, social-media, internet-culture, information-and-communication, government-and-politics, australia

First posted