Deepfakes and synthetic media are some of the most feared things in journalism today. Indeed, it is very worrisome that people would base their reporting on false information created by AI. In this article, we’ll explain some of the methods being used to create synthetic media, and wonder whether there are any ways in which it could be used for good. This guide is intended for those without a background in artificial intelligence.

What is synthetic media?

AI-based models can now produce and manipulate audiovisuals with an extremely realistic outcome. The result of this process is a new category of images, text, audio, videos, and data generated by algorithms called synthetic media. It is possible to generate faces and places that don’t exist and even create a digital voice avatar that mimics human speech.

“Do as I do” motion transfer, which is the transfer of body movements from a person in a source video to a person in a target video is a type of synthetic media:

Image: The Wall Street Journal

How is synthetic media created?

The creation of synthetic media happens through generative artificial intelligence. The three most common types of this are Generative Adversarial Networks (GAN), Variational Autoencoders, and Recurrent Neural Networks.

Generative Adversarial Networks use two neural networks (a neural network is a computing system that can predict and model complex relationships and patterns) that compete against each other. The first network — the generator — creates new content based on a dataset. The second network — the discriminator — assesses whether the content is fake or real. As the discriminator identifies the content as fake, the generator refines its creations. Over time the generator becomes better at creating content (generally images) that seems real.

Image credit: freeCodeCamp

Variational autoencoders, however, are most commonly used when making digital artwork or video. In this method, an encoder (a neural network) takes an input and converts it to a compressed representation. Then a decoder (another neural network) reconstructs the content. The decoder includes probability modeling that identifies likely differences between the two so it can reconstruct elements that would otherwise get lost through the encoding-decoding process.

Diagram for variational autoencoders. Image credit: keras.io

A third common procedure, named “recurrent neural networks,” is designed to recognize characteristics and patterns among a dataset to predict the most likely next scenario. By recognizing the structure on a large set of text, the algorithm can predict the next word in a sentence. This is how autocomplete features work and it’s typically the methodology used in text generation.