Researchers from UC San Diego released a paper empirically proving that deepfake videos can be altered adversarially, causing certain deepfake detection systems to identify fake videos as real.

What is the premise of the research?

The research explores whether deepfake detection techniques could be fooled into classifying fake videos as real. Many of these deepfakes detection methods are based on deep neural networks (DNN) and function with a fairly high rate of success. However, once these detection systems are deployed “in the wild” bad actors will inevitably try to subvert them, likely using adversarial examples. In the case of a deepfake video, this involves creating tiny pixel modifications that, while imperceptible to the human eye, can dramatically change how a DNN based detection system processes the video. These adversarial deepfakes would look indistinguishable from a normal deepfake, but may trick a detector into verifying a fake video as real.

What were the research findings?

The researchers created adversarial examples for each face in a set of deepfake videos by applying a standard off the shelf algorithm and then placing the faces back in their original frames. Throughout a variety of different tests, these adversarial deepfakes were able to fool two published detection models with a high success rate. The researchers also demonstrated that it is possible to generate adversarial deepfakes that are robust to the kind of video compression widely applied by social media platforms when a video is uploaded. To combat these kinds of critical attacks, the authors recommend that future research into deepfake detection focuses on training techniques that are known to build robustness against these kinds of adversarial examples.