Overview

We partnered with other industry leaders and academic experts in September 2019 to create the Deepfake Detection Challenge (DFDC) in order to accelerate development of new ways to detect deepfake videos. In doing so, we created and shared a unique new dataset for the challenge consisting of more than 100,000 videos. The DFDC has enabled experts from around the world to come together, benchmark their deepfake detection models, try new approaches, and learn from each others’ work.

The DFDC dataset consists of two versions:

Preview dataset 5k videos Featuring two facial modification algorithms Associated research paper

Full dataset 124k videos Featuring eight facial modification algorithms Associated research paper



This full dataset was used by participants during a Kaggle competition to create new and better models to detect manipulated media. The dataset was created by Facebook with paid actors who entered into an agreement to the use and manipulation of their likenesses in our creation of the dataset.

We hope that by making this dataset available outside the challenge, the research community will continue to accelerate progress on detecting harmful manipulated media.

Facebook AI’s work in this space can be found in this blog post for more information.

If using this dataset, please cite the paper associated with the relevant dataset (preview/full):

@misc{DFDC2019Preview,

title={The Deepfake Detection Challenge (DFDC) Preview Dataset},

author={Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, Cristian Canton Ferrer},

year={2019},

eprint={1910.08854},

archivePrefix={arXiv},

primaryClass={cs.CV}}

}

@misc{DFDC2020,

title={The DeepFake Detection Challenge Dataset},

author={Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, Cristian Canton Ferrer},

year={2020},

eprint={2006.07397},

archivePrefix={arXiv},

primaryClass={cs.CV}

}