Digital Domain Senior Director of Software R&D Doug Roble in CG form – as part of a test of the studio’s facial-capture and CG humans pipeline.

MACHINE LEARNING IN ACTION

More on Pinscreen’s own implementation of these kinds of networks is below, but first a look at one of the most front-and-center examples of where machine learning has been used in VFX in recent times – Digital Domain’s Thanos in Avengers: Infinity War. Here, the visual effects studio used a type of machine learning to transform Josh Brolin’s face – captured with head-cam cameras looking onto facial tracking markers – into the film’s lead character. This involved taking advantage of facial-capture training data.

“We already knew that we could build a system that will take motion-capture data and produce a result,” states Digital Domain’s Head of Digital Humans, Darren Hendler. “With machine learning, we can take the original system we built and now feed in corrections. All future results will then be corrected in the desired manner. This is a more rudimentary version of machine learning, but really shows great promise in speeding up the work and improving the quality.”

Since their work on Infinity War, Digital Domain has furthered its deep learning techniques to create an entirely new facial-capture system. “Now,” says Senior Director of Software R&D, Doug Roble, “we can take a single image and in real time re-create a high-resolution version of any actor to a similar quality as our final result in the film. How this works sometimes seems like pure magic, but like all machine learning, it can be very temperamental and unpredictable, which makes the solution finding particularly rewarding.”

At Pinscreen, one of the goals of the company has been to generate photorealistic and animatable 3D avatars complete with accurate facial and hair detail from single mobile phone images of users. They’ve been relying on deep learning approaches to make that possible, based on extensive ‘semi-supervised’ or ‘unsupervised’ training data. The data, coupled with deep neural networks, is used to help predict the correct results for what a 3D avatar should look like – for example, to work out what facial expression should be displayed.

Pinscreen’s results are sometimes compared to ‘deep fake’ face-swapping videos, which have gained popularity by using deep learning techniques to re-animate a famous person’s face to make them say things they never actually said or appear where they never actually appeared. Li notes that “while the deep fakes code still requires a large amount of training data, i.e. video footage of a person, to create a convincing model for face swapping, we have shown recently at Pinscreen that the paGAN (photorealistic avatar GAN) technology only needs a single input picture.”

Research that Li is part of at the University of Southern California is looking at ways of generating photorealistic and fully-clothed production-level 3D avatars without any human intervention, and how to model general 3D objects using deep models. “In the long term,” he says, “I believe that we will fully democratize the ability to create complex 3D content, and anyone can capture and share their stories immersively, just like we do with video nowadays.”