The team used two neural networks called Pose2Pose and Pose2Frame. First, a video is fed into a Pose2Pose neural network designed for specific types of actions like dancing, tennis or fencing. The system then figures out where the person is compared to the background, and isolates them and their poses. Then, Pose2Frame takes the person, along with their shadow and any objects they're holding, and inserts them into a new scene with minimal artifacts. You can then control their movement, based on poses from the video, using a joystick or keyboard.

It only took a few short videos of each activity -- fencing, dancing and tennis -- to train the system. It was able to filter out other people and compensate for different camera angles. The research resembles Adobe's "content-aware fill" that also uses AI to remove elements from video, like tourists or garbage cans. Other companies, like NVIDIA, have also built AI that can transform real-life video into virtual landscapes suitable for games.

The motion is a bit screwy, with the characters looking like they're playing on ice, a problem in 3D animation known as "foot slide." On top of that, the range of motion is a bit limited. However, they do appear fairly realistic against the backgrounds compared to previous efforts at character extraction. It's still early days for the research, so hopefully the team can solve the motion issues.

Facebook's Vid2Game synthesis could make gaming more personal, letting you insert your own character, or favorite YouTube personality into games. "[It] addresses a computational problem not previously fully met, together paving the way for the generation of video games with realistic graphics," the team wrote. "In addition, controllable characters extracted from YouTube-like videos can find their place in the virtual worlds and augmented realities."