Artificial intelligence is getting better and better at creating generative images, and now scientists are working at generative video. The idea is that simply by typing out a phrase, artificial intelligence could create a video of that scene. Scientists at Duke and Princeton have created a working model.

’Sailing on snow,’ enlarged for greater visibility University of Toronto

"Video generation is intimately related to video prediction," the authors say in their new paper. Video prediction, in which A.I attempts to predict what actions come next in a video, has long been a goal for researchers. Visual representations, especially moving ones, present a wider spectrum of human emotion and actions than simply identifying things like the correct type of truck.

The researchers used a variety of easily defined activities for their videos, as opposed to nuanced interactions like comedy. These activities mainly focused on sports, like biking in snow’, ‘playing hockey’, ‘jogging’, ‘playing soccer’, ‘playing football’, ‘kite surfing’, ‘playing golf’, ‘swimming’, ‘sailing’ and ‘water skiing.’ With videos taken from Google's Kinetics Human Action Video Dataset, they had a wide array of options.

The A.I then studies these clips and learns to identify each motion. Using millions of network connections, it would refine itself constantly.

With a dataset in place, the researchers used a two-step process to create the generative video. The first was to "generate the 'gist'; of the video from the input text, where the gist is an image that gives the background color and object layout of the desired video." The gist of the video is, for all intents and purposes, a fuzzy blur resembling the actions the user desires to see.



Then comes the second stage, the "discriminator." For a video of "biking in snow," the discriminator judges the gist's work. It looks at the gist's creation of a video of someone biking in the snow compared to a real one and demands better and better work. Eventually, the gist is able to create something passable.

The work is still in its earliest stages, only capable of creating videos that are 32 frames long, lasting about 1 second and that are the size of a postage stamp at 64 by 64 pixels. Human bodies, with their unpredictable nature, give the system its most problems, says Duke scientist and lead author on the paper Yitong Li in the paper's conclusion. To get a better grasp on humans, they're looking at human skeletal models.

Beyond the obvious nightmare of fake news generation, there could be actual use for generative video. These mainly lie in training other A.I, like helping train self-driving cars from getting into crashes by showing them exactly what each car model on the road looks like.

It's a long way from either the positives or the negatives on either end, and even longer until a computer model can accurately predict, let alone generate, an episode of The Office.

Source: Science

This content is created and maintained by a third party, and imported onto this page to help users provide their email addresses. You may be able to find more information about this and similar content at piano.io