Human behaviour is a perennial challenge for tech Karen Bleier/AFP/Getty Images

We are difficult for computers to understand. Our actions are sufficiently unpredictable that computer vision systems, such as those used in driverless cars, can’t readily make sense of what we’re doing and predict our next moves.

Now fake people are helping them to understand real human behaviour. The idea is that videos and images of computer-generated bodies walking, dancing and doing cartwheels could help them learn what to look for.

“Recognising what’s going on in images is natural for humans. Getting computers to do the same requires a lot more effort,” says Javier Romero at the Max Planck Institute for Intelligent Systems in Tübingen, Germany. This, he says, is one of the biggest things holding back progress with driverless cars. Using synthetic images to train computers could give them more meaningful information about the human world.


At the moment, the best computer vision algorithms are trained using hundreds or thousands of images that have been painstakingly labelled to highlight important features. This is how they learn to distinguish an eye from an arm, for example, or a table from a chair. But there is a limit to how much data can be realistically labelled this way.

Ideally, every pixel in every frame of a video would be labelled. “But this would mean instead of creating thousands of annotations, people would have to label millions of things, and that’s just not possible,” says Gül Varol at École Normale Supérieure in Paris, France.

Automatic labelling

So Varol, Romero and their colleagues have generated thousands of videos of “synthetic humans” with realistic body shapes and movement. They walk, they run, they crouch, they dance. They can also move in less expected ways, but they’re always recognisably human – and because the videos are computer-generated, every frame is automatically labelled with all the important information.

The team created their synthetic humans using the 3D rendering software Blender, basing their work on existing human figure templates and motion data collected from real people to keep the results realistic.

The team then generated animations by randomly selecting a body shape and clothing, and setting the figure in different poses. The background, lighting and viewpoint were also randomly selected. In total, they generated more than 65,000 clips and 6.5 million frames.

With all this information, computer systems could learn to recognise patterns in how pixels change from one frame to the next, indicating how people are likely to move. This could help a driverless car tell if a person is walking close by or about to step into the road.

As the animations are in 3D, they could also teach systems to recognise depth – which could help a robot learn how to smoothly hand someone an object without accidentally punching them in the stomach.

The work will be presented at the Conference on Computer Vision and Pattern Recognition in July.

“With synthetic images you can create more unusual body shapes and actions, and you don’t have to label the data, so it’s very appealing,” says Mykhaylo Andriluka at Max Planck Institute for Informatics in Saarbrücken, Germany. He points out that other groups are using graphics from video games like Grand Theft Auto to improve computer vision systems, as these can offer a relatively lifelike simulation of the real world.

“There’s been huge advances in the realism of virtual images. We can use this to teach computers to see things,” says Romero.

Reference: arxiv.org/abs/1701.01370