Berlin startup Spil.ly had a problem last spring. The company was developing an augmented-reality app akin to a full-body version of Snapchat’s selfie filters—hold up your phone and see your friends’ bodies transformed with special effects like fur or flames. To make it work, Spil.ly needed to train machine-learning algorithms to closely track human bodies in video. But the scrappy startup didn’t have the resources to collect the tens or hundreds of thousands of hand-labeled images typically needed to teach algorithms in such projects.

“It’s really hard being a startup in AI, we couldn’t afford to pay for that much data,” says CTO Max Schneider.

His solution? Fabricate the data.

Spil.ly’s engineers began creating their own labeled images to train the algorithms, by adapting techniques used to make movie and videogame graphics. Roughly a year later, the company has roughly 10 million images made by pasting digital humans it calls simulants into of photos of real-world scenes. They look weird, but they work. Think of it as putting the artificial in artificial intelligence.

“The models we train on purely synthetic data are pretty much equivalent to models we train on actual data,” says Adam Schuster, an engineer at Spil.ly. In a demo, a virtual monkey appears on a table viewed through an iPhone’s camera, jumps to the ground, and squirts paint onto the clothes of a real person standing nearby.

Berlin startup Spil.ly used images like this to create augmented reality software that recognizes people in video. Figure by Viorama GmbH; Cat by Mike Estes

Fake it ‘til you make it has long been a motto of startups trying to survive in markets stalked by larger competitors. It has led some companies, like blood-test “innovator” Theranos, into trouble. In the world of machine learning, however, spoofing training data is becoming a legitimate strategy to jumpstart projects when cash or real training data is short. If data is the new oil, this is like brewing biodiesel in your backyard.

The phony data movement could accelerate the use of artificial intelligence in new areas of life and business. Machine-learning algorithms are inflexible compared to human intelligence, and applying them to a new problem generally requires new training data specific to that situation. Neuromation, a startup based in Tallinn, Estonia, is churning out images containing simulated pigs as part of work for a client that wants to use cameras to track the growth of livestock. Apple, Google, and Microsoft have all published research papers noting the convenience of using synthetic training data.