Mark Johnson wants to beat the United States Department of Agriculture at its own game: predicting yields of America's crops. The USDA puts boots on the ground, deploying hundreds of workers to survey thousands of farms a month ahead of the October corn harvest, America's biggest crop. Johnson's startup, Descartes Labs, has just 20 employees, and they never leave the office in Los Alamos, New Mexico. Instead, Descartes relies on 4 petabytes of satellite imaging data and a machine learning algorithm to figure out how healthy the corn crop is from space.

Corn yield prediction is big business in the US. Billions of dollars are at stake along the ag supply chain each year as corn starts to come out of the ground in August. Grain elevator operators, ethanol producers, commodities traders, hedge funds, insurance companies, and even the farmers growing the corn will all look to the USDA's August crop report being released August 12th to try and understand how the supply side of the corn market will behave.

Descartes says it can consistently out-predict the USDA's corn estimates

Descartes, which launched in 2014, began releasing corn yield estimates ahead of the USDA's August crop report last year. Johnson says its model has consistently out-predicted the USDA's estimates at the national level at every point in the growing season. It beat the accuracy of the USDA's 2015 August predictions by a percentage point, according to numbers provided by Descartes. Now, Johnson says their algorithms have gotten even more precise, with a 2.5 percent average margin of error when run through historical backtests.

Johnson argues that the depth and frequency of data his company is able to analyze is a game-changer in crop prediction. "What's great about our techniques is that traditionally you have to talk to tons of farmers in the US to get a USDA-style number," Johnson tells The Verge. "With machine learning techniques, with us, we look at tons of pixels from satellites, and that tells us what's growing."

shrinking sensors and cheap cloud computing

It's a familiar narrative at the moment — companies across industries are tapping into large data sets accumulated by the proliferation of shrinking sensors and processing them using cheap cloud computing services from companies like Google and Amazon. The Weather Company, for example, just announced its hyper-local weather forecaster called Deep Thunder, which uses machine learning to crunch through historical weather reports to predict future conditions.

But big data and machine learning are just one side of the equation with Descartes.

The rise in popularity of nanosatellites — small satellites roughly the size of a shoebox — over the past five years has opened up a broad realm of possibilities for Descartes and startups like it.

In the past, if a company wanted satellite imagery data, it would turn to US government-run satellite programs like Landsat or MODIS, which image the entire globe at 20- to 30-meter resolutions roughly once a week. Now new nanosatellite constellations, like the one run by satellite imaging startup Planet, are taking snapshots of the entire globe at 3- to 5-meter resolutions every day.

The amount of imaging data being collected right now is enormous. "One way to think about it," says Johnson, "is that it took Landsat over 40 years to collect under a petabyte of data with 7 satellites. Planet will easily produce over a petabyte a year."

Companies have been increasingly using this data to analyze global trends. Satellite imagery provider DigitalGlobe helped Facebook create a map of 2 billion disconnected people across the world earlier this year. Orbital Insights tracks industrial development in China and monitors the parking lots of over 50 US retailers from space to gain insight into store traffic.

Johnson and his partners chose to track agriculture for a few reasons. First, food scarcity and global climate change are pressing issues. Second, year's-worth of data sets already existed from images taken by Landsat and MODIS that could be used to train their machine learning models. Third, corn grows slowly, and farmers can benefit from observing it with extra-spectral bands like infrared, which both older and newer generations of imaging satellites record.

Measuring chlorophyll from space

Finally, and most importantly, Johnson says, it was a hard problem. "It's not like a satellite looking at all the Walmart parking lots and picking out the cars," says Johnson. "That's a problem of automation; it's something a human could do. What we do is an automation of what humans can't do."

Descartes uses spectral information, not visible to the human eye to measure chlorophyll levels. "There are well-established methods for getting a proxy for crop health from chlorophyll levels," says Josh Alban, Planet's vice president of business development. Planet, which is formal partners with Descartes, provides it with data to calculate yield estimates and helps it build custom products for corporate clients. Johnson wouldn't provide the names of any of his company's clients, saying that people on the supply side of the ag business are very tight-lipped about where they get they get data. Sources familiar with the industry weren't surprised.

Descartes says it analyzes satellite data of every single farm in the US on a daily basis (provided there is no cloud cover) and updates its corn yield prediction every two days. The USDA only updates its forecasts once a month. While the USDA provides country and statewide predictions, Descartes delivers both in addition to county-level predictions, which the USDA only provides at the end of the season, when those numbers matter less.

Damien Lepoutre, founder of Geosys, a global crop and analytics company that has operated for nearly three decades, says the ability to deliver local estimates is paramount. "One thing I've seen over the past 35 years is the complexities of agriculture," Lepoutre said. "Agriculture is always local. You don't have the two same soils in different places. Every time is a bit different."

The hard part is building your neural network

Geosys also analyzes spectral satellite data, as do other large players in the industry. But Lepoutre thinks recent advances in technology, including the specific focus on machine learning, give startups a better chance at surviving than they had in the past.

It all comes down to a company's model, says Chris Curran, chief technologist at PricewaterhouseCoopers. "The hard part is the training of your algorithm, the building of your neural network to actually create value out of your data."

"With better satellites, better access to data, and more powerful algorithms, our models will continue to get better," Johnson says. And as Descartes accomplishes that, Johnson says they'll begin moving on to other crops (Descartes has already begun tracking soybeans) and other regions such as Brazil and Argentina, the Black Sea region, China, and the EU.

After that, Johnson says his company has ambitions to better understand the planet as a living organism. Descartes, he says, aims to "understand our natural resources, understanding how those resources move around, and then how we as humans change the planet."