Big Data’s Fatal Flaw, and How to Fix It

Algorithms are what they eat.

Big data is the buzzword du jour. Sophisticated analytics yield more and deeper insights into countless industries, etc. Proselytizers promise unparalleled, software-enabled acumen. But they’re missing one crucial detail. Algorithms are what they eat.

Netflix isn’t just another movie studio. They’re planning to spend nearly $5B on original programming this year, and are aggressively taking on media incumbents. But Netflix has more than a war chest, they have a secret weapon. Because the entire customer experience happens on their platform, they track every click, view, rating, and action that users take. They correlate those data against every factor imaginable: genre, actors, release date, format, social validation, time, location, etc. With this cheat sheet in their back pockets, they can outspend and outsmart the competition at the same time.

Peter Thiel cofounded Palantir to give similar insights to a very different type of customer. In-Q-Tel, the venture capital arm of the Central Intelligence Agency, was an early investor. Now, Palantir’s client list includes many US and international government agencies and Fortune 500 companies. Palantir aggregates customers’ structured and unstructured datasets, and then applies data science and statistical techniques to try to find signal in the noise. Netflix might appreciate that the company’s name was inspired by the “seeing stones” in J.R.R. Tolkien’s The Lord of the Rings.

But the big data crystal ball has a potentially fatal flaw. Netflix can rest easy because their data is digital by definition, all the activity they’re tracking happens within a contained online system. But the numbers that Palantir crunches often come from a different source: the real world. And that means they’re intermediated by something, usually a person or a sensor. If the reporting is inaccurate or incomplete, no amount of data science magic can restore it. Garbage in, garbage out. Even worse, it’s not always apparent when there’s a problem, so false findings are taken at face value.

But a new kind of company is trying to bridge the gap. GroundMetrics, a survey and monitoring company based in San Diego, is a great example. GroundMetrics has developed a new kind of electromagnetic sensor system based on fundamentally new physics. Almost like an MRI for geology, GroundMetrics images the subsurface to identify oil, water, minerals, and other deposits for a variety of clients including the world’s largest energy companies. Because GroundMetrics vertically integrates everything from building sensors to writing the complex geophysical code required for backend analytics, they can optimize for what their clients actually care about: ground truth. If something goes wrong, they can work from the digital universe all the way back to the reservoir. By making sensors and software, they control data capture as well as analytics. That’s why I invested in them.

Machine learning and other techniques have empowered companies to pursue big data as a kind of Holy Grail. But even the most sophisticated statistical tools rely on raw material. If we want to make good on the promise of analytics, we must go to the source. Only companies that understand the real world as well as the software they use to analyze it can give us all what we really want: results.