Dan Zigmond | Vice president of data | Hampton Creek Louise Pomeroy

Modern data science started at places like Google, Amazon, LinkedIn, and Facebook. Yes, mathematicians have practiced statistical analysis since the early 20th century, looking for new truths in collections of data. But in recent years, the Googles and the Amazons have taken things to a level no one previously imagined. In running global search engines, shopping sites, social networks, and video services, these companies collect unprecedented amounts of data—almost inadvertently—and over the past several years they’ve developed new software, algorithms, and techniques capable of rapidly analyzing all this digital information.

When I was at Google, that’s what I did. I analyzed data. At YouTube, this helped us discover that the site would generate the most money if we showed ads that users could skip after a few seconds—a basic formula still used to make billions of dollars. Now, at a San Francisco startup called Hampton Creek, I’m applying the same techniques in an effort to create new kinds of food. Yes, food.

Our project is part of a new push in the world of data science. Drawing on the basic ideas that helped bootstrap fundamental internet service like YouTube and Facebook, Uber is using data to optimize transportation. Airbnb is using it to streamline lodging. The big pharmaceutical companies are using it to find new drugs. And others believe the latest techniques can help diagnose disease. I believe it can change food.

At Hampton Creek, we’ve already built a reasonable facsimile of the chicken egg, using proteins from the Canadian yellow pea and an American variety of sorghum, and with this egg we’ve made a better mayo and a better cookie. The idea is to create new food sources for an expanding global population—sources that are cheaper, safer, and healthier than what we have today. That may appear to have nothing in common with YouTube and Google Maps, but the same data science applies.

caption

Together with a small team of other scientists, I’m building a massive database of all known plant proteins—one day, it could span 18 billion of them. Thanks to many of the same software tools and techniques I had access to at Google, we can model the creation of new foods. Our biologists have already cataloged and analyzed about 4,000 plant proteins, running about 30 biological tests on each of them.

By expanding this catalog of proteins and folding in data describing how some of them interact, we can predict how others will interact, identify combinations likely to produce enjoyable foods, and pinpoint what will produce the right tastes, textures, and colors. Then we can focus our lab efforts accordingly.

Others have worked for years to create new foods from plants. But with help from the data, we aim to do this in a far more exhaustive way, examining every viable protein combination on Earth. Eighteen billion proteins is an enormous number to go through, but we may not have to examine each one. In analyzing the data, we can learn what types of combinations work and what types don’t. Data science can help hone our data science.

Our big data project is still in the early stages, but it’s already paying dividends. It’s leading us to new species of plants and new combinations of proteins. Data science may have started with Google and Amazon. But it’s moving everywhere.

Check out the full Next List here.