Inside a squat building on San Francisco's 10th Street, packed into a space that looks a lot like a high school chem lab, Hampton Creek is redesigning the food you eat. Mixing and matching proteins found in the world's plants, the tiny startup already has created a reasonable facsimile of the chicken egg—an imitation of the morning staple that's significantly cheaper, safer, and possibly healthier than the real thing—and now it's working to overhaul other foods in much the same way.

At the back of the room, spread across the long stainless steel science desks, among the centrifuges, scales, bottles, and beakers, biochemists systematically extract proteins from plants like the Canadian yellow pea to analyze their makeup and behavior. Beside them, food scientists combine these proteins in new ways, mixing them with other natural substances to create something that looks, feels, and tastes like the foods we know today. In the next row over, chefs—including Chris Jones and Ben Roche, recruited from Chicago's celebrated gastromolecular eatery, Moto—strive to turn these creations into something you could serve to your family: an omelet or some french toast or a chocolate chip cookie.

But if you walk up a set of stairs at the front of the building, ducking under a sign displaying a high-minded quote from Buckminster Fuller on the nature of change, you'll find a different kind of scientist. There, seated at a row of desktop computers with flat-panel displays, a team of recently hired mathematicians is building an online database that one day could catalog the behavior of practically every plant protein on earth—a collection of digital information that could allow Hampton Creek to model the creation of new foods using computer software.

Led by Dan Zigmond—who previously served as chief data scientist for YouTube, then Google Maps—this ambitious project aims to accelerate the work of all the biochemists, food scientists, and chefs on the first floor, providing a computer-generated shortcut to what Hampton Creek sees as the future of food. "We're looking at the whole process," Zigmond says of his data team, "trying to figure out what it all means and make better predictions about what is going to happen next."

Dan Zigmond. Josh Valcarcel/WIRED

The project highlights a movement, spreading through many industries, that seeks to supercharge research and development using the kind of data analysis and manipulation pioneered in the world of computer science, particularly at places like Google and Facebook. Several projects already are using such techniques to feed the development of new industrial materials and medicines. Others hope the latest data analytics and machine learning techniques can help diagnosis disease. "This kind of approach is going to allow a whole new type of scientific experimentation," says Jeremy Howard, who as the president of Kaggle once oversaw the leading online community of data scientists and is now applying tricks of the data trade to healthcare as the founder of Enlitic.

Zigmond's project is the first major effort to apply "big data" to the development of food, and though it's only just getting started—with some experts questioning how effective it will be—it could spur additional research in the field. The company may license its database to others, and Hampton Creek founder and CEO Josh Tetrick says it may even open source the data, so to speak, freely sharing it with everyone. "We'll see," says Tetrick, a former college football linebacker who founded Hampton Creek after working on economic and social campaigns in Liberia and Kenya. "That would be in line with who we are as a company."

The 18 Billion Protein Problem

Backed by funding from Microsoft founder Bill Gates and Li Ka-Shing, perhaps the richest man in Asia, Hampton Creek isn't out to genetically modify your food. Instead, the 63-person startup wants to reconstruct it using what nature already has given us. "There are other companies using synthetic biology and genetic engineering to create whole new food ingredients," Zigmond says. "We are exploring the vast world of plants to discover natural compounds that can revolutionize food."

Like Zigmond, Tetrick believes this kind of work can reinvent our food supply chain and ultimately make us healthier. He was inspired to found the company in part because his father ate so poorly. "Eggs are just one place to start," he says. "There's nothing wrong with the chicken egg necessarily. It's the system that surrounds most of them. They use a lot of land, a lot of water, and they promote issues like Avian flu." The aim is to replace such a system with something that not only promotes good health, but is also less complicated and less expensive.

That begins by examining the behavior of plant proteins at the molecular level and how they interact to create not only certain tastes but textures and behaviors—whether they can duplicate, say, how an egg behaves when you whip it or how it browns when cooked in a pan. As Gregory Ziegler, a professor of food science at Penn State University, notes, others have worked on somewhat similar efforts for years. But Hampton Creek is taking a far more expansive approach. "We're trying to be more comprehensive, more rigorous, more systematic," Zigmond says. "No one has used data in this way before."

Inside the Hampton Creek lab, a scientist screens plant proteins. Josh Valcarcel/WIRED

In creating its egg recipe—already used in mayonnaise and cookie dough the company sells through major outlets like Whole Foods—Hampton Creek scientists have cataloged and closely analyzed about 4,000 plant proteins, running about 30 assays (a kind of biochemical test) to measure things like molecular weight, pH, and how they dissolve in water. They've also recorded what happens when many of these proteins are combined, mixed together "like you were baking a cake." This is just what needed to happen to settle on the egg recipe. But now, Zigmond and his team can use this data to explore ways of reproducing other foods. Because they've already recorded how certain proteins behave and interact, they can model, with software, what would happen with new combinations of proteins.

"We can make predictions," Zigmond explains. "These predictions might not be perfect, but they can lead us in the right direction." They could provide, say, a shortlist of 100 compounds that seem suited to redesigning how we make cakes. "It may not be that all 100 will work, but it's a lot easier to go back and look at those 100 rather than all 4,000." Then, as Zigmond and team expand their database, they can expand the scope of these models. As more and more proteins are added to the database, their analysis can become more accurate.

The team could potentially expand the databases to all known plant proteins (there are about 18 billion). But as explained by Jason Ernst, who runs a computational biology lab at UCLA, that's an enormously expensive proposition, and Zigmond agrees. So, his data scientists will look for ways of homing in on subsets of this vast molecular universe. "Our hope is that we can guide our search, so that we don't have to look at every single protein," Zigmond says. "That's really the job of my team in all this: to make the laboratory more efficient by focusing our attention where it's most likely to yield results."

Artificial Intelligence Does Food

Initially, Zigmond and his team will model protein interactions on individual machines, using tools like the R programming language (a common means of crunching data) and machine learning algorithms much like those that recommend products on Amazon.com. As the database expands, they plan to arrange for much larger and more complex models that run across enormous clusters of computer servers, using the sort of sweeping data-analysis software systems employed by the likes of Google. "Even as we start to get into the tens and hundreds of thousands and millions of proteins," Zigmond says, "it starts to be more than you can handle with traditional database techniques."

In particular, Zigmond is exploring the use of deep learning, a form of artificial intelligence that goes beyond ordinary machine learning. Google is using deep learning to drive the speech recognition system in Android phones. Microsoft is using it to translate Skype calls from one language to another. Zigmond believes it can help model the creation of new foods.

Hampton Creek's first product, Just Mayo, is now available at Whole Foods. Josh Valcarcel/WIRED

With his startup Enlitic, Jeremy Howard is doing something similar, using deep learning as a way of diagnosing disease, and the promise of this technology is that it could be applied to a wide range of other tasks, both on the internet and off. Howard, as steeped in the ways of modern data science as anyone, calls the Hampton Creek project "a very big deal," seeing it as another step in the continued evolution of the big data movement.

But Ziegler, the Penn State food scientist, is quick to say the difficulties facing this project shouldn't be underestimated. Trying to physically redesign food is hard enough—when Roche cooked us an omelet at Hampton Creek, it came close to the feel and taste of a real egg without actually matching it—and modeling this kind of thing with software may be even harder. "The functionality of proteins depends not just on their chemical composition but also their physical structure, and I'm not sure that we know enough about what the desired compositions and structures are," says Ziegler. "I don't know that we're quite at the stage of being able to do the same level of computation predictions you can do for electronics materials or other simpler materials." It may even be easier, he says, to model medicines and predict their behavior.

Zigmond agrees, up to a point. "It's certainly harder in some ways, but it's certainly easier in others," he says. "With pharmaceuticals, you have to worry about interaction with all these different systems in the body and side effects. But with food, you're using this stuff in small enough doses that you're not expecting it to have effects on the body and, in general, it doesn't. We don't have to simulate the heart and the brain and all different kinds of cells."

In the end, he acknowledges that the challenges are enormous. But that's why he's doing it. It's an opportunity to significantly change not only the way we use data, but how we manage the world's food supply and what we ultimately put into our bodies. As that Fuller quote says, at the bottom of the stairs: "You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete." What goes unsaid is that building a new model is almost as difficult.