The 20th century broke open both the atom and the human genome. Physics deftly imposed mathematical order on the upwelling of particles. Now, in the 21st century, systems biology aims to fit equations to living matter, creating mathematical models that promise new insight into disease and cures. But, after a decade of effort and growth in computing power, models of cells and organs remain crude. Researchers are retreating from complexity towards simpler systems. And, perversely, ever-expanding data are making models more complicated instead of accurate. To an extent, systems biology, rather than climbing upwards to sparkling mathematical vistas, is stuck in a mire of its own deepening details.

Synthetic biology does away with systems biology's untidiness by focusing on individual parts, creating a tool set for engineering organisms unconstrained by biology as we know it, making the discipline more like software programming. But instead of modularity, synthetic biology often encounters messiness. What a particular “part” actually does depends on the rest of the system, so synthetic biology rediscovers the complexity it hoped to escape.

A dream deferred

Systems biology takes over where the sequencing of the human genome left off. Reading DNA, we once believed, would disclose the genes underlying disease. While hundreds of genes have been implicated in cancer, for instance, the idea that genes directly cause complex diseases like cancer, diabetes, or atherosclerosis is too simple. Systems biology seeks to understand how biology really works by looking beyond genes, aspiring to a universal understanding of biological organization and to models that precisely predict biological events—like disease.

Focused use of systems biology has been very successful, drawing information out of the huge bodies of data produced by DNA chips and proteomic gels. But our growing genomic knowledge and powerful computers have tempted a number of researchers to try to skip the analysis of data produced using biology, and attempt to produce a realistic cell entirely in silico.

These efforts began with a cell related to those that make up humans, but much simpler: yeast. The first yeast model appeared in 2001, created by the Institute for Systems Biology. In 2002, the Alpha Project began its trek toward omega, with yeast also as the first step. 2004 saw the unveiling of another model, YeastNet, which researchers expected would become more and more accurate "simply by continued addition of functional genomics data..."

Genomics data cooperated, propagating with exponential fervor. Data on proteins obligingly followed a similar path, according to a Moore’s law for proteomics. Consequently, a 2007 update to YeastNet encompassed 82 percent of yeast genes and more than 95 percent of the yeast proteome. But it showed no real gain in accuracy.

Edward Marcotte, who leads the effort at the University of Texas, said in March that his team had updated YeastNet. He claimed to see "a nice increase in predictive power over v. 2," however he has "yet to write it up or release it," suggesting that, even with improvements, the results might not be so prepossessing. YeastNet’s publication arc has tailed down, beginning from the commanding heights of Science, descending in version two to the egalitarian plain of PLos ONE and now apparently a file drawer at UT Austin. Similar difficulties beset the Alpha Project, which repeatedly scaled back its ambitions until it winked out of existence in 2008.

Premonitions of yeast’s unruliness came sooner in some quarters. In 2004, the Institute for Systems Biology turned to a much simpler species it believed to be "better suited for the initial phase" of systems biology. The new organism, a bacteria called H. salinarum, had just 2,400 genes, compared to over 6,000 in yeast. A group at the European Molecular Biology Lab switched in 2009 to the even smaller M. pneumoniae, which has a mere 687 genes. As "-omics" data grew, the complexity of organisms modeled by systems biology dropped.



Adapted from He X, Zhang J., "On the Growth of Scientific Knowledge: Yeast Biology as a Case Study", DOI: Retreat from complexity: data up, model organism complexity downAdapted from He X, Zhang J., "On the Growth of Scientific Knowledge: Yeast Biology as a Case Study", DOI: 10.1371/journal.pcbi.1000320

Running ? Hiding

But retreat from complexity is not escape. Looking at M. pneumoniae, scientists concluded that "there is no such a thing as a 'simple' bacterium." The tiny organism's modest genetic machinery proved baffling, causing a mismatch between what the genomic model predicted and the proteomic reality. Mum about its own inner workings, neither did M. pneumoniae disclose any fundamental principles of biology which researchers said "remain elusive "

The outpouring of data has failed to coalesce into a solid theoretical foundation from which to build. As Anne-Claude Gavin, senior author on the M. pneumoniae paper, said: "I believe what we still miss in the majority of the cases is a structuring frame on which to integrate or superimpose the large datasets gathered..."

Of course the search continues, but each iteration adds to the difficulties. For example, to better understand the swirling protein complexes seen through the window of proteomics, we could zoom in and supply any number of biophysical details, like how quickly interactions take place, for instance. But this adds layers to the model. And more complicated models are less likely to work or to reveal simple, elegant principles of biology.

Yet the cycle almost necessarily continues. As two researchers put it in the pages of Nature: "The inescapable reality in systems biology is that models will continue to grow in size, complexity, and scope."

A growth industry

There’s not really an upper limit on model size, either. Because living organisms progress through time which can be sliced to arbitrary thinness, data space is effectively infinite. Time presents serious difficulties for systems biology. To create a virtual physiologic human, the discipline would need to span 17 or more orders of magnitude, from the nanoseconds of molecular motion all the way up to the years and decades of human life spans.

The number of spatial scales too is daunting, from nanometers to meters, at least nine orders of magnitude. And as we’ve studied this enormous biological time-space in more detail, it’s produced a profusion of discoveries, the many new kinds of RNA, for instance. Estimates for the total number of molecular species in a human cell range as high as one million.

"There are so many unknowns that it seems we are condemned to spend many years collecting data before we can even start to think about modelling what is going on," as Mike Williamson at the University of Sheffield put it. In the meantime, concluded Williamson, "it is only reasonable to expect that the model can predict something that it was designed to predict "

That’s a rather large concession. Lee Hood, whose lab at Caltech invented the DNA sequencer, once envisioned predicting the behavior of a system "given any perturbation. Not just the ones you’ve seen before, which we’re really good at, right? But any perturbation." That was in 2003, shortly after Hood founded the Institute for Systems Biology.