A collection of new studies on the genomes of two model organisms has moved the frontiers of biology forward, and hints at methods that may someday make real the long-promised, as-yet-unfulfilled genomic revolution.

Published in Nature and Science, the studies go far beyond the level of genes that code for proteins, which represent just a small fraction of all genes and an even smaller fraction of all DNA in the genome.

Once thought to contain the blueprint of life, protein-coding genes were just the most visible ink in a parts list. The new studies both expand that list and begin to show how the parts are arranged – and how they interact.

"It's become very clear that DNA sequences are just a building block. They don't explain higher-order complexity," said Peter Park, a Harvard University bioinformaticist and co-author of one of the Nature studies. "People are sequencing all these genomes, but it doesn't actually tell us about the activities of the cell."

Park is a contributor to modENCODE, short for the model organism ENCyclopedia of DNA Elements, a massive international collaboration of dozens of institutions and hundreds of researchers. They study an alphabet soup of transcription factors, messengers, regulators and other types of DNA that interact with protein-coding genes to sustain the processes of life.

It's an effort that few people thought necessary a decade ago, when the Human Genome Project's near-completion was marked by a White House ceremony where President Clinton announced that "it will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases," and that "humankind is on the verge of gaining immense, new power to heal."

That may very well still happen, but not on the timetable that many scientists expected. With some exceptions, such as breast-cancer-susceptibility mutations and single-gene conditions like Huntington's disease and Marfan syndrome, identifiable genetic variation has done relatively little to explain disease or development. Promising pathways and mechanisms have been flagged and are now being explored, but understanding is slow in coming.

Over the past two years, a series of high-profile genome-wide association tests (comparisons of variation at genomic "hotspots" in thousands of people) suggested new pathways. But they didn't provide anticipated explanations for disease, and the limitations of standard genomics entered the scientific mainstream. Discussions of "missing heritability," or the roughly 95 percent of disease risk that's heritable to the naked eye but can't be tagged in a sequencer, appeared in the New England Journal of Medicine and Nature.

All this represented not a failure, but a dawning realization of just how extraordinarily complicated each genome is. As the process of learning builds on the Human Genome Project's early steps, researchers are taking fine-grained looks at each genome's full DNA and chemical components, then trying to understand how all these work together at different scales, from molecules to cells to whole organisms.

"The goal of modENCODE is to identify all the functional elements in the genome, and to understand what the genome is doing, which is the next step beyond knowing the sequence," said Brenton Graveley, a University of Connecticut development biologist.

In one of the Nature papers, Graveley and dozens of other researchers used new DNA-sequencing techniques to take a base-by-base look at the fruit fly genome, hoping to identify pieces missed in earlier studies. (He compared earlier examinations to "going into a grocery store and not thinking bananas were a fruit, because you the only fruit you know are apples.")

They identified 2,000 previously unknown genes, which now account for one-eighth of the fruit fly's genome. Beyond that, they identified more than 100,000 new elements, or molecules that aren't genes but may still have function in the genome. In fruit flies, about 40 percent of the genome fits this description. In humans, it's closer to two-thirds.

The second Nature study looked at non-DNA chemical "information" on the genome, which is made from chromatin: DNA wrapped around proteins called histones, and combined with still more proteins, all of which affect how the DNA works.

This approach is known from epigenetics (epi means outside) but the new examination was unprecedentedly thorough, looking at dozens of epigenetic factors, at every single DNA base. The resulting "chromatin landscape" revealed regions that once seemed dead, but now appear involved in gene regulation. It's also just a beginning.

"At each location on the sequence, we can measure all these different attributes of chromatin. There are hundreds of attributes, and we only now know what a couple of dozen do," said Park. "How these marks translate into gene regulation is important. Right now we just see correlations. We don't necessarily understand the mechanisms behind this."

Such mechanisms, and how genetic elements and regulatory layers interact as cells function and organisms develop, is the province of the two Science papers. These provide network-level analyses, or "wiring diagrams," of the fruit fly and roundworm, said Yale University bioinformaticist Mark Gerstein, co-author of the roundworm paper.

Gerstein's specialty is network structure. In other research he's compared the characteristics of gene networks between organisms, and even between bacteria and computer operating systems. That work has hinted at the importance of network structure to producing wildly different organisms from common genetic components. (Humans and mice famously share almost the same set of genes.)

"Previously, people had looked at transcription factor networks in E. coli and yeast, but nobody had ever looked at this scale of network in an animal," said Gerstein. "You can start to see patterns: a microRNA that regulates a transcription factor, transcription factor that regulates microRNA, a feedback loop. We observe many of these."

In a commentary accompanying the Science papers, University of Edinburgh geneticist Mark Blaxter likened modENCODE to the Large Hadron Collider, investigating the nature of the genome's "dark matter."

"It is not currently possible to compute an organism from its genome," he wrote, but the modENCODE work will "bring this goal closer."

Despite the volume of the studies, joined by 17 more studies released in tandem in the Journal of Genome Research, the modENCODE work is just beginning. "We're looking at a vast amount of data. We're just scratching the surface," said Park. Future studies will look in greater detail at different tissue types and stages of development.

The modENCODE work is also considered a warm-up for a similar project in humans, called ENCODE. It should generate comparable findings in the next two years.

"There remains much to be discovered to be discovered even about organisms that are as exhaustively studied as the fruit fly," said Graveley. "In organisms like humans, there are undoubtedly many, many more mysteries to be uncovered."

Top image: A visualization of physical chromosome arrangement (left) and histone modification readings (right) at a given DNA base location (the green dot).

Peter Park/Harvard University.

See Also:

Citations: "Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project." By Mark B. Gerstein, Robert H. Waterston et al. Science, Vol. 330 No. 6012, Dec. 24, 2010

"Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE." The modENCODE Consortium, Sushmita Roy, Manolis Kelis et al. Science, Vol. 330 No. 6012, Dec. 24, 2010.

"Revealing the Dark Matter of the Genome." By Mark Blaxter. Science, Vol. 330 No. 6012, Dec. 24, 2010.

"Comprehensive analysis of the chromatin landscape in Drosophila melanogaster." By Peter Kharchenko, Peter Park et al. Nature, Vol. 468 No. 7327, Dec. 23, 2010.

"The developmental transcriptome of Drosophila melanogaster." By Brenton Gravely, Susan Celniker et al. Nature, Vol. 468 No. 7327, Dec. 23, 2010.