



Is there treasure in the DNA’s so-called “junk” pile? Well, as the first half of a popular saying goes, money talks. The National Institutes of Health (NIH) just funded five centers to explore what the “dark matter genome” (the non-protein-coding part) is doing. Two of the centers will be at the University of California, San Francisco, which describes the new project:

“The Human Genome Project mapped the letters of the human genome, but it didn’t tell us anything about the grammar: where the punctuation is, where the starts and ends are,” said NIH Program Director Elise Feingold, PhD. “That’s what ENCODE is trying to do.” [Emphasis added.]

Grammar — there’s an ID-friendly analogy for you. Language students and their teachers don’t look for grammar and punctuation in gibberish. The statement implies purpose: functional information that has a beginning and end. Rules that organize information for communication. Genes without grammar are like words without sentences.

Launched in 2003 after the Human Genome Project found that only 2 percent of DNA codes for proteins, ENCODE was tasked “to find all the functional regions of the human genome, whether they form genes or not.” Initial results were spectacular, showing that at least 80 percent of DNA is transcribed. This made the #1 spot in our top ten evolution-related stories for 2012 an “easy pick,” as Casey Luskin wrote at the time, since it “buries” the “junk DNA” dogma — the idea that evolution left our genome littered with useless leftovers of mutation and natural selection.

Darwinians don’t give up easily, though, as we have often noted. Transcription is not proof of function, they argue. But why use costly resources to transcribe junk for no purpose? In the intervening years, more and more functions have come to light.

The initiative revealed that millions of these noncoding letter sequences perform essential regulatory actions, like turning genes on or off in different types of cells. However, while scientists have established that these regulatory sequences have important functions, they do not know what function each sequence performs, nor do they know which gene each one affects. That is because the sequences are often located far from their target genes — in some cases millions of letters away. What’s more, many of the sequences have different effects in different types of cells. The new grants from NHGRI [National Human Genome Research Institute] will allow the five new centers to work to define the functions and gene targets of these regulatory sequences.

We anticipate future spectacular discoveries will continue to come from ENCODE. And now researchers have new lights to shine: including faster DNA barcoding and the CRISPR-Cas9 gene-editing tool.

The project’s aim is for scientists to use the latest technology, such as genome editing, to gain insights into human biology that could one day lead to treatments for complex genetic diseases.

In addition to the two centers at UCSF, others will be set up at labs including Cornell, Stanford, and Lawrence Berkeley. The National Center for Human Genome Research explains the goals, in which it will invest an initial outlay of $31.5 million for 2017:

At its core, ENCODE is about enabling the scientific community to make discoveries by using basic science approaches to understand genomes at the most fundamental level. Its catalog of genomic information can be used for a variety of research projects — for example, generating hypotheses about what goes wrong in specific diseases or understanding the processes that determine how the same genome sequence is used in different parts of the body to make cells with specialized functions. More than 1,600 scientific publications by the research community have used ENCODE data or tools.

Other Junk-Busting Research

Meanwhile, labs all over are finding treasure in the formerly dismissed junk. It has become something of a scientific sport these days to get the function ball downfield ahead of other labs.

Enhancer RNAs . Last month, Penn Medicine News threw this touchdown, “‘Mysterious’ Non-protein-coding RNAs Play Important Roles in Gene Expression.” Realizing that transcribing junk didn’t make sense, researchers at the University of Pennsylvania suspected that there must be more going on. They asked, Why do body cells turn out so different when they all have the same genome? Seeking function, they learned about the role of enhancer RNAs that regulate which genes get expressed in different types of cells.

DNA repeats . It looks so boring, repetitive DNA. It must be unimportant, right? Not so, found two researchers from Rockefeller University. Writing in PNAS, they discovered that three proteins carefully protect those repeats around centromeres — the locations on chromosomes where the spindle attaches during cell division. “Our study reveals the existence of a centromere-specific mechanism to organize the repetitive structure and prevent human centromeres from suffering illegitimate rearrangements.” Some could lead to cancer and aging. Doesn’t the converse, legitimate arrangements, imply complex specified information?

Disordered proteins . Most proteins fold into compact shapes. What are disordered proteins doing, flailing like air dancers in the wind? Canadian researchers publishing in PNAS found one that has a signaling function. It’s not alone; intrinsically disordered regions (IDRs) are “widespread” and have “diverse functions,” they say. Since they are maintained by “stabilizing selection,” they must be doing something important. Oddly, the function remains the same even when the underlying amino acid sequence changes. In one instance in yeast, they found evidence for “selection maintaining this quantitative molecular trait despite underlying genotypic divergence.” This could be a major paradigm change, since 40 percent of proteins are predicted to contain “disordered” regions. The one they studied appears to have a signaling function. Now, the hunt is on to find other functions in “disorder” (synonymous with junk).

Accordion genomes . Protein-making is not the only function of DNA. Some of it, we know, provides structural support or anchor points. Researchers at the University of Utah are exploring another mystery: why genomes grow and shrink. By studying the genomes of birds and mammals (including flying mammals, the bats), they speculate that shedding DNA can streamline a bird or bat for flight, but allow other creatures to grow their supply. The stretching and squeezing of genomes they liken to an accordion mechanism. It would seem that extra scaffolding could be jettisoned without harm. Whatever is going on, it doesn’t match the old dogmas of neo-Darwinism. “Evolution is often thought of as a gradual remodeling of the genome, the genetic blueprints for building an organism,” this article begins. “In some instances it might be more appropriate to call it an overhaul.” Since overhauling a genome non-gradually would likely be catastrophic, we suspect scientists will find this process is under careful regulation. “I didn’t expect this at all,” the lead author remarked. “The dynamic nature of these genomes had remained hidden because of the remarkable balance between gain and loss.” Watch this space.

The research strategy of looking for function continues to prove fruitful. It’s an attitude that says, If it’s there, it’s probably doing something important. True, just because some things are designed doesn’t imply that everything is designed. But science was hindered for decades by the junk-DNA myth and the vestigial-organs myth, which we now know are being discarded. Science is playing catch-up after years of lazy thinking that reasoned, If it’s not doing something I understand right now, it must be junk. It’s time now to assume function, until the case is shown to be otherwise. As Paul Nelson says, “If something works, it’s not happening by accident.”

Photo credit: Metro St. Louis [CC BY 2.0], via Wikimedia Commons.