You’re an enzyme of RNA polymerase floating in the nucleus of a cell. Your job is to transcribe a gene, but you are blind and it’s dark. Other machines guide you to a promoter, where your work begins, but how do you know which direction to read?

Two recent papers add more insight to the wondrous design of DNA transcription. Both papers recognize that protein-coding genes represent only a tiny part, about 3%, of the DNA in a cell. The looming question that the ENCODE project began to answer last year is, how much of that intergenic DNA is functional? Since most of it is transcribed (a process that requires the expenditure of energy), the cell presumably performs all that work for a reason.

The first paper, published in Nature, examined how RNA polymerase (RNAP) knows which way to begin transcription. Gene starts are designated by “promoter” regions, but from that point, RNAP can read either direction on either fork, once the double helix is unwound. The authors found that two DNA segments, working against each other, regulate the reading of genes and non-genes.

One, named PAS, controls whether a polyadenylation tail (a series of adenines, or “A” letters in the code), is added to the growing messenger RNA (mRNA). For genes, that tail prepares the mRNA for export from the nucleus. For intergenic transcripts, though, polyadenylation signals other enzymes to cleave it into small transcripts.

The other sequence, named U1 snRNP, controls whether the mRNA is cleaved after transcription by suppressing polyadenylation. When present, it allows RNAP to proceed uninterrupted.

Gene regions are rich in U1 snRNP but low in PAS. The reverse is true for intergenic regions. The authors believe this is how RNAP avoids excessive transcribing of non-coding DNA. The shortened, cleaved transcripts, like lincRNAs, stay in the nucleus to perform other functions. A report from MIT explains how these sequences offset each other:

The work demonstrates the important role of U1 snRNP in protecting mRNA as it is transcribed from genes and in preventing the cell from unnecessary copying of non-protein-coding DNA, says Gideon Dreyfuss, a professor of biochemistry and biophysics at the University of Pennsylvania School of Medicine. “They’ve identified a very likely mechanism for early termination of these upstream RNAs by depriving them of U1 snRNP suppression of polyadenylation and cleavage,” says Dreyfuss, who was not part of the research team.

The authors of the Nature paper, though, remained undecided about the roles of these upstream, intergenic transcripts:

The function of all of this upstream noncoding RNA is still a subject of much investigation. “That transcriptional process could produce an RNA that has some function, or it could be a product of the nature of the biochemical reaction. This will be debated for a long time,” Sharp says. His lab is now exploring the relationship between this transcription process and the observation of large numbers of so-called long noncoding RNAs (lncRNAs). He plans to investigate the mechanisms that control the synthesis of such RNAs and try to determine their functions. (Emphasis added.)

In their paper, the authors toss in a Darwinian speculation. They proposed that upstream antisense RNAs (uaRNA), or RNAs transcribed in the “wrong” direction, might represent ancestors of protein-coding genes, and that lncRNAs are intermediate forms that gained or lost U1 snRNP and polyadenylation sequences. They found some differences in U1 snRNP counts between orthologous regions in human and mouse genomes as support for the idea.

This hypothesis, though, seems absurd for several reasons. For one, how or why would a non-functional transcript acquire a function? Before it had a function, why would it be transcribed and conserved? Natural selection cannot act to “store up” variations in hopes of finding a future function. Functional protein sequences, as William Dembski and Robert Marks have shown, represent a tiny fraction of sequence space. Imagining that a blind, unguided process would find one of them seems optimistic to the point of being ridiculous. The authors did not pursue their wishful thinking in detail, but rather dropped the subject after a brief mention, focusing primarily on the “U1-PAS axis” as having “wide use as a general mechanism to regulate transcription elongation in mammals.” Regulation by a mechanism is the language of design.

A second paper, in PLoS Genetics, is more confident that the intergenic transcripts are functional. Confirming what ENCODE found last year (that at least 85% of intergenic regions are transcribed and regulated), these authors believe functions are soon to be discovered in the forest of intergenic DNA. The Abstract says:

Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.

The clear implication is that lincRNAs are functional, else why would the cell regulate them and ensure their conservation? The authors’ optimism continues in their Introduction:

A large fraction of the human genome consists of intergenic sequence. Once referred to as “junk DNA”, it is now clear that functional elements exist in intergenic regions. In fact, genome wide association studies have revealed that approximately half of all disease and trait-associated genomic regions are intergenic. While some of these regions may function solely as DNA elements, it is now known that intergenic regions can be transcribed, and a growing list of functional noncoding RNA genes within intergenic regions has emerged.

What do we know about lincRNA functions at this time?

Long intergenic noncoding RNAs (lincRNAs) are defined as intergenic (relative to current gene annotations) transcripts longer than 200 nucleotides in length that lack protein coding capacity. LincRNAs are known to perform myriad functions through diverse mechanisms ranging from the regulation of epigenetic modifications and gene expression to acting as scaffolds for protein signaling complexes.

Since these authors found significantly more lincRNAs in their survey than previously known, the implication is that more of those “myriad functions” are waiting to be found. (For more functions already discovered, see the lncRNA blog.) Here’s their concluding statement:

Owing to the extended breadth of tissues sampled and relaxed constraints on transcript structure, we find significantly more lincRNAs than all previous lincRNA annotation sets combined. Our analyses revealed that these lincRNAs display many features consistent with functionality, contrasting prior claims that intergenic transcription is primarily the product of transcriptional noise. In sum, our findings corroborate recent reports of pervasive transcription across the human genome and demonstrate that intergenic transcription results in the production of a large number of previously unknown lincRNAs. We provide this vastly expanded lincRNA annotation set as an important resource for the study of intergenic functional elements in human health and disease.

It’s clear that the search for function is driving this cutting-edge research. Search for function is exactly what intelligent-design science would recommend. Darwinians describe natural selection as a tinkerer, generating useless parts as well as structures cobbled together that might do something by chance, since there is no supervising designer to guide the process in a particular way. By contrast, intelligent design expects that what exists, as the product of mind, is there for a reason.

Remember how Darwinists call ID a “science stopper,” since it supposedly counsels just giving up and saying, “God did it”? The real science stopper is Darwinism. It focused only on protein-coding genes and dismissed everything else as “transcriptional noise” or “junk DNA” left behind by the blind tinkerer. Why waste time studying junk? Were it not for that attitude, our understanding of intergenic DNA function might have been much farther along by now.