Dan Graur and colleagues have just published a damning criticism of a remarkable paper on the human genome that summarized a vast research project (perhaps costing more than $250 million) co-authored by no less than about 476 scientists. The September 2012 paper, called ENCODE for ENcyClopedia Of DNA Elements, has a succinct abstract claiming function for 80% of the 3.2 billion nucleotides in the human genome. (Another 29 publications appeared from subsets of the authors at that time, and others still continue to be published.)

I was sceptical of the nature of the analysis, although at the time too ill to do anything about it. With a focus of my lab’s work on repetitive DNA evolution, and my advantages of working with several species and, for each species, the near and more distant relatives, I could not understand how, within the ENCODE paper, there could be no mention of tandem repeats or tandemly arrayed DNA sequences, of transposons, LINEs, SINEs, ALU elements, rDNA repeats, a single mention of retrotransposons (in a section comparing human and other primates), and no discussion of other key and abundant DNA features of the genome, let alone the repetitive DNA associated with structural chromosome components, centromeres or telomeres. Together, these classes of repetitive element represent something like half or more of the DNA in the genomes of all organisms with genomes larger than the tiniest (c. <200 million bp, Mbp), so it seems hard to understand how these can go unmentioned in the main ENCODE project report. Furthermore, many of these elements do not follow the (for most genes and regulatory sequences at least) conventional evolutionary pathways, but change in copy number by unequal crossing over, slippage replication and other mechanisms, while also showing extensive within- and between-chromosome homogenization of sequence.

The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. http://www.nature.com/nature/journal/v489/n7414/pdf/nature11247.pdf

Now, Dan Graur, four colleagues from the University of Houston (http://nsm.uh.edu/~dgraur/) and Eran Elhaik from John Hopkins University (http://eelhaik.aravindachakravartilab.org/) have published a report pointing out substantial fallacies in the arguments and definitions used in the ENCODE paper: redefining words (“using the wrong definition wrongly”), using logical fallacies including “affirming the consequence”, equating A being a subset of B as meaning B has the properties of A, misuse of population genetics and evolutionary concepts, and other methodological errors. They also note the numerous press conferences and public relations activities that were associated with the publication of the papers, and the fact that ENCODE says they “assign biochemical functions for 80% of the genome”, while investigators from the project quote many other figures ranging from 20%, through 40% and upwards (Graur et al. noting “Unfortunately, neither 80% nor 20% are based on actual evidence”)!

Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall, R. A., & Elhaik, E. (2013). On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution. Preprint: URL http://dx.doi.org/10.1093/gbe/evt028

Abstract: A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. http://www.nature.com/nature/journal/v489/n7414/pdf/nature11247.pdf

Abstract:

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is

unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription,

transcription factor association, chromatin structure and histone modification. These data enabled us to assign

biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many

discovered candidate regulatory elements are physically associated with one another and with expressed genes,

providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical

correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation.

Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an

expansive resource of functional annotations for biomedical research.

———————

Any adverts below not associated with site and not returning money to me (except I don’t pay for wordpress)