Data Filters

There is a wealth of data that can help one understand the ancient earth and primitive biology. As noted above, numerous OOL models are available to which one can attempt to fit the data. However, it is important that data come before models; affection for a model should not cause data to be disregarded or cherry-picked. Using phylogenetic and biochemical reasoning, Gogarten and others have concluded that LUCA was a prokaryotic-like terrestrial life form, using DNA as genetic material, RNA as message, ribosomes for coded synthesis of proteins from 20 amino acids, with membranes and chemiosmotic coupling (Zhaxybayeva and Gogarten 2004; Peretó et al. 2004; Gogarten and Deamer 2016). Martin uses phylogenetic reasoning to conclude by contrast that LUCA was dependent upon geochemistry of hydrothermal vents and was only “half-alive” (Weiss et al. 2016). The contradictions in these conclusions suggest that approximations, computation short-cuts, model-dependent inferences, and biases interfere with interpretation of data. Our goal here is to enumerate and define types of data that we believe are useful for understanding and evaluating OOL models. We have developed several filters by which we weigh different types of data.

The most significant data are derived from authentic measurements made on actual biological or abiotic systems. The most important data has authenticity that does not depend on speculative models and significance that does not depend on indirect inference. Useful data are found in the rich abiotic chemical inventory in carbonaceous chondrite meteorites (Martins et al. 2008; Schmitt-Kopplin et al. 2014) and in the comparison of that inventory with an uninteresting inventory of hydrothermal vents (McCollom and Seewald 2007; Proskurowski et al. 2008; Lang et al. 2010; McDermott et al. 2015). In fact, the seminal hydrothermal vent model of the OOL (Corliss et al. 1981) has been directly falsified by demonstrations that vents do not produce the molecules proposed in the model. Other examples of useful data are sequences and structures of ribosomal proteins and ribosomal RNAs (Woese and Fox 1977; Ramakrishnan 2002; Steitz 2008), and the statistics of their similarity and distribution over phylogeny (Fournier and Gogarten 2010).

Data are less useful when relevance and significance are model dependent and/or when derived from arbitrary experimental conditions. A specific example of data with low utility for understanding the OOL is the observation of and properties of an in vitro selected ribozyme that catalyzes a Diels–Alder reaction (Seelig and Jaschke 1999). The authors make the reasonable claim that their Diels–Alder ribozyme merely reveals the potential of small ribozymes for catalyzing organic transformations. In our view, this ribozyme is not relevant to the OOL because of the following reasons:

1. no evidence for Diels–Alder ribozymes has been found among extant biological structures or sequences, 2. no evidence for Diels–Alder ribozymes has been found in ancestral biological systems, 3. the processes and reagents used to obtain the Diels–Alder ribozyme are inconsistent with plausible early biological or abiotic environments; the in vitro selection process employed modern protein enzymes such as polymerases and reverse transcriptases and is not representative of pre-protein RNA World environments, and 4. thus far, a null hypothesis has not been evaluated, in which alternative polymers such as polysaccharides would be investigated for ability to catalyze Diels–Alder reactions.

Factors that constrain the significance of a Diels–Alder ribozyme for understanding the OOL apply equally to other in vitro selected ribozymes, including RNA polymerase ribozymes (Mutschler et al. 2015; Horning and Joyce 2016). The in vitro selection of RNAs, using modern methods of molecular biology, does not in our view provide important information relevant to the OOL.

LUCA and the Universal Gene Set

We consider the contents and properties of the Universal Gene Set of life, which is the set of genes shared as orthologs throughout the tree of life, and found in essentially every living system, to be important and useful data. The size and composition of the Universal Gene Set are generally agreed upon (Koonin 2003; Harris et al. 2003; Charlebois and Doolittle 2004). In some highly dependent symbionts, components of the Universal Gene Set might be absent from a given species.

The Universal Gene Set is small and distinctly non-random. Koonin’s version of the Universal Gene Set, for example, contains around 65 genes. Fifty-three universal genes are directly involved in translation. These include genes for ribosomal RNAs, ribosomal proteins, aminoacyl tRNA synthetases, and translation factors (Fig. 2). A few members of the Universal Gene Set are involved in transcription and even fewer in replication. The Pace and Doolittle versions are very similar to the Koonin Universal Gene Set.

Fig. 2 The central dogma of molecular biology, emphasizing the contents of Universal Gene Set of life, which includes genes for ribosomal RNAs, ribosomal proteins, aminoacyl tRNA synthetases and translation factors Full size image

The Universal Gene Set is the most robust and unchanging subset of the gene set of LUCA. As noted above, there is evidence to suggest that LUCA contained genes beyond the Universal Gene Set (Zhaxybayeva and Gogarten 2004; Peretó et al. 2004; Gogarten and Deamer 2016). Specifically, for several proteins involved in DNA replication, ancestry at LUCA is indicated by conservation of three-dimensional structures, even though sequences are not conserved (Edgell and Doolittle 1997). In addition, LUCA may not have been a single entity. Genes that are ancestral to the Universal Gene Set may have been hosted in a variety of types of organisms (Zhaxybayeva and Gogarten 2004).

LUCA and the Molecular Toolbox of Life

We consider the components and properties of the Molecular Toolbox of Life (Fig. 3) (Jacob 1977) to be another source of important and useful data. Biological systems, regardless of domain or environment, use a common set of molecular components that is fixed over time and is surprisingly restricted in composition. The universal molecules of life are composed of twenty amino acids, eight nucleotides, glucose, S-adenosylmethionine, coenzyme A, nicotinamide adenine dinucleotide, and several other components. Also, universal to life are several polymer backbone types, including polypeptide, polyribonucleotide, and polydeoxyribonucleotide. The diverse morphology of eukaryotes, from algae to whales, and the diverse metabolism of prokaryotes, from methanogenic archaea to sulfur oxidizing bacteria, are all built with the same small toolbox of organic molecules (Jacob 1977). Diverse organisms are distinguished not by differences in composition of their Molecular Toolboxes but by differences in organization of components of a common Molecular Toolbox.

Fig. 3 Schematic of the molecular toolbox of life, which contains the small molecules and macromolecular backbones and motifs that are universal to all living systems. This image was inspired by Jacob (1977) Full size image

Persistence and Robustness

A small set of organic molecules and genes are found in everything alive, in all bacteria, archaea, and eukaryote. If LUCA was prokaryote-like (Zhaxybayeva and Gogarten 2004; Peretó et al. 2004; Gogarten and Deamer 2016), then the Toolbox of Life and Universal Gene Set have been fixed since LUCA, which is thought to have existed over 3.7 billion years ago (Doolittle 2000; Nutman et al. 2016). There are no direct data to support the hypothesis that alterations of the Molecular Toolbox and the Universal Gene Set are allowed by Darwinian evolution, even over billions of years.

The robustness and stability of the Molecular Toolbox are demonstrated by the history of guanine. Guanine is one of the four bases of DNA and RNA. Guanine is used to not only encode genetic information in DNA, but is a primary component of ribosomes and other RNAs. Guanine is used in energy transduction and signaling. Guanine, which is endowed with remarkable capabilities for molecular recognition, is a component of the Molecular Toolbox of Life.

Around 3.7 billion years ago, when the Molecular Toolbox was established, guanine was chemically suitable as a component of genetic material and was a reasonable evolutionary choice for the Molecular Toolbox.

However, 1.5–2.0 billion years after LUCA, with the Great Oxidation Event (GOE) (Anbar et al. 2007; Hazen et al. 2008), the oxidative potential of the biosphere changed, the chemical stability of guanine declined. Rates of guanine degradation in biological systems increased markedly. In the oxidizing environment of the extant post-GOE earth, around 100,000 guanines per mammalian cell are degraded to 8-oxoguanine each day (Fig. 4) (Grollman and Moriya 1993; Hirano 2008). A steady-state level of around one 8-oxoguanine per 106 guanines is observed in mammalian cells (Delaney et al. 2012). Oxidation causes other damage, including hyperoxidized guanine. Oxidative damage to guanine in humans leads to mutagenesis, genetic instability, aging, and cancer. The inclusion of guanine in the Molecular Toolbox is a demonstration of evolution’s lack of foresight.

Fig. 4 The oxidation of guanine to form 8-oxoguanine. Guanine spontaneously degrades in the oxidative environment of the post-GOE earth Full size image

For the last 2 billion years, biological systems have been under intense pressure brought on by chemical instability of guanine. The permanence and integrity of genetic information and of critical energy transduction and signaling molecules are under relentless assault by oxidative processes. The pressure to change the components of the Molecular Toolbox must be intense.

What is the evolutionary response to the continuous degradation of guanine? Has evolution, over the last 2 billion years, altered the contents of the Molecular Toolbox to accommodate fundamental chemical change in the biosphere? Has evolution swapped guanine for a more appropriate substitute? No. Since the GOE, evolution has tinkered. Biology has produced elaborate and multilayered systems to repair 8-oxoguanine, and to chemically push it uphill, back to guanine (Grollman and Moriya 1993; Hirano 2008). In addition, evolution has sequestered iron and other mediators of oxidative damage (Theil and Goss 2009). As stated by Jacob (1977), “It is always a matter of using the same elements, of adjusting them, of altering here or there, of arranging various combinations…. It is always a matter of tinkering.” Evolution changes the distributions and spatial arrangements of toolbox components, but never the essential identity of the components. The persistence of guanine demonstrates that the Molecular Toolbox is fixed; alterations of the toolbox are effectively prohibited.

Dependencies and the Limits of Evolution

A useful OOL model should account for and predict the contents and the robustness of the Universal Gene Set and the Molecular Toolbox. Why are the Universal Gene Set and Molecular Toolbox so robust over time and environment? Why is the Universal Gene Set focused on translation and not metabolism? The answers appear to be found in dependencies, which are relationships in which change in one element induces change in another element. Systems with the most extensive and far-reaching dependencies are most resistant to evolutionary change. For the Molecular Toolbox and the Universal Gene Set, biology appears to be at a limit of total dependency. The dependency of biological systems on polypeptide or polyribonucleotide, for example, is obviously complete and total. Converting polypeptide to polyester would impact every system of every cell. Converting phosphorus to arsenic in polynucleotides would also impact every system of every cell. These types of changes are not observed. The universal conservation of translation (Woese 2000; Hsiao et al. 2009) confirms the expectation that the dependencies on translation are at the same limit (Hinegardner and Engelberg 1963). Translation directly impacts all cellular functions and processes. Translation controls the sequence of amino acids of every protein. Translation is regulated by molecular interaction networks that dwarf other networks in size, integration, and evolutionary conservation (Bu et al. 2003; Butland et al. 2005). Translation consumes vast cellular resources (Warner 1999; Caton et al. 2000). Translation components are embedded in processes that appear unrelated to translation (Park et al. 2008).