The RNA World is challenged by the fact that a self-replicating RNA polymerase ribozyme has yet to be demonstrated and that little evidence for its existence is seen in life today as well as problems in how the transition from RNA-only to RNA–protein world could have occurred.

Recent work on RNA polymerase ribozymes supports the view that the molecular replicator was made of RNA, and no other molecule was required (the RNA World).

The first ancestor to all life was a self-replicating entity capable of evolving but must have been much simpler than a cell; that is, a molecular replicator.

Evolution requires self-replication. But, what was the very first self-replicator directly ancestral to all life? The currently favoured RNA World theory assigns this role to RNA alone but suffers from a number of seemingly intractable problems. Instead, we suggest that the self-replicator consisted of both peptides and nucleic acid strands. Such a nucleopeptide replicator is more feasible both in the light of the replication machinery currently found in cells and the complexity of the evolutionary path required to reach them. Recent theoretical and mathematical work supports this idea and provide a blueprint for future investigations.

A number of suggestions for the IDA have been made over the years. These include protein or peptide alone, such as thiol-rich peptides;(see Glossary ) inspired in part by our understanding of]; nucleic acid alone (mainly the RNA World []); and a mixture of both []. The most widely accepted is the RNA World which posits that the IDA was an RNA strand capable of folding into an active replicase. This is the main alternative to aconsidered in this work. The RNA World is immediately attractive when considering our three tests above. (i) RNA is intricately involved in current self-replicating systems ( Figure 1 ). It carries the transcribed genetic information (and there is general agreement that the original substrate for the genetic code was RNA, not DNA) and it constitutes the majority of the ribosome, the heart of cellular self-replication, including the catalytic site itself []. In addition, cells do contain RNA sequences (ribozymes) that are able to catalyse reactions; a function more typically associated with proteinaceous enzymes. (ii) Considerations of simplicity make the RNA World attractive because it is able to house both encoding of sequence information and catalytic activity in a single molecule in a way which proteins (which are poor at information storage) and DNA (poor at catalysis) are not. (iii) In principle, single-component self-replicating systems (RNA or otherwise) are not dynamically or chemically forbidden and indeed rudimentary systems have been demonstrated on the macroscopic scale []. However, a closer look at the RNA-only IDA in the context of these three considerations reveals serious problems, some of which appear insurmountable. Comparison with a nucleopeptide IDA suggest the latter is a more likely candidate.

(A) Minimal self-replication in extant life occurs at the cellular level and requires nucleic acids (beige) and proteins (purple). Underlined molecules act as stores of sequence information. Molecules acting as or carrying molecular building blocks are shown in red. Black arrows indicate flow of information, materials. Blue arrows indicate replication by polymerases. (B) A typical RNA World scenario begins with a milieu containing various components including short RNA strands (beige) and amino acids (purple). RNA strands spontaneously assemble into a longer strand with self-replicating ribozyme (SRR) functionality. The process repeats, forming a second, functionally identical molecule, able to copy the first (and vice versa). (C) A possible nucleopeptide IDA. The milieu is the same with additionally, short RNA strands bound to amino acids. An RNA strand is able to form a structure with primordial mRNA (p-mRNA) and primordial ribosome (p-Rib) functionality (+ strand). Some of the amino acid-bound RNA strands contain sequences complementary to sequences on the p-mRNA and so are able to act as primordial tRNA (p-tRNA). Alignment of such charged p-tRNA on the p-Rib/p-mRNA in an entropy trap results in formation of a peptide chain. For some sequences, dissociation of the components results in release of a peptide with primordial polymerase (p-Pol) activity and the + strand. The p-Pol acts on the + strand producing a complementary copy (- strand, black). The - strand is able to act as a template for the polymerase, producing the + strand. Dissociation releases all components and the + strand is again able to act as a p-Rib.

Three approaches will be useful to us in this endeavour. (i) Consideration of current self-replicating biological systems. By looking at how cells currently achieve self-replication we may be able to extrapolate into the past and deduce the composition of the start point. (ii) Consideration of the simplest system. Which IDA is most likely able to achieve self-replication and has the simplest path to arrive at current self-replicating systems? (iii) Dynamical feasibility. It is not always obvious that a given self-replication scheme is actually dynamically feasible. For example, expected rates of substrate breakdown will determine if the mathematical model allows self-sustained self-replication. Where substrates break down more quickly than they can be synthesised, the mathematical model will rightly predict self-sustained self-replication to be unfeasible. A dynamically unfeasible scheme need not be considered further.

The identity of the IDA has been cause for much speculation over the years. Nonbiological replicators such as clay crystals [] have been invoked but are unconvincing as they require a complete takeover of one substrate with another and lack a persuasive argument to show how this could have occurred. An IDA built from biological molecules is more convincing and necessitates physicochemical conditions compatible with their formation, and with the self-replicating reaction cycle of the IDA itself. The latter likely required relatively mild conditions approaching those of analogous biochemical reactions today. There is now ample evidence that biological building blocks such as amino acids, ribose, and deoxyribose, among others, were present on the early Earth []. Potential chemistries for building block synthesis have been demonstrated, with cyanosulfidic chemistry [] showing great promise. Furthermore, recent work has shown convincing peptide ligation in prebiotic conditions []. Given these findings, it now seems reasonable to assume that the IDA was constructed of components highly similar or identical to those found in life today. Indeed, such a replicator must have occurred at some stage very early in evolution even if not at the very beginning. The scene being set for an IDA constructed of biological molecules to arise, our focus turns to the main issue of this work – deciding on its identity.

Life as we understand it is cellular. The last universal common ancestor (LUCA) of all cells (not a single cell of course but a population) is understood in some detail; it possessed a cell membrane, DNA, the basic molecular machines for copying DNA (i.e., polymerase etc.), and a functional ribosome, among many more []. From this highly truncated list alone it is clear that LUCA was far too complex to spontaneously assemble. It must have evolved from simpler systems, themselves able to self-replicate with some tolerance for error (otherwise they would not be able to evolve). Indeed, it is difficult to imagine that anything recognisably a cell could have spontaneous origins. This means that they in turn must have evolved from even simpler self replicators; that is, molecular self-replicators. The first such replicator is referred to as the initial Darwinian ancestor (IDA) [].

For an RNA-only replicator, the story is less convincing. A self-replicating RNA molecule seems at first glance feasible, and if such an RNA-only IDA existed, it is likely that it would be a self-replicating ribozyme polymerase ( Figure 1 ) []. However, this is not what is seen in extant cells and it is difficult to understand the steps by which it gained mRNA functionality, and transferred polymerase functionality to peptides and information storage functionality to DNA. Of these, the first two present most difficulty as the ribozyme, being a self-sufficient self-replicating entity simply has no need for interaction with peptides or amino acids. This makes a gradual transfer of these functions to them difficult to envision. How could gain of mRNA functionality; that is, encoding of amino acid sequences in RNA occur in a ribozyme whose sequence is optimised for ribozyme self-copying functions? Any change to encode a helpful peptide would presumably decrease ribozyme effectiveness. It seems unlikely that by chance the sequence giving useful ribozyme functionality would also happen to encode a functional peptide that would increase the replication and survival of the ribozyme itself. If this did take place then it would consequently have required the ribozymal polymerase sequence to serve also as mRNA; encoding an amino acid sequence that could be translated into a functional peptide. We then come close to the nucleopeptide replicator outlined in Figure 1 ; the difference being that the system in Figure 1 has the advantage of not requiring the mRNA to also have polymerase functionality. Similar arguments pertain to transfer of polymerase functionality from RNA to peptides. Recent work shows that this is highly unlikely, suggesting such an RNA functionality never existed [].

What advantage does such a system have over the main alternative, the RNA World? Perhaps the most convincing answer comes when considering how they could have evolved into the current replication machinery. For the nucleopeptide IDA the path is straightforward. Each component simply maintains the same role, evolving greater specificity and efficiency over time, something that can be achieved by errors that would inevitably arise during replication. The part with mRNA functionality expands its code to encode for an increased number of amino acids. This leads to production of more efficient polymerases, thus ensuring better survival. The part with ribosome functionality separates from the mRNA portion and gains increasing catalytic efficiency, again increasing survival likelihood.

As a thought experiment, we can simplify further the extant nucleopeptide replicator system so that only indispensable components remain and assume that instead of multiple examples of each class there was one able to carry out all the functions currently undertaken by that class. That is, there existed (i) a single nucleic acid sequence that encoded a single amino acid sequence and acted as its own mRNA and its own ribosome; and (ii) a peptide sequence that acted to copy the mRNA that encoded the amino acid sequence of the same peptide ( Figure 1 ). This is the most fundamental ancestral self-replicator that maintains the existing functional split between nucleic acids and peptides.

The concept of using current cells and molecules as a guide to predict the features of earlier (now extinct) ones and indeed, even resurrect extinct proteins is well established []. Bioinformatics techniques [] allow phylogenetic trees to be reconstructed [] and predictions to be made of the identity of ancient ancestor molecules. The extreme chronological distance between the IDA and the current day would make this challenging but perhaps possible if a simple self-replicator was present in any cells today. In fact, no such single molecule replicator exists. Instead, replication is split between nucleic acids and proteins with the general rule being that nucleic acids encode and transfer information (DNA and mRNA), and proteins (enzymes) carry out catalytic functions including synthesising new copies of the nucleic acids ( Figure 1 ). A notable exception being the ribosome, the catalytic centre of which is a ribozyme []. This means that a basic cross-catalytic symmetry is observed: RNA makes protein, protein makes RNA. Thus, in the spirit of Spiegelman’s Monster ( Box 1 ), we can conceive that given the correct conditions, a supply of energy and suitable chemical building blocks, a self-replicating system using components from current cells could function and would include DNA, DNA polymerase, ribosome, RNA polymerase, tRNA andamong others. Indeed, recent progress has been made towards this in experiments using liposome-based synthetic cells. These contained DNA replication machinery from Φ29 and were capable of self-sustained DNA amplification []. However, such systems would not be indefinitely self-sustaining due to the eventual degradation of the components responsible for synthesising the protein machinery. Unsurprisingly, these preliminary functional self-replicating systems retain the nucleic acid–protein division of labour.

Qβ bacteriophage is an RNA virus whose genome is replicated by Qβ replicase, an RNA-dependent RNA polymerase. In 1965, Spiegelman carried out an interesting experiment – he mixed together Qβ replicase and the phage RNA together with RNA nucleotides. As a result he was able to observe in vitro replication of the genome [] – a breakthrough at the time. Furthermore, the system was able to evolve [], freed from constraints and requirements of functioning in cells and encoding other virus proteins. Over the course of 75 serial transfers, the RNA evolved to become shorter, eventually reducing in length to 550 nt from an original of 3600 nt. Subsequent experiments were able to isolate a 218 nt RNA; essentially the minimum required to function. The system did not encode or synthesize the Qβ replicase, which was provided. Nevertheless, it showed the potential for simple, self-replicating systems to function and, importantly, to evolve outside of the cellular environment. Crucially, it showed that this could be achieved with a few constituents, comprising only building blocks, an RNA strand, and a protein. The spirit of Spiegelman’s experiment lives on today as efforts continue to produce fully self-contained in vitro self-replicating systems [].

The Simplest Self-Replicating System

27 Crisci J.V. Parsimony in evolutionary theory: law or methodological prescription?. By asking which possible IDA is simplest we are considering which functional replicator is simple enough to be realistically feasible as the first, spontaneously occurring IDA. The simplest system does not have to be the real one but arguments for parsimony in biology are powerful [].

ligase activity; that is, self-templating using pre-existing large fragments of complimentary sequences. Ligase (but not self-replicating ligase) ribozymes do exist in nature [ 28 Vicens Q.

Cech T.R. A natural ribozyme with 3′,5′ RNA ligase activity. 29 Bartel D.P.

Szostak J.W. Isolation of new ribozymes from a large pool of random sequences [see comment]. 30 Kurihara E.

et al. Development of a functionally minimized mutant of the R3C ligase ribozyme offers insight into the plausibility of the RNA World Hypothesis. 31 Rogers J.

Joyce G.F. A ribozyme that lacks cytidine. 32 Robertson M.P.

et al. Optimization and optimality of a short ribozyme ligase that joins non-Watson-Crick base pairings. 33 Paul N.

Joyce G.F. A self-replicating ligase ribozyme. 34 Lincoln T.A.

Joyce G.F. Self-sustained replication of an RNA enzyme. 35 Robertson M.P.

Joyce G.F. Highly efficient self-replicating RNA enzymes. 36 Zhou L.

et al. Assembly of a functional ribozyme from short oligomers by enhanced non-enzymatic ligation. 37 Wachowius F.

Holliger P. Non-enzymatic assembly of a minimized RNA polymerase ribozyme. 7 Wachowius F.

et al. Nucleic acids: function and potential for abiogenesis. The simplest RNA World theory requires only a self-replicating ribozyme. This could be an RNA strand withactivity; that is, self-templating using pre-existing large fragments of complimentary sequences. Ligase (but not self-replicating ligase) ribozymes do exist in nature [] and in vitro designed/evolved ribozyme ligases have been produced, beginning with the work of Bartel and Szostak []. Efforts have been made to produce minimal ligases, for example by Kurihara et al. [] when they made an ~50 nt functional version of R3C ligase [], similar in length to the small L1 ligase []. These ligases, however, do not self-replicate. Efforts to produce self-replicating ligases have borne fruit. Paul and Joyce, for example, modified R3C ligase ribozyme so that it could template two half copies of itself that it then ligated []. Difficulties arose because of substrate inhibition, overcome by a cross-catalytic approach [], whereby two template strands each catalyse ligation of two halves of the other template strand. However, these systems still required at least one of the strands to be 50 nt or greater in length and in the first round require spontaneous assembly of the full length ribozyme (typically well over 100 nt). This may seem unlikely, but in fact, recent work suggests that functional ribozyme ligases can be produced spontaneously (i.e., nonenzymatically) from short building blocks more likely to have been present on the early Earth []. However, again, these ligases do not replicate themselves. This may be an insurmountable problem, as Wachowius et al. stated: ‘Fundamentally, emergence of new functions when assembling long sequences is confounded by the nature of such activities: ligases use less information to choose substrates than is required to define the ligase activity itself, so cannot copy themselves (or other components) from sequences lacking that information, i.e. random sequence’ [].

38 Johnston W.K.

et al. RNA-catalyzed RNA polymerization: accurate and general RNA-templated primer extension. 39 Wochner A.

et al. Ribozyme-catalyzed transcription of an active ribozyme. 40 Attwater J.

et al. In-ice evolution of RNA polymerase ribozyme activity. 41 Akoopie A.

Müller U.F. Lower temperature optimum of a smaller, fragmented triphosphorylation ribozyme. A ribozyme acting as a polymerase therefore seems more promising. This could copy any template strand from only short nucleotide building blocks ( Figure 1 ). The first designed ribozyme able to convincingly do this was R18 []. The tC9Y ribozyme made a breakthrough, being able to polymerise products slightly longer than itself []. The most recent advance is the 24-3 ribozyme that can copy RNA sequences having secondary structure, although this is still possible only for short sequences; that is, they cannot copy themselves. These recent ribozymes, at close to 200 nt in length, are likely too big to spontaneously arise, although other recent work [] suggests that, in some cases, ribozymes may have been able to function as fragments working together. In summary, if a self-replicating RNA-only ribozyme polymerase does prove possible, it may be that it is too long and complex to have arisen as the IDA.

18 Carter Jr., C.W.

Kraut J. A proposed model for interaction of polypeptides with RNA. 42 Carter C.W.J. Cradles for molecular evolution. An alternative to an RNA-only IDA is that it consisted of both nucleic acid and peptide components, each able to catalyse polymerisation of the other. An early example of this idea was the Carter and Kraut model [], which proposed that a short double-stranded (ds)RNA sequence would be able to catalyse formation of a short β hairpin structure that in turn would catalyse dsRNA polymerisation. With a potential for partial coding [], sustained self-replication could be possible.

43 Banwell E.F.

et al. Reciprocal nucleopeptides as the ancestral Darwinian self-replicator. 44 Gamow G. Possible relation between deoxyribonucleic acid and protein structures. 45 Woese C.R.

et al. On the fundamental nature and evolution of the genetic code. entropy trap and has been suggested in other models of early biological replicators wherein RNA-based carriers of amino acids align on mRNA sequences via codon–anticodon interactions [ 46 Carter Jr., C.W.

Wills P.R. Hierarchical groove discrimination by Class I and II aminoacyl-tRNA synthetases reveals a palimpsest of the operational RNA code in the tRNA acceptor-stem bases. 47 Tamura K.

Schimmel P. Peptide synthesis with a template-like RNA guide and aminoacyl phosphate adaptors. 48 Jacobsen J.R.

Schultz P.G. Antibody catalysis of peptide bond formation. 49 Sievers A.

et al. The ribosome as an entropy trap. Box 2 How Did the Link between Codons and Amino Acids Arise? 52 Koonin E.V.

Novozhilov A.S. Origin and evolution of the universal genetic code. In all life, triplet codons in mRNA code for specific amino acids that are brought to the ribosome as activated amino acids attached to a tRNA that contains the relevant, specific anticodon. That the correct amino acid is connected to the correct tRNA is ensured by enzymes called aminoacyl tRNA synthetases (aaRSs). These attach the amino acids to the 3′ end of the tRNA distal from the anticodon. This is a form of symbolic coding; that is, there is no direct interaction between the anticodon and the amino acid. However, when the earliest IDA arose, there would have been no aaRS, so how was the link between the code and the amino acid made? Without an aaRS it is reasonable to think that the connection between tRNA and amino acid could have originally been chemically different, and not necessarily an ester bond. An obvious first answer is that the codon (or anticodon) binds specifically to the amino acid for which it codes via a direct physicochemical interaction, thus doing away with the need for an intermediary such as aaRS. This is known as the stereochemical hypothesis , first put forward several decades ago and which remains one of the main theories for how the genetic code first came about []. However there has been little compelling evidence to support this theory and it also raises practical questions (e.g., if the amino acid is bound to the anticodon how does the anticodon recognise the codon?). 67 Crick F.H. The origin of the genetic code. 53 Koonin E.V. Frozen accident pushing 50: stereochemistry, expansion, and chance in the evolution of the genetic code. One possible answer is that there was a physicochemical interaction between p-tRNA and amino acids but that the sequence of the interacting RNA was not the anticodon sequence and was located distal from the anticodon (as in current RNAs). In this scenario the code came about by simple chance and was ‘frozen’ in place, in line with the concept of the ‘frozen accident’ origin of the genetic code first put forward by Crick []. This could have been achieved if a particular combination of p-tRNA and amino acid allowed a self-replication system to come into being that included a polymerase able to replicate the p-tRNAs themselves. Such an idea was recently discussed []. One challenge to confirming this hypothesis is that once such a system was replaced by aaRS charging of tRNA, then the original amino acid recognition sequence may well have degraded to the point of being untraceable. Our favoured nucleopeptide IDA model is a conceptual relative of the Carter and Kraut model ( Figure 1 ). [] It relies only on random production of short amino acids and RNAs (or possibly a mix of RNA, DNA, and other monomers since lost, known as XNA). Here, a short stretch of single-stranded (ss)RNA could act as both a primordial RNA (p-RNA) and a primordial ribosome (p-Rib), encoding a peptide sequence and catalysing its polymerisation. If the resulting peptide was able to act as a primordial polymerase (p-Pol) and copy the ssRNA then an IDA would result. The first and most fundamental problem with this concept is how specific amino acids (or classes of amino acids) can be encoded and located at specific mRNA sequences (the coding problem; Box 2 ). This is troublesome as tRNA and certainly tRNA synthetases did not then exist. The simplest answer to this problem is hard stereochemical selection, where there is a direct interaction between amino acids and their codons or anticodons []. Such an interaction could bring specific amino acids into close proximity on the p-mRNA, increasing the probability of polymerisation via peptide bond formation. This is a so calledand has been suggested in other models of early biological replicators wherein RNA-based carriers of amino acids align on mRNA sequences via codon–anticodon interactions []. The release of the produced peptide could occur stochastically or through periodic environmental changes (e.g., changes in temperature). There is some experimental support for entropy traps [] and indeed they may even play a role in peptide bond formation in the extant ribosome [].

aptamers can be evolved in vitro [ 50 Yarus M. The genetic code and RNA-amino acid affinities. 51 Johnson D.B.

Wang L. Imprints of the genetic code in the ribosome. 52 Koonin E.V.

Novozhilov A.S. Origin and evolution of the universal genetic code. The key assumption of hard stereochemical theories – that codons can bind to the amino acids they encode – has been given some impetus in recent years such as by the demonstration that amino acid-bindingcan be evolved in vitro []. In addition, analysis of ribosome structures has shown that some amino acids are located proximal to their codons or anticodons with statistical significance []; perhaps reflecting the remnants of an ancient stereochemical interaction. Recent computational analyses have lent some support to this idea but doubts remain as to whether such apparently weak interactions (and the fact that they are for peptides rather than single amino acids) are robust enough to carry out the proposed function []. It also makes tRNA unnecessary, requiring it to be invented at a later date.

43 Banwell E.F.

et al. Reciprocal nucleopeptides as the ancestral Darwinian self-replicator. 53 Koonin E.V. Frozen accident pushing 50: stereochemistry, expansion, and chance in the evolution of the genetic code. 43 Banwell E.F.

et al. Reciprocal nucleopeptides as the ancestral Darwinian self-replicator. A more appealing, soft stereochemical solution was recently elaborated in which a particular amino acid is linked to its anticodon not directly but via binding to specific sequences on primordial tRNA (p-tRNA), which are distal from the anticodon [] ( Figure 1 ). Such a p-tRNA combined with the dual function p-Rib/p-mRNA and the encoded p-Pol would comprise a functional IDA with a clear path connecting it to present day replication in cells [].