Selective crystallization of an organic salt

The chemical scenario that leads to a continuous synthesis of RNA building blocks by just fluctuations of physical parameters is shown in Figs. 1 and 2a. The scenario starts with an aqueous solution of malononitrile 1 and different amidinium salts 2a-d (HCl or H 2 SO 4 salts, 400 mM), both recognized prebiotic compounds18. In addition, sodium nitrite and acetic acid are present to establish a slightly acidic pH of around 4. Under these conditions the amidine molecules (2a-d) are protonated, which leads to their chemical deactivation. This allows selective nitrosation of malononitrile 1 to give (hydroxyimino)malononitrile 3 in situ. Slow evaporation of water under ambient conditions, followed by gentle cooling to 8–10 °C resulted in crystallization of a salt from the ca. 1 M amidinium solution. This crystallization is very robust and resembles naturally occurring concentration processes. The resultant crystals had excellent quality for X-ray analysis, which showed that the salts are formed from the amidinium cations 2a-d and the (hydroxyimino)malononitrile anion of 3 (Fig. 2b). Interesting is the distance between the negatively charged oxygen in 3 and the positively charged H-bond donor centre of the amidininium units 2a-d. We determined distances between 1.85–1.95 Å, which is long for a salt bridge but right in the regime for a typical hydrogen bond. This is important because it is supposedly the reason for the comparably low melting temperatures of the salts, which we determined between 110 and 160 °C. The robustness and ease of crystallization establishes a first physical enrichment step that finishes the initial wet–dry phase with the deposition of these salt materials (Fig. 2b).

Fig. 1 RNA nucleoside formation pathway. A geothermal environment provides the right set up for the depicted transformations by establishing wet–dry cycles. The prebiotic starting materials are produced from a prebiotic atmosphere and washed into an aqueous environment (e.g. by rain). Major atmospheric components are written in larger letters, whereas minor components are written in smaller letters. Transformations are taking place in different environments, illustrated by various rivers (in light blue). Each environment provides the right setup for different chemistries, leading to several different chemical transformations. This geochemical setup leads to a set of canonical and non-canonical RNA building blocks by continuous synthesis (6a, m1G: R1 = O, R2 = Me, R3 = NH 2 ; 6b, ms2A: R1 = NH, R2 = H, R3 = SMe; 6c, A: R1 = NH, R2 = H, R3 = H; 6d, m2G: R1 = O, R2 = H, R3 = NHMe; 6e, m2 2 G: R1 = O, R2 = H, R3 = N(Me) 2 ; 6f, G: R1 = O, R2 = H, R3 = NH 2 ; 6g, DA: R1 = NH, R2 = H, R3 = NH 2 ; 6h, m2A: R1 = NH, R2 = H, R3 = Me) Full size image

Fig. 2 Chemical complexity created by physical fluctuations. a Relative changes of temperature (in blue) and pH (in red) are shown for each synthetic step for the continuous synthesis of purine RNA building blocks from small organic and inorganic molecules. Several wet–dry cycles establish fluctuations of the depicted physical parameters that enable the physical enrichment of intermediates. Gray backgrounds denote compounds that are enriched by crystallization from an aqueous solution. b Formation of an organic salt consisting of amidine derivatives 2a-d and (hydroxyimino)malononitrile 3. The salt is selectively crystalized by concentrating a dilute mixture of organic and inorganic compounds by slow evaporation. The crystal structures of the four crystalized organic salts are depicted (Supplementary Tables 1–4) Full size image

Nitroso-pyrimidine formation

When the obtained salts containing 2a-d and 3 are subsequently heated to their respective melting temperatures, transformation into the corresponding nitroso-pyrimidines (4a-d, Fig. 3a) occurs. The required temperatures between 110 and 160 °C could have been readily accessible under early Earth conditions, due to, for example, volcanic activity in geothermal fields or sunlight shining on dark surfaces. In order to investigate whether the nitroso-compounds 4a-d would form in parallel despite their varying structures and different melting points, the different salts were combined in a reaction flask and a temperature gradient (1 °C/5 min, from 100–160 °C) was applied to simulate soil that would slowly heat up. Subsequent 1H-NMR analysis indicated successful formation of the anticipated nitroso-pyrimidines 4a-d (Supplementary Fig. 1).

Fig. 3 Reaction scheme and physical enrichment of intermediates. a Dry-state reactions of salts containing 2a-d and 3 provide nitroso-pyrimidines 4a-d, which can be further diversified by hydrolysis (red arrows) or aminolysis (blue arrows) to give a set of nitroso-pyrimidines (nitrosoPys) 4a-i. In the presence of elementary Fe and Ni and dilute formic acid, formation of the formamidopyrimidines (FaPys) 5a-h as direct purine base precursors takes place. In square brackets: non-isolated reaction intermediates. b Second physical enrichment of the nitroso-pyrimidines isolated in high purity and yield. c Third physical enrichment of the formed FaPys 5a-h as nucleoside precursors from nitroso-pyrimidines Full size image

The resultant nitroso-pyrimidines are stable compounds with melting points typically >250 °C without decomposition. In addition we noted that the nitroso-pyrimidines are rather insoluble in water, which offers the possibility for a second physical enrichment step. Addition of water to the reaction mixture dissolves unreacted starting materials, leaving the nitroso-pyrimidines in basically NMR-pure form behind (Supplementary Fig. 2). In this model, one wet–dry cycle and two physical enrichment steps with a final rain shower or flooding would be sufficient to deposit a mixture of stable nitroso-pyrimidines (4a-d) in excellent purities and good chemical yields between 60 and 85% (Fig. 3a).

Diversification by hydrolysis and aminolysis

Depending on the composition and pH of the aqueous environment, which may or may not contain different amines, the nitroso-pyrimidines could undergo further hydrolysis and aminolysis reactions (Fig. 3a). Because these reactions are very slow under neutral conditions, we used dilute HCl to accelerate the processes for investigation. Importantly, we noted a high regioselectivity. Upon treatment overnight at room temperature with 0.5 M HCl, compounds 4a and 4c for example are hydrolyzed to afford the oxo-nitroso-pyrimidines 4f and 4i in near quantitative yields. Hydrolysis of 4b to product 4e was comparitively slower, and under our accelerated conditions a mixture of 4b and 4e was obtained. This inefficient conversion would be advantageous in a prebiotic context given that from 4b the canonical nucleoside adenosine (A) and its 2-thiomethyl derivative (ms2A) are derived later, whereas 4e gives rise to guanosine derivatives (G, m2G, m2 2 G, Fig. 3a). This allows for the simultaneous formation of canonical and non-canonical bases from the same precursor. In contrast to the 2-amino (4a,c) or 2-methyl (4d) substituted nitroso-pyrimidines, we noted that the 2-thiomethyl functionality in 4b and 4e was prone to undergo selective nucleophilic substitution. Reaction of 4e with different amines leads to efficient formation of the nitroso-pyrimidines 4g-i with the concomitant release of methanethiol. Due to its insolubility under basic conditions, nucleophilic substitutions of 4b are very inefficient. To confirm this, we partially hydrolyzed 4b to 4e in the presence of methylamine (300 mM) and dimethylamine (100 mM). The pH was carefully adjusted with Na 2 CO 3 to about pH 10. Compound 4b precipitated, while 4e stayed in solution, consequently protecting 4b from further reactions. It is in this context interesting that nucleosides that would form via aminolysis of 4b have not yet been found in nature. In contrast, 4e reacts efficiently and after 3–4 days at room temperature 4e is almost completely converted into 4g and 4h, which are direct precursors to the ubiquitous non-canonical RNA bases m2G and m2 2 G (Fig. 3a, Supplementary Fig. 3).

Thus, a few simple chemoselective and regioselective hydrolysis and aminolysis reactions affords a diverse mixture of differently substituted nitroso-pyrimidines (4b-d, f-i), all of which possess the right substitution pattern for the synthesis of naturally occuring canonical and non-canonical RNA nucleosides. Because all the formed nitroso-pyrimidines are poorly soluble in water at neutral pH, neutralizing the solutions leads to their efficient precipitation, providing a naturally occurring purification step (Fig. 3b). Importantly, all nitroso-pyrimidines that later give adenosine-derived nucleosides (4b-d) are soluble in water under acidic conditions, while the nitroso-compounds that are converted into guanosine-derived nucleosides (4g-i, except for 4f) are soluble under basic pH conditions. These properties allow for potentially divergent chemical pathways leading to A-derived and G-derived nucleosides (Fig. 3b, Supplementary Fig. 4).

Formamidopyrimidine formation as nucleobase precursor

The next wet–dry cycles allow for the formation and isolation of formamidopyrimidines (FaPys) 5a-h, from nitroso-pyrimidines 4 that are after their precipitation exposed to acidic conditions like dilute formic acid in the presence of elementary Fe or Ni, which are components of the Earth’s crust. This leads to reduction of the nitroso-pyrimidines 4 to aminopyrimidines as non-isolated reaction intermediates (Fig. 3a, in square brackets), which are immediately formylated to give the water soluble formamidopyrimidines (FaPys) 5a-h in a one-pot reaction. During the wet phase, Ni0 and Fe0 are converted into the biologically relevant Ni2+/Fe2+ ions, while formic acid decomposes into CO 2 and H 2 (Fig. 3c). In the reaction formic acid has a dual function. It provides the H-atoms needed for the reduction and it subsequently reacts with the formed aminopyrimidines to give FaPy compounds that were already shown to be prebiotically valid precursors to purine nucleosides18. The Ni/Fe/formic acid environment converts quantitatively all nitroso-compounds 4b-d,f-i into the corresponding FaPy compounds 5a-h (Fig. 3a). The water soluble FaPy compounds (under dilute basic conditions) can now be separated from unreacted Ni0/Fe0 and from the formed Ni2+/Fe2+ byproducts. Under slightly basic conditions (pH ≈ 9–10) the latter compounds precipitate as insoluble carbonate or hydroxide salts. The FaPys 5a-h are thus washed away, while the transition metal compounds sediment out. Final evaporation of water concentrates the reaction mixture, leading to the crystallization of the FaPy molecules. This third physical enrichment step, involving a wet–dry cycle, leads to the NMR-clean formation of FaPy-derivatives 5a-h (Fig. 3c).

The 2-(methylthio)-5-nitrosopyrimidine-4,6-diamine (4b) gives after treatment with formic acid and elementary Ni two different FaPy products depending on the reaction conditions. One of the products (5b) contains a thiomethyl group, while the other (5c) is desulfurated. The desulfurization reaction is simply controlled by time and can be promoted when H 2 is bubbled through the solution prior to reaction. Compound 5c is always generated in a stepwise reaction cascade via compound 5b which was confirmed by reacting 4b for 2 h and isolating the only product formed (5b, Fig. 3c). The isolated product was immediately subjected to the same conditions, which provided 5c after 7 days in pure form. This pathway via nitroso-pyrimidines thus affords 5c, the precursor for the canonical base A under plausible prebiotic conditions20. These conditions also lead to the parallel formation of the precursor to the ubiquitous 2-thiomethyl modification (ms2A), which is today found in all three domains of life.

Formation of canonical and non-canonical nucleosides

All of the prepared FaPy compounds undergo rapid and regioselective condensations with ribose when they are present in the same dry-state environment (Fig. 4). We do not assume that ribose was formed at the same location together with the FaPy compounds since the required carbohydrate chemistry may be incompatible. Several models are available, however, that show ribose formation in different physical environments21,22,23,24. Even though ribose and FaPys might have formed separately, the water solubility of the FaPys and of ribose allows them to be washed into the same environment by rain or flooding. Evaporation of water in the last wet–dry cycle would enable a condensation reaction under dry-state conditions. Indeed, the physically enriched FaPy compounds (5a-h) engage in a rapid reaction with ribose to give the corresponding FaPy-ribosides. Upon dissolution in water and subsequent heating under basic conditions, all four expected purine α/β-ribofuranosides (α/β−f) and α/β-pyranosides (α/β−p) are obtained (6a-h, Fig. 4a), completing the last wet–dry cycle. The LC-MS traces of the reactions using both UV- and MS-detection are shown in Fig. 4b. To ensure correct structural assignment we chemically synthesized some of the expected products and performed co-injection studies (Supplementary Methods). These experiments show that the major isomers are the naturally occurring β-configured pyranosides and furanosides. Pyranosides are building blocks for pyranosyl-RNA, which was suggested to be a potential RNA predecessor12. Therefore, our scenario delivers the building blocks for this pre-RNA and for RNA. As such it provides the basis for the chemical transition from one genetic polymer to the other directed by selection pressure. Importantly, our continuous synthetic pathway provides next to the canonical bases A and G also the non-canonical β-furanosyl-nucleosides (β−f) m2G, m2 2 G, m1G, ms2A and m2A (in red, Fig. 4b), arguing that the early RNA polymer was structurally already more complex regarding the nucleobases. The ribosylation of the FaPys leading to non-canonical nucleosides is equally efficient to the formation of A and G, with yields between 15 and 60%. Interestingly, we noted that for some A derivatives (m2A, DA and A) other regioisomers were found as well. These isomers were not formed when pure FaPy starting materials were used that were not derived from our continuous synthesis. We believe that these isomers might be the N3-connected nucleosides, previously proposed by Wächtershäuser for homo-purine RNA25. Despite the presence of these side products, we observe efficient N9-nucleoside formation with remarkable yields of up to 60% for the canonical and the non-canonical nucleosides. This work demonstrates that the non-canonical compounds could plausibly have formed as companion and potential competitor compounds in parallel to the canonical nucleosides.