Phosphate is essential for all living systems, serving as a building block of genetic and metabolic machinery. However, it is unclear how phosphate could have assumed these central roles on primordial Earth, given its poor geochemical accessibility. We used systems biology approaches to explore the alternative hypothesis that a protometabolism could have emerged prior to the incorporation of phosphate. Surprisingly, we identified a cryptic phosphate-independent core metabolism producible from simple prebiotic compounds. This network is predicted to support the biosynthesis of a broad category of key biomolecules. Its enrichment for enzymes utilizing iron-sulfur clusters, and the fact that thermodynamic bottlenecks are more readily overcome by thioester rather than phosphate couplings, suggest that this network may constitute a “metabolic fossil” of an early phosphate-free nonenzymatic biochemistry. Our results corroborate and expand previous proposals that a putative thioester-based metabolism could have predated the incorporation of phosphate and an RNA-based genetic system.

The major finding we report below is the discovery of a phosphate-independent core metabolism hidden within this biosphere-level network. This core protometabolism is capable of supporting the synthesis of a broad set of biomolecules, including several amino acids and carboxylic acids. Statistical analysis of the physiochemical properties of enzymes within this network show an enrichment for iron-sulfur and transition metal coenzymes. By broadening our analysis of protometabolism with the inclusion of different types of coenzyme precursor couplings, we further show that thioesters, rather than phosphate, could have enabled this core metabolism to overcome energetic bottlenecks, supporting the feasibility of a metabolically rich thioester-based early biochemistry.

Here we address these questions using computational systems biology approaches originally developed for performing large-scale analyses of complex metabolic networks (). Similar approaches have been previously used to describe the biosphere-level metabolic changes that accompanied the transition to an oxic atmosphere, about 2.2 billion years ago (). Specifically, we use these and other computational methods to systematically study the size, architecture and physicochemical properties of phosphate-independent biochemical networks. Given that our goal is to shed light on processes that predate the estimated last universal common ancestor (LUCA) (), and given the long-term reshuffling of genes among organisms through horizontal-gene transfer, we focused our analysis on a global, biosphere-level biochemical network, which encompasses all known metabolic reactions across all organisms. In exploring the prebiotic relevance of metabolic reactions that in extant life are catalyzed by highly evolved, efficient, and specific protein-based enzymes, we implicitly formulate the hypothesis that many of such reactions could have been initially catalyzed to a much weaker and less specific extent by a number of small molecules. Such a hypothesis in itself is not new to origin of life research () and is supported by a large body of literature, both pertaining to individual small-molecule catalysts and reactions (), as well as to whole networks ().

Ligand field theory and the origin of life as an emergent feature of the periodic table of elements.

The thioester world hypothesis, and other phosphate-independent protometabolism models, are typically invoked to explain the prebiotic plausibility of general biochemical mechanisms, and are illustrated through specific reactions or pathways. Could systems biology approaches help achieve a more systematic and quantitative understanding of the biosynthetic potential of a putative pre-phosphate metabolic network? Is it at all possible that a phosphate-independent geochemical setting could support the emergence of a rich and complex organized biochemistry?

The alternative solution to this dilemma is that primitive forms of life could have initially emerged and endured without a major dependence on phosphate. Multiple scenarios for early metabolic pathways that do not rely on phosphate have been proposed (). In many of these scenarios, sulfur and iron are conjectured to have fulfilled major catalytic and energetic functions prior to the appearance of phosphate. Most notably, in the thioester world scenario (), thioesters are hypothesized to have played a role similar to the one played today by ATP. Thioesters are widespread in modern metabolism, primarily as Coenzyme A (CoA) derivatives (e.g., Acetyl-CoA), and are used as condensing agents, enabling the synthesis of heterogeneous biopolymers.

The ensuing dilemma of phosphate’s high importance in spite of its poor bioavailability is particularly challenging for early life, as primordial protocells would have needed both a readily available phosphate source and a simple mechanism for early phosphate acquisition. Currently, there is no consensus for a phosphate source in early life, with theories ranging from acid-mediated ion solubilization, high concentrations of reduced phosphorus species in early oceans, or accumulation during late heavy bombardment (). Even provided a phosphate source, the mechanisms of phosphate utilization and polymerization in early life remain debated ().

Among the many unanswered questions on life’s origin, the enigma of how phosphate ended up playing a prominent role in cellular biochemistry has been puzzling scientists for decades (), resurfacing in recent years in light of novel discoveries (). Phosphate is present in a large proportion of known biomolecules. It is an essential component of biochemical energy transduction (most notably through ATP), cofactors such as NADH, and information storage (in DNA and RNA polymers) (). However, phosphate is geochemically scarce and difficult to access, often serving as the limiting nutrient in a variety of modern ecosystems (). Phosphate is found in terrestrial and marine ecosystems, tightly complexed with rocks and minerals, requiring mechanisms for environmental extraction and transport ().

While most research on the evolution of living systems has been focused on sequences and genomes, some answers to fundamental questions about the emergence of life may be hidden in the architecture of the complex biochemical reaction networks that sustain the cell (). The field of metabolic network modeling and analysis is expanding as a major research area of relevance to multiple applications (). However, the use of such techniques to address fundamental questions on the emergence of living systems is still highly unexplored.

A larger metabolic network may have been reachable if phosphate-free versions of modern day coenzymes drove several primordial reactions. Like CoA, many modern day coenzymes contain nucleotide phosphate groups that are important for enzyme-binding but not directly involved in catalysis. For example, the redox coenzyme NAD contains adenine and phosphate, but facilitates electron transfer at the nicotinamide moiety ( Figure S4 A). By substituting CoA with pantheteine and implicitly assuming that oxidoreductase reactions could be coupled to primitive electron donors/acceptors instead of NAD(P)/FAD ( Figures S4 B and S4C), we found that the core network expands to nearly three times more metabolites (814), incorporating five more amino acids (K, R, L, V, and P), uracil and ribose ( Figure 4 Tables S4 A and S4B). Addition of these five amino acids to the repertoire of amino acids would have enabled broader catalytic capabilities, and paved the way for increased richness of peptides, once suitable and energetically supportable mechanisms for peptide synthesis became available (perhaps, initially driven by thioesters themselves []). Further, the formation of pyrimidines, pentoses and vitamins could have set the foundation for the assembly of nucleotide triphosphates and modern coenzymes upon the addition of phosphate ( Table S4 C).

We removed all phosphate-dependent reactions from biosphere-level metabolism, and identified the largest connected subnetwork. The gray lines represent reactions that are unreachable without phosphate, meaning there is no seed set capable of recovering this portion of the network without phosphate. The red lines are the reactions belonging to the phosphate-independent core network (identified in Figure 1 A) and the brown lines are reactions that are not included in the core, but still accessible from a phosphate-free seed. The blue lines correspond to phosphate-free reactions coupled to Coenzyme A, while the purple lines are the phosphate-free reactions coupled to nicotinamide or flavin coenzymes.

(C) For different maximum allowable concentration ratio between oxidized and reduced species (x axis) we counted the number of half reactions that overlapped with the reduction potential of NADH (y axis). The colored lines represent different temperatures. For example, if a 100-fold concentration ratio was permitted, approximately 200 non-phosphate half-reactions could have sufficed as plausible substitutes for NAD(P).

(B) Plausible substitutes for NADH. All half reactions representing a two electron reduction via hydrogen transfer (i.e., Oxidized + 2H+ 2e→ Reduced) were computed using KEGG reaction pairs database. For approximately 3/4 of all suitable half reactions (712/947), a standard reduction potential could be estimated using group contribution estimates of free energies of formations for KEGG metabolites (). The red line marks the standard reduction potential of NAD(P)/NAD(P)H.

(A) Phosphates in modern day coenzymes. The phosphates in Coenzyme A and NADH play no role in catalysis, while the phosphates in ATP are pivotal in catalysis.

Could thioester chemistry () serve as a solution to this energetic conundrum? Thioesters, proposed to have served as ancient condensing agents (), are widespread throughout central metabolic processes (e.g., Coenzyme A [CoA] derivatives in TCA cycle and lipid biosynthesis) and can facilitate energy-rich group transfer. While CoA contains phosphate, this serves mainly as a structural component with no catalytic role, motivating the hypothesis that ancient reactions may have relied on pantetheine, the simpler, phosphate-free variant of CoA () thought to be available in prebiotic environments (). We explored the energetic consequences of a primitive thioester-based reaction coupling scheme by substituting pantetheine for CoA in modern CoA-coupled reactions, followed by adding pantetheine into the seed set ( Figure 3 A, see STAR Methods ). These changes caused a 33 kJ/mol reduction in the bottlenecks that limited network expansion, enabling the viability of alternative metabolic pathways under physiologically realistic conditions () ( Figure 3 B, red line). Interestingly, these bottlenecks could not be easily overcome through an alternative phosphate-based coupling scheme, in which NTP-coupled reactions are substituted with either pyrophosphate or acetyl-phosphate ( Figure 3 B, blue line, Figure S3 ). The uniqueness of this behavior is also emphasized by the fact that removal of elements other than phosphate (e.g., sulfur or nitrogen) would dramatically limit the possibility of expansion ( Figure S3 ).

(A and B) This analysis aims at testing the uniqueness of the feasibility of a phosphorus-free network, in comparison to other hypothetical scenarios in which other atoms are missing from the initial seed set. Specifically, we compare the size of the expanded network under elimination of phosphorus, sulfur or nitrogen, with and without the thermodynamic feasibility constraints. Network expansion was performed for all KEGG reactions using a seed set without sulfur (no HS and pantetheine, green bars), phosphate (no pyrophosphate, blue bars), or nitrogen (no ammonia, nitrogen gas, and pantetheine, pink bars) (see also Venn diagram for specific seeds and atomic compositions). The left set of bars represents the size of expanded networks without imposition of thermodynamic constraints, while the right plot shows the network sizes when thermodynamic feasibility is imposed. It can be seen that removal of sulfur, without thermodynamic constraints, gives rise to a larger network relative to the P-free core network, due to the appearance of sugars and phosphosugars. However, when taking into account thermodynamic feasibility, the phosphate-independent network is the only one that can reach a large size. Thus, a thermodynamically feasible network expansion is conditional on the presence of sulfur and nitrogen, but not phosphate. This observation is in line with broad consensus regarding the prebiotic availability of sulfur (), as compared to the uncertain and debated prebiotic availability of phosphorus ().

So far, our analysis was focused on the core network structure, ignoring possible energetic constraints. In extant metabolism, phosphate-mediated group transfer plays a key role by driving unfavorable or energetically uphill reactions (). To investigate the energetic consequences of phosphate unavailability, we implemented a thermodynamically constrained network expansion algorithm, which blocks endergonic reactions with standard molar free energies above a cutoff value, τ ( Figure 3 B, black line). The network becomes dramatically limited to <12% of the core network asremains below 55 kJ/mol, preventing the condensation of oxalate and acetate to yield oxaloacetate. Energetic constraints of this magnitude would have prohibited the expansion of an early metabolism, given plausible ranges of intracellular metabolites concentrations () (see STAR Methods ). Consequently, a mechanism to overcome these thermodynamic bottlenecks would be essential for a phosphate-independent metabolism.

(B) Network expansion from the core seed set was performed after constraining reversibility of reactions exceeding a thermodynamic threshold. For each value of this threshold (x axis) we plot the size (black line) of the final expanded network, in terms of the number of metabolites (y axis). The effect of thioester coupling was simulated by adding a Coenzyme A substitute (pantetheine) to the seed set (red line). For comparison, a phosphate-coupled network was simulated by substituting nucleotide triphosphate-coupled phosphoryl-transfer reactions with pyrophosphate (or acetyl-phosphate) (blue line), followed by adding pyrophosphate (or acetyl-phosphate) to the seed set. Although significantly more metabolites are observed in the phosphate-coupled network with no thermodynamic barrier (due to the addition of sugars and phosphorylated intermediates), the network expansion process would not be thermodynamically feasible under physiologically realistic conditions (non-shaded region) (). Note that more than one third of reactions in KEGG lack a free energy estimate. In the main plot, all reactions with unknown free energies are assumed to be available (equivalent to assuming that they have a free energy barrier lower than then predefined threshold). Results are qualitatively very similar if all such reactions lacking free energy estimates are removed from the network (Top left inset).

(A) Models of ancient coenzymes (bottom reactions), based on present-day versions (top reactions), were constructed to simulate the roles of thioesters and phosphates in models of ancient biochemistry (see STAR Methods ).

In addition to a biased coenzyme usage, we investigated other features that could be associated with an ancient protometabolic network. First, motivated by the notion that early catalysts may have been composed of smaller polypeptides relative to present day enzymes (), we tested if the enzymes in the core network are on average smaller relative to all genome encoded enzymes. We found that sequences are considerably shorter for catalysts in the core network (median = 309) compared to all known metabolic enzymes (median = 354) ( Figure 2 C; one-tailed Kolmogorov-Smirnov:). Second, we thought of checking whether enzymes in the core network are enriched, in their composition, for amino acids producible by the core network itself. Such enrichment would be consistent with the expectation of self-sustainability and homeostasis in a protometabolic network, whereby the network would be capable of producing the building blocks necessary for replenishment and accumulation of its catalysts. We found indeed that core network enzymes are more highly composed of the 10 amino acids found within the core network relative to all known metabolic enzymes ( Figure 2 D; one-tailed Kolmogorov-Smirnov:). One potential simple reason for this enrichment could be attributed to the known sequence bias in FeS-proteins for cysteine, both of which are present in the core phosphate-free network. However, we found no detectable enrichment for cysteine in our core network enzymes (one-tailed Kolmogorov-Smirnov test,).

One fundamental property we focused on is the reliance of these enzymes on iron-sulfur or metal coenzymes, reflecting the notion that modern biochemistry emerged from mineral geochemistry () and that metal-based cofactors in modern day enzymes represent a living relic of this contingency (). Using a manually curated list of known protein-coenzyme pairs (), we found that enzymes within the core network were enriched for both zinc and iron-sulfur-dependent coenzymes relative to the full network ( Figure 2 A, Table S3 , Fisher’s exact test:). For comparison, amino acid derived-coenzymes were observed with comparable frequencies in the core and full KEGG networks, while nucleotide-derived coenzymes (e.g., enzyme-bound FAD, TPP, molybdopterin) were slightly depleted among reactions in the core network, highlighting the coordination between nucleotide and phosphate biochemistry. The occurrence of metal-associated enzymes within the core network was independently corroborated by identifying protein structures with verifiable metal ligands in a separate database (), allowing for the identification of KEGG reactions that rely on enzymes bound to metal ions. Out of the 47% (148/315) of the core network reactions with crystal structures available, 86% (127/148) relied on enzymes with a metal ligand, which constituted a significant enrichment relative to the full KEGG network ( Figure 2 B, Fisher’s exact test:).

For (B–D), we tested for enrichment within the phosphate-free core network enzymes compared to both the aerobic and anaerobic networks. Significance values are reported for the aerobic network, followed by the anaerobic network in parenthesis.

(C and D) Core network reactions relied more heavily on enzymes with metal cofactors relative to both the aerobic and anaerobic KEGG reactions. The enzymes in the core network are shorter (C) and biased in their amino acid composition (D) relative to either the aerobic or anaerobic KEGG network.

(B) Structural data support metal-protein enrichment in the core network. The number of reactions catalyzed by enzymes with available structural data was determined for all KEGG reactions using the MIPS database (). For all reactions with crystal structures available, we classified each reaction as either not having (−) a metal cofactor, or having (+) a metal cofactor. We tested for the enrichment of enzymes containing metal cofactors within the core network relative to both the aerobic KEGG network (green text), or the anaerobic network (parenthesis, blue text).

(A) The fraction of coenzyme-coupled KEGG reactions in the core network (red bars), the anaerobic KEGG network (blue bars) and the aerobic KEGG network (green bars) are compared. Each set of reactions is composed of a manually curated list of coenzyme-coupled reactions in KEGG (). We found that a significant number of reactions require iron-sulfur coenzymes (Fisher’s exact test, p < 0.05) and zinc (Fisher’s exact test, p < 0.05) within the core network relative to the aerobic KEGG network.

Taking a taxonomic approach, we found that enzymes in the core network are overrepresented within genomes (Monte Carlo permutation test,). The core network is also enriched with enzymes (E.C. numbers) and protein folds (SCOP) previously identified as likely components of the last universal common ancestors (LUCA) proteome () ( Figure 1 C, Fisher’s exact test:and, respectively), suggesting that a significant fraction of the reactions in this core network appeared in the earliest organisms. One limitation of using comparative phylogenetic analysis is that it only provides information as far back as LUCA. Furthermore, evolutionary processes like horizontal gene transfer () and cataclysmic extinction events hamper the elucidation of LUCAs metabolism with certainty. In order to investigate the pre-LUCA features of the phosphate-free core network, we examined the corresponding enzymes in terms of their basic physiochemical properties, with special attention given to properties proposed to be associated with ancient metabolism.

Is there independent evidence that this core phosphate-free network may indeed resemble the very early stages of biochemical processes? The plausibility of this early metabolism relies on the notion that catalysts for these reactions would initially have been much different than they are today, including assortments of short prebiotically-formed peptides (), metal-ion cofactors (), mineral catalysts (), or iron-rich clays (). Such initial catalysts would have been gradually replaced by longer and more complex genome-encoded protein-enzymes, potentially still retaining properties or components of the early catalysts (). Thus, we performed multiple analyses to test whether current enzymes within this network contain taxonomic, sequence, and biochemical signals pointing to potential associations with early modes of catalysis.

This core, phosphate-independent network is significantly enriched with reactions within primary metabolic pathways such as amino acid biosynthesis, pyruvate metabolism, glyoxylate/dicarboxylate, and the TCA cycle, as well as intermediary metabolic pathways such as C5-branched dibasic metabolism ( Figure 1 B, Fisher's exact test, Bejamini-Hochberg procedure, FDR < 0.05; Table S2 C). Further analysis showed significant enrichment for metabolites/reactions involved in various carbon fixation pathways, including the dicarboxylate-hydroxybutyrate cycle, the hydroxypropionate bi-cycle, and the reductive TCA cycle ( Table S2 D), which has been previously proposed as a primitive carbon fixation pathway in ancient autotrophs (). Enrichment for reactions involved in heterotrophic carbon utilization was also observed within pathways for one-carbon (serine pathway) and two-carbon assimilation (Krebs cycle, methylaspartate, and glyoxylate cycle) ( Table S2 D). In addition to a diverse central carbon metabolism, half of the proteinogenic amino acids (G, A, D, N, E, Q, S, T, C, and H) are producible, representing six of the ten amino acids observed in the Miller-Urey experiment (). In this network, building upon a core carbon, energy and nitrogen metabolism, hydrogen sulfide enables the production of sulfur-containing heterogeneous peptides like glutathione, as well as thioester derivatives like S-formyl and S-succinyl glutathione. Intermediates in the degradation and biosynthesis of more complex biomolecules are also observed; 5,6-dihydrouracil is an oxidized catabolic product of uracil and pyrrole is the basic building block for complex heterocyclic aromatic rings like heme ( Figure 1 A). Thus, we report the existence of a phosphate-independent core metabolic network reachable from simple putative prebiotic compounds.

We started by searching for regions of global metabolism that could be accessible starting from simple molecules likely to have been geochemically abundant on early Earth ( Figure 1 A). To this end we adopted the network expansion algorithm, which simulates the emergence of metabolic networks from a predefined set of compounds (). The algorithm adds metabolites and reactions to an initial seed set, iteratively asking whether any new reaction could take place given the available substrates, until convergence to a final set of reactions and metabolites (or “scope”) (see STAR Methods ). This algorithm is seed-set dependent, typically resulting in the recovery of a subset of reactions/metabolites within a defined metabolic network ( Figure S1 ). Network expansion was performed with a seed set of eight compounds thought to have been available in prebiotic environments, notably lacking phosphate ( Figure 1 A, STAR Methods ) (). Importantly, the set of seed molecules we defined contains simple carboxylic acids in the form of acetate and formate, which could be provided by either an abiotic mechanism or a primitive pathway for carbon fixation (e.g., a primitive variant of the Wood-Ljungdahl pathway [] or the reductive TCA cycle [], see also Discussion ). The resulting scope of this seed set consists of a fully connected network of 315 reactions and 260 metabolites ( Figure 1 A; Tables S2 A and S2B), the composition of which is robust to variations of the seed set compounds ( Figures S1 and S2 ). Although this network requires the addition of catalytically accessible carbon, nitrogen and sulfur sources ( Figure S1 ), acetate and formate were substitutable by several alternative carboxylic acids like pyruvate ( Figure S2 ).

(D) Scatter plot of the number of reactions (x axis) in each network expansion versus the fraction of the core network embedded in the final network (y axis). All large networks are greater than the 315 reactions obtained using acetate (black line), indicating that expansion from acetate represents a suitable lower bound for a phosphate-independent core metabolism. It should be noted that larger network ( > 350 reactions) resulted from carbohydrate sources in the seed (glucose), while slightly smaller networks were generated from carboxylic acids (acetate, oxaloacetate).

(C) Empirical CDFs for number of carbons in the seed set that give rise to small (black dashed line) and large (red continuous line) networks. Large networks were generated more frequently from smaller carbon substrates (two-tailed Kolmogorov-Smirnov test, p < 10 −41 ).

(B) Empirical CDFs for the average degree of reduction per carbon atom (y/x for substrate C, where xCO+ yH→ C + zH2O, see () in main text) for small (black dashed line) and large (red continuous line) networks. It can be seen that more highly oxidized carbon substrates led, with increased frequency, to larger networks (two-tailed Kolmogorov-Smirnov test, p < 10).

Network expansion was repeated with acetate and formate (see Figure 1 , main text) replaced by a single organic compound. This process was repeated for each KEGG molecule composed exclusively of C, H and O.

10 4 random samples of size k = 8 metabolites were chosen as seeds for network expansion. For each sample, at least one molecular species was required to contain the following elements: C, H, O, N and S. Network expansion was performed using each randomly assembled seed set. For each simulation, the final number of reactions was recorded (x axis). Next, the fraction of the core network recovered after network expansion was computed for each seed set (y axis). The color of each point represents the fractional abundance of carbon atoms in the scope of the simulation, highlighting the molecular heterogeneity between simulations. The positive correlation between the network size and the fraction of the core phosphate-free suggests that large (> 250 reactions) networks without phosphate contain a substantial fraction of core network reactions. Note that networks between 100 and 200 metabolites were typically composed of only CHO molecules, while networks > 250 metabolites contained a substantial number of molecules with nitrogen and sulfur.

(C) The core network reactions are enriched with enzyme functions (E.C.), protein folds (SCOP) and orthologous genes (COGs) proposed to be present in LUCA, relative to all known metabolic reactions (aerobic network) or to the oxygen-independent (anaerobic) portion of the complete network. () (Fisher’s exact test).

Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.

(B) Pathway enrichment analysis of KEGG pathways within the core network. The fractional abundance of pathway reactions within the core network are plotted for pathways with an FDR < 0.05.

(A) A network expansion algorithm was implemented using a simple set of seed compounds (bottom left box) and all balanced reactions in the KEGG database. The figure displays a simplified view of the resulting network, in which reactions are not explicitly shown, and metabolites are linked if they are interconverted through reactions that are responsible for the expansion. Node color indicates the time (iteration) at which the metabolite appears during the network expansion algorithm, while node size indicates the degree of that node, also indicative of the number of reactions added in the subsequent iteration. Note that major hub metabolites (including pyruvate, glutamate, and glycine—center of the network) are reachable after a few iterations from the seed (blue nodes). Catalytically important amino acids (e.g., His, Ser []) are producible in this network as well.

The first goal of our analysis was to evaluate the impact of removing all reactions and metabolites involving phosphates (or, more broadly, phosphorus) from metabolism. Rather than analyzing the metabolic networks of individual organisms, we aimed at uncovering effects at the level of the complete collection of all known biochemical reactions (see STAR Methods Tables S1 A and S1B). This “biosphere-level” metabolism (which we inferred from the KEGG database []) allowed us to explore the properties of putative early biochemical networks, beyond the organismal boundaries ().

Discussion

Specific hypotheses generated by our analysis could be testable in future work. For example, it would be interesting to extend currently available evidence of non-enzymatic catalysis of metabolic reactions to a larger set of reactions and potential catalysts, with and without the specific constraint of phosphate availability. In particular, one could test the possible role of previously identified small-molecule catalysts (e.g., amino acids, short peptides, and metal sulfides) in enabling reactions within the core network. While our calculations suggest that thioester chemistry had an initial thermodynamic advantage toward generating a surprisingly large and connected metabolism, this set of metabolites constitutes less than 20% of the complete phosphate-dependent set of known metabolites we know today. In future work it will be interesting to search for more evidence that the network dependent on thioesters may have been self-sustaining (i.e., capable of producing its own small-molecule catalysts), and for signatures of a putative thioester-to-phosphodiester transition.

While we cannot rule out possible alternative interpretations of our findings, such as a gradually evolved reliance on metabolic routes that make minimal use of phosphate, it is interesting to ask whether our result could help bridge a fundamental gap between geochemistry and biochemistry. Through the systematic inclusion of broader classes of geochemically plausible reactions, future versions of our analysis could provide more detailed and comprehensive models of early metabolism. The properties of the core phosphate-free network suggest that a thioester-based protometabolism may have started from a few, simple geochemically abundant molecules, and expanded to a surprisingly rich and diverse biochemistry, potentially a network-level “fossil” of biosphere metabolism, even prior to the appearance of the phosphate based genetic coding system. This network could have enabled the synthesis of a diverse set of (bio)chemical compounds, providing precursors for the subsequent rise of informational nucleic acid polymers. Whether and how such a primordial system could have been endowed with features essential for cellular life as we know it, such as collective autocatalysis and information processing, remains an open question.