Here, we report an in‐depth evolutionary analysis of Type II reaction center proteins including Bayesian relaxed molecular clocks under various scenarios for the origin of photosynthesis. The data presented here indicate that a photosystem with the structural and functional requirements to support the oxidation of water to oxygen could have arisen in the early Archean and long before the most recent common ancestor of Cyanobacteria.

Based on the phylogeny of reaction center proteins, several stages in the evolution of oxygenic photosynthesis can be envisaged: The earliest of these stages is the divergence of Type I and Type II reaction center proteins (1, Figure 1 b); this is then followed by the divergence of the anoxygenic family (L/M) and the oxygenic family (D1/D2) of Type II reaction center proteins (2), then by the duplication event that led to the divergence of D1 and D2 (3), and the subsequent (7) gene duplication events and specializations that created the known diversity of D1 forms, which ultimately resulted in the emergence of the standard form of D1. Because a photosystem made of a D0 had already acquired some of the fundamental features required to oxidize water such as highly oxidizing chlorophyll cofactors and the capacity to generate the neutral tyrosyl radical at each side of the reaction center: Then, it can be suggested that some of the earliest stages specific to the evolution of PSII and oxygenic photosynthesis had occurred between stages 2 and 3 as depicted in Figure 1 b. Therefore, the span of time between D0 and the ancestral standard form of D1 (marked 8 in Figure 1 c) represents the duration of the evolutionary trajectory of PSII from a simpler homodimeric highly oxidizing reaction center to the more complex enzyme inherited by all organisms capable of oxygenic photosynthesis. We denote this span of time by ΔT. If ΔT is small, such as a few million years or less for example, then the evolution of oxygenic photosynthesis may be better described as a sudden and fast process only getting started shortly before the GOE as suggested by some recent analyses (Shih, Hemp et al., 2017 ; Ward, Kirschvink, & Fischer, 2016 ). On the other hand, if ΔT is large: Several hundred million years or more for example, then the earliest stages in the evolution of oxygenic photosynthesis could significantly predate the GOE as suggested by some geochemical (Mukhopadhyay et al., 2014 ; Planavsky et al., 2014 ; Satkoski, Beukes, Li, Beard, & Johnson, 2015 ) and phylogenetic data (Blank & Sanchez‐Baracaldo, 2010 ; Schirrmeister, de Vos, Antonelli, & Bagheri, 2013 ).

Several types of D1 can be distinguished phylogenetically (Cardona, Murray, & Rutherford, 2015 ) and their evolution is schematized in Figure 1 c. The early evolving forms, referred to as atypical D1 forms (G0, G1, G2 in Figure 1 ), are characterized by the absence of some, but not all, of the ligands to the Mn 4 CaO 5 cluster and have been recently found to be involved in the synthesis of chlorophyll f , which supports oxygenic photosynthesis using low energy far‐red light (Ho, Shen, Canniffe, Zhao, & Bryant, 2016 ; Nurnberg et al., 2018 ); or the inactivation of PSII when anaerobic processes are being carried out (Murray, 2012 ; Wegener, Nagarajan, & Pakrasi, 2015 ). The late evolving forms, referred to as the standard D1 forms, are characterized by a complete set of ligands to the Mn 4 CaO 5 cluster and are the main D1 used for water oxidation. Among the standard forms, there are also several types, which have been roughly subdivided into two groups: the microaerobic forms of D1 (G3) and the dominant form of D1 (G4). The microaerobic forms are suspected to be expressed only under low‐oxygen conditions. The dominant form, G4, is the main D1 used for water oxidation by all Cyanobacteria and photosynthetic eukaryotes. Most Cyanobacteria carry in their genomes an array of different D1 types, yet every strain has at least one dominant form of D1 (G4). Therefore, all Cyanobacteria descended from a common ancestor that already had evolved efficient oxygenic photosynthesis, had a dominant form of D1, and was able to assemble a standard PSII virtually indistinguishable from that of later evolving strains. Furthermore, because the atypical D1 forms support or regulate oxygenic photosynthesis under specific environmental conditions it can be argued that when these branched out water oxidation to oxygen had already evolved.

As a result of the monophyletic relationship of D1 and D2 and the conserved structural and functional characteristics between these two proteins, it is possible to reconstruct traits of the ancestral photosystem. Some of the conserved traits, present in both D1 and D2, but absent in L and M, suggest that the ancestral homodimeric photosystem, made of a D0 dimer, was already unlike any of the known anoxygenic Type II reaction centers and had acquired characteristics associated with the highly oxidizing potential required for water oxidation (Cardona, 2016 ; Cardona, Sedoud, Cox, & Rutherford, 2012 ; Rutherford & Faller, 2003 ; Rutherford & Nitschke, 1996 ). One of these conserved traits is a redox tyrosine–histidine pair strictly conserved in both D1 and D2, Y Z ‐H190 and Y D ‐H189, respectively. The presence of these tyrosine–histidine pairs indicates that the midpoint potential (E m ) of the photochemical chlorophylls at the heart of the reaction center was oxidizing enough to generate the neutral tyrosyl radical on either side of the homodimeric reaction center (Rutherford & Faller, 2003 ; Rutherford & Nitschke, 1996 ). That is an E m of at least 1 V (DeFelippis, Murthy, Faraggi, & Klapper, 1989 ; DeFelippis et al., 1991 ), sufficient to drive the oxidation of water to oxygen, which has an E m of 0.82 V at pH 7 (Dau & Zaharieva, 2009 ; Tachibana, Vayssieres, & Durrant, 2012 ). Based on this and other arguments, Rutherford and Nitschke ( 1996 ) suggested that before the gene duplication that led to D1 and D2, this ancestral photosystem was well on its way toward the evolution of water oxidation, and may have been able to oxidize water, even if only inefficiently.

There is no doubt that D1, D2, L, and M share a common origin: Beanland ( 1990 ) was the first to record this but has been followed by many others (Cardona, 2015 ; Nitschke & Rutherford, 1991 ; Sadekar et al., 2006 ). That is to say that D1, D2, L and M, all descended from a single protein (denoted II in Figure 1 ). The earliest event in the evolution of Type II reaction centers can be described as the divergence of this ancestral protein into two new forms, one ancestral to D1 and D2, the oxygenic branch; and a second one ancestral to L and M, the anoxygenic branch (Figure 1 ). Hence, D1 and D2 originated from a gene duplication event and together make a monophyletic clade of Type II reaction center proteins, distinct from that which gave rise to L and M (Cardona, 2015 , 2016 ). The ancestral protein to D1 and D2 will be referred to as D0 and the ancestral protein to L and M will be referred to as K.

Evolution of Type II reaction center proteins. (a) A maximum likelihood phylogeny of Type II reaction center proteins. (b) A schematic representation of the phylogeny shown in (a). All reaction centers have common ancestry and descended from a homodimeric reaction center, marked A. From A, two new reaction centers emerged, one ancestral to all Type II reaction centers (II) and a second ancestral to all Type I reaction centers. This is the earliest diversification event of reaction center proteins that can be inferred from sequence and structural data and it is marked 1. The ancestral Type II reaction center protein (II) gave rise to two new proteins, one ancestral to D1 and D2, named here D0 and a second ancestral to L and M named K. The ancestral L and M subunits further diversify into Chloroflexi‐type and Proteobacteria‐type L and M subunits (5). Step 6 indicates that Type I reaction center proteins also diversified in parallel to Type II reaction center proteins. (c) Evolution of cyanobacterial D1 and D2, modified from Cardona et al. (). G0, G1, and G2 represent atypical D1 forms, and G3 and G4 standard D1 forms. ΔT marks the span of time between D0 and the appearance of the ancestral standard form of D1, which characterizes PSII and predates the most recent common ancestor of all known Cyanobacteria capable of oxygenic photosynthesis [Colour figure can be viewed at wileyonlinelibrary.com

The evolution of Type II reaction center proteins has been described and discussed in some detail before (Beanland, 1990 ; Blankenship, 1992 ; Cardona, 2015 , 2016 ; Nitschke & Rutherford, 1991 ; Rutherford & Nitschke, 1996 ; Sadekar, Raymond, & Blankenship, 2006 ) and it is presented and schematized in Figure 1 . Type II reaction centers can be divided into two major families: the oxygenic and the anoxygenic Type II reaction centers. The oxygenic Type II reaction center is also known as PSII, and its electron transfer core is made of two homologous reaction center proteins, D1 and D2, exclusively found in Cyanobacteria and photosynthetic eukaryotes. The Mn 4 CaO 5 cluster is bound by D1 and the core antenna protein CP43 (Ferreira, Iverson, Maghlaoui, Barber, & Iwata, 2004 ). On the other hand, anoxygenic Type II reaction centers are found in phototrophic members of the phyla Proteobacteria, Chloroflexi, and Gemmatimonadetes, with the latter obtaining the reaction center via horizontal gene transfer (HGT) from a gammaproteobacterium (Zeng, Feng, Medova, Dean, & Koblizek, 2014 ). The core subunits of the anoxygenic Type II reaction centers are known as L and M and lack an oxygen‐evolving cluster.

The transition from anoxygenic to oxygenic photosynthesis initiated when an ancestral photochemical reaction center evolved the capacity to oxidize water to oxygen (Rutherford, 1989 ). Today, water oxidation is catalyzed in the Mn 4 CaO 5 oxygen‐evolving cluster of Photosystem II (PSII) of Cyanobacteria and photosynthetic eukaryotes. How and when Type II reaction centers diversified, and how and when one of these reaction centers evolved the capacity to oxidize water are questions that still remain to be answered. While there is agreement that by 3.5 Ga (billion years before the present) a form of anoxygenic photoautotrophy had already evolved (Butterfield, 2015 ; Nisbet & Fowler, 2014 ; Tice & Lowe, 2004 ), the sedimentological and isotopic evidence for the origin of oxygenic photosynthesis has been interpreted to range from 3.7 Ga (Frei et al., 2016 ; Rosing & Frei, 2004 ) to the Great Oxidation Event (GOE) at ~2.4 Ga (Johnson et al., 2013 ). Molecular clock studies have generated a wider range of age estimates for the origin of Cyanobacteria spanning between 3.5 Ga (Falcon, Magallon, & Castillo, 2010 ) and <2.0 Ga (Betts et al., 2018 ; Shih, Hemp, Ward, Matzke, & Fischer, 2017 ). There is thus great uncertainty and no consensus. For this reason, determining when PSII evolved the capacity to oxidize water should greatly advance our understanding of the origin of oxygenic photosynthesis.

2 RESULTS

2.1 Change in sequence identity as a function of time A first approximation to the evolution of Type II reaction centers as a function of time can be derived from the level of sequence identity between D1 and D2 of different species with known or approximated divergence times as shown in Figure 2. For example, the D1 protein of the dicotyledon Arabidopsis thaliana shares 99.7% amino acid sequence identity with that of the dicotyledon Populus trichocarpa, and these are estimated to have diverged between 127.2 and 82.8 Ma (Clarke, Warnock, & Donoghue, 2011), see Figure 2 and Supporting information Table S1. On the other hand, A. thaliana's D1 shares 87.7% sequence identity with that of a unicellular red alga Cyanidioschyzon merolae. Complex multicellular red algae are known to have diverged at least 1.0 Ga ago (Butterfield, 2000; Gibson et al., 2017) and recently described fossils could push this date back to 1.6 Ga (Bengtson, Sallstedt, Belivanova, & Whitehouse, 2017; Sallstedt, Bengtson, Broman, Crill, & Canfield, 2018). At the other end of this evolutionary line, the three dominant forms of D1 from Gloeobacter violaceous (G4) share on average 79.2% sequence identity with that of C. merolae or 78.5% with that of A. thaliana. If the percentage of sequence identity between pairs of species is plotted as a function of their divergence time, a linear decrease of identity is observed among reaction center proteins at a rate of less than 1% per 100 million years (Supporting information Table S2). The trend in Figure 2 indicates that the rate of evolution of D1 and D2 since the GOE and since the emergence of photosynthetic eukaryotes has remained very slow and stable until the present time, if considered over a large geological time scale, with less than 20% change in sequence identity in the past 2.0 Ga. Figure 2 Open in figure viewer PowerPoint Gloeobacter violaceous in comparison with that of Cyanidioschyzon merolae. The light orange bar marks the GOE. The dashed line is fitted from a linear function and shows that over a period of at least 2.0 Ga, no dramatic changes in the rates of evolution of D1 and D2 are observed. The red dashed lines show an extrapolation of current rates of evolution throughout Earth's history. This line highlights that the rate is too slow for the divergence of D1 and D2 to have started right before the GOE. The gray dots around 3.5–3.8 Ga mark a speculative timing for the earliest events in the history of photosynthesis: the divergence of D1 and D2 (~29% sequence identity), the divergence of anoxygenic (L/M) and oxygenic (D1/D2) reaction center proteins (~17%), and the divergence of Type I and Type II reaction center proteins (≤10%). The curved blue line highlights that any scenario for the diversification of reaction centers after the origin of life requires faster rates of evolution at the earliest stages in the evolution of photosynthesis [Colour figure can be viewed at Decrease of sequence identity of D1 and D2 proteins as a function of divergence time. D1 subunits are shown in gray and D2 in orange. The divergence time between pairs of species is plotted against the level of sequence identity as tabulated in Supplementary Table S1 . The red circle, placed at 79.2%, corresponds to the average sequence identity of the three distinct Group 4 D1 sequences ofin comparison with that of. The light orange bar marks the GOE. The dashed line is fitted from a linear function and shows that over a period of at least 2.0 Ga, no dramatic changes in the rates of evolution of D1 and D2 are observed. The red dashed lines show an extrapolation of current rates of evolution throughout Earth's history. This line highlights that the rate is too slow for the divergence of D1 and D2 to have started right before the GOE. The gray dots around 3.5–3.8 Ga mark a speculative timing for the earliest events in the history of photosynthesis: the divergence of D1 and D2 (~29% sequence identity), the divergence of anoxygenic (L/M) and oxygenic (D1/D2) reaction center proteins (~17%), and the divergence of Type I and Type II reaction center proteins (≤10%). The curved blue line highlights that any scenario for the diversification of reaction centers after the origin of life requires faster rates of evolution at the earliest stages in the evolution of photosynthesis [Colour figure can be viewed at wileyonlinelibrary.com Now, if the most recent common ancestor (MRCA) of Cyanobacteria capable of oxygenic photosynthesis, defined as the MRCA of the genus Gloeobacter and all other extant photosynthetic strains, existed hundreds of millions of years before the GOE, this would presuppose an even slower rate of evolution of the core subunits of PSII. In contrast, if the rate of evolution of D1 and D2 are taken at face value, following the roughly uniform rate observed in photosynthetic eukaryotes, this would locate the divergence of Gloeobacter after the GOE (Figure 2, red spot): In consequence, the older the MRCA of Cyanobacteria, the slower the rate of evolution of the dominant form of D1 and D2. Therefore, large uncertainties in the fossil record of photosynthetic eukaryotes would result in only small changes to this trend. For example, if the divergence of red algae occurred as late as 1.0 Ga or as early as 2.0 Ga, this will only cause a small shift in the overall rate. Or for example, if the MRCA of angiosperms is actually 100 million years older than currently understood, this would result in almost a negligible change in the rate of evolution of the dominant form of D1 and D2 compared over the large time scale of the planet. Let us reiterate that all the evidence suggests that all reaction center proteins originated from a single ancestral protein that diversified as the multiple groups of photosynthetic bacteria arose. As a result of this common ancestry, any standard D1 shares on average about 29% sequence identity with any D2 across their entire sequence. Any standard D1 or D2 shares on average 17% sequence identity with any L or M. The level of sequence identity falls well below 10% if any Type II reaction center protein is compared with any Type I reaction center protein (Cardona, 2015). As a result of this, the rate of evolution of D1 and D2 since the GOE, as estimated from the decrease of sequence identity (<1% per 0.1 Ga), is too slow to account for the evolution of photochemical reaction centers within a reasonable amount of time (Figure 2, dashed line). In other words, the rate of evolution of reaction center proteins since the origin of life could not have been constant, and any scenario for the origin of photochemical reaction centers at any point in the Archean requires initially faster rates of evolution than any rate observed since the Proterozoic (Figure 2, light blue line). Taking into consideration that D1 and D2 share only about 29% sequence identity, two other observations can be made, as illustrated in Figure 2: (a) that the duplication that led to the divergence of D1 and D2 is more likely to have occurred closer to the origin of the primordial reaction center proteins at the origin of photosynthesis in the early Archean, than closer to or after the GOE; and (b) that the MRCA of Cyanobacteria is more likely to have existed closer to the GOE than closer to the origin of photosynthesis.

2.2 Bayesian relaxed molecular clock analysis The simple approach used above indicates that the divergence of D1 and D2 is likely placed well before the GOE, to confirm this observation we applied a molecular clock to the phylogeny of Type II reaction center proteins. Figure 3 shows a Bayesian relaxed log‐normal autocorrelated molecular clock built using the CAT + GTR + Γ model allowing for flexible boundaries on the calibration points (Lartillot, Lepage, & Blanquart, 2009). As an informed starting point, we first specified the age of the root (root prior) at 3.5 Ga with a standard deviation of 0.05 Ga. That is to say, that the most ancestral form of a Type II reaction center protein is assumed to have already evolved by 3.5 Ga. Under these conditions, the last common ancestral protein to the standard form of D1 prior to the divergence of the G3 and G4 types (Figure 3, green dot) is timed at 2.19 ± 0.22 Ga. On the other hand, D0 (Figure 3, orange dot) is timed at 3.22 ± 0.19 Ga. It follows then that the difference in time between D0 and the first standard form of D1, ΔT, is 1.02 Ga, with the level of uncertainty on the estimated ages resulting in a range for ΔT between 1.44 and 0.60 Ga (see Table 1 and Figure 4a and b). This large ΔT agrees with the predictions made from the comparisons of sequence identity plotted in Figure 2. Figure 3 Open in figure viewer PowerPoint Relaxed molecular clock of Type II reaction center proteins. A log‐normal autocorrelated relaxed clock is shown implementing a CAT + GTR + Γ model with flexible boundaries on the calibration points. Red dots are calibration points as described in Materials and Methods . The gray dot denoted II represents the ancestral Type II reaction center protein, as schematized in Figure 1 . The orange dot (D0) marks the initial divergence of D1 and D2. The violet dot marks the divergence point between G2 atypical D1 sequences and standard D1. The green dot marks the divergence point between the microaerobic D1 forms (G3) and the dominant form of D1 (G4). This point represents the last common ancestral protein to all standard D1 forms predating crown group Cyanobacteria. The blue dot represents the origin of the dominant form of D1 inherited by all extant Cyanobacteria and photosynthetic eukaryotes. The gray bars represent the standard error of the estimated divergence times at the nodes. The orange bar shows the GOE [Colour figure can be viewed at wileyonlinelibrary.com Table 1. Effect on ΔT assuming different ages for the most ancestral Type II reaction center proteins Root prior (Ga) II D0 Ancestral standard D1 ΔT (range) CAT + GTR + Γ 3.2 3.25 ± 0.05 2.80 ± 0.16 1.99 ± 0.19 0.80 (1.17–0.44) 3.5 3.54 ± 0.05 3.22 ± 0.19 2.19 ± 0.24 1.02 (1.44–0.60) 3.8 3.83 ± 0.05 3.44 ± 0.21 2.27 ± 0.24 1.17 (1.62–0.71) 4.1 4.12 ± 0.05 3.71 ± 0.23 2.38 ± 0.25 1.32 (1.81–0.84) CAT + GTR + Γ and removing calibration point 11 3.5 3.52 ± 0.05 3.00 ± 0.29 1.78 ± 0.25 1.22 (1.77–0.66) 3.8 3.81 ± 0.05 3.15 ± 0.29 1.77 ± 0.25 1.37 (1.93–0.81) LG + Γ 3.2 3.27 ± 0.05 3.19 ± 0.08 2.51 ± 0.13 0.68 (0.89–0.46) 3.5 3.53 ± 0.05 3.40 ± 0.09 2.64 ± 0.15 0.77 (1.00–0.53) 3.8 3.81 ± 0.05 3.64 ± 0.12 2.77 ± 0.17 0.88 (1.16–0.58) 4.1 4.10 ± 0.05 3.91 ± 0.14 2.90 ± 0.19 1.01 (1.34–0.68) LG + Γ and removing calibration point 11 3.5 3.49 ± 0.05 3.18 ± 0.19 2.30 ± 0.20 0.87 (1.26–0.48) 3.8 3.79 ± 0.05 3.52 ± 0.19 2.55 ± 0.22 1.38 (0.97–0.55) Figure 4 Open in figure viewer PowerPoint Effect of model selection on estimated divergence times. (a) Divergence times of key nodes in the evolution of Type II reaction centers as a function of the root prior. The root prior was varied from 3.2 to 4.1 ± 0.05 Ga under a CAT + GTR + Γ model. The colored dots match selected nodes of interest in Figure 3 . The thick gray bar marks the GOE and the narrow bar marks the minimum accepted age for the origin of photosynthesis. (b) Identical to (a) but using a LG + Γ model of amino acid substitutions. (c) Divergence times of key nodes assuming a root prior of 3.5 Ga as a function of the standard deviation on the root. The standard deviation was varied from 0.025 to 0.4 Ga under a CAT + GTR + Γ model. (d) Identical to (c) but using a LG + Γ model. In every case, D0 is the oldest node after the root and the magnitude of ΔT is always in the range of a billion years [Colour figure can be viewed at wileyonlinelibrary.com To test the effect of different root priors on our results, we varied the age of the root and the standard deviation over a broad range. Table 1 lists estimates of divergence times of key ancestral Type II reaction center proteins and the respective ΔT value using different root priors. For example, under the assumption that Type II reaction centers had already evolved by 3.8 Ga ago (Czaja et al., 2013; Nisbet & Fowler, 2014; Rosing, 1999), ΔT is found to be centered at 1.17 Ga. Similarly, if it is assumed to be a late event occurring at 3.2 Ga, though unlikely, ΔT is still 0.80 Ga. Furthermore, increasing the standard deviation on the root prior pushes the timing of the earliest events in the evolution of Type II reaction centers to even older ages rather than younger ages, see Table 2 and Figure 4c and d. For example, a root prior of 3.5 Ga with a standard deviation of 0.1 Ga pushes the estimated time for the root to 3.65 Ga, making D0 3.30 ± 0.27 and generating a ΔT of over a billion years. Table 2. Effect of varying the standard deviation (SD) on the root prior at 3.5 Ga SD (Ga) II D0 Ancestral standard D1 ΔT (range) CAT + GTR + Γ 0.025 3.50 ± 0.03 2.99 ± 0.22 2.07 ± 0.22 0.91 (1.31–0.46) 0.05 3.54 ± 0.05 3.22 ± 0.19 2.19 ± 0.24 1.02 (1.44–0.60) 0.10 3.65 ± 0.10 3.29 ± 0.26 2.22 ± 0.23 1.07 (1.56–0.57) 0.20 4.00 ± 0.19 3.61 ± 0.32 2.32 ± 0.25 1.32 (1.87–0.70) 0.40 4.82 ± 0.38 4.55 ± 0.44 2.50 ± 0.28 1.74 (2.48 – 1.01) LG + Γ 0.025 3.51 ± 0.02 3.36 ± 0.09 2.61 ± 0.15 0.75 (0.98–0.51) 0.05 3.53 ± 0.05 3.37 ± 0.11 2.62 ± 0.15 0.75 (1.00–0.48) 0.10 3.60 ± 0.09 3.45 ± 0.13 2.67 ± 0.16 0.79 (1.07–0.50) 0.20 3.75 ± 0.14 3.62 ± 0.16 2.74 ± 0.18 0.88 (1.21–0.53) 0.40 3.92 ± 0.21 3.79 ± 0.21 2.83 ± 0.21 0.96 (1.37–0.53) The Bayesian clock using flexible boundaries on the calibration points consistently produced ages for the divergence of the D2 subunit of G. violaceous and the dominant form of D1 (G4) after the GOE, similar to the ages reported by Shih, Hemp et al. (2017). Yet, previous molecular clocks have suggested that the MRCA of Cyanobacteria might predate the GOE (Schirrmeister, Gugger, & Donoghue, 2015), so we also performed a similar analysis that allowed us to explore this scenario. This was achieved using an empirical amino acid substitution model (LG + Γ) instead of the non‐parametric approach described above. We found this to be the only way to locate the D2 of G. violaceous and the dominant form of D1 (G4) before the GOE. The effect of less flexible boundaries on the estimated divergence times is shown in Tables 1 and 2, and Figure 4b and d. For example, assuming a root at 3.5 ± 0.05 Ga, the estimated divergence time for the standard form of D1 becomes 2.64 ± 0.15 Ga and pushes D0 back to 3.40 ± 0.09 Ga, making ΔT 0.77 Ga. On the other hand, if we allowed flexibility on the root prior by increasing the standard deviation to 0.4 Ga (Table 2), the estimated divergence time for the standard form of D1 becomes 2.83 ± 0.21, but the estimated age of the root is pushed back to 3.92 ± 0.21 Ga with D0 at 3.79 ± 0.21, making ΔT about a billion years. Overall, placing the MRCA of Cyanobacteria before the GOE pushes the gene duplication event that led to the divergence of D1 and D2 even closer to the origin of Type II reaction centers and to the origin of photosynthesis, just as predicted by the comparison of the level of sequence identity.

2.3 Rates of evolution The inferences derived from Figure 2 revealed that the rates of evolution had to be faster in the initial stages during the Archean compared with the Proterozoic, even when ΔT is as large as one billion years. To gain a better understanding of the changes of the rate of evolution of Type II reaction center proteins, we plotted the rates as a function of divergence time. In Figure 5a, the rate of evolution (ν) of each node in the tree, expressed as amino acid substitutions per site per unit of time, is plotted against the estimated divergence time for each respective node. It can be seen that the rate at the earliest stage is much faster than the rates observed since the Proterozoic. Thus, faster rates are necessary to explain the origin and evolution of Type II reaction centers at any point in the Archean and regardless of when exactly photosynthesis originated, as seen in Figure 5b. The decrease in the rate of evolution is consistent with the observations derived from Figure 2 and can be roughly fitted with a first‐order exponential decay curve (fitting parameters are presented in Supporting information Table S3). Figure 5a additionally shows that L and M have been evolving at a faster rate than D1 and D2. From this slow‐down of the rates, it can be calculated that since each respective duplication event (stages 2 and 4 in Figure 1) it took about 168 million years for D1 and D2 to fall to 50% sequence identity and about 115 million years for the same to occur for L and M. Figure 5 Open in figure viewer PowerPoint Rates of evolution as a function of time. (a) Change in the rate of evolution of oxygenic (gray) and anoxygenic (orange) Type II reaction center proteins. The rates correspond to the tree in Figure 3 , assuming an origin of photosynthesis at about 3.5 Ga. The dashed lines represent a fit of a single‐component exponential decay and the rates are given as amino acid substitutions per site per million years. (b) Changes in the rate of evolution constraining the root to younger and younger ages. The red curve farthest to the right was calculated using a root prior of 4.2 ± 0.05 Ga, while the green curve farthest to the left was calculated using a root prior of 0.8 ± 0.05 Ga. Younger divergence times imply initial faster rates of evolution. (c) Change in the rate of evolution as a function of ΔT, the dashed lines represent a fit to a power law function. The curve in orange was calculated using ΔT values subtracting the mean average of the divergence times of D0 and the ancestral standard D1. The curve in gray was calculated using ΔT values subtracting the minimum age of D0 and the maximum age for ancestral standard D1 [Colour figure can be viewed at wileyonlinelibrary.com The maximum rate of evolution in the D1 and D2 family of reaction center proteins is placed at the node that represents D0. We will refer to this rate as ν max . Figure 5a shows that the rate of evolution flattens out to comparatively slow rates during the Proterozoic. These rates correspond to the rates of Group 4 D1 and D2. We will refer to the average rate of evolution during this zone of slow change as ν min and it is calculated as the average rate from each node in Group 4 D1 and D2. In Figure 5a, ν max is 5.03 ± 1.42 amino acid substitutions per site per Ga, while ν min is 0.12 ± 0.04 substitutions per site per Ga. Therefore, if Type II reaction centers had evolved by 3.5 Ga, to account for the divergence of D1 and D2 in one billion years, the initial rate of evolution had to be about 40 times larger than that observed since the MRCA of Cyanobacteria. Table 3 lists the rates of evolution of a diverse number of proteins reported in independent studies in a broad range of organisms. It is found that the rate of evolution of the core subunits of PSII (ν min ) is similar to the rate of other proteins that are billions of years old and highly conserved such as subunits of the ATP synthase, the cytochrome b 6 f complex, or the ribosome. Our estimated rates fall well within the expected range of other cyanobacterial proteins, thus validating our calibration choices and consistent with the expected level of sequence identity as derived from Figure 2. Furthermore, even ν max is found to be within plausible levels when ΔT is about a billion years: slower than known fast evolving proteins such as peptide toxins (Duda & Palumbi, 1999), the influenza virus (Carrat & Flahault, 2007), or proteins of the immune system (Hughes, 1997; Table 3). Table 3. Comparison of rates of protein evolution Rate (Amino acid substitutions per site per Ga) References D0 (ν max ) 5.03 This worka Group 4 D1 and D2 (ν min ) 0.12 This worka K 6.11 This worka L and M 0.61 This worka PsaA, Photosystem I core subunit (Cyanobacteria) 0.09 Sanchez‐Baracaldo ( 2015 AtpA, ATP Synthase CF1 Alpha Chain (Cyanobacteria) 0.08 Sanchez‐Baracaldo ( 2015 PetB, Cytochrome b6 (Cyanobacteria) 0.05 Sanchez‐Baracaldo ( 2015 RpsM, S13 Ribosomal Protein Translation (Cyanobacteria) 0.13 Sanchez‐Baracaldo ( 2015 L1 Ribosomal Protein (Cyanobacteria) 0.11 Sanchez‐Baracaldo ( 2015 ADP‐glucose pyrophosphorylase large subunit (plants) 1.2 Georgelis, Braun, and Hannah ( 2008 PRTB Proteinb (humans) 0.13 Matsunami, Yoshioka, Minoura, Okano, and Muto ( 2011 Alcohol dehydrogenase (ascidians) 0.27 Canestro et al. ( 2002 Protein‐L‐isoaspartyl (D‐aspartyl) O‐methyltransferase (bacteria to humansc ) 0.39 Kagan, McFadden, McFadden, O'Connor, and Clarke ( 1997 Peptide neurotoxins (gastropods) 17 Duda and Palumbi ( 1999 Hepatitis C Virus 3,700 Bukh et al. ( 2002 Influenza virus type A (H1) 5,800 Carrat and Flahault ( 2007 A rate of evolution of 0.12 substitutions per site per Ga, as seen for standard D1 and D2, means that it would take about 8 billion years for each position of the sequence to have changed at least once, assuming—just for the sake of simplicity—that each position has a similar chance of mutating. This very slow rate of evolution is the reason why standard D1 and D2 have changed little in more than 2.0 Ga, as seen in Figure 2. In contrast, the fast evolving peptide neurotoxins of the venomous gastropod Conus have been estimated to evolve at a rate of about 17 substitutions per site per Ga, which is about 140 times faster than D1 and D2. It means that each position in the sequence is expected to have changed at least once after only 60 Ma. In other words, it would take about 60 Ma for two identical neurotoxin peptides to lose all sequence identity. Unlike the slow‐evolving and highly conserved proteins of bioenergetics (including D1 and D2), which are under strong purifying selection, neurotoxins, viruses, or the immune system has evolved to generate change at the amino acid level within a few generations or within the lifetime of the organism. From these comparisons in the rates, it can be concluded that the homodimeric stage (D0) was likely very short‐lived, even when ΔT is in the order of a billion years. Our Bayesian analyses show that the evolution of PSII is better described by a long span of time since the appearance of a homodimeric photosystem (with sufficient power to oxidize water) until the emergence of standard PSII (inherited by all known Cyanobacteria capable of photosynthesis). Notwithstanding, a fast rate of evolution at the earliest stage implies that ν max would increase if ΔT is considered to be smaller, as would be the case for an evolutionary scenario in which PSII evolves rapidly before the GOE after an event of gene transfer of a bacteriochlorophyll a‐based anoxygenic Type II reaction center with low oxidizing power (i.e., before the evolution of tyrosine oxidation) like those found in phototrophic Proteobacteria and Chloroflexi (Shih, Hemp et al., 2017; Soo, Hemp, Parks, Fischer, & Hugenholtz, 2017). We illustrate this concept in Figure 5b and c. These figures depict the change of the rate of evolution as a function of ΔT. This manipulation of the molecular clock can only by accomplished computationally by changing the root prior to younger and younger ages. The increase of ν max with decreasing ΔT can be fitted using a power law function (see Supporting information Table S4 for fitting parameters). This function can then be used to calculate ν max under varying ΔT. For example, Ward et al. (2016) calculated that the planet could have become oxygenated within just one hundred thousand years from the origin of oxygenic photosynthesis. Thus, in a hypothetical scenario in which a non‐photosynthetic ancestor of Cyanobacteria obtained photosynthesis via HGT from an anoxygenic phototroph, and this transferred reaction center evolved standard levels of oxygen evolution within one hundred thousand years (ΔT = 0.1 Ma), then ν max would need to be more than 400 thousand amino acid changes per site per Ga, which is orders of magnitude greater than the rate of evolution of any known protein (Table 3). If ΔT is hypothesized to be 100 Ma instead, this would require a ν max of about 33 amino acid substitutions per site per Ga, while less extreme, it is still twice the rate of short peptide neurotoxins. Unlike neurotoxins, photosynthetic reaction centers are highly regulated large multisubunit membrane protein complexes binding dozens of cofactors at precise orientations and distances to allow efficient photochemistry to occur. Even the “simplest” known homodimeric reaction center is made of at least four protein subunits binding 62 chlorophyll‐derived pigments, two carotenoids, two lipids, four Ca2+ ions, and a Fe 4 S 4 cluster (Gisriel et al., 2017). It is therefore likely that reaction centers have always been under strong purifying selection (Shi, Bibby, Jiang, Irwin, & Falkowski, 2005). In fact, even the scenario in which ΔT is a billion years (ν max = 5.03 ± 1.42) may be an overestimation and could potentially indicate that the age of the duplication event that led to D1 and D2 occurred immediately after the origin of the earliest reaction center. In consequence, the evolution of the core subunits of PSII is more consistent with a scenario in which oxygenic photosynthesis originated long before the GOE as supported by the geochemical record of inorganic redox proxies (Crowe et al., 2013; Havig, Hamilton, Bachan, & Kump, 2017; Planavsky et al., 2014; Wang et al., 2018).

2.4 The D1/D2 duplication is older than the L/M duplication In Figure 3, it can also be seen that the divergence of the L and M subunits occurs after the divergence of D1 and D2. The estimated time for the divergence of L and M is 2.87 ± 0.16 Ga, while the time for the divergence of D1 and D2, as we saw above, is 3.22 ± 0.19 Ga. Because no calibration points were set on L and M, greater levels of uncertainty are observed in this part of the tree: Hence, we refrain from making strong inferences on the timing of specific diversification events within phototrophic Proteobacteria or Chloroflexi and only focus on the general trends. Still, we found the above result puzzling as it would place the roots of PSII before the roots of anoxygenic Type II reaction centers. After a closer inspection, we noted that this effect is caused by faster rates of evolution computed for L and M, relative to D1 and D2, across all time points (Figure 5a and Table 3). In consequence, at a faster rate of evolution, it would take less time for L and M to converge to node K than D1 and D2 to node D0. What is more, the phylogeny of Type II reaction centers, as seen in Figure 1, also shows that L and M branches are overall longer than D1 and D2 branches, which is suggestive of accelerated rates. Longer branches can be caused by a relatively early diversification due to a slow rate of evolution, or alternatively by a late diversification indicating comparatively faster rates. One question remains: Is this result an artifact of phylogenetic reconstruction given the lack of constraints on L and M, or does it have biological significance? That anoxygenic phototrophs are displaying higher rates of evolution than Cyanobacteria is supported by other independent molecular clock studies (Magnabosco, Moore, Wolfe, & Fournier, 2018; Shih, Ward, & Fischer, 2017). For example, the level of sequence identity between L in Roseiflexus castenholzii and L in Chloroflexus sp. Y‐400‐fl, two relatively distant phototrophs of the phylum Chloroflexi, is about 45%. In comparison, the level of sequence identity between standard D1 in Gloeobacter and D1 in Arabidopsis is just under 80%, as we saw above. Shih, Ward et al. (2017) calculated that the MRCA of phototrophic Chloroflexi occurred about 1.0 Ga ago. That date would imply that the L in Roseiflexus and Chloroflexus lost 55% sequence identity since their most recent common ancestor about 1.0 Ga ago (1% loss for every ~18 Ma). If the estimated age reported by Shih, Ward et al. (2017) is correct, that would make the rate of evolution of L in the Chloroflexi about 5.5 times faster than D1 or D2 (1% loss for every ~100 Ma assuming Gloeobacter branched out 2.0 Ga ago). Magnabosco et al. (2018) computed an age for the MRCA of phototrophic Chloroflexi of 2.1 Ga (obtained with their Model D), which would make the rates of evolution of L 2.6 times faster than D1 and D2. The average rate of evolution for L and M calculated by our molecular clock is 0.61 ± 0.19 substitutions per site per Ga, while that for D1 and D2 is 0.12 ± 0.04 (see Table 3). Therefore, according to our clock L and M are evolving on average 4.7 times faster than D1 and D2. This result is nicely within the range suggested by the two independent studies referenced above and confirms that our approach using a single protein produced similar rates of evolution as those computed using a large set of highly conserved concatenated sequences (Magnabosco et al., 2018; Sanchez‐Baracaldo, 2015; Shih, Hemp et al., 2017). It is worth noting here that under every scenario tested in this study, the duplication leading to D1 and D2 was always found to be the oldest node after the root. The late divergence of L and M relative to D1 and D2 does not seem to be artifactual but a consequence of the apparent faster rates of evolution measured in anoxygenic phototrophs. A ramification of this is that the hypothesis that Cyanobacteria obtained a Type II reaction center via HGT from an anoxygenic phototroph, right before the GOE, becomes untenable because D1 and D2 would predate L and M.