Protein function is generally constrained by selective parameters that can inhibit evolutionary potential. It has thus been difficult to determine how novelties arise. Zheng et al. allowed bacterial populations to accumulate mutations and then used directed evolution to evolve green fluorescent protein function from a gene that expressed yellow fluorescent protein (see the Perspective by Lee and Marx). Protein alternatives could evolve in cases where cryptic alleles—selectively neutral or mildly deleterious genetic variants with no apparent phenotypic differences—were present in the population. Thus, cryptic alleles provide an evolutionary bridge between diversity and selection and facilitate the generation of novel adaptations.

Cryptic genetic variation can facilitate adaptation in evolving populations. To elucidate the underlying genetic mechanisms, we used directed evolution in Escherichia coli to accumulate variation in populations of yellow fluorescent proteins and then evolved these proteins toward the new phenotype of green fluorescence. Populations with cryptic variation evolved adaptive genotypes with greater diversity and higher fitness than populations without cryptic variation, which converged on similar genotypes. Populations with cryptic variation accumulated neutral or deleterious mutations that break the constraints on the order in which adaptive mutations arise. In doing so, cryptic variation opens paths to adaptive genotypes, creates historical contingency, and reduces the predictability of evolution by allowing different replicate populations to climb different adaptive peaks and explore otherwise-inaccessible regions of an adaptive landscape.

Cryptic genetic variation is standing genetic variation that does not normally contribute to heritable phenotypic variation in a population but that can bring forth phenotypic variation after environmental change or genetic perturbation (1, 2). Cryptic variation exists because phenotypes are to some extent robust to genetic change (3–6). Because of its potential role in adaptive evolution, cryptic variation has attracted widespread interest (7–17), but supporting experimental evidence is limited (1, 17–19). One distinguishing feature of cryptic variation is that the conditions inducing its phenotypic effects are rare or absent in a population’s history. In consequence, it can be protected from selection until a new environment arises in which cryptic variation may give rise to new and potentially beneficial phenotypes (1, 2). The molecular mechanisms of adaptation under cryptic variation are difficult to study for complex phenotypes of whole organisms, because their genetic basis often remains elusive (17, 20). Such mechanisms are better studied with simple and tractable systems, such as evolving proteins. Many mutations in proteins interact epistatically (i.e., nonadditively), which can render adaptive landscapes rugged and multipeaked (21–26). An evolving population’s location on a rugged adaptive landscape influences which of these peaks are accessible (26–28). These observations hint that cryptic variation may help populations of evolving proteins enter regions of an adaptive landscape that would otherwise remain inaccessible.

Results

To create cryptic genetic variation, we subjected each of four replicate populations of yellow fluorescent protein (YFP; populations VC, with C for cryptic) to four rounds (“generations”) of directed evolution subject to stringent stabilizing selection to maintain yellow fluorescence (phase I; Fig. 1 and fig. S1). Specifically, we allowed ~5 × 106 YFP variants in each generation and in each replicate population to evolve, and we subjected these variants to PCR mutagenesis (0.84 amino acid–changing mutations per YFP molecule per generation; tables S1 and S2). In every generation of phase I, we allowed only those 20% of cells of evolving populations to survive whose yellow fluorescence intensity lay in a narrow interval around the median of ancestral YFP (Fig. 1) (29). Such stringent stabilizing selection allows the accumulation of cryptic variation, because only the mutations (or their combinations) that have little effect on yellow fluorescence can persist. We then initiated phase II, in which we subjected the same populations to four generations of stringent directed evolution toward green fluorescence (Fig. 1). As controls, we also subjected four populations (called V0, for zero initial cryptic variation) that started from identical ancestral YFP molecules to four generations of evolution toward green fluorescence. We then compared the change in green fluorescence intensity during phase II in populations VC with that of the control populations V0. Populations VC reached significantly higher green fluorescence during three of the four generations of evolution in phase II (Fig. 2A), and they adapted approximately three times faster during the first generation of phase II (fig. S2A). In addition, populations VC more rapidly evolved a green (512-nm) emission peak than populations V0 (Fig. 2B). At the evolutionary end point, three of four VC populations showed significantly greater green fluorescence than the four V0 populations (two-way analysis of variance [F 7,16 = 46.5, P = 1.99 × 10−9], post hoc Tukey’s test [P < 0.05 for VC replicates 1, 2, and 4 relative to the four V0 populations]) (Fig. 2C and table S3). In sum, the genetic variation accumulated in phase I facilitated the evolution of green fluorescence during phase II.

Fig. 1 Experimental evolution of YFP. In phase I, we subjected four replicate populations of YFP to four generations of directed evolution under stabilizing selection for the native yellow fluorescence, allowing only those ~20% of cells closest to the median (dashed vertical line) of ancestral yellow fluorescence [VC, λex = 488 nm and λem = 530 ± 15 nm (29)]. In phase II, we subjected these populations to four further generations of strong directional selection for green fluorescence, allowing only 0.01% of cells to survive [λex = 405 nm and λem = 525 ± 25 nm (29)]. As controls, we subjected four populations (V0) consisting of initially identical YFP molecules to the same stringent directed evolution for green fluorescence (29).

Fig. 2 Cryptic variation leads to faster color change and higher fluorescence. (A) Fold change of yellow and green fluorescence intensities relative to the ancestral YFP during phase II evolution (29). Error bars represent 1 SEM, from four replicate populations (thin lines). Note the logarithmic vertical scale.*P < 0.05; **P < 0.01 (one-sided t tests with Holm adjustments). (B) Emission spectra (shown as mean values of four replicate populations) of evolving populations V0 and VC at the new excitation wavelength (405 nm) in phase II (29). The vertical axes indicate the relative fluorescence intensity at a given emission wavelength (horizontal axis) relative to the maximal fluorescence intensity at the emission peak 512 nm (green vertical dashed line). (C) Fold change of green fluorescence intensity relative to the ancestral YFP for each replicate population at the evolutionary end point. Error bars denote SD (n = 3) (29).

To study why this genetic variation facilitated adaptive evolution, we used single-molecule real-time sequencing (SMRT) to genotype ~500 to 1000 evolved variants for each replicate population and for each generation (table S4). We first noticed that VC populations were more diverse than V0 populations throughout phase II. Specifically, they harbored on average more mutations per individual molecule (Fig. 3A). They also showed a broader distribution of mutations per individual molecule (fig. S2B), as well as greater overall genetic diversity (fig. S2C) (29). Additionally, the four VC populations diverged to a much greater extent from each other (Fig. 3B and fig. S2D).

Fig. 3 Cryptic variation helps explore diverse high-fluorescence genotypes. (A) Number of amino acid changes per protein sequence based on genotyping hundreds of evolved variants in each population using SMRT sequencing. *P < 0.05; **P < 0.01; ****P < 0.0001 (one-sided t tests). Thick lines indicate means for populations V0 or VC over four replicate populations (thin lines), and error bars denote SEM. (B) Average number of amino acid differences (at the evolutionary end point) between all protein sequences in the labeled populations (29). (C) Cryptic variation helps explore diverse genotypes. Each circle (node) represents a genotype that has been observed during evolution. An edge connects two genotypes if they differ in a single amino acid. Colored circles represent genotypes that exclusively occur in a single replicate population, where circle area (logarithmic scale) corresponds to genotype frequency. White and gray circles indicate genotypes that were not observed in populations at the end point or that were observed in at least two replicate populations at the end point, respectively. Sizes of gray circles correspond to the highest frequencies of the corresponding genotypes in those replicate populations. Dashed ovals circumscribe each labeled high-fluorescence genotype, together with the genotypes composed of subsets of its constituent mutations. (D) The frequency of constituent mutations of typical and alternative genotypes in each replicate population at the evolutionary end point. The alternative genotypes A1, A2, A3, and A4 comprise the unique mutation combinations F65S/K102R/N145S/V164A, F72I/K167E/I172V, I129T/K141R, and F72C/I168V, respectively. In addition, each of these genotypes also harbors the mutations of T (G66S+Y204C), and genotype A3 also harbors the mutation F47L. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; E, Glu; F, Phe; G, Gly; I, Ile; K, Lys; L, Leu; N, Asn; R, Arg; S, Ser; T, Thr; and V, Val. Mutations (e.g., G66S) indicate the original residue (G66) and the residue created by the mutation (S). (E) Fold change in green fluorescence intensity of each typical (blue) or alternative (red) genotype relative to the ancestral YFP. [Note that here A1 does not contain the mutation K102R, because K102R does not significantly improve green fluorescence (table S5).] Error bars denote SD (n = 3 or 6). *P < 0.05; ***P < 0.001 (one-sided t tests with Holm adjustments).

We then studied the dynamics of polymorphisms in each replicate population during phase II (fig. S3A) (29) and observed that two mutations (G66S and Y204C; see the legend for Fig. 3D for a full list of amino acid abbreviations and an explanation of mutation descriptions) swept through all replicate V0 and VC populations, with two other mutations (F65L or F47L) achieving high or medium frequency (>10%) in two or more V0 populations and in one or two VC populations. Because of their ubiquity, we refer to these four mutations as typical mutations (fig. S3B). At the evolutionary end point, most of these mutations co-occurred in three similar and high-frequency genotypes that share the two mutations G66S and Y204C and that harbor one additional mutation each, i.e., F65L, F47L, or L43M. We refer to these genotypes as T1, T2, and T3, or typical genotypes (Fig. 3, C and D), and to the combination of the G66S and Y204C mutations as genotype T.

Populations VC evolved differently from populations V0. First, 17 alternative mutations attained a frequency of more than 10% in VC populations but in none of the V0 populations [except the mutation V164A, which reached a frequency of 10.9% in V 4 0 (fig. S3A)]. Also, typical genotypes dominated only one replicate VC population (number 2), in contrast to their importance in V0 populations. The remaining populations were dominated by one or two of four other alternative genotypes (A1 to A4), which contained some combination of 11 alternative mutations in the genetic background T (Fig. 3, C and D). We measured the green fluorescence intensity of the three typical genotypes as well as of the four alternative genotypes (29). Three of the alternative genotypes exhibited greater green fluorescence than all typical genotypes (Fig. 3E).

In sum, during directional selection for green fluorescence: (i) More diverse genotypes attain high frequency in populations VC than in populations V0; (ii) different alternative genotypes dominate each of three replicate populations VC, and (iii) three of the four alternative genotypes show significantly higher green fluorescence than all three typical genotypes.

Because VC populations evolved faster than V0 populations in phase II (Fig. 2A), we suspected that some of their adaptive mutations or genotypes had already accumulated in phase I. We thus studied the phase I evolutionary dynamics of the 4 typical mutations and 11 alternative mutations (fig. S4A). All 15 mutations were already present above our phase I detection limit of 0.064 to 0.16% (29), and 11 of the 15 mutations reached frequencies between 0.5 and 2.5% in at least one of the VC populations. This demonstrates that the variants accumulated in phase I are relevant to the exploration of different high-fitness genotypes in phase II. We performed additional directed evolution experiments starting from the YFP ancestor but in the complete absence of selection, which allowed us to determine how fast individual variants would increase in frequency through mutation alone. High-throughput sequencing showed that the frequency of all but one (F47L) of the mutations had not increased significantly more than expected with mutation pressure alone during phase I (two-way analysis of covariance with Holm adjustment [P = 9.11 × 10−5]) (fig. S4B). Specifically, 93.3% (14 of 15) of the genetic variants that were involved in adaptive evolution during phase II were not subject to positive selection in phase I. These observations demonstrated that most genetic variation that was adaptive in phase II accumulated cryptically during phase I.

Because the typical and alternative genotypes were also the genotypes with the highest green fluorescence in each replicate population at the evolutionary end point (Fig. 3E, figs. S5 and S6, and table S5), we wanted to identify the accessible evolutionary paths to these genotypes (fig. S5) (29). Each step on such a path involves a single point mutation, and we distinguish two kinds of steps, an accessible mutational step that increases green fluorescence significantly and an inaccessible step that does not. We call a path inaccessible if it contains at least one inaccessible step. We first engineered all mutations leading to each of the typical genotypes (T1 to T3) into the ancestor and measured their green fluorescence to determine path accessibility. No less than one-third of paths to the typical genotypes are accessible (Fig. 4, A and B).

Fig. 4 Cryptic variation enables the exploration of alternative high-fluorescence genotypes. (A) Accessibility of mutational paths to two representative genotypes, the typical genotype T1 and the alternative genotype A1. [Note that the mutation K102R is not shown because it does not significantly improve A1’s green fluorescence (table S5).] Blue solid lines indicate an accessible mutational step, which increases green fluorescence significantly, and dashed lines indicate an inaccessible step, which does not increase green fluorescence significantly. Solid red lines indicate a conditionally accessible step that significantly increases green fluorescence in the genetic background where it occurs but where the ancestral YFP must first experience one or more inaccessible steps to create this kind of genetic background. We call a path inaccessible if it contains at least one inaccessible step, and we consider a difference in green fluorescence between genotypes significant if P < 0.05 (two-sided t test with Holm adjustment). (B) Percentages of accessible mutational paths to typical genotypes and to alternative genotypes, as well as accessibility inferred from mutation rates and genotype frequencies (29).The right-most entry indicates which populations harbored the genotype. (C) Evolutionary trajectories as indicated by frequency changes of mutants G66S and Y204C, genotype T, of all high-fitness genotypes that had significantly higher green fluorescence than genotype T (Fig. 3E and fig. S5), as well as those of all intermediate genotypes (averaged) leading to these high-fitness genotypes that are inaccessible through selection for green fluorescence alone. Error bars denote SEM (n = 4). Each circle indicates data from one replicate population. (D) An illustration of how cryptic genetic variation can accelerate adaptation and provide access to diverse adaptive peaks (see text for details).

We then engineered and analyzed the mutations leading to the alternative genotypes A1 to A4 and found that these genotypes are much less accessible (Fig. 4, A and B, and figs. S5 and S6) (29). For example, genotype A2, which had the highest green fluorescence among all typical and alternative genotypes, can be accessed by only 3.3% of all mutational paths (Fig. 4B and fig. S6) (29). The reason is that two mutations in this genotype (F72I and I172V) enhance green fluorescence only after the arrival of two other mutations (G66S and Y204C), and the remaining constituent mutation (K167E) only becomes beneficial once the four other mutations have arrived. An even more extreme example is the alternative genotype A1, because no path to it is accessible. Four of its six constituent mutations do not increase green fluorescence in the wild-type background or in the presence of the remaining two mutations, which suffices to block each path (Fig. 4A and fig. S5) (29).

We next examined our sequence data to study the order of mutations by which evolving populations approach those high-fitness genotypes that have the highest frequency in any one generation and population (Fig. 4C). All four V0 populations followed similar mutational paths to each of the three typical genotypes, T1 to T3 (Fig. 4C). They first acquired either mutation G66S or Y204C, which arose to an average frequency of 20.1% after generation 1 of phase II (II-1). Next evolved the genotype T (G66S+Y204C), which reached a frequency of 9.2% one generation later. After that arose genotypes T1, T2, and T3, which incorporate the additional mutations F65L, F47L, and L43M, respectively. They show even higher green fluorescence (fig. S5) and reached a frequency of 18.2% in generation 3 (Fig. 4C). Inaccessible genotypes play no major role in these evolutionary dynamics, because their frequency remains low in populations V0 (Fig. 4C).

These evolutionary dynamics differ from those observed in populations VC with cryptic variation. Here, intermediate genotypes that would be inaccessible during selection for green fluorescence steadily increased in frequency before such selection started. At the end of phase I, the collection of all such genotypes had already reached a frequency of 16.9% in Vc populations (Fig. 4C). These otherwise-inaccessible intermediate genotypes served as stepping-stones toward high green fluorescence in phase II, as shown by a transition from inaccessible intermediate genotypes to high-fitness genotypes early in phase II (Fig. 4C). Specifically, inaccessible intermediate genotypes reached a frequency of 26.5% in VC populations in the first generation of phase II, which enabled a rapid increase in the frequency of high-fitness genotypes to 28.7% only one generation later.