Understanding the genetic changes that underlie phenotypic functional innovations is a fundamental goal in evolutionary biology, giving insight into species’ past, present, and future evolutionary trajectories. One important unresolved question is whether such genetic changes typically affect protein expression or protein structure. Here we use large-scale laboratory evolution with bacteria to quantify the types of genetic changes that occur during functional innovation. We show that whether these changes affect protein expression or protein structure depends on which cellular functions are being selected upon. We then show that changes affecting protein expression occur in qualitatively different sets of genes from changes affecting protein structure. These results show that using functional knowledge it is possible to predict the course of evolution.

Determining the molecular changes that give rise to functional innovations is a major unresolved problem in biology. The paucity of examples has served as a significant hindrance in furthering our understanding of this process. Here we used experimental evolution with the bacterium Escherichia coli to quantify the molecular changes underlying functional innovation in 68 independent instances ranging over 22 different metabolic functions. Using whole-genome sequencing, we show that the relative contribution of regulatory and structural mutations depends on the cellular context of the metabolic function. In addition, we find that regulatory mutations affect genes that act in pathways relevant to the novel function, whereas structural mutations affect genes that act in unrelated pathways. Finally, we use population genetic modeling to show that the relative contributions of regulatory and structural mutations during functional innovation may be affected by population size. These results provide a predictive framework for the molecular basis of evolutionary innovation, which is essential for anticipating future evolutionary trajectories in the face of rapid environmental change.

One of the most important questions in evolutionary biology concerns the molecular mechanisms that underlie functional innovations. These changes are often polarized into two classes: those that affect protein structure and those that affect protein expression level. Both of these classes have been shown to play important roles across a wide range of taxa, from vertebrates (1, 2) to bacteria (3, 4), and their relative importance has been the topic of considerable discussion (5⇓⇓⇓⇓⇓⇓–12). Significantly, many previous studies have addressed these questions by focusing on single instances of functional innovation (13⇓⇓–16) or selective regimes (17⇓⇓⇓⇓–22). However, to identify general principles, it is necessary to study evolutionary innovation for a large number of different functions in parallel. Indeed, the fact that only a small number of examples exist has resulted in few hypotheses being put forth that identify general characteristics of the molecular changes underlying functional innovation. One prominent hypothesis states that if the development of a novel trait is spatially or temporally limited, then innovation frequently occurs through changes in regulation (23, 24). Whether there are general patterns beyond this is not well-established.

Here we used an experimental system that allows the analysis of a large number of independent cases of evolutionary innovation and investigation of the underlying genetic changes. We worked with a collection of 87 strains of Escherichia coli that each had a deletion of one gene encoding a different metabolic function (SI Appendix, Table S1). Each of these deletions resulted in an inability to grow in minimal glucose media. Then, for each of these 87 deleted metabolic functions, we used experimental evolution to select for novel functionality that could replace the functionality that was lost through gene deletion. As an example, one of the deleted genes was serB, a phosphoserine phosphatase that catalyzes the final step in serine biosynthesis. Regaining the ability to grow in the absence of this gene requires the evolution of a new function that allows sufficient amounts of serine to be made to support cell growth. Although this experimental system does not necessarily recapitulate natural evolutionary scenarios, many aspects of this design are reflective of ecological and evolutionary features found in more natural circumstances. For example, the loss of specific genes or functions may occur through drift in small populations or through selection in the face of antagonistic pleiotropy (25). One well-established example of this in E. coli is that many natural isolates have null alleles at the locus for the stress response sigma factor rpoS; this is thought to be due to tradeoff between stress resistance and growth rate (26). Similarly, the experimental design here provides an evolutionary scenario very similar to that experienced by microbes during the evolution of abilities that allow the degradation of nonnatural compounds, such as organic chlorides (27) (SI Appendix).

Our experimental design provides two significant advantages that cannot easily be realized in settings outside of the laboratory. First, we can study how a large number of different types of metabolic functions arise, and thus gain general insights into the process of evolutionary innovation. Second, we have information on the cellular context of the evolved novel metabolic functionality, and can thus analyze whether the mechanisms of evolutionary innovation depend in a predictable manner on the specific characteristics of the pathway or interaction networks in which the function acts (12).

Results and Discussion

We began the experimental evolution by establishing five replicate cultures of each of the 87 deletion genotypes, yielding a total of 435 independent cultures. We grew large populations of cells in rich media, and used serial transfer in glucose minimal media to evolve these populations for 28 transfers, or ∼145 generations (Materials and Methods). To ensure the transfer of cells that evolved even low levels of novel metabolic activity, we severely limited the rate of serial dilution, imposing only 5-fold dilutions for the first 10 transfers and 100-fold dilutions for the following 18 transfers. We allowed 48 h of growth in-between transfers.

At the end of this period, 68 out of the 435 populations recovered the ability to grow in glucose minimal media. These 68 populations encompassed 22 out of the 87 different deletion genotypes, and the functions encoded by these 22 deleted genes were distributed throughout the E. coli metabolic network (SI Appendix, Fig. S1). For a small number of deletion genotypes, all five replicate populations regained the ability to grow. However, for the majority of deletion genotypes in which growth recovered, between one and four replicate populations regained growth ability (Fig. 1A). The pattern of recovery that we observed was consistent with a scenario in which a small number of functions are easy to recover, whereas a much larger number of functions are difficult or perhaps impossible to evolve. Under such a scenario, it is possible that increases in the population size or mutation rate might result in more functions being recovered.

Fig. 1. Sixty-eight out of 435 populations evolved the ability to compensate for the function of the deleted gene. (A) Growth recovery was not deterministic. For some deletion genotypes, five out of five replicates recovered growth; for the majority, between one and four replicates recovered growth. Three hundred sixty-seven populations went extinct during the evolutionary process; these are not shown. (B) Novel functions that were related to building block biosynthesis were more difficult to evolve. The white bars indicate those deletion genotypes in which novel functionality evolved; in gray are those for which no novel function was evolved. For all categories except building block biosynthesis, novel functions evolved that compensated for the majority of deleted functions.

We found, furthermore, that the probability of growth recovery depended on the metabolic function that had been deleted. We classified the 87 deleted genes as acting in one or more of four metabolic functional categories (28): carbon compound utilization, energy metabolism, central intermediary metabolism, and building block biosynthesis. The functions of proteins that acted in building block biosynthesis were much less likely to be replaced than the other types of metabolic functions (Fisher’s exact test, P = 0.0002, odds ratio 0.09; Fig. 1B), indicating that new functionality in building block biosynthesis was more difficult to evolve than other types of new functionality.

We selected one clone from each of the 68 populations in which growth recovered and determined its maximum growth rate. Fifty-seven of these clones exhibited detectable growth as assayed by changes in optical density (Fig. 2 and SI Appendix, Materials and Methods); in the remaining 11 populations, growth could be detected only as changes in colony-forming units over time. Although many clones exhibited growth rates similar to that of the wild type, lag times were considerably longer (SI Appendix, Fig. S2), suggesting that in most cases the functionality of the deleted gene had not been fully replaced. Notably, we found a striking level of parallelism in growth rates, with clones from replicate populations evolving similar growth rates (Fig. 2). This set of 68 clones comprises a large set of independent instances in which functional innovation has evolved to confer novel growth abilities. We propose that this provides a model system for investigating the molecular mechanisms underlying the evolution of new functionality and for identifying whether the genetic changes are predictably regulatory or structural in their nature.

Fig. 2. Growth rates of recovered clones were more similar for lineages derived from the same deletion genotypes. Each point shows the mean estimated doublings per hour for each clone (±SEM); clones are grouped by genotype (x axis) and colored gray and white to emphasize the groupings. The black solid line indicates the growth rate of the ancestral BW25113 strain. Six recovered clones did not exhibit detectable growth as assayed by OD 600 and are indicated as having growth rates of 0. For four deletion genotypes, no clones exhibit detectable growth as assayed by OD 600 ; these deletion genotypes are not shown here. The number below each genotype indicates the probability of observing a set of clones with growth rates at least as clustered as those that we observe (SI Appendix). Cases in which only one lineage recovered for a genotype are indicated with NA, as no clustering probability could be calculated. Each point is based on three biological replicates, except for 11 cases in which one replicate was excluded due to no growth being observed (SI Appendix).

We determined the genetic changes that occurred during experimental evolution using whole-genome sequencing. Using the same set of 68 clones used in the quantification of growth rates, we identified 238 genomic changes in total (SI Appendix and Dataset S1). We focus here on those mutations that are most likely to provide phenotype-specific novel functionality. For this reason, we excluded from all subsequent analyses those mutations that have been observed in other laboratory evolution studies or in the ancestral deletion genotypes (these are likely to be general laboratory adaptations; SI Appendix, Tables S2 and S3). Excluding this class yielded a total of 210 mutations (Fig. 3 A and B) that may have specifically been responsible for novel functionality, compensating for the role of the deleted genes. We term the genes affected by these mutations the “recruited” genes (Dataset S1), and propose that by changing protein expression or structure they confer crucial functional innovations that are necessary for growth recovery on minimal media. Notably, we found that clones isolated from populations in which a large number of replicate lineages recovered contained fewer mutations (Spearman’s rho = −0.38, P = 0.001; Fig. 3C). One interpretation of this observation is that novel functions that required fewer mutational steps evolved with higher probability. Alternatively, it is possible that certain types of deletions lead to increased mutation rates, and as a consequence to a higher number of fixed mutations.

Fig. 3. Mutational events that occurred during the evolution of novel functionality affected both protein structure and expression level. (A) Mutations classified by type. The overwhelming majority of changes were point mutations (Left), followed by insertions sequence (IS) element-mediated changes, small indels, large amplifications (larger than 200 bp), and large deletions (larger than 200 bp). (B) Mutations classified by functional effect (SI Appendix). We inferred that the majority of changes affected protein structure, although more than a fifth of these resulted in altered reading frames or the incorporation of premature stop codons. Almost 40% of changes were inferred to be regulatory, that is, as directly affecting protein expression (in contrast to indirect effects, which may occur via structural changes in transcription factors or other mechanisms). (C) Deletion genotypes in which few replicates recovered tended to contain clones with more mutations. Each box shows the numbers of mutations found within clones, classified by the number of replicate lineages that recovered (e.g., for three deletion genotypes, four replicate lineages recovered; the number of mutations in each of the 12 sequenced clones is shown). The boxplots indicate the median, first and third quartiles, and the extreme values within the category. (D) Mutations in evolved clones often increased predicted transcriptional output due to changes in σ70 binding. For each unique intergenic mutation, we predicted the transcriptional output for the ancestral sequence and the evolved sequence (SI Appendix, Materials and Methods). The dotted black line indicates unchanged transcriptional output. The annotated black points are the promoters shown in E. (E) Random mutations that result in increased transcriptional output are rare. We predicted transcription (σ70 binding) for all point mutations and 1-bp indels in the promoter region surrounding the intergenic mutations plotted in D. Four examples are shown here. The predicted transcriptional output of the ancestor is shown as a red line; that of the same promoter region with the evolved mutation is shown as a green line. Most random mutations have little effect on transcription; however, in several of the evolved clones, the observed mutation was among those mutations with the largest possible predicted effect on transcription (SI Appendix, Fig. S4). Clockwise from the top left, the deletion genotypes and recruited genes are ∆argC and proB; ∆glyA and cycA; ∆pabA and pabB; and ∆ptsI and glk. The numbers in the top left of each panel indicate the fraction of all one-mutant neighboring promoters that have a predicted transcriptional output that is equal to or lower than the observed mutant. Note that both the x and y axes are on a log scale. (F) Mutations that affect translation both increase and decrease the predicted translation initiation rate. Translation initiation rates were predicted using a biophysical model (34) for the ancestral and derived alleles for all intergenic mutations, with the black dotted line indicating no change.

The relative numbers of mutational types that we observed suggested that the vast majority were positively selected. Within coding regions, the ratio of nonsynonymous mutations per nonsynonymous site to synonymous mutations per synonymous site (K a /K s ) was 6.0 (Fig. 3A and SI Appendix, Materials and Methods); the analogous ratio for intergenic sites (K i /K s ) was 11.8 (Fig. 3A). In both cases, if the two classes of sites had evolved neutrally, the expected ratio is 1. This suggested that there was pervasive strong positive selection for point mutations that caused amino acid changes, which likely lead to structural changes in proteins (SI Appendix, Fig. S6). Furthermore, this provides evidence that positive selection for point mutations at intergenic sites was even more prevalent; such mutations are likely to lead to regulatory changes. We also found evidence for positive selection on other types of mutations that likely affect protein levels through regulatory changes, including a 3.5-fold enrichment of indels in intergenic regions compared with coding regions, four mutations in RNA molecules (two small RNAs, and two in the tRNA-processing rnpB), amplifications, and transposon insertions (29) (Fig. 3 A and B and SI Appendix, Fig. S7, and Table S4). Finally, we found that parallel changes were common. Certain amplifications occurred in all clones that recovered for certain deletion genotypes (SI Appendix, Fig. S3). In some cases, these amplifications increased the genome size by more than 106 bp (∼25% of the genome), a change that is likely to be deleterious unless mitigated by considerable beneficial effects. Such extensive parallelism was also observed for point mutations. For example, all four lineages that recovered functionality for a deletion of carA contained a mutation 16 bp upstream of the start codon of carB, a mutation that is likely to increase protein expression level by affecting translation initiation (Fig. 4).

Fig. 4. Intergenic mutations confer only moderate changes in protein expression. Mean fluorescence levels (±SEM) conferred by chromosomal copies of intergenic regions containing the ancestral (gray points) and evolved alleles (white points). Each pair of alleles is annotated with the mutational change that occurred, with the number indicating the position, in base pairs, from the first base pair of the downstream ORF. The x axis is annotated with the recruited genes whose promoters were affected by the mutation (first row) and the deletion genotype in which the mutation arose (second row). The ORFs of both metJ and metB are downstream of a single intergenic region (in opposite directions). Thus, the sequence contained in these constructs is identical, but GFP expression is driven by promoters on opposite strands. The arrows emphasize the direction of expression change. We predicted significant expression changes in carB, panD A-12G, avtA, and glnL of 12.4-, 2.7-, 2.4-, and 0.64-fold, respectively. For all other genotypes, we predicted no significant changes based on changes in σ70 binding or changes in ribosome binding. Note that the sensitivity of the assay means that very low expression levels (i.e., the avtA and glnL alleles) cannot be accurately measured. Thus, the fold change in expression, particularly for these strains, is likely larger than what we measured.

These genomic analyses yielded the following insights into the relative contribution of structural and regulatory changes during the early stages of functional innovation in bacteria: We found that during the early stages of functional innovation, structural mutations are more common. Sixty-one percent of all observed point mutations led to amino acid changes, and thus likely to structural change (Fig. 3B). Although regulatory mutations were less common, they were strongly overrepresented. Whereas only 12.2% of the genome of E. coli is intergenic, 25% of the point mutations and 43% of all indels were located in intergenic sites. This means that mutations that occurred in intergenic regions, and which potentially change protein expression levels, were approximately three times more likely to increase in frequency compared with mutations in coding regions. To further test the adaptive nature of these potential regulatory mutations and to understand their molecular consequences, we examined how they affected transcriptional and posttranscriptional processes.

Computational analyses showed that many of these potential regulatory mutations increased transcriptional output or translation initiation rates. First, we analyzed changes in transcription using an approach based on information theory (30⇓–32) to predict how the regulatory mutations (point mutations and indels) affected binding of the housekeeping sigma factor σ70 (SI Appendix, Materials and Methods). In 11 out of 35 cases (30.5%), we found that the mutation increased σ70 binding strength, by 1.1- to 6.5-fold (Fig. 3 D and E and Materials and Methods). This fraction is much higher than what is expected by chance: Only 3.6% of all random mutations are predicted to increase binding by more than 10% (Fig. 3E and SI Appendix, Fig. S4 and Table S6). As binding is directly proportional to transcriptional output (33), this supports the hypothesis that many of the intergenic mutations resulted in increased protein expression levels. We next used a biophysical model to predict changes in translation initiation rates. Of the eight mutations that occurred within 35 bp upstream of the start codon, and which may have affected translation (34), four were predicted to increase translation initiation, whereas three were predicted to decrease it (Fig. 3E). This suggests that mutations affecting protein expression often incurred their beneficial effects through increasing transcriptional output, and less often through increasing the rate of translation initiation.

We then experimentally quantified the effects of these mutations on protein expression. We placed single copies of the intergenic region containing the ancestral and evolved alleles as translational fusions with GFP in a neutral location in the chromosome (35) (Materials and Methods). In the four cases for which we also computationally predicted a change in protein level, expression changed in the direction anticipated. Surprisingly, we found that in many cases, expression levels increased by only moderate levels (Fig. 4). Despite these weak effects on protein expression level, as pointed out above, there was strong evidence that many of these mutations were adaptive. Together, these data suggested that the regulatory mutations that occurred during the evolution of novel functionality often conferred adaptive benefits through the creation of novel regulatory elements, but that these elements frequently effected only moderate changes in protein expression level.

To test whether changes in protein expression alone could provide novel functionality that rescues the lethal phenotypes and to understand the extent of expression change needed, we selected eight deletion genotypes (SI Appendix, Materials and Methods) in which several evolved clones appeared to have single large-effect regulatory mutations, suggesting that the lethal phenotypes could be rescued by increased expression of a single ORF; this criterion largely excluded evolved clones with structural changes that were highly paralleled in other lineages and evolved clones that contained two or more regulatory changes. We then transformed the eight ancestral deletion genotypes with plasmids containing the ORFs of the recruited gene under an Isopropyl β-D-1-thiogalactopyranoside (IPTG)-regulated promoter (36) (Materials and Methods) and observed growth for 72 h. In four out of the eight cases, high growth rates were observed when expression of the ORF was induced using 100 μM IPTG (SI Appendix, Fig. S5), showing that increased expression of this single ORF was sufficient to provide the new functionality. In three cases, growth rates displayed threshold behavior, with high growth rates occurring at one concentration of IPTG and small decreases in IPTG level resulting in little or no growth at all. This suggests that subtle changes in protein expression level can have profound effects on growth (3), a result that provides an interesting corollary to previous data showing that small changes in protein structure can exert large effects on cell growth (37).

Our results indicate that structural and regulatory changes are both important for the evolution of new functions, but show that regulatory mutations are consistently more likely to be positively selected. We next sought to develop a more refined predictive framework to understand how the genetic changes that occur during evolutionary innovation depend on functional or demographic parameters. First, we asked whether the relative importance of structural and regulatory mutations depends on cellular contexts or pathways. Using the same categorical metabolic functional classifications outlined above (carbon compound utilization, energy metabolism, central intermediary metabolism, and building block biosynthesis), we found that deletion genotypes involved in building block biosynthesis were more likely to be compensated for by regulatory mutations (Fisher’s exact test, P = 0.006, odds ratio 2.5; Fig. 5A). Thus, we found that the enrichment of regulatory mutations described above was dependent on cellular context, revealing our first predictive pattern in the path of molecular evolution during functional innovation.

Fig. 5. Mechanisms promoting novel functionality are dependent on cellular context. (A) The relative enrichment of regulatory and structural mutations is dependent on cellular function. Mutations that contribute toward novel functionality related to building block biosynthesis are more enriched for regulatory mutations, with 48% of all mutations being regulatory. In contrast, in other pathways, only 17–23% are regulatory. Green bars indicate regulatory mutations; gray bars indicate structural mutations. The numbers above the bars indicate the number of deletion genotypes within the category. (B) Regulatory mutations recruit proteins that act in functions related to the missing function. We calculated the shortest network distance between pairs of genes from high-confidence links in the STRING database (38). Green points indicate the network distances between the deleted gene and the genes recruited for functional compensation. Black points indicate the expected network distance between the set of deleted genes and a randomly selected recruited gene based on 5,000 randomizations of protein pairs. The last bin includes all gene pairs with a distance of nine or more, or which are not connected in the network. Genes recruited via regulatory mutations are on average more than three network links closer than expected by chance (Wilcoxon rank-sum test between observed and randomized network distances; n = 34; P = 2.5e-15). (C) Structural mutations recruit proteins that act in functions unrelated to the missing function. Gray points indicate the network distances between the deleted gene and the genes recruited for functional compensation. Black points indicate the expected network distance between the set of deleted genes and a randomly selected recruited gene based on 5,000 randomizations of protein pairs; genes recruited via structural change mutations are on average only 0.6 network links closer than expected by chance (Wilcoxon rank-sum test; n = 85; P = 4.0e-5).

Having established that the types of mutations that occur are affected by the nature of the novel metabolic function that is required, we asked whether these mutations themselves affect predictable cellular functions. We measured the shortest physical and functional proximity [network distances (38)] between a deleted gene and the genes recruited to compensate for its function (Materials and Methods), and found that genes recruited via regulatory changes were, on average, more than 3 network links closer to the deleted genes than would be expected in a randomized network (Fig. 5B; P = 2.5e-15, n = 34, Wilcoxon rank-sum test). In contrast, genes recruited through structural changes were, on average, only 0.6 links closer to the deleted genes than in a randomized network (Fig. 5C; P = 4.0e-5, n = 85, Wilcoxon rank-sum test). Thus, proteins that confer novel functionality through regulatory change tended to affect proteins that function within the pathways that are most relevant to the missing (deleted) function, whereas proteins that are affected by structural mutations tended to be located in unrelated pathways.

Finally, we asked how demographic parameters might influence the relative contribution of regulatory and structural mutations. We have shown that structural mutations were more frequently selected for during functional innovation but regulatory mutations were more strongly enriched than expected on the basis that intergenic regions provide a relatively small mutational target. This differential enrichment yields a general insight into the nature of regulatory and structural mutations: Either regulatory mutations have a higher probability of being beneficial than structural mutations, or regulatory mutations have larger beneficial effects. A simple population genetic model suggests that it should be possible to disentangle these two hypotheses by testing how changes in population size affect the fraction of regulatory mutations that are observed (Fig. 6). If regulatory mutations have a higher probability of being beneficial, then their relative numbers will be enriched relative to structural mutations (compare Fig. 6A and Fig. 6B), and this enrichment will be independent of population size (Fig. 6B). In contrast, if regulatory mutations have larger beneficial effects than coding mutations, the level of enrichment will be dependent on population size: Larger populations will fix a greater fraction of regulatory mutations (Fig. 6C). These results show that population size can impact the relative contribution of regulatory and structural mutations, and emphasize that a predictive framework of the molecular basis of evolutionary innovation should take into account demographic parameters.

Fig. 6. Population genetic modeling (SI Appendix, Materials and Methods) shows that the relative numbers of regulatory (green points) and structural mutations (gray points) that contribute to novel function can depend on demographic parameters. (A) When the proportion of structural and regulatory mutations is 0.85 and 0.15, respectively (similar to the ratio of nonsynonymous to intergenic sites in the E. coli genome), and the distribution of selective effect sizes is identical, the ratio of the average number of structural and regulatory mutations within an individual is approximately independent of population size (white points; Lower). (B) If the number of structural sites at which structural mutations are beneficial is halved, the ratio again remains independent of population size, but the fraction of regulatory mutations approximately doubles. (C) If the mean and variance of the effects of structural mutations on fitness are half that of regulatory mutations, the ratio is dependent on population size, with individuals in larger populations containing larger relative numbers of regulatory mutations. (Insets) The shape of the distribution of mutational effects for structural (black) and regulatory (green) mutations. In A, the two distributions are identical. The results shown here correspond to 150 generations of evolution. All points shown are the means of at least 50 independent simulations.

By using a large number of independent instances of functional innovation and comprehensively characterizing their molecular effects, we have been able to develop a predictive framework for the genetic basis of evolutionary innovation. This framework provides insight into both when and via what mechanisms functional innovation will occur. We have shown that coding mutations tend to be numerically dominant overall, but regulatory mutations are much more common than expected based on the small fraction of the genome that does not encode proteins. One possible explanation for this observation is that coding mutations are more likely to incur antagonistic pleiotropic effects. For example, several of the regulatory mutations we observed affected genes that are essential for growth in minimal glucose media [e.g., proB (a glutamyl kinase), pabB (an aminodeoxychorismate synthase), and metE (a homocysteine transmethylase)]. Changes in the coding regions of these proteins may have detrimental effects on these genes’ native functions, whereas regulatory mutations may tend to have less of a deleterious effect. The regulatory mutations we observed frequently resulted in the creation of new transcriptional control elements. They were particularly frequent in novel functions related to building block biosynthesis—functions which were hard to evolve—and generally affected genes that act in cellular contexts closely related to the novel function that was evolved.

It has previously been proposed that gene duplications play a critical role in functional innovation (39, 40); more recently, this phenomenon has been observed in a laboratory setting (41). However, we found that gene amplifications did not dominate during the early stages of the evolution of functional innovation. This is perhaps due to deleterious pleiotropic effects that manifest when large parts of the genome are amplified, and the rarity with which smaller and less deleterious duplications occur (e.g., 17). At the same time, our results suggest that during the early stages of functional innovation, some genes may take on dual roles in the cell (42), and these dual roles may be facilitated by overexpression. Later, rare duplication of these loci may allow their specific enzymatic activities to diverge (41).

The predictive framework we present here contributes to our fundamental understanding of the evolutionary process, and at the same time provides insights into how populations can respond to rapid environmental change.