Microbial ecology in the chemostat with two substrates

Microbial evolution is often studied in chemostats, which maintain long-term cellular growth at a rate fixed by the experimenter, and we focus on a model of microbial ecology suitable for chemostats19, 20. The chemostat’s environment is controlled through the choice of metabolic substrates, and to investigate microbial metabolic adaptation, we extend the standard model21 to include two growth-limiting substrates, which we name u and v (Fig. 1a). We follow the typical assumption that all other necessary nutrients are provided in excess and do not limit growth. Both substrates are added continuously to the chemostat at influx rates that the experimentalist determines and microbes grow in proportion to the substrate concentrations. The two substrates are individually sufficient for growth—i.e., they are perfectly substitutable22. The chemostat is continuously diluted at a constant rate, which is the same for both microbes and substrates. Therefore, microbial populations must grow at least as fast as the dilution rate to survive being washed out (to extinction), and population growth exactly equals the dilution rate at steady-state. The chemostat is well-mixed through stirring, which eliminates the spatial component of the environment and facilitates the development of simpler mathematical models (Methods section).

Fig. 1 Modeling microbial ecology in the chemostat with a metabolic trade-off between two substrates. a The chemostat is a homogeneous environment. The two substrates (u and v) are added continuously, and cells and substrates are diluted at a constant rate. b Cells specializing in using one substrate cannot also specialize in the other. The metabolic specialization of each cell is parameterized by a number, s, between zero and one: values near zero indicate a v-specialist; values near one indicate u-specialist; intermediate values are generalists. c To replicate, cells must progress through a series of discrete states of growth by metabolizing substrates. The yield of a substrate is the amount by which a cell’s state of growth increases by metabolizing the substrate. Cells replicate when their growth state exceeds a critical value. When a cell encounters a u substrate molecule, the molecule is imported with probability of s; when a cell encounters a v substrate molecule, the molecule is imported with a probability of (1 − s). d A rare mutation leads to competition between the (high abundance) resident phenotype and the (low abundance) invading phenotype and has three possible qualitative outcomes Full size image

In total, the ecological model is described by nine environmental parameters: the chemostat’s dilution rate; the influx rates of the two substrates into the chemostat; the two maximal rates of cellular import for the substrates; the metabolic rates for each substrate once imported into a cell; and the yield of each substrate (an integer number of incremented growth states per unit of substrate metabolized).

A cellular trade-off to constrain metabolic specialization

Cells in the chemostat encounter both u and v substrates but cannot simultaneously specialize to using both. To investigate how a metabolic constraint affects adaptation23, we consider a simple phenomenological trade-off between the ability of a microbe to import and consume u substrate molecules compared to its ability to import and consume v substrate molecules from the environment. We parameterize the metabolic specialization of each cell—its phenotype—by a number, s, between zero and one: values near zero indicate that the cell specializes to v; values near one indicate u specialization (Fig. 1). Cells inherit their phenotype (their value of s) from their parent nearly always without variation through clonal replication and we assume that cells cannot spontaneously change their phenotype (Fig. 1b).

We can interpret the metabolic specialization, s, in two ways. In the first interpretation, cells can always import both substrates, but increase the rate of import for u at the expense of decreasing the import rate for v. The maximal rate of import for u is multiplied by s; the maximal rate of import for substrate v is multiplied by (1 − s) (Supplementary Note 1). Such a constraint may arise if, for instance, a finite intracellular resource is shared between the production of the u and v permeases24,25,26 or through antagonistic pleiotropy6, 7, 27. In the second interpretation, cells adopt a mixed strategy (following evolutionary game theory28) of randomly switching between two metabolic programmes, each exclusive to only one substrate. Then, the probability of adopting the u metabolic program is s and the probability of adopting the v program is (1 − s) (Supplementary Note 2).

We model the microbial life cycle through a sequence of import and metabolism of substrates and a corresponding growth in biomass. Cells import the u and v substrates at rates proportional to the concentrations of these substrates in the chemostat, and the imported substrates are metabolized to give biomass to the cell, which divides once a threshold of biomass is crossed. We therefore structure the microbial population into discrete states of growth through which the cell progresses before replicating. Accordingly, cells that metabolize u and v molecules transition to a higher state of growth by an integer amount of states, which we call the substrate’s yield, and replicate when they exceed a maximum growth state (Fig. 1c).

Mutation-limited adaptation as a Markov process

In our model, microbes will almost always produce offspring that inherit their parents’ metabolic specialization, but, rarely, a mutation may result in a phenotypically distinct population that will compete with the (resident) parent population or community. We follow the theory of adaptive dynamics to include phenotypic mutations only on evolutionary timescales11, 12; i.e., mutations are sufficiently rare that mutants emerge only after the chemostat’s ecology has reached steady-state. This assumption is typically referred to as the ‘weak mutation’ limit29 and separates the ecological and evolutionary timescales. If, for example, the chemostat contains a single phenotypic population then invasion by a mutant has three possible outcomes: the mutant becomes extinct, or the resident becomes extinct, or the two phenotypes co-exist in a community (Fig. 1d). In our spatially homogeneous model at most two populations can co-exist—following the competitive exclusion principle30, which states that at most k species may co-exist on k growth-limiting substrates.

Simulating the map of invasion events

When adaptation is limited by the availability of mutations, mutational paths emerge according to the sequence and outcome of mutation and invasion events. Successful invasion of a resident community by a new mutant modifies the chemostat’s environment through changing the steady-state levels of the available substrates. Therefore the context in which future mutation and invasion events occur is modified through the construction and destruction of ecological niches2, 8, 18, 23, 31.

To generate the mutational paths, we first create an invasion map for the outcome of competitions between all resident communities and all mutants. To do so efficiently, we developed a dynamic programming algorithm to simulate invasion events assuming rare mutations on a discretized phenotype space (Fig. 2a). Briefly, the algorithm treats the invasion map as a tree: nodes are communities of phenotypes that can co-exist at steady-state and are connected by edges representing single mutation and invasion events. The algorithm iteratively constructs the tree by perturbing the steady-state of the community at each parent node to introduce a small population of mutants. The resulting competition is simulated (Methods section), and the outcome at the new steady-state is analyzed and recorded as a connected child node. We terminate a path of mutation and invasion events if the path arrives at a node that has already been placed in the tree. By doing so, we avoid redundant simulations because the sub-trees below two identical nodes are always the same. The algorithm completes when all paths have terminated (Supplementary Note 3).

Fig. 2 Generating a Markov process for mutation-limited adaptation. The procedure has two stages: first, simulating all possible ecological invasions; second, calculating the probabilities to transition between steady-states as a result of these invasions. We demonstrate a simple example that has only three possible phenotypes (labeled A, B, and C). a The invasion map contains the outcome of all possible competitions between residents and mutants and all steady-states. An algorithm based on dynamic programming constructs the invasion map as a tree: nodes represent steady-states connected via edges that represent mutations and competitions. Starting with the empty chemostat (root node), each state is perturbed with a small population of mutants that have each phenotypic value in turn. The steady-state after every competition is recorded in a child node and connected to the parent node. The algorithm dynamically truncates the tree to avoid any recursion (repetition). b To calculate transition probabilities between resident states, we parse the invasion map in a with a mutational process. We assume that mutations are uniformly distributed in the space of phenotypes, but centered on the resident’s phenotype and with a maximum size of mutation ΔS max . The elements of the matrix show the conditional probability that phenotype s x generates mutant phenotype s y . c The Markov process in b visualized through a graph of its directed network. States are classified as either transient (circles) or recurrent (squares). d, e As in b, c, but with a larger maximum size of mutation ΔS max . The recurrent states, as well as the mutational paths of the process, depend on the distribution of mutations Full size image

We found that invasion fitness—the ability of a phenotype to invade another when initially rare—was dependent on the frequency and type of other phenotypes in the environment. For example, in the example shown in Fig. 2a, phenotype C can always invade and drive A to extinction in pairwise competition; the converse is not possible. Phenotypes B and C are mutually invasible and establish a community of co-residents. This community can be invaded by phenotype A, driving B to extinction, thereby establishing a co-existence with phenotype C. The A,C community would not have been possible without the intermediate environmental modification effected by phenotype B. Frequency-dependent effects through niche creation and destruction are stronger when mutations in metabolic specialization are large because mutants can exploit metabolically disparate niches.

The invasion map can be thought of as providing two extensions to the pairwise invasibility plots used in adaptive dynamics theory11, 12, 32. Like adaptive dynamics, our modeling approach recovers the existence of evolutionary branching points where an initially monomorphic population branches into two co-existing populations (Supplementary Fig. 1), but unlike adaptive dynamics such points require no special treatment. In addition, we can investigate non-local evolutionary branching that occurs after a mutation of large effect and analyze mutational paths that contain this type of diversification (Supplementary Fig. 2).

Mutations connect community states

To model adaptive evolution as a Markov process, we define a state of the Markov chain as a viable population in the chemostat, which is either monomorphic with all cells having the same phenotype or dimorphic (i.e., a community) with two distinct phenotypes that co-exist, and require the probability of a transition from one state to another through a single mutation and invasion event (Methods section). The probability of such a transition is proportional to the abundance of cells in the source state and to the propensity with which each cell generates the mutation that effects the transition to the target state. We will only be concerned with the sequence of transitions between one state to another and not with the waiting time between such events. As a consequence, our approach draws no conclusions about the length of time a microbial community spends in a particular state and hence does not quantify the evolutionary timescale. It should be noted, however, that in an experimental setting we may be able to observe (or reconstruct) transitions between states without necessarily knowing the precise time at which the transitions occurred. Under these circumstances, our methods, which work without a temporal component, retain their descriptive and predictive utility.

The availability of phenotypes to mutants partly determines which of the possible states will follow a current state. To investigate how the supply of mutations affects adaptation we use a uniform distribution of mutations in phenotype space centered on a resident phenotype, s x , with a maximum mutation of size ΔS max , which is the largest possible difference between the metabolic specialization of a parent and that of its offspring. Mathematically, a mutant with phenotype s y can be generated by a cell with phenotype s x if |s y − s x | ≤ ΔS max . For example, consider the s = 0.75 phenotype, which uses more u substrate than v substrate. When ΔS max = 0.25, cells with this phenotype can generate pure u specialists (s = 1) but not pure v specialists (s = 0) because a larger phenotypic change is required to achieve the latter. The probabilities of mutation are chosen so that all of the discrete s y values that can be generated from s x are equally likely after correcting for boundary effects (Fig. 2b, c). The distribution of mutations affects both the mutational paths and the stationary behavior of the adaptation process, and we parse the invasion map with different sizes of maximum mutation to investigate this dependency. For simplicity, we assume that the rate of mutation does not depend on the phenotype and that it is constant over time (Methods section).

Finally, to complete the description of mutation-limited adaptation as a Markov process, we must choose an initial distribution of ancestral states. Most laboratory evolution experiments start with a single isogenic population, and in our model adaptation begins with equal probability from any viable population comprising a single phenotype (rather than a community with two phenotypes).

To determine the long-term evolutionary outcome of an adaptation process, we classify the states of the Markov process as either transient or recurrent, following the theory of Markov processes. Transient states represent microbial communities that may only be visited once on a mutational path. A recurrent state, however, will always re-emerge once established, and the endpoints of mutational paths are always recurrent states (Fig. 2d, e). Recurrence occurs either when a microbial community cannot be invaded by any mutant that can be generated from that community (an evolutionarily stable state28) or if there is a sequence of mutation and invasion events that returns to the community.

Hierarchy of evolutionary outcomes

To investigate the effect of environmental and mutational parameters on the adaptation process, we randomly sampled 10,000 sets of our model’s nine environmental parameters. For each set of parameters, we calculated the invasion map using 11 discrete phenotypic values ranging from a pure v-specialist to a pure u-specialist: s ∈ {0.0, 0.1,…, 0.9, 1.0}. We parsed each invasion map with distributions of mutations that are uniform but with an increasing size of maximum mutation, ΔS max ∈ {0.1, 0.2,…, 0.9, 1.0}, to generate 10 adaptation processes. We then analyzed the 100,000 resulting adaptation processes to determine their long-term evolutionary behavior.

We classify the evolutionary outcome of an adaptation process by the number and type of its recurrent states (Fig. 3a). At a high level, we differentiate between processes with either a single recurrent state or multiple recurrent states. We further classify phenotypes as either specialists (when s = 0, a v-specialist, or s = 1, a u-specialist) or generalists (when 0 < s < 1) and determine whether recurrent communities are either all specialists, all generalists, or a mixture of both. We observed two qualitatively different behaviors in processes with multiple recurrent states, which we named ‘multi-stationary’ and ‘cycling’.

Fig. 3 Our model generates multiple evolutionary outcomes and complex networks of mutational paths. Qualitative properties of the dynamics of adaptation can be visualized and conveyed through graphing the networks of mutational paths. a We hierarchically classify adaptation processes according to their evolutionary outcome using the number and type of recurrent states. Labels show each outcome’s frequency obtained by sampling of environmental parameters. b–e The networks of mutational paths from four sets of environmental parameters demonstrate the scope of the model’s dynamic and stationary behavior. The maximum size of mutation (ΔS max ) affects the mutational paths and long-term outcomes of adaptation (compare the recurrent states between the left and right networks). Circles are transient states; squares are recurrent states; colors denote the number, and type of metabolic phenotypes in each state following a. Red and blue traces are examples of mutational paths, which start from an ancestral initial state (always a monomorphic state) and end at a recurrent state. We indicate the phenotype values of residents in some vertices to aid interpretation. b The invasibility relationship between a pair of phenotypes may reverse in the presence of co-residents as a consequence of frequency-dependent modification of the environment. For example, an s = = 0.8 metabolic generalist can invade an s = 1.0 specialist but the converse is not possible in pairwise competition. When an s = 0.6 generalist is a co-resident, however, the s = 1.0 specialist can invade the dimorphic community of s = 0.8 and s = 0.6 generalists——and drive s = 0.8 to extinction. c A process with multiple, non-connected recurrent states has more than one evolutionarily stable state, each of which is reached with some probability. d A process where multiple recurrent states are connected exhibits quasi-periodic evolutionary cycling. e An example of a potentially bottle-necked process. The network consists of two highly-connected (top and bottom) subgraphs, which are themselves connected via only a few mutation and invasion transitions Full size image

Multi-stationary outcomes have more than one exclusive recurrent states and each recurrent state is reached with some probability. In these adaptation processes we expect experimental replicates to eventually show divergent phenotypic variability8, 10, 33. We found that the number of recurrent states and their stationary probability can depend on the maximum size of mutation (Fig. 3b, c). Alternative microbial community states have been detected in the human gastrointestinal tract34, although whether these states are caused by variations in the host environment or are true alternative states is unclear.

Cycling outcomes are processes where no microbial community is completely resistant to invasion, and the adaptation process continually cycles between a set of states through mutation and invasion events. This cycling can be either periodic (Fig. 3d, left) or aperiodic and unpredictable—albeit confined to a closed class of recurrent states (Fig. 3d, right). These changes in the community occur at the evolutionary time-scale35 and, though reminiscent of, should be distinguished from quasi-periodic and chaotic dynamics on ecological time scales36, which are not driven by mutations. For example, in a dimorphic community of specialists and generalists, mutation and invasion events can drive the generalists to extinction. This extinction leaves an unexploited metabolic niche that a newly-emergent population of generalists can fill, and the cycle repeats (Fig. 3d). Both of these evolutionary outcomes are qualitatively reproduced in stochastic simulations with continuous phenotypes (Supplementary Figs. 3–5).

Visualizing adaptation processes as directed networks not only conveys information about the evolutionary outcomes but also about the dynamics of adaptation. For example, we found that some adaptation processes are bottle-necked: most mutational paths have to transition through a few common states before reaching a recurrent state (Fig. 3e). The network examples we show in Fig. 3 are only representative, and do not portray the entire scope of networks we have observed, but instead illustrate a few properties of adaptation that can be conveyed graphically. We will later quantify such properties of the networks to more comprehensively characterize the dynamics of adaptation and to construct a predictive model.

Sensitivity of adaptation to environmental conditions

In light of the multiplicity and complexity of evolutionary outcomes, we investigated how particular outcomes are associated with the environmental parameters of the model. These parameters describe the chemostat’s set-up (rate of influx of substrates and dilution rate) and the properties of the two substrates (maximal rate of import, metabolic rate, and yield).

Although some evolutionary outcomes were more likely to be associated with certain types of environments, no clear pattern emerged (Fig. 4a). In particular, we did not detect regions in the space of environmental parameters that robustly associated with a single evolutionary outcome. We first used a nearest-neighbors algorithm37 to estimate the density of evolutionary outcomes given one environmental parameter with the others varying (i.e., we estimated the probability of each of the eight evolutionary outcomes given the environmental parameter). In general, these relationships were not robust, with the exception of high ratios of the influx rates of the substrates (>4 in log space): no parameter region associated with an evolutionary outcome with probability near 1. Instead, we observed that metabolically diverse (dimorphic) communities are most likely to emerge when the rates of influx for the two substrates are similar and when the dilution rate is low. As the disparity in supply of the substrates increases, we find that communities undergo evolutionary cycling, perhaps signaling the onset of ecological collapse38. At even greater ratios of the substrate’s rates of influx, communities disappear, leaving a population of a single specialist as the only evolutionary outcome. We also found that equality of the substrate’s yields was a necessary condition for the emergence of multi-stationary evolutionary outcomes, which suggests that parity between some properties of the substrates, as for the rates of influx, is important to promote metabolic diversity. However, increasing the difference between either the two metabolic rates or the two maximal rates of import did not lead to notable changes in the probabilities, except for a general trend towards a loss of diversity.

Fig. 4 A complex association exists between the environment and the evolutionary outcomes. Overlapping clusters of evolutionary outcomes in environmental parameter space make prediction difficult and explain sensitivity of microbial communities to environmental change. a Some evolutionary outcomes are more likely to be associated with certain parameter values, but without a simple dependence. Using a nearest-neighbors algorithm, we empirically estimated the density of each outcome from sampling parameters. For all combinations of parameters except the dilution rate, we show densities at absolute values because of the symmetry about zero. Colors for the outcomes follow Fig. 3a. b Evolutionary outcomes form clusters in the space of parameters but these clusters are not distinct. We used Euclidean distance in a standardized space of parameters to perform within-outcome hierarchical clustering and then aligned clusters with samples from other outcomes to assess their overlap. We only plot a representative sample (1%) of the data set, which preserves the observed frequencies of each outcome. Distinct clusters would appear as bright regions in the diagonal combined with dark regions in the off-diagonal rows and columns indicating that samples with the same outcome are closer to each other than they are to samples with different outcomes. Instead, we found that clusters for one outcome are almost always proximal to samples from other outcomes (bright off-diagonal regions). c Environmental perturbations often cause community collapse and only rarely create diverse communities. We calculated the distance in the standardized space of parameters from each process to the closest process in all other outcome classes and so formed distributions of shortest distances. The left column shows per panel the shortest distances from the monomorphic specialists to themselves and to one other outcome. The monomorphic specialists are generally closer to other monomorphic specialists than they are to other outcomes. The right column shows per panel the shortest distances from one other outcome to itself and to the monomorphic specialists Full size image

Even by using combinations of environmental parameters to characterize outcomes, substantial unpredictability remained. We developed a hierarchical classification, which combines six statistical estimators to make predictions (Methods section). Its overall performance (a mean recall of 0.78 over the eight evolutionary outcomes), however, did not suggest that a reliable predictive map from environmental parameters to evolutionary outcomes could easily be formed (Supplementary Fig. 6).

Evolutionary outcomes are interspersed in parameter space

We can partly understand this unpredictability by considering how the different evolutionary outcomes are distributed in parameter space, where distinct clusters do not appear to exist. For each outcome, we first performed hierarchical clustering using the pairwise Euclidean distance in a standardized parameter space (Methods section) and then aligned these clusters with samples from other outcomes to assess their overlap (Fig. 4b). In general, while parameter sets that produced the same outcomes did form clusters (bright diagonal elements), these clusters were not distinct: a sample or cluster from one outcome is typically proximal to samples or clusters from other outcomes (bright off-diagonal elements). Environmental perturbations are therefore often likely to cause a qualitative shift in the long-term community (Supplementary Fig. 7).

In particular, monomorphic specialists permeate parameter space and are typically close to all other outcomes. By calculating the distribution of shortest distances from one type of outcome to either the same outcome or one of the other seven possibles outcomes, we can estimate the magnitude of the environmental change required to both maintain the type of outcome and to alter one outcome to another (Methods section). Assuming that the probability of environmental perturbation is inversely proportional to its magnitude, a change in the environment is approximately as likely to preserve a diverse community as it is to lead to its collapse to a monomorphic specialist (Fig. 4c, right). The converse, however, is not true: a population of monomorphic specialists is far less likely to transition to a diverse community (Fig. 4c, left). This asymmetry arises from differences in the cluster shapes in parameter space. As a two-dimensional analogy, consider a circular cluster of outcome X surrounded by a ring of outcome Y of equal area. On average, Y points in the ring will be closer to the X–Y boundary than X points in the circle because of the larger interior of the circle. Therefore, small perturbations of X points are less likely than equivalent perturbations of Y points to cause a crossing of the X-Y boundary.

We conclude that prediction of evolutionary outcomes from environmental conditions, even in simple chemostats, is likely to be challenging. Evolutionary outcomes are only the endpoints of the adaptive process, however, and we next investigate if the dynamics of these processes (the mutational paths) are characteristic of their endpoints.

Adaptation dynamics on the network of mutational paths

The mutational paths of the Markov process describe the dynamics of adaptation, and the processes we construct have tens to millions of mutational paths despite having only eleven discrete phenotypes and a maximum of two co-existing populations in each community.

Measuring evolutionary repeatability

We analyze the dynamics of adaptation by enumerating a process’s mutational paths and calculating the probability with which each path may be observed to construct the distribution of paths (Methods section). In addition, we calculate several properties of the paths during enumeration (Fig. 5a). For example, we define and calculate the length of a path as the number of state-to-state transitions in one realization of the adaptation process (a single run of an evolution experiment) from an ancestral to a recurrent state.

Fig. 5 Different evolutionary outcomes have characteristic adaptation dynamics. Processes with different evolutionary outcomes have characteristic dynamics of adaptation, which can be quantified and compared through the properties of their mutational paths. a We enumerate mutational paths to generate the distribution of paths. Paths begin at a monomorphic state (with equal probability) and end at a recurrent state. A path’s probability is inversely proportional to its length. Most mutational paths are of intermediate length, and only a few are either short or long. b The repeatability of the dynamics of adaptation depends on the long-term evolutionary outcome and does not typically decrease with larger maximum sizes of mutation. The entropy of the distribution of paths quantifies repeatability, with high entropy implying low repeatability. Points show the mean path entropy; shaded regions are ±s.d. c Statistics from the mutational paths vary with both the maximum size of mutation and the evolutionary outcome. We plot the distributions for six properties of the paths, grouped by the number of recurrent states and the two extremes of the maximum size of mutation. The distributions are multi-modal, and the peaks correspond to outcomes lower in the classification of Fig. 3a. d Processes with different evolutionary outcomes have characteristic properties of their mutational paths. Through linear discriminant analysis, we identified combinations of the six statistics in c that separate the adaptation processes by their evolutionary outcome. A two-dimensional projection of the result is shown with lines and ellipses drawn manually Full size image

The degree to which adaptive evolution is repeatable is of long-standing interest39, 40, but analyses are typically restricted to models with static fitness landscapes41. To quantify the repeatability of adaptation in our model, where fitness landscapes change dynamically through construction and destruction of ecological niches, we mathematically define repeatability as the entropy of the distribution of mutational paths. If replicate experiments are likely to follow only a few mutational paths, the entropy is small and repeatability is high; if each replicate experiment follows a different mutational path, the entropy is large and repeatability is low.

In our model, the repeatability of the dynamics of adaptation is associated with the long-term evolutionary outcome of the adaptation process and varies with the maximum size of mutation (Fig. 5b). With one exception, path entropy does not decrease with the maximum size of mutation as more mutational paths become possible. We find that processes where monomorphic specialists are evolutionarily stable have the most repeatable dynamics, and that this repeatability plateaus early as the maximum size of mutation increases suggesting that the adapting system can follow only a few mutational paths even as larger mutations become available. In contrast, processes with metabolically diverse communities have the least repeatable dynamics. Notably, dimorphic specialists (and, to a lesser extent, cycling outcomes) have a path entropy that increases to a maximum at intermediate maximum sizes of mutation and decreases thereafter because a few mutational paths with large mutations emerge to dominate the process (Fig. 5b, blue line). The rate at which path entropies plateau varies between the different evolutionary outcomes. A non-increasing path entropy suggests that mutational paths with larger mutations are not realized. Frequency-dependent fitness effects may limit large mutations, which lead to phenotypes different from the current resident, because the corresponding mutants are ill-suited to the current environment in the chemostat. Smaller mutations, in contrast, are more likely to gradually alter the environment.

Mutational path properties identify evolutionary outcomes

To compare the dynamics of adaptation between processes, we constructed a condensed dynamical profile for each adaptation process through calculating six properties of its mutational paths. These properties were: the number of paths, the mean and variance of the length of the geodesic (shortest) paths, the mean and variance of the length of all paths, and the mean minimum cut size (the smallest number of edges that must be removed from the graph to disconnect an initial state from a recurrent state and a measure of the extent of bottle-necks in the process).

We analyzed the six properties of the mutational paths for processes with different evolutionary outcomes and at different maximum sizes of mutation (Fig. 5c). The effect of increasing the maximum size of mutation depends on the evolutionary outcome of the process, but the qualitative trends are consistent. Specifically, larger mutations decrease both the mean and the variance of the length of the geodesic path because recurrent states can be reached via fewer mutation and invasion events of larger effect. Similarly, larger mutations have a decreasing, albeit smaller, effect on the mean and variance of the overall lengths of the paths. Finally, we found that a larger maximum size of mutation generally leads to processes with fewer bottle-necks (higher minimum cut size), particularly for processes with a single recurrent state. Increasing the maximum size of mutation by increasing the availability of mutations can increase the number of outgoing edges from vertices in the network. Mutational paths that pass through such these vertices can often still reach the recurrent states even as edges are removed from the network, and so the degree of bottle-necking is reduced.

The distributions of properties of the mutational paths are multi-modal (Fig. 5c), and the peaks correspond to evolutionary outcomes lower in the hierarchy of evolutionary outcomes (Fig. 3a), suggesting that processes with different outcomes have characteristic dynamics of adaptation. We used discriminant analysis to find linear combinations of the six properties of the paths that maximize the separation of the viable evolutionary outcomes according to their dynamics. We found that different outcomes, particularly those higher in the hierarchy of outcomes (Fig. 3a), occupy different regions in a two-dimensional projection of the discriminant space (Fig. 5d). We can therefore conclude that mutational paths characteristically identify long-term outcomes, and we contrast this positive finding with the difficulty of obtaining an association between environmental parameters and evolutionary outcomes (Fig. 4).

The properties of mutational paths that we calculate quantify statistical aspects of the adaptation process, but enumeration of paths becomes computationally prohibitive for larger networks and relies on the correct classification of states, which itself requires knowledge of the complete network of paths. We next present an alternative graph-theoretic approach that relaxes these constraints and forms the basis of a model for predicting evolutionary outcomes from observations of the dynamics of adaptation.

Predicting outcomes from networks of mutational paths

To motivate our predictive model, we consider an evolution experiment as the progressive construction of a network of mutational paths. In such a setting, the resident microbial populations in a chemostat are periodically assayed42, and the transitions between resident communities are used to progressively construct the network. To simulate the results of this procedure, we re-sampled the 100,000 adaptation processes in our data set to obtain networks at varying stages of completion. Incomplete networks may contain spurious recurrent states, which obfuscate the process’s evolutionary outcome (Fig. 6a, top). These states appear recurrent because the mutation and invasion events that lead away from these states have not yet been observed in the experiment. The objective, then, is to forecast the true recurrent states of the complete network from an incomplete sample. While a single adaptive trajectory is unlikely to contain enough information for reliable prediction, we show how incomplete trajectories from replicate evolution experiments can be combined and used in a predictive manner.

Fig. 6 Predicting long-term outcomes from the topology of the network of mutational paths. We calculated six centrality measures, which characterize different aspects of the networks’ topology, and used these as learning features in a predictive model trained on complete and incomplete networks. a The state of a network’s completion affects its topology, which is reflected in the statistics of its centralities. As an example, we show a network in various states of completion and the corresponding progression of four statistics (dark blue shows the mean and light blue shows the s.d. calculated for 100 different networks). b We characterized the statistics of the centralities by how they converge to their value when the network is complete (the feature error) and by how well they can discriminate between evolutionary outcomes (mutual information). The plots show the mean of the feature error and the mean of the mutual information between the statistic and the evolutionary outcomes. c Evolutionary outcomes predicted with a classifier trained on the statistics of the centralities from partial and complete networks. We assessed performance via the unweighted mean of the F1 score taken over the seven outcomes. The mean test-set performance is reported for 10-fold cross-validation. Top and right line plots show the mean performance, with the s.d.s as shaded regions, taken over either the state of completion of the network or the maximum size of mutation. The red line is the performance of a naive classifier, which predicts following the empirical frequencies of the outcomes Full size image

Informative and invariant topological properties

Our predictive model relies on quantifying how networks change with accumulating observations. As we have shown earlier, evolutionary outcomes have characteristic mutational paths. To use the information encoded in the entire network of mutational paths at once, we measure graph-theoretic properties that quantify abstract features of the network. Specifically, for each network, we calculated six measures of centrality43 that characterize different aspects of the topology of the network: the in-degree, out-degree, closeness, betweenness, HITS-hub, and HITS-authority centralities (Methods section, Supplementary Fig. 8, and Supplementary Table 1). Though less interpretable from an ecological or evolutionary standpoint compared to the properties of the mutational paths (such as the length of the path), centralities nevertheless contain valuable predictive information. For each network we calculate the mean and variance of the six centralities (taken over the network’s vertices). As mutation and invasion events are observed, new vertices and edges are added to the network, and the resulting changes in the network’s topology are reflected in changes in the statistics of the centralities (Fig. 6a, bottom).

To quantify how informative is a statistic for a centrality, and therefore how useful the statistic will be in a predictive model, we calculated the mutual information between each of the two statistics for the six centralities and the evolutionary outcomes (Methods section). A high mutual information suggests that processes with different evolutionary outcomes have sufficiently different distributions of centralities to allow us to unambiguously associate a particular statistic of a centrality with an evolutionary outcome (Fig. 6b, feature information). By calculating the difference in the statistics of the centralities between the network at varying stages of completion and when the network is fully complete, we found that some statistics will converge quickly: in the sense that, after a small fraction of the network is discovered, they will remain invariant as more vertices and edges are added (Methods section). In other words, these centralities are abstract topological features of the network whose statistics can be reliably identified with only a few observations (Fig. 6b, feature error). Interestingly, we found that the most informative statistics of the centralities (those with the highest mutual information) were also the slowest in terms of converging to their value when the network is complete (Supplementary Fig. 9).

Topology predicts outcomes from incomplete observations

Differences between the topologies of the networks, even for incomplete networks, were sufficient to enable reliable prediction of evolutionary outcomes using a statistical-learning approach. The twelve statistics (the mean and variance of the six centralities) provide a projection of the network of mutational paths, through its topology, to a low-dimensional space. We trained a classifier to predict evolutionary outcomes from these features of the centralities using data from incomplete and complete networks, the degree of the network’s completion, and the maximum size of mutation (Methods section). To avoid biasing against rare outcomes, we used the unweighted mean F1 score (over the outcomes) as the optimization metric. Neither the degree of the network’s completion or the maximum size of mutation will typically be known during an experiment, and we marginalized over both these variables during testing (Supplementary Fig. 10). The performance of the classifier improves with increasing completion of the network and is best at intermediate sizes of maximum mutation (Fig. 6c). At 50% completion of the network, the classifier predicts the correct long-term outcome approximately 85% of the time (average F1 score >0.7), and at 60% completion predicts correctly 98% of the time (F1 = 0.8).

We have shown here that our predictive approach can be useful in forecasting changes in the community through incomplete observations of the dynamics of adaptation, including the loss of (metabolic) diversity. Detecting the onset of transitions in a community is an enduring problem, but most approaches focus on catastrophic transitions38, 44. Nevertheless, the loss of biodiversity following environmental disturbance can, as we have shown, involve a series of ecological transitions mediated by multiple mutation and invasion events, and our network-based approach can address this challenge.

With increasing complexity of ecological models, the computational burden of simulating an exponentially increasing number of combinations of parameters imposes a limit on our understanding of complex and realistic models. When the goal is to forecast long-term outcomes, however, our predictions suggest that it is possible to safely forego simulating the entire model once a sufficient training sample has been obtained. Instead, only a fraction of the mutational path network is necessary to reliably predict the long-term community outcome.