We lack a comprehensive understanding of evolutionary pattern and process because short-term and long-term data have rarely been combined into a single analytical framework. Here we test alternative models of phenotypic evolution using a dataset of unprecedented size and temporal span (over 8,000 data points). The data are body-size measurements taken from historical studies, the fossil record, and among-species comparative data representing mammals, squamates, and birds. By analyzing this large dataset, we identify stochastic models that can explain evolutionary patterns on both short and long timescales and reveal a remarkably consistent pattern in the timing of divergence across taxonomic groups. Even though rapid, short-term evolution often occurs in intervals shorter than 1 Myr, the changes are constrained and do not accumulate over time. Over longer intervals (1–360 Myr), this pattern of bounded evolution yields to a pattern of increasing divergence with time. The best-fitting model to explain this pattern is a model that combines rare but substantial bursts of phenotypic change with bounded fluctuations on shorter timescales. We suggest that these rare bursts reflect permanent changes in adaptive zones, whereas the short-term fluctuations represent local variations in niche optima due to restricted environmental variation within a stable adaptive zone.

Evolutionary biologists working at different timescales often adopt dramatically different perspectives on the pace and process of phenotypic evolution. For example, the norm for microevolutionary studies is to observe high levels of heritable genetic variation (1, 2), strong selective pressures (3, 4), and the frequent occurrence of substantial phenotypic change on a timescale of a few to a few dozen generations (5–8). However, paleontologists working on much longer timescales have recognized an overwhelming prevalence of evolutionary stasis (9–11), although other patterns are known (12–14). On even longer timescales, comparative studies of extant species routinely record substantial divergence (15, 16). These different perspectives have generated controversy about the ability of microevolutionary process to explain macroevolutionary patterns (4, 9–10, 17–21). To resolve this debate we need models that can simultaneously account for evolutionary change across a range of timescales. Here, we make a step toward that goal by assembling and analyzing a dataset of body-size evolution across an unprecedented temporal span.

Gingerich (7) compiled a large dataset representing a collection of evolutionary rates measured over 100–107 generations in the fossil record. Analyzing this dataset, Estes and Arnold (21) found a striking pattern of bounded fluctuations in phenotype, which implies that the expected magnitude of phenotypic change is about the same regardless of whether two samples are separated by 10 generations or 1 million generations (see also ref. 22). In other words, short-term, fluctuating evolution occurs, but the changes fail to accumulate with time. This pattern predicts that closely related species should be as different as less related species, a conclusion seriously at odds with comparative studies, which often detect strong phylogenetic signal for traits related to organism size (23–26). This incongruence suggests that we need a more extensive compilation of data to get a full picture of how evolutionary patterns scale up over time.

Many studies have examined large-scale patterns of body-size evolution using either fossil (7, 21, 27, 28) or comparative data (15, 16, 29). Such studies have yielded many important insights, but no studies have yet combined paleontological and comparative data in the same modeling framework, and the puzzling inconsistencies that seem to exist between these types of studies remain. Resolving these inconsistencies is made more challenging by the different timescales, traits, and taxa analyzed in different studies. To achieve a much needed synthesis, we combine microevolutionary and fossil time series data with phylogeny-based comparative data and analyze these data with a set of evolutionary models that extends the analysis of Estes and Arnold (21). We base our analysis on a dataset that spans the broadest range of timescales yet examined (0.2 y to 357 Myr) by combining historical and contemporary field studies, fossil time series, and comparative data.

Results and Discussion

Consistent with previous studies (21, 22) we find a complete absence of a time-span effect up to ∼1 Myr, but this pattern then yields to a pattern of increasing divergence on timescales above 1–10 Myr (Fig. 1). Whereas the dominant pattern of bounded, but fluctuating changes on shorter timescales is consistent with a common paleobiological concept of stasis (9, 14), the long-term evolutionary pattern is consistent with observations of phylogenetic correlation and cumulative evolutionary change. The union of these seemingly contradictory patterns generates a remarkably continuous visual pattern reminiscent of the flared barrel of a blunderbuss firearm. We show that this “blunderbuss pattern” is consistent regardless of the method of trait standardization and whether timescale is expressed in generations or years (SI Text and Figs. S1 and S2). Because biases introduced by including different taxa and traits at different timescales may cloud our interpretation of the pattern, we examined subsets of data on divergence in the body size of mammals, squamates, and birds, as well as molar dimensions in primates (Fig. 2). The general pattern and timing are strikingly consistent across traits and taxa. Although the continuity of pattern is clear, it is less obvious what evolutionary processes can account for divergence across all timescales.

Fig. 1. The “blunderbuss pattern”, showing the relationship between evolutionary divergence and elapsed time. Divergence is measured as the difference between the means of log-transformed size in two populations ( and ) standardized by the dimensionality, k. Intervals represent the total elapsed evolutionary time between samples. Microevolutionary data include longitudinal (allochronic) and cross-sectional (synchronic) field studies from extant populations. Paleontological divergence is measured from time series, including both stratigraphically adjacent (autonomous) populations and averaged longer-term trends (nonautonomous). We supplement these data with node-averaged divergence between species with intervals obtained from time-calibrated phylogenies. Pairwise comparisons between species (small points) are also presented to give a visual sense of the range of divergence values across taxonomic groups. Dotted lines indicate the expected 95% confidence interval for the multiple-burst model fitted to the microevolutionary, fossil, and node-averaged phylogenetic data.

Fig. 2. Divergence patterns are similar across major groups of vertebrates. Taxonomic levels are denoted by color. For comparative data, smaller points indicate pairwise divergence measures and larger points are node-averaged divergence. Data for primates are first molar lengths and widths only.

To describe these patterns more precisely, we fitted four stochastic models to the data. The first model describes bounded evolution (BE) and was designed to fit the pattern of short-term fluctuations observed in the data. This model assumes that the amount of trait divergence over any time interval is an independent draw from a normal distribution with mean zero and variance, . Because this model does not allow for increasing divergence on longer timescales, we combined it with three other time-dependent stochastic models that capture this aspect of divergent evolution. The first of these unbounded model components is the well-known Brownian-motion (BM) model, which is often used to describe phylogenetic correlations. In this model the variance in trait means among replicate lineages increases linearly with time. In the single-burst (SB) model, we assume that the optimum can undergo a single large normally distributed change after a random (exponentially distributed) time interval. A special case of this model, in which the change was immediate, had the best fit to the data in Estes and Arnold (21). Finally, in the multiple-burst (MB) model, we allow multiple normally distributed changes in the optimum to occur over time according to a Poisson process.

The multiple-burst model was the best-fitting model, although all unbounded models capture the expansion of variance that occurs after 1 Myr of stationary fluctuations (Figs. 1 and 3 and Table 1). The multiple-burst model was preferred over the Brownian-motion model primarily because it better explained the overdispersion of divergence values found in the data at intermediate time intervals (106–108 y) in both the fossil data and the phylogenetic data. The three models including unbounded components (BM, SB, and MB) all estimate which means that 1 SD in the central band of data corresponds to an ∼10% difference in linear size traits. Some of this change reflects measurement error, but much of it probably reflects evolutionary change (SI Text). The time-dependent component of the models leads to noticeable departures from the central band only after ∼1 Myr of evolution (Table 1 and Fig. 3). For example, the expected waiting time to a displacement of the optimum in the multiple-burst model is >10 Myr (107.3976). This result implies that the distribution of divergence changes very little over the first million years, with only a modest 10% increase in the width of the 95% prediction interval for the central band occurring over this interval. By contrast, the prediction interval doubles after 5 Myr. For each displacement of the optimum, the estimated burst-size distribution predicts a size ratio between ancestor and descendant population means of 1.28.

Fig. 3. Best-fitting time-dependent stochastic models determined by fitting by maximum likelihood. Each model was fitted to the complete dataset, as well as to a subset including only microevolutionary and fossil data and a subset including only node-averaged phylogenetic data. Lines show the 95% confidence intervals for the model fits using the estimated parameters presented in Table 1. (A) Multiple-burst (MB) model. (B) Single-burst (SB) model. (C) Brownian-motion (BM) model. Also included is the expected 95% confidence interval for a Brownian-motion model of body mass evolution fitted to the mammalian phylogenetic data, accounting for simulated measurement error (SI Text). Although the confidence intervals for both the MB and the BM model appear very similar, the MB is a better fit to the overdispersed distribution of divergence values that occurs at intervals >1 Myr (SI Text).

Table 1. Parameter estimates and AIC scores for four model fits to the complete dataset and subsets

The patterns we observe are not simple consequences of sampling bias introduced by using different data sources. Whereas fossil data could be biased by the hesitancy of paleontologists to assign ancestor–descendant relationships to highly divergent samples, we observe the opposite pattern in the dataset. When we fitted models to microevolutionary and fossil data alone and compare these with models fitted to phylogenetic data, we find that the fossil series begin to diverge more rapidly than the phylogenetic data, with otherwise remarkably similar parameter estimates (Table 1, Table S1, and Fig. 3). This pattern is not expected if fossil series are biased against evolutionary change, but may be explained, for example, by increasing probability of extinction in more rapidly evolving lineages. Alternatively, this pattern could be a consequence of the known tendency of molecular phylogenies to estimate older divergence times than the fossil record (30,, 31).

The transition from bounded evolution to steadily increasing divergence is illuminated by using linear regressions to model the relationship between absolute divergence and time. We compared a segmented regression with a single breakpoint to a model with separate regressions for each of the three subsets of data (Fig. 4 and SI Text). A segmented regression with a breakpoint at 66,000 y has a lower Akaike's information criterion (AIC) than a model in which independent regressions are fitted to each of the three major subsets of the data. This success indicates that the change in slope is not an artifact arising from differences in data sources, but instead indicates a pattern common to both phylogenetic and paleontological data (Fig. 4, Fig. S3, and SI Text).

Fig. 4. Linear regressions of interval on the logarithm of absolute divergence, ln(|divergence| + 0.001). A positive linear relationship is predicted from Brownian-motion models of evolution. (A) AIC scores for the fit of the microevolutionary, fossil, and node-averaged phylogenetic data to linear regressions. The fit of a single linear regression to the entire dataset is the most poorly fitting model (green dashed line, AIC = 79,520). We then performed a segmented regression to find the optimal breakpoint by fitting the model iteratively while increasing the breakpoint from 0 to 8.5 log 10 y with a step size of 0.01 (solid black line). The AIC for this model reaches a minimum with a breakpoint of 104.82 years (AIC = 79,059). Subsetting by datasets and fitting a regression independently to each did not improve the fit relative to the segmented regression (red dashed line, AIC = 79,062). (B) Best-fit segmented regression model provides a better fit to the data (solid line) than a model where independent linear regressions are fitted to each dataset (dashed lines). Datasets and the corresponding regression lines are indicated by color (microevolutionary, yellow; fossil, green; and phylogenetic, blue).

One potential explanation for a central, bounded band of divergence that lasts 1 Myr is that the tendency to study rapid evolution on microevolutionary timescales might mask a pattern of ever-increasing divergence in the data. In particular, evolutionary responses to disturbances such as introductions, anthropogenic disturbance, or island isolation may be especially rapid and bias microevolutionary data toward higher divergence values. To test for such bias, we examined the evolution of body size according to whether a particular study represented disturbance-mediated evolution or in situ divergence (defined in ref. 8). Disturbance-mediated evolution is clearly more prevalent in field and historical studies (Fig. 5) and also appears elevated in two fossil time series that are arguably influenced by novel selective factors: the transition of Bison antiquus to Bison bison, which is hypothesized to have been driven by community change and anthropogenic selection (31), and the evolution of a dwarf form of Cervus elaphus on the Isle of Jersey (32). Clearly, we can often identify the many causal processes that generate divergence in body size, and these are likely to be overrepresented in our microevolutionary dataset. Although these types of causal phenomena may be overrepresented in intervals spanning 1–100 y, these same phenomena are likely to occur naturally over timescales of 1,000–100,000 y, and yet we see little accumulation of divergence across these timescales.

Fig. 5. Divergence identifiable as natural in situ variation vs. disturbance-mediated community change demonstrates that the majority of cases of rapid evolution over microevolutionary timescales are from identifiable causes such as introductions, anthropogenic disturbances, and island isolation. Highlighted examples include (clockwise, starting from left) in situ evolution of Geospiza fortis in response to natural climate variation, divergence in Nucella lapillus in response to an introduced predator, introduction of Gambusia affinis to Nevada and Hawaii, island–mainland divergence in Crotalus mitchelli, Holocene dwarfing of Bison antiquus to B. bison, dwarfing of Cervus elaphus on Jersey Island, and the introduction of Passer domesticus to North America and New Zealand. Most fossil time series cannot be assigned to either group, but are expected to record primarily in situ evolution of wide-ranging species. Dotted lines indicate the 95% confidence interval for simulated measurement error given equal means, average within-population variance, and a distribution of sample sizes matching those found in the data (SI Text).

We have taken a phenomenological approach to modeling the stable, central band of data apparent in Fig. 1 and have not attempted to account for that band with alternative models. Nevertheless, because that band is similar to a stable band analyzed by Estes and Arnold (21), we can conclude that none of the several alternative models considered by Estes and Arnold (21) can fully account for the band in Fig. 1. Those models include Brownian motion of the trait mean, Brownian and white-noise motion of an intermediate optimum (including a stationary optimum as a special case), steady directional movement of an intermediate optimum, and shifts of the trait mean between alternative adaptive peaks. All of these models fail either because they require unrealistic values for microevolutionary processes or because bounded evolution is difficult to achieve under any set of parameter values. We conclude that some unspecified process of bounded evolution is responsible for the band we observe in our data (Fig. 1). This process may include short-term movements of adaptive peaks corresponding to more or less regular, recurring fluctuations in the local environment (cf. Fig. 5 and ref. 33) as well as bounded displacements of adaptive peaks within adaptive zones. The process resembles Simpson's depiction of the bounded evolution of lineages as they radiate within an adaptive zone (34, 35). Under this interpretation, the slower time-dependent component of divergence commonly estimated from phylogenetic comparative data (14, 15) could then be due to the accumulation of rare, dramatic changes in the niche (or “primary”) optimum (18, 34, 36, 37). However, this interpretation does not address the issue of what controls the macroevolutionary dynamics of niches and results in departures from bounded evolution over longer timescales (19, 21, 37, 38).

It is tempting to turn to climate for an explanation of evolutionary bursts and suppose that the 1-Myr boundary on bounded evolution reflects limits on long-term rates of climatic and environmental change. However, substantial changes in global climate over timescales of <<1 Myr appear to result in bounded divergence characteristic of microevolutionary timescales, rather than dramatically hastening phenotypic evolution (39). Although habitat tracking by migration may mitigate the effects of such climate shifts, the data nonetheless do not strongly support a primary role for climate in driving phenotypic change.

What then, allows divergence to accumulate above 1 Myr, but not below? A particularly elegant explanation is provided by Futuyma's ephemeral-divergence model. Futuyma argues that although continuous, rapid evolution often occurs in local populations, the mosaic of niches and diverse adaptive optima of wide-ranging species prevents local evolutionary changes from spreading across the entire range (9, 40–44). Consequently, the variance of the stationary fluctuations, , that we estimate in our models could be interpreted as measuring the among-population geographic variation that results from ephemeral divergence in phenotypes responding to local selective pressures. Selective pressures that cause displacement of the optimum at the species level could reflect the same kind of disturbances that we observe occurring at the population level, but spread across a species’ entire range. Such significant, range-wide changes in selective optima may be sufficiently rare to explain the observed pattern of bounded evolution on timescales <1 Myr. Note that range-wide changes in adaptive optima can be accomplished by two means: either by the global spread of a selective factor across a species’ range or by the contraction of the species’ range itself (9, 41). Consequently, any process that reduces the range of a species, including speciation and taxon cycles, can potentially result in a displacement in phenotype by subsampling from the set of previously occupied adaptive niches (e.g., sampling from the distribution estimated by ). Species life spans are typically identified as spanning 1–5 Myr (28, 45–47) and taxon cycles have been described as spanning from 10,000 y to 10 Myr (48, 49), providing a suggestive correspondence to the rate of accumulation in phenotypic evolution that we observe in the data.

Our results are qualitatively consistent with recent analyses of body-size evolution that have fitted phenomenological models to data on a more limited timescale and with more taxonomic constraint than the data that we examine here. Gillman (27) identified the striking regularity with which body-size range increases with time on macroevolutionary timescales, but did not observe this phenomenon's relationship to bounded microevolutionary divergence due to limited sampling on short timescales. Other studies used various large comparative datasets to compare the fit of Brownian motion, an Ornstein–Uhlenbeck process, and an early-burst model in which a Brownian motion rate parameter declined exponentially with time (15, 16). The early-burst model fitted best on large groups such as mammals or birds taken as a whole (16). This success indicates that the rate of evolution may be faster early in a radiation, but within subclades the early-burst model was usually outcompeted by the other models. Although Ornstein–Uhlenbeck models often are the best-fitting models in these studies in some clades and subclades, the restraining force is generally very weak (15, 16). Consequently, the qualitative picture of linearly increasing variance we observe over most macroevolutionary timescales is consistent with these results (15, 16, 27). Our analysis and treatment of the data do not allow us to make inferences about the possible role of speciation and extinction events. Those processes may or may not be important components in generating the patterns we observe. In other words, our models and conclusions allow for the possible existence of a correlation between evolutionary change and speciation and extinction events (29).

We believe the path forward will involve explicit microevolutionary interpretations of the phenomenological stochastic models used in this and similar studies (15, 16, 29). In addition, combining data across micro- and macroevolutionary timescales should facilitate interpretation of model results in terms of adaptive landscapes and ecological processes. Finally, understanding how landscape dynamics scale across species ranges is a neglected step that needs to be taken. We may need more sophisticated models that bridge between population- and range-wide evolution to understand the striking patterns of divergence that we have documented.