Significance The success of teleost fishes, which represent roughly half of all vertebrate species, has attracted attention since Darwin. Numerous scenarios invoke elevated diversification in teleosts facilitated by supposed key innovations, yet claims of teleost exceptionalism are profoundly biased by the evolutionary “snapshot” of living fishes. Analysis of 160 million y (Permian–Early Cretaceous) of evolution in neopterygian fishes reveals that anatomical diversification in Mesozoic teleosts as a whole differed little from their “living fossil” holostean sister group. There is some evidence for evolutionary heterogeneity within teleosts, with early evolving lineages showing the greatest capacity for evolutionary innovation in body shape among Mesozoic neopterygians, whereas members of the modern teleost radiation show higher rates of shape evolution.

Abstract Since Darwin, biologists have been struck by the extraordinary diversity of teleost fishes, particularly in contrast to their closest “living fossil” holostean relatives. Hypothesized drivers of teleost success include innovations in jaw mechanics, reproductive biology and, particularly at present, genomic architecture, yet all scenarios presuppose enhanced phenotypic diversification in teleosts. We test this key assumption by quantifying evolutionary rate and capacity for innovation in size and shape for the first 160 million y (Permian–Early Cretaceous) of evolution in neopterygian fishes (the more extensive clade containing teleosts and holosteans). We find that early teleosts do not show enhanced phenotypic evolution relative to holosteans. Instead, holostean rates and innovation often match or can even exceed those of stem-, crown-, and total-group teleosts, belying the living fossil reputation of their extant representatives. In addition, we find some evidence for heterogeneity within the teleost lineage. Although stem teleosts excel at discovering new body shapes, early crown-group taxa commonly display higher rates of shape evolution. However, the latter reflects low rates of shape evolution in stem teleosts relative to all other neopterygian taxa, rather than an exceptional feature of early crown teleosts. These results complement those emerging from studies of both extant teleosts as a whole and their sublineages, which generally fail to detect an association between genome duplication and significant shifts in rates of lineage diversification.

Numbering ∼29,000 species, teleost fishes account for half of modern vertebrate richness. In contrast, their holostean sister group, consisting of gars and the bowfin, represents a mere eight species restricted to the freshwaters of eastern North America (1). This stark contrast between teleosts and Darwin's original “living fossils” (2) provides the basis for assertions of teleost evolutionary superiority that are central to textbook scenarios (3, 4). Classic explanations for teleost success include key innovations in feeding (3, 5) (e.g., protrusible jaws and pharyngeal jaws) and reproduction (6, 7). More recent work implicates the duplicate genomes of teleosts (8⇓–10) as the driver of their prolific phenotypic diversification (8, 11⇓–13), concordant with the more general hypothesis that increased morphological complexity and innovation is an expected consequence of genome duplication (14, 15).

Most arguments for enhanced phenotypic evolution in teleosts have been asserted rather than demonstrated (8, 11, 12, 15, 16; but see ref. 17), and draw heavily on the snapshot of taxonomic and phenotypic imbalance apparent between living holosteans and teleosts. The fossil record challenges this neontological narrative by revealing the remarkable taxonomic richness and morphological diversity of extinct holosteans (Fig. 1) (18, 19) and highlights geological intervals when holostean taxonomic richness exceeded that of teleosts (20). This paleontological view has an extensive pedigree. Darwin (2) invoked a long interval of cryptic teleost evolution preceding the late Mesozoic diversification of the modern radiation, a view subsequently supported by the implicit (18) or explicit (19) association of Triassic–Jurassic species previously recognized as “holostean ganoids” with the base of teleost phylogeny. This perspective became enshrined in mid-20th century treatments of actinopterygian evolution, which recognized an early-mid Mesozoic phase dominated by holosteans sensu lato and a later interval, extending to the modern day, dominated by teleosts (4, 20, 21). Contemporary paleontological accounts echo the classic interpretation of modest teleost origins (22⇓–24), despite a systematic framework that substantially revises the classifications upon which older scenarios were based (22⇓⇓–25). Identification of explosive lineage diversification in nested teleost subclades like otophysans and percomorphs, rather than across the group as a whole, provides some circumstantial neontological support for this narrative (26).

Fig. 1. Phenotypic variation in early crown neopterygians. (A) Total-group holosteans. (B) Stem-group teleosts. (C) Crown-group teleosts. Taxa illustrated to scale.

In contrast to quantified taxonomic patterns (20, 23, 24, 27), phenotypic evolution in early neopterygians has only been discussed in qualitative terms. The implicit paleontological model of morphological conservatism among early teleosts contrasts with the observation that clades aligned with the teleost stem lineage include some of the most divergent early neopterygians in terms of both size and shape (Fig. 1) (see, for example, refs. 28 and 29). These discrepancies point to considerable ambiguity in initial patterns of phenotypic diversification that lead to a striking contrast in the vertebrate tree of life, and underpins one of the most successful radiations of backboned animals.

Here we tackle this uncertainty by quantifying rates of phenotypic evolution and capacity for evolutionary innovation for the first 160 million y of the crown neopterygian radiation. This late Permian (Wuchiapingian, ca. 260 Ma) to Cretaceous (Albian, ca. 100 Ma) sampling interval permits incorporation of diverse fossil holosteans and stem teleosts alongside early diverging crown teleost taxa (Figs. 1 and 2A and Figs. S1 and S2), resulting in a dataset of 483 nominal species-level lineages roughly divided between the holostean and teleost total groups (Fig. 2B and Fig. S2). Although genera are widely used as the currency in paleobiological studies of fossil fishes (30; but see ref. 31), we sampled at the species level to circumvent problems associated with representing geological age and morphology for multiple congeneric lineages. We gathered size [both log-transformed standard length (SL) and centroid size (CS); results from both are highly comparable (Figs. S3 and S4); SL results are reported in the main text] and shape data (the first three morphospace axes arising from a geometric morphometric analysis) (Fig. 2A and Figs. S1) from species where possible. To place these data within a phylogenetic context, we assembled a supertree based on published hypotheses of relationships. We assigned branch durations to a collection of trees under two scenarios for the timescale of neopterygian diversification based on molecular clock and paleontological estimates. Together, these scenarios bracket a range of plausible evolutionary timelines for this radiation (Fig. 2B). We used the samples of trees in conjunction with our morphological datasets to test for contrasts in rates of, and capacity for, phenotypic change between different partitions of the neopterygian Tree of Life (crown-, total-, and stem-group teleosts, total-group holosteans, and neopterygians minus crown-group teleosts), and the sensitivity of these conclusions to uncertainty in both relationships and evolutionary timescale. Critically, these include comparisons of phenotypic evolution in early crown-group teleosts—those species that are known with certainty to possess duplicate genomes—with rates in taxa characterized largely (neopterygians minus crown teleosts) or exclusively (holosteans) by unduplicated genomes. By restricting our scope to early diverging crown teleost lineages, we avoid potentially confounding signals from highly nested radiations that substantially postdate both genome duplication and the origin of crown teleosts (26, 32). This approach provides a test of widely held assumptions about the nature of morphological evolution in teleosts and their holostean sister lineage.

Fig. 2. (A) Morphospace of Permian–Early Cretaceous crown Neopterygii. (B) One supertree subjected to our paleontological (Upper) and molecular (Lower) timescaling procedures to illustrate contrasts in the range of evolutionary timescales considered. Colors of points (A) and branches (B) indicate membership in major partitions of neopterygian phylogeny. Topologies are given in Datasets S4 and S5. See Dataset S6 for source trees.

Fig. S1. Morphospace of 398 Permian–Early Cretaceous Neopterygii. Three major axes of shape variation are presented. Silhouettes and accompanying arrows illustrate the main anatomical correlates of these principal axes, as described in Table S3. Actual fossil specimens represented in the dataset are also displayed to visualize the extremities of morphospace axes. Fossil taxa as follows: (a) Bavarichthys incognitus; (b) Ellimmichthys longicostatus; (c) Turbomesodon relegans; (d) Allothrissops salmoneus; (e) Pholidophorus germanicus; (f) Macrosemius fourneti.

Fig. S2. Morphospace of 398 Permian–Early Cretaceous Neopterygii, illustrating the major clades of (A) teleosts and (B) holosteans.

Fig. S3. Comparisons of size rates between (A) holosteans and teleosts, (B) crown teleosts and all other neopterygians, (C) crown teleosts and stem teleosts, (D) crown teleosts and holosteans, and (E) stem teleosts and holosteans. Comparisons were made using the full-size SL dataset, a CS dataset, and a smaller SL dataset pruned to exactly match the taxon sampling of the CS dataset. Identical taxon sampling leads the CS and pruned SL datasets to yield near identical results. Although the larger SL dataset results often differ slightly, the overall conclusion from each pairwise comparison (i.e., which outcome is the most likely in an overall majority of trees) is identical in all but one comparison (E, under molecular timescales).

Fig. S4. Comparisons of size innovation between (A) holosteans and teleosts, (B) crown teleosts and all other neopterygians, (C) crown teleosts and stem teleosts, (D) crown teleosts and holosteans, and (E) stem teleosts and holosteans. Comparisons were made using the full-size SL dataset, a CS dataset, and a smaller SL dataset pruned to exactly match the taxon sampling of the CS dataset. Comparisons of size innovation are presented for K value distributions of the three datasets resemble each other closely.

Methods Phenotypic Datasets. Phenotypic data were collected from photographs of museum specimens of neopterygians ranging in age from Wuchiapingian (late Permian, ∼260 Ma) to Albian (Early Cretaceous, ∼100 Ma), supplemented by high-quality images in the primary literature. The phenotypic datasets represent a combined total of 1,170 unique specimen images assigned to 483 species. Our phenotypic datasets are divided into those describing variation in size and those capturing differences in shape. Because of varying degrees of completeness between fossil specimens, these datasets do not contain identical sets of taxa, although the degree of overlap between any two datasets is high. We obtained SL for 949 specimens assigned to 468 species, and CS (based on our constellation of landmarks for geometric morphometic analyses; see below) for 626 individuals assigned to 382 species. Size values within species were averaged, and all resultant species sizes were log-transformed before analysis (SL in Dataset S1; CS in Dataset S2). We used a 2D geometric morphometric approach using a constellation of 25 landmarks to quantify shape variation (Fig. S6) using the software package tpsDig2 (47). The shape dataset consisted of 774 specimen images assigned to 398 species (Fig. 2A and Figs. S1 and S2). Both fixed landmarks and semilandmarks were used to capture overall body shape and fin position, based on schemes applied previously to living (48) and fossil (31) fishes. Landmarked specimen data were aligned using orthogonal generalized Procrustes superimposition analysis (GPA), permitting shape values to be averaged within species. The averaged species data were then aligned with GPA and subject to a relative warp (RW) analysis in tpsRelw v1.54 (49). Of the four axes that described >5% of overall variation, the first three (RW1 to 3) captured clear biological features (rather than differences potentially related to preservation) and formed the basis of all shape analyses in Dataset S3. RW1 to 3 explained 42.53%, 21.43%, and 13.52% of the variation respectively. The Supporting Information details anatomical correlates of these axes (Table S3). Fig. S6. Landmark scheme used to quantify shape. Fixed landmarks are marked by red circles; semilandmarks are marked by black circles with white fill. Red circles containing a white asterisk act as anchor points for the semilandmarks in-between. Fixed landmarks document: 1) anterior tip of the upper jaw (premaxilla); 2) the central, ventral surface of the orbit; 3) the center of the orbit; 4) the central, dorsal surface of the orbit; 5) the dorsal surface of the skull immediately above the eye; 6) postero-dorsal tip of braincase; 7) anterior insertion of dorsal fin; 8) posterior insertion of dorsal fin; 9) dorsal surface representation of the last vertebral centra; 10) ventral surface representation of the last vertebral centra; 11) posterior insertion of anal fin; 12, anterior insertion of anal fin; 13) anterior insertion of the pelvic fin; 14) anterior insertion of the pectoral fin; 15) lower jaw joint. Landmarked specimen is Amiopsis lepidota (SMNS 80251) from the Staatliches Museum für Naturkunde, Stuttgart. Table S3. The first four axes derived from a relative warp analysis and their anatomical correlates Tree Construction. Summarizing existing topologies. Because there is no densely sampled phylogenetic hypothesis available for early fossil neopterygians, we adopted a “supertree” approach to produce a sample of trees for comparative analyses. Topologies were constructed using matrix representation with parsimony (MRP), drawing upon 120 source topologies (Dataset S6) to summarize relationships and capture phylogenetic uncertainty among 671 (mostly Mesozoic, but some living) neopterygian species. We adopted MRP because many trees lacked the data matrix used to create them (e.g., ref. 50) or were hand-constructed (e.g., ref. 51). All junior synonyms present in the source trees were replaced by their correct senior synonyms to ensure all taxa were correctly represented in the source topologies. We included a “seed” (i.e., backbone) tree built with reliable taxonomic information that contained every species (52). This process permitted inclusion of large numbers of taxonomically assigned species that have not otherwise been included in a formal phylogenetic analysis. Use of taxonomic information is further vindicated given that paleontological trees derived from taxonomies can deliver comparable results using comparative methods to those derived from cladistic phylogenies (53). The taxonomy seed tree was also treated as a constraint on the supertree analysis to ensure that strongly corroborated placements could not be overruled by the source trees (e.g., holostean monophyly, which is well-supported by modern molecular and morphological analyses, but not recovered by older studies). We purposely left the seed tree poorly resolved to allow source trees to dictate relationships where there is genuine uncertainty. A second constraint was applied to ensure that the relationships between major living teleost clades matched those arising from recent molecular phylogenetic studies (54). To implement both constraints, the nodes (expressed as characters in the MRP matrix) defining the relationships of the taxonomic and molecular trees were upweighted to 1,000 (the maximum) in our MRP data matrix. Safe taxonomic reduction (55) was performed upon the MRP matrix using Claddis v0.1 (56) before phylogenetic analysis in TNT v1.1 (57). Twenty replicates of new technology searches were performed, saving 1,000 trees each time, with each replicate starting from a random tree. Ten-thousand MPTs were then obtained from these saved replicates, followed by a final search for remaining MPTs with tree bisection and reconnection, delivering a total of 10,500 MPTs. Taxa removed by safe taxonomic reduction were reinserted into every MPT, either into their sole possible position or, if multiple positions were equally likely, one was chosen at random. One-hundred trees were then selected at random from this pool for downstream comparative analyses, with any remaining polytomies randomly resolved using the “multi2di” function in APE (58). Timescaling topologies. Living species were pruned from our 100 supertrees before timescaling using the timePaleoPhy function of the paleotree package in R (59). As illustrated in Fig. 2B, we adopted two end-member timescaling procedures: (i) a paleontological timescale to reflect divergence times based solely upon fossils, and (ii) a molecular timescale to reflect some of the oldest neopterygian divergence estimates in recent clock studies. The tip age of every species was randomized (with a uniform distribution) between its oldest potential age (i.e., the oldest lower boundary age of all of the deposits where the species is found) and its oldest reliable minimum age (i.e., the oldest upper boundary age of all of the deposits where the species is found). This randomization procedure was carried out for each tree individually. We used the node-dating procedure of Hedman (60) to provide an estimate for the neopterygian crown node (i.e., the root of the supertree) as the first step in timescaling topologies under our paleontological approach. This approach delivered a mean estimate of 280 Ma for the neopterygian crown, which we set as the root age. For our molecular timescaling procedure, we constrained the age of three nodes based upon the clock estimates of Near et al. (54) (crown Neopterygii: 361.2 Ma; crown Holostei: 271.9 Ma; crown Teleostei: 307.1 Ma). For both paleontolgical and molecular timescaling, we used the “equal” method implemented in timePaleoPhy. Quantifying Phenotypic Rates. Size rates were quantified using the Bayesian approach of Eastman et al. (61) implemented over 1,000,000 generations, discarding the first 250,000 generations as burn-in. Randomization tests (implemented via the “compare.rates” function of the auteur package in R v2.15.3) provided a two-way P value (α = 0.05) to test for differences between neopterygian partitions. The Adams (62) method permits estimation of evolutionary rate on multivariate data, and was applied to our shape dataset. Simulation of each supertree topology under a null model of equal rates was used to generate a null distribution of rate ratios for each of our five comparisons. The observed rate ratio for a given comparison can then be compared with the simulated distribution of rate ratios to derive a two-way P value to test for differences between two sets of taxa (α = 0.05). Quantifying Phenotypic Innovation. Blomberg’s K quantifies whether closely related taxa in a clade of interest are either more (K > 1) or less similar (K < 1) with respect to a trait value than expected under a Brownian motion model of evolution (K = 1). Therefore, a more innovative clade (i.e., one that is efficient at exploring new regions of trait space) should have a larger K value than a less innovative clade. This is because the lineages of an innovative clade should spread apart from one another, occupying different regions of trait space so that more closely related taxa appear more similar in trait value than more distantly related taxa. A less-innovative clade should express lower K values, as multiple lineages overlap and re-explore similar phenotypes, eroding phylogenetic signal. Variation in rate of phenotypic change between focal groups can, however, distort this simple relationship. For example, in a clade that shows high rates of phenotypic change relative to its boundaries in phenotypic space, K can be degraded (33, 34). K values are interpreted to reflect innovation in the main text with appropriate caveats given potential differences in rate, with all comparisons further contextualized in Table S1.

SL vs. CS Comparisons CS offers a measure of size independent of shape (63), whereas SL does not. Therefore, we may expect to observe some differences in the analytical results derived from these two types of dataset. When the SL dataset is pruned to match the CS dataset for taxon sampling, it yields near identical results regarding evolutionary rate regardless of timescale choice (Fig. S3). The same is also true of innovation, where K distributions in taxonomic comparisons are very similar for both the pruned SL and CS datasets (Fig. S4). The larger SL dataset produces highly similar rate and innovation results compared with the CS and pruned SL dataset (Figs. S3 and S4). The only minor differences observed occur exclusively in rate, where the larger SL dataset recovers significantly higher rates in crown and total group teleosts in a higher proportion of trees than the CS and pruned SL datasets, regardless of timescale choice (Fig. S3). However, these small differences have almost no influence on our interpretations, because there is only a single instance where the increased frequency of high rate results changes the overall conclusion for a comparison: the crown teleost vs. stem teleost comparison on molecular timescales. Here, the larger SL dataset delivers a majority of trees where crown teleosts possess significantly higher rates than stem teleosts, whereas the CS and pruned SL datasets find this result in a large minority (Fig. S3). Overall, these results suggest that choice of size metric is relatively unimportant for our dataset, and that the overall size and taxonomic samplings of the dataset are more likely to influence subsequent results, despite those factors having a relatively small influence here. Nevertheless, choice of metric may be important for other datasets (e.g., different groups of organisms or datasets of other biological/nonbiological structures), because it is possible to envisage scenarios where the choice of size metric would matter. For example, it is possible that two fishes, identical in SL, could differ greatly in CS if their shapes differed greatly, such as between an extremely deep-bodied taxon (e.g., Opah) and a highly shallow-bodied taxon (e.g., needlefish). This discrepancy could present downstream effects on analyses if a hypothetical “clade A” contained deep-bodied taxa that regularly evolved into shallow-bodied taxa of similar SL (and vice versa), whereas “clade B” consisted solely of similar sized deep-bodied taxa. As a result, these two clades may show similar rates of evolution using SL but different rates using CS (because clade A would be expected to possess much higher rates of CS change). In our own data, the fact that SL and CS datasets deliver near identical results suggests such transitions are rare enough not to create such issues. This theory matches what we expect given the topology of our tree, where organisms that contrast the most in shape, such as highly deep-bodied (e.g., Pycnodontiformes) and shallow-bodied taxa (e.g., Aspidorhychiformes), form distinct, distantly related clades, meaning rates will rarely be calculated upon transitions between these distinct shape clusters.

The Importance of Timescale and Topology Examination of Fig. 3 and Figs. S4 and S5 clearly demonstrates the ability for evolutionary timescale and topology to alter specific rate and innovation analyses. For example, timescale can dictate whether crown teleosts show higher size rates than stem teleosts in a minority or a majority of trees (Fig. 3C), whereas topology can dictate whether holosteans or teleosts are more likely to possess higher rates of shape evolution (Fig. 3A). Furthermore, the observation that molecular and paleontological timescales can deliver differing results has considerable implications for paleontology because the vast majority of analyses in the field are, often by necessity, conducted solely upon paleontological timescales. Therefore, where possible (when crown nodes are present in paleontological trees to which various molecular estimates could be applied) it should prove enlightening to incorporate the full range of known timescale uncertainty, especially in cases where large mismatches still persist between clock and rock estimates, such as for arthropods (e.g., ref. 64) or land plants (e.g., ref. 65). Similarly, paleontological analyses that do not contain crown nodes may benefit from the introduction of additional timescale uncertainty, to discover the temporal limits of their findings. Our analyses also provide insight into the relative importance of timescale versus topology for understanding Mesozoic neopterygian evolution. Although neopterygian timescales have estimated the origin of crown teleosts to range anywhere from the Carboniferous to the Lower Jurassic (54, 66), we demonstrate that even the most dramatic timescale differences under our current methodology have little bearing upon our main findings. Instead, the pool of sampled topologies typically set the range of outcomes for a given analysis (Fig. 3 and Fig. S5), while changes to the timescale subtly adjust the proportions of each potential outcome, rather than bringing about conclusive outcomes in favor of one or another hypothesis (Fig. 3 and Fig. S5). These observations suggest refinement of the fossil neopterygian phylogentic framework could be more important to Mesozoic-specific questions than additional revisions to the neontological neopterygian timescale. However, this does not undermine the importance of timescale (indeed, changes to the topology themselves alter the timescale by altering branch durations), but simply highlights that reductions in uncertainty will be primarily delivered by improvements to paleontological topology and timescale, obtainable via more systematic studies of fishes and the creation of paleontological databases that will permit use of probabilistic timescaling approaches (67).

Estimating the Position of the Genome Duplication upon the Teleost Stem Hurley et al. (10) performed a molecular clock analysis on the basis of four paralogous genes. This approach allowed them to date not only the major divergences between taxa, but the divergence between the paralogs themselves, providing an estimate for the timing of the genome duplication event. Hurley et al. (10) performed clock analyses upon both a halecostome and a holostean topology, and although the holostean topology is preferable in the light of recent analyses (e.g., refs. 1 and 54); see also figure 2 in ref. 10), both topologies still provide an estimate for the origin of the teleost stem lineage. Different paralogue groups (a and b) also provide two independent estimates for the age of the teleost crown (an estimate from “a” paralogues and a separate estimate derived only from “b” paralogues); we averaged these to provide a single crown estimate. Finally, a clock estimate between the pairs of paralogues (a vs b) delivered an absolute age estimate for the genome duplication event. Altogether, these analyses delivered both the total duration of the teleost stem, and an absolute estimate for the genome duplication event. For three analyses (tables 3–5 in the supplement of ref. 10) it was therefore possible to calculate the relative timing of the genome duplication on the teleost stem, providing estimates of 62%, 54%, and 64%, respectively, yielding an average estimate of 60% (Table S2). Therefore, current estimates suggest that the duplication occurred just over half way up the teleost stem lineage.

Acknowledgments We thank L. Sallan, R. Benson, R. Close, and L. Soul for useful discussion; and the collections managers, curators, and research scientists at numerous institutions for their assistance and access to fossil specimens. This work was supported by a Palaeontological Association Whittington Award and a Natural Environment Research Council Cohort grant (to J.T.C.); Australian Research Council Grant DE140101879 (to G.T.L.); Philip Leverhulme Prize PLP 2012-130 (to M.F.); Natural Environment Research Council Award NE ∕ I005536 ∕1 (to M.F.); and the John Fell Fund (M.F.).

Footnotes Author contributions: J.T.C. and M.F. designed research; J.T.C. performed research; G.T.L. contributed new reagents/analytic tools; J.T.C. and G.T.L. analyzed data; and J.T.C., G.T.L., and M.F. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1607237113/-/DCSupplemental.