Bathymetry layers and tropical limit across time

Plate tectonics drive the evolution of oceanic basins and mountain chains through the motions of continental plates across the surface of the Earth6. Plate kinematic models provide insights on whether and when plates are located in favourable latitudes for the establishment of tropical marine biodiversity associated with shallow water reef systems. We used two different absolute plate motion models extending back to 140 Ma, constructed using relative plate motions derived from current marine magnetic anomalies and oceanic fracture zones33. Different choices of reference frames determine the absolute positions of plates back in time for each model. The first model (Model 1), is based on a combined moving Indian/Atlantic Hotspot reference frame34 back until 100 Ma, and a fixed hotspot reference frame35 back to 140 Ma. For the Pacific Plate, we used the absolute reference frame from Wessel et al.36 The second model (Model 2) uses a plate kinematic framework published by Seton et al.37 This second model uses a hybrid reference frame, which merges a moving Indian/Atlantic hotspot reference frame back to 100 Ma (ref. 34) with a paleomagnetically derived true polar wander corrected reference frame38 back to 140 Ma.

For subducted ocean crust, the marine magnetic anomalies of in situ oceanic crustal remnants are utilized in combination with geological information and paleomagnetic data from accreted terranes and microcontinents along with the rules of plate tectonics to generate sets of synthetic paleo-age grids for oceanic lithosphere for the past 140 Myr. Due to the age-depth relationship related to the cooling of oceanic crust, one can compute the paleobathymetry of oceanic crust, assuming either no sedimentation or relatively simple sedimentation patterns based on present-day observations6. We used the GDH-1 cooling model39 to compute the paleobathymetry grids for each time step. Estimates for continental topography, including continental margins, however, do not follow such relatively simple rules due to the effects of tectonic deformation, sediment loading and erosion. We used a set of newly digitized paleo-shorelines9 from the compilation of Smith et al.40 to generate simple continental shelf topography fringing continental plates at each reconstruction time step. For both models, the depth at the continent–ocean boundary is determined by age of the adjacent oceanic crust. We combine both data sets to generate synthetic paleobathymetry models with a 1° spatial resolution for the past 140 Myr in 1 million year time steps. Shelfal area are assigned a constant bathymetry of −200 m in the grids, but this represents a coarse depth approximation of the entire shelfal area which in reality displays a much more complex bathymetry. The bathymetric reconstructions only partially considers volcanic activity leading to shelf emergence and we did not model the biodiversity of the Tropical Eastern Pacific that contains many islands of volcanic origin.

Since the study focuses on the history of tropical shallow reef fauna and because climate strongly fluctuated in the last 140 millions of years41, we also reconstructed the paleo-latitudes of tropical ocean limits from reef-forming coral fossil records. Reef-forming coral species can only develop at warm water temperature >25 °C (ref. 42) and thus the paleo-distribution of this group constitutes a good indicator of the past latitudinal tropical borders. We collected fossil occurrences of scleractinian corals from paleoDB ( www.paleodb.org) corresponding to 31,392 occurrences43 from −140 Ma to the present (Supplementary Fig. 17). From the reconstructed paleo-latitude for each million year time slice, we computed the 95th percentile of the paleo-latitude at which corals were living to inform on the latitudinal border of tropical oceans across time. By combining reconstructed shelfal areas with tropical limits, we generated 1 map per million years of tropical shallow marine habitats for the last 140 Myr (Supplementary Fig. 1).

Mapping current and past biodiversity patterns

Current biodiversity patterns were extracted from a global-scale distribution database3,44 for tropical reef fishes and from the IUCN (http://www.iucnredlist.org/technical-documents/spatial-data) for Scleractinian corals. By examining almost 500 references and extracting information from published works, regional checklists, monographs on specific families or genera, and reports, we obtained information on presence/absence for 6,316 reef fishes in grid cells of 5° × 5°, corresponding to ∼555 × 555 km at the Equator. To compare model simulations and observed biodiversity gradients, we focused on one coral clade and one fish clade based on two criteria: the clade should show a global distribution and have a stem age within the time frame of the simulations. For the fishes, we selected the Labridae family (n=559 species) with a median age of 76±10 Myr obtained from the dated phylogeny. For the corals, we considered the Acroporidae family (n=274 species) also with a global distribution and with the earliest fossil in the data set dating to 131±5 Ma.

To compare simulated past biodiversity patterns to empirical data, we used fossil occurrences of corals (Supplementary Fig. 17) obtained from PaleoDB (www.paleodb.org). We mapped ancient coral species richness for three epochs with sufficient fossil occurrences, the Miocene (20 Ma), Eocene (40 Ma) and late Cretaceous (70 Ma). We retained only occurrences of fossils that were identified at the species level (that is, 25,077 species occurrences). Occurrences were converted into diversity maps by computing the sum of species present in the surrounding 40° × 40° window of each 5° × 5° cell when the fossil date range overlapped with the date of interest. Diversity maps were rescaled between 0 and 1.

Landscape-based model of biodiversity dynamics

The use of correlative approaches in macroecological studies does not provide a mechanistic understanding of the way historical processes have shaped the distribution of alpha- and beta-diversity. To disentangle causalities for emergent patterns, the application of dynamic simulation models provide more understanding on the processes than statistical approaches8,19. In addition, phylogeography and diversification models are generally calibrated without information on the distribution of paleo-habitats and biogeographic history45,46. Ideally, diversification analyses should consider the historical sequence of habitat distribution, which implies to explicitly integrate landscape dynamics in the model. Accounting for shifting habitat through time within a dynamic model of speciation and dispersal was proposed by Gotelli et al.8 under the term general simulation model. General simulation model should track the presence (1) or absence (0) of a given species in a particular grid cell, the phylogenetic history of lineages, and historical biodiversity information which can be compared with palaeontological or phylogenetic evidences8. Nevertheless, this idea was so far never applied to real data, possibly because of the lack of habitat reconstruction for deep time periods47.

Here, we develop a diversification model constrained by shallow tropical reef maps. The simulation starts with a unique ancestral species and new species originate either within (that is, sympatric) or adjacent to (that is, parapatric) the geographical range of its ancestor. In early Cretaceous times (∼140 Ma), a continuous passive continental margin along the northern coast of Gondwana provided less opportunity for geographical separations with limited faunal beta-diversity48. We selected the simulation start that should be the most representative of all the possible lineages arising from the breakup of Gondwana. For younger clades, for example, those arising in the Miocene, it would be more appropriate to start in a subregion of an earlier time step and assess the sensitivity to the starting point. The model starts 140 Ma with one species occupying the entire available suitable cells along the coast of Gondwana which represented a well-connected habitat. Yet, other distant regions not connected to Gondwana likely harboured allopatric lineages that also contributed to extant biodiversity especially in the IAA (Supplementary Fig. 18).

The model records at any single point in time within a data frame the distribution of each species in each cell, together with its genealogy. During each million year time slice the simulation performs two phases:

The first phase is the speciation. We considered two modes of speciation, parapatric and sympatric implemented within two alternative models as well as a model combining both modes of speciation (the combined model hereafter). We refer to parapatric speciation because it is mainly dispersal limitation and not geographic isolation which drives divergence and speciation. Parapatric speciation arises when a species range is split into two or more distinct areas separated by a sea distance larger than d s (Supplementary Figs 1 and 2). This mode of speciation was modelled as a cluster splits using cluster optimization algorithm based on the ‘dbscan’ function in the fbc R library49. If the range of a species i is fragmented into n patches separated by a given distance threshold of d s , the species will separate into n species with smaller ranges (Supplementary Fig. 2). The model assumes that discontinuous range after 1 million years of separation generates full speciation with no possibility of hybridization or introgression. The main process generating species with the parapatric model is the successive phases of habitat patch disconnection and reconnection (Supplementary Fig. 2). For the sympatric speciation mode, sympatric speciation arises stochastically in a cell occupied by a species i with a probability P s (Supplementary Fig. 3). Consistent with the speciation-area- hypothesis12, species occupying a large area, A, will show higher sympatric speciation frequency because S=P s × A (Supplementary Fig. 19).

The second phase is the dispersal. The species in the time step t are allowed to disperse to all habitat cells at the time step t+1 that are distant by a threshold value lower than d from a Weibull dispersal kernel. This threshold is randomly sampled for each species at each time step. If a given species does not find a habitat cell within the dispersal distance, it will go extinct only if all the habitat cells inhabited by the species at time t disappear at time t+1.

The model can account for the parapatric mode (two parameters), the sympatric mode (two parameters) or both (three parameters) and can make inferences about assemblage properties (alpha-, beta-diversity), diversification rates (that is, speciation and extinction rates) and phylogenetic branching (that is, phylogeny). We run the simulation for a range of dispersal (d∈{1°:40°}) and speciation (P s ∈{1e−5:1e−4}, d s ∈{1°:40°}) rates. Values beyond those ranges were unrealistic as they predict no species (full extinction), too many species (for example, >20,000, especially when d>d s ) or predict all species everywhere. The model was run until 1 Ma (and not to the current period) since we were interested in highlighting the geological contingencies on tropical reef assemblages. The effects of glaciations during the Quaternary (2.6 Ma to Present) have been studied elsewhere3. Therefore, we did not consider Quaternary climatic fluctuations in the current study. Parameter combinations (d, d s , P s ) were run 20 times, the stochastic component in the dispersal kernel of the parapatric model and a larger stochastic component linked to cell speciation probability in the sympatric model. We ran a total of 16,000 simulations.

We highlighted the best model (BM) among all simulations across a set of empirical validations including (1) current alpha and beta-diversity of corals and fishes globally and for the Central Indo-Pacific region, (2) past species richness 70, 40 and 20 Ma based on fossil coral data (Fig. 2; Supplementary Figs 4 and 20). For fossil coral data, we focused on those three time periods because they were the most differentiated and contained the most available fossil information. We compared observed and predicted current beta diversity patterns using the pairwise Jaccard’s dissimilarity index. We compared observed (for corals and fishes) and predicted species richness using a simple regression model from which we extracted the coefficient of determination (R2). Because models differed in the number of estimated parameters, we also computed the Bayesian information criterion (BIC) of each simulation as follow:

Where RSS represent the difference between scaled observed and predicted cell values, n the number of samples and k the number of model parameters. For each of the comparisons (fossil diversity gradients, current alpha and beta diversities), we summed the BIC to rank the simulations (Supplementary Fig. 21). We obtained general models of marine faunal dynamics explaining shared properties in the historical biodiversity dynamics of both corals and fishes.

Nestedness and species turnover patterns

The IAA also called the coral triangle (that is, the Indonesian archipelago, the Malaysian peninsula, the Philippines, New Guinea and northern Australia) is widely recognized as the global centre of modern marine biodiversity50. At the scale of the Central Indo-Pacific region, the number of reef fishes and corals was found to decrease with increasing distance from the coral triangle, with the tendency that distant assemblages being nested within those of the coral triangle19,21,51. We therefore evaluated if the parapatric and sympatric models faithfully reproduced this biogeographical pattern. To do so, we used the NODF index based on paired overlap and decreasing fill52 to measure the extent to which species present in species-poor cells constitute proper subsets of those ones present at species-rich cells. More specifically, we calculated the NODF index among sites (hereafter NODF sites ) that ranges from 0 (cells are not nested) to 100 (cells are perfectly nested). To determine the probability that nestedness could be obtained by chance, we compared the observed NODF sites value to those obtained by a null model where species-presence matrices are randomly drawn (n=9,999) under the sequential swap algorithm53. Last, we extracted the rank order of sites in the maximally packed matrix (that is, maximally ordered) for simulated occurrence data under the parapatric and sympatric models and observed fish occurrence data. We assessed the level of congruency between observed and predicted rank orders using a Spearman’s rank correlation test.

Because differences in species composition between places may result from both nestedness and species turnover (that is, the replacement of some species by others between assemblages), we also quantified the levels of species turnover across the Central Indo-Pacific region using a pairwise turnover metric (β jtu ) that is independent of species richness differences54. We quantified the level of congruency between observed and predicted distance matrices using a Mantel test (999 permutations). To visualize observed and predicted patterns of species turnover, we plotted the corresponding distance matrices along reduced axes using the non-metric multidimensional scaling neighbour-joining algorithm. The ordination results were then plotted and mapped by assigning a colour to each grid cell according to its position in the ordination space.

Phylogenetic reconstruction from molecular sequences

We reconstructed the phylogenetic relationships of species occurring in tropical reefs of the Labridae family together with outgroups according to previous phylogenies55. Molecular sequences from published sources were downloaded on GenBank focusing on the most frequent loci (Supplementary Data 1). Those loci were then concatenated to perform a supermatrix approach for phylogenetic reconstruction with sufficient taxon overlap for each loci. We used four mitochondrial gene regions (non-coding: 12S and 16S, coding: Cytb and CO1) and three nuclear genes (RAG2, S7 II and tmo4c4), getting data for 55% of known species. Each marker data set was independently aligned using the algorithm MUSCLE with standard settings and alignments were then manually edited using MEGA 6 (ref. 56). We tested substitution models using a maximum likelihood approach with a random starting tree for each gene alignment and we selected the best models using the BIC criterion on Jmodeltest 2.1.6 (refs 57, 58; Supplementary Table 1). Alignments were afterwards concatenated in a supermatrix using SequenceMatrix59. We then analysed the supermatrix using a Maximum Likelihood approach via the algorithm RAxML (ref. 60) implemented in RAxMLGUI (ref. 61). We performed 10 parallel runs to fully explore tree space and to avoid being retained on a local optimum. We then selected the best tree using likelihood scores. To estimate node ages, we used the penalized likelihood method on the best likelihood tree. We then used the ultrametric tree as a starting tree for Bayesian inference of tree topologies and node ages using Markov chain Monte Carlo (MCMC) in BEAST v1.8.1 (ref. 62). We ran six independent MCMC, 40 × 106 steps long under individual gene models previously selected for each reconstruction. We used the same fossil and biogeography based calibrations of Cowman and Bellwood63 as we had no update in fossil calibrations since then, and there is no fossil data available to our knowledge for the additional families (Supplementary Table 2). We checked for the convergence and stationarity of each independent run using Tracer v1.6 (ref. 64). We combined the six independent runs for each data set (after removing an appropriate burn-in) using LogCombiner v1.8.1 (ref. 62) to reach an effective sample size above 200 for all our estimates and we extracted the maximum clade credibility tree for combined tree sets using TreeAnnotator v1.8.1 (ref. 62). The resulting time-calibrated phylogeny (Supplementary Fig. 14) is consistent with previously published phylogenies for the Labridae, with a few differences. Overall, the node age estimates appear to be 5–10 million years older than previously estimated40 (Supplementary Fig. 22). In addition, few nodes showed low posterior probabilities (Supplementary Fig. 23).

Biogeographic reconstructions using the phylogeny

Due to the lack of knowledge of the sister family of the Labridae, we confined our biogeographic analyses to within the Labridae. Based on a global-scale distribution database3,44, we classified the distribution of the species belonging to the Labridae into three main areas, namely Atlantic, Western Indian Ocean and Central Indo-Pacific. We performed ancestral biogeographic estimation on the reconstructed empirical phylogeny using the R package BioGeoBEARS65. BioGeoBEARS is designed to perform inference of biogeographic history on time-calibrated phylogenies using models of dispersal, extinction and cladogenesis (DEC)66 with the addition of several other biogeographic parameters that can be combined to form other biogeographic models (for example, DIVA, BayAREA) in a likelihood framework. In addition, it allows the inclusion of founder-event speciation scenarios through the inclusion of the ‘Jump’ (+J) parameter. Alternate models can be compared using Akaike information criterion (AIC). We estimated the biogeographic range evolution under a DEC model which allowed many alternative biogeographic scenarios. In addition we compared the inclusion of the ‘+J’ parameter for the empirical data using AIC. Species ranges were categorized into three separate biogeographic regions – Atlantic, West Indian Ocean and Central Indo-Pacific. To reflect the tectonic history of marine habitat we allowed a fourth area to only exist in the past – the Tethys. Lineages were only permitted to occupy, or include the Tethys region as part of its range before 12 Ma, a time that signifies the final closure of the Tethys seaway67. To reflect the fossil and paleogeographic record, we constrain the deeper nodes in the tree (before 70 Ma) to include the Tethys region in the node range estimation (but not exclude other adjacent area combinations). Dispersal was only allowed between adjacent areas and a dispersal matrix was constructed to reflect the formation of land bridges at different time period (closure of the Tethys Seaway – 12 Ma; Isthmus of Panama – 3.1 Ma). Recent geologic evidence points to earlier closures of the Isthmus of Panama68 and its extended evolutionary influence on marine and terrestrial biotas69. Here, we model our dispersal matrix on the final closure of the Isthmus, but earlier vicariance events are reported in the Labridae70. Under the DEC+J model, we identified the age of colonization of the central Indo-Pacific among the different lineages of Labridae.

Diversification rates

To compare to the simulated diversification rate to empirical patterns, we estimated extinction and speciation rates shaping diversification for the coral family Acroporidae using fossils occurrences following the method (PyRate) proposed by Silvestro et al.71 This method includes the probability of preservation and sampling of fossils to estimate the lifespan of each lineage. We first removed all undetermined taxa from the data set. Due to the lack of fossil occurrences before 60 Ma, we only estimated diversification rates during the last 60 Myr. We first ran 10,000,000 BDMCMC generations both assuming constant and Gamma distributed preservation rate to test the heterogeneity of preservation rate. We then generated 10 randomization of the age of fossils occurrences and performed 10 independent BDMCMC analyses to estimate the diversification rates through time. We examined the posterior samples using Tracer v1.6 (ref. 64) and combined it after removing an appropriate burn-in to get a good estimation of speciation and extinction rates.

Because missing extinct lineages in the most ancient section of the phylogeny of Labridae preclude any diversification estimate in the Cretaceous from phylogenies, we computed the rate of diversification from a phylogeny containing 18 tropical reef fish families (that is, Labridae including Scaridae, Pomacentridae, Chaetodontidae, Acanthuridae, Haemulidae, Balistidae, Carangidae, Serranidae, Lutjanidae, Sparidae, Caesionidae, Holocentridae, Mullidae, Muraenidae, Tetraodontidae, Lethrinidae and Siganidae). We obtained this phylogeny by pruning a time-calibrated phylogeny for 7,822 extant fish species55. These families were selected as the most representative reef fish families, that is, they are abundant and speciose on tropical reefs72. These families are well sampled in the phylogeny of Rabosky et al.55 with more than 80% of known genus. As missing species may influence the estimation of diversification rates, we grafted those species on the pruned phylogenetic tree (1,291 missing species out of 2,224) based on published phylogenies for these families, supplemented by taxonomic information from fish identification guides and FishBase (www.fishbase.org). Specifically, new tips representing unsampled species were added to direct sister species when information allowed or to the base of the clade representing its genus. Using this empirical phylogeny, we estimated diversification rates based on an evolutionary model under a birth-death-shift process46. The birth-death-shift model uses a likelihood approach to estimate speciation and extinction rates like in a simple birth-death process but it also allows changes of those parameters through time if we force diversification shifts to happen at given times. For a given number of shifts, we simultaneously estimated time at which shifts occur, and speciation and extinction rates within each time interval between diversification shifts. We calculated those parameters for several numbers of shifts and we compared the resulting models using the AIC. The method proposed by Stadler46 is robust to phylogenetic uncertainty due to polytomies. Finally, we extracted the simulated values of diversification rate expected under the parapatric and sympatric models by computing for each time step the speciation rate minus the extinction rate and we compared the observed diversification rates with those obtained from the observed phylogeny for fishes and fossils for corals (Supplementary Fig. 8).

Data availability

The R script to run the species diversification model together with the bathymetry reconstructions are available online on FigShare (DOI: 10.6084/m9.figshare.3081226). Species diversity data and fossils are made available through the IUCN http://www.iucnredlist.org/ and http://fossilworks.org/.