Taxon and character sampling: Our phylogenetic hypothesis is informed by nuclear and plastid DNA sequences of c. 60 % of all Erica species, represented by 606 accessions of 488 species and 28 sub-specific taxa from across the geographic range of the clade (17 of 19 Palearctic species [89 %], 414 of 690 CFR [60 %]; 13 of 23 Tropical Africa [57 %]; 27 of 51 Drakensberg [53 %]; and 17 of c. 41 Madagascar/Mascarenes [42 %]), plus six outgroups (Additional file 1: Table S1). Specimens were collected in the field and determined by EGHO. Vouchers were lodged in herbaria (Additional file 1: Table S1), and leaf samples dried in silica gel and archived at -20 °C to preserve the DNA. Most sequences were obtained newly for this study, with some from previous work [10–12]. We obtained DNA sequences mostly using a direct PCR amplification protocol [13] with universal angiosperm primers [14, 15] as described in [12]. We employed a targeted supermatrix sampling strategy [16]: ITS and chloroplast trnT-trnL and trnL-trnF-ndhJ spacer sequences were obtained for all samples, and other plastid markers (trnL intron, atpI-atpH spacer, trnK-matK intron and matK gene, psbM-trnH spacer, rbcL gene, rpl16 intron, trnL-rpl32 spacer) were added for taxa selected, on the basis of preliminary analyses, as representative of early diverging lineages within each of the major subclades, in order to improve resolution of deeper nodes in the plastid tree. Sequences in general, and particularly ITS, were inspected to confirm the absence of polymorphism and (other) evidence of paralogy (e.g. indels in coding regions). An accessions table including Genbank accessions numbers is presented in Additional file 1: Table S1.

Phylogenetic inference: Individual matrices including all sequences for each marker were aligned in Mesquite [17] and imported into SequenceMatrix [18] to export concatenated matrices (excluding taxa causing topological conflict between gene trees; see below) for further analyses. A matrix of 63 phylogenetically representative taxa for which a minimum of 14 of the 20 data partitions were available was analysed using PartitionFinder [19] to infer best fitting data partitioning strategies and substitution models (heuristic search strategy ‘greedy’; comparison of fit using the Bayesian information criterion). Individual markers, coding and non-coding regions within those markers, and codon positions within protein coding genes were all specified as potential data partitions. Maximum likelihood (ML) analyses were performed using RAxML on CIPRES [20, 21] incorporating the data partitions inferred using PartitionFinder. Clade support was estimated using bootstrapping halted automatically by RAxML following the majority-rule ‘autoMRE’ criterion. To test for experimental error, confirm congruence of individual plastid markers, and to infer and compare gene trees we performed preliminary phylogenetic analyses of individual markers separately. These were followed by final analyses of ITS, combined plastid data and combined ITS and plastid data. Fifteen taxa causing topological conflict subject to ≥70 % bootstrap support (BS) between ITS and combined plastid gene trees were excluded from analyses of the concatenated data (leaving 597) under the assumption that such conflicts reflect (apparently uncommon) incidences of reticulation or incomplete lineage sorting that violate the assumption of a bifurcating tree [22]. Further phylogenetic analyses were performed using BEAST 1.8 [23] (as below).

Molecular dating: Two dating methods were employed on the Ericeae matrix: BEAST [23], using the 63 taxa matrix but excluding the most distant outgroup, Empetrum; and RELTIME [24], using the 597 taxa ML tree from the RAxML concatenated data analysis, removing Empetrum and Corema album. We used the 63 taxa matrix with BEAST because of the failure of multiple runs to converge with the full supermatrix, a not unexpected phenomenon in the presence of large proportions of missing data [16]. The targeted sampling strategy meant that the same internal focal nodes are represented in both trees. For BEAST, the root age (most recent common ancestor of Erica and Daboecia) was constrained based on the results of [25] in Ericaceae-wide analyses employing multiple fossil calibrations (producing results consistent with those presented in [26]). We used a normal distribution with mean 62 Mya and SD 10, giving a 95 % prior probability distribution of 42-82 Mya reflecting uncertainty in the original analyses [25]. In a further analysis an additional prior was implemented to reflect the age of Ericaceae pollen in sediments offshore of Southern Africa [27] and thereby test the impact on age estimates assuming that this pollen record represents Erica. For this, we used an exponential distribution with offset of 10 Mya (a hard minimum) and mean of 2.0, giving a 95 % prior probability distribution of 10-16 Mya (i.e. a soft maximum) for the stem node of Cape Erica. This is to assume that the Cape Erica clade is at least as old as the age of the pollen record and may be older to a limited degree. Following preliminary partitioned analyses that failed to converge, the data were not partitioned; we applied a GTR + G substitution model, lognormal relaxed clock, Yule process speciation model, and otherwise default priors, and performed two runs of 10 million generations sampling every 1000 in each case. Convergence was assessed using TRACER 1.6 [28] and Are We There Yet [29], and the results summarised using programs of the BEAST package. For RELTIME we assumed local clocks and imposed age constraints by means of a point estimate for the root node (the minimum, mean and maximum ages as above).

Diversification rates analyses: To infer the net diversification rate of the Erica Cape clade and compare it to those of other Cape and rapidly radiating clades, we used the method of Magallón & Sanderson ([30], as implemented in GEIGER; [31]). For Cape Erica, we used species richness and full range of crown node ages (minimum and maximum under RELTIME and highest posterior density intervals under BEAST) as inferred here. For comparison, we performed the same calculations based on data from the literature for the recent rapid radiations of lupins [2], Inga [3], carnations [4], and ice plants [5]; as well as the Cape clades Muraltia [32], Pentameris [33] Protea [34] and Restionoideae (“African Restionaceae”) [32]. The latter are examples for which detailed time-calibrated phylogenies of ancestrally CFR species – not those that also diversified in other areas – are available. We did not account for the impact on crown node age estimates of unsampled species during the calculation, and used relative extinction rates of 0.9 and zero across the board.

To test whether diversification rate heterogeneity reflects different speciation and extinction rates between geographic areas, we used MuSSE (Multiple State Speciation and Extinction) as implemented in diversitree 0.93 [35]. MuSSE uses maximum likelihood to estimate the values of different parameters under a constant birth death model: speciation (λ) and extinction (μ) rates under each of the discrete states of the character (in this case, geographic distribution), and rates of transition (q) from one state (area) to another. We compared the rates between Palearctic, Tropical African, Madagascan, Drakensberg and Cape species of Erica, Calluna and Daboecia. The areas are indicated in Fig. 1 and were so defined because they are often compared in the literature, are largely geographically isolated and <1 % of Erica species are widespread between any two of them (these limited to two species in both the Cape and Drakensberg – E. caffra and E. cerinthoides – and one in the Palearctic and Tropical Africa – E. arborea). We used the discrete multistate model, instead of GeoSSE, that models widespread geographic distributions, to represent multiple areas (rather than just two in GeoSSE) under the assumption that widespread distributions were rare throughout the evolutionary history of the group. We coded the three samples of widespread species according to the region in which they were collected under the assumption that effectively failing to sample such species across their wider distribution would have little impact on the results. We used the rate-smoothed 597-taxa RAxML tree, having removed multiple accessions of species and outgroups (leaving 487 terminals), and corrected for incomplete sampling by assigning region-specific sampling fractions. We did not consider phylogenetic uncertainty, as the major clades are well supported and largely restricted to single regions and thus the uncertainty regarding our question remains low. We compared maximum likelihood estimates given models considering different regions (either 5 distinct regions or combinations of Palearctic, Cape and the rest of Africa or of Europe versus Africa and or Cape versus other regions) and considering single versus multiple rates for speciation and/or extinction. For all but the unconstrained model, we constrained the transition rates to one parameter. Thereafter, for the best model, we tested whether constraining the transition rate reduces the likelihood. We compared the fit of the models to the data using the anova function in diversitree and using the AIC to compare the fit of the models. The parameters for the best fitting model were then calculated using a Bayesian MCMC approach run for 10,000 steps using an exponential probability distribution as prior for the underlying rates in the model. We assessed convergence by comparing the probability values of the sampling after excluding a burnin of 25 %.

Fig. 1 The diversification of Erica in space and time. a Time-calibrated phylogenetic tree of 478 extant lineages that populated the radiation of Erica with branches coloured according to mean net diversification rates (scale indicates species per million years) inferred using BAMM, with regions of samples indicated by the coloured bar at the terminals and clades/species referred to in the text indicated with numbers: 1 = Cape clade; 2 = E. pauciovulata; 3 = E. trimera; 4 = Afrotemperate clade; 5 = E. arborea. b Geographic distribution of Erica, based on collections databased by GBIF, showing Palearctic, Tropical Africa, Madagascar, Drakensberg and Cape regions. c Region specific speciation rates (λ) and the single extinction rate (μ). d-g Examples of the spectacular floral diversity of Cape Erica: d) E. macowanii, e) E. pulvinata, f) E. coarctata, and g) E. jasminiflora Full size image

To further determine whether there is diversification rate heterogeneity in the Erica dataset, we used BAMM 2.5 and Bammtools 2.1 [36, 37]. The method compares the fit of different models (a series of diversification processes) assuming different numbers of shifts based on a reversible jump MCMC to explore parameter space. We used the pruned, rate-smoothed RAxML tree, as above, and corrected for non-random species sampling by assigning regional specific proportions to the few, largely endemic, clades. We used “setBAMMpriors” to adjust the priors according to the scaling of the tree. The initial speciation rate was set to 0.18 and extinction rate to 0.111 according to inferred rates for Ericaceae [25]. Preliminary results showed that different initial speciation and extinction rate did not have a large effect on our results. The MCMC was run for 10,000,000 generations, with every 1000 generation saved. To assess convergence, the likelihood of all sampled generations was plotted in R (burnin = 10 %) and ESS values for the likelihood and the inferred numbers of shifts were calculated using the coda package [38]. It was not possible to compare Bayes Factors for zero rate shifts with those for given numbers of shifts (see BAMM Documentation part 7.6), but we compared the prior probability of a given number of shifts to the posterior probability to confirm that these differed. We then computed the set of credible shifts and reconstructed the mean of the marginal posterior density of speciation, extinction and net diversification rates across the tree. We sought to assess whether the BAMM results are dependent on the particular topology and branch lengths of the phylogenetic tree used above by repeating the analyses with 25 randomly selected, rate smoothed and pruned RAxML bootstrap trees.