DNA extraction, Sanger mitochondrial and nuclear DNA sequence data collection

Genomic DNA was extracted from muscle or liver tissue samples on loan form La Sierra University, Villanova University, the California Academy of Sciences, the Zoologisches Forschungsmuseum Alexander Koenig, and the Chicago Field Museum. Extractions were preformed using a DNeasy tissue kit (Qiagen, Inc.) and sequenced for the mitochondrial and nuclear genes, ND2 (primers from [21]) and RAG-1 (primers from [23]), respectively, using standard PCR and Sanger sequencing protocols. We edited the sequences and aligned them within Geneious Pro 5.0.4 (http://www.geneious.com, [24]) and these new sequence data were combined with existing data from [21] and [23] (Additional file 1: Table S1). In total, the dataset included 17 of the 26 draconine genera, including all but two of the Indian genera (Psammophilus and Coryphophylax). Hyrdosaurus and Physignathus were not included as their phylogenetic affinities are with other agamid lineages outside of the Draconinae [21]. At least three species (or individuals if the genus was monotypic) per genus were sampled, for a total of 44 individuals. ND2 and RAG-1 were selected as they are the most frequently sequenced markers across acrodont lizards and therefore provide maximum taxonomic coverage. We used these markers to preliminarily place new genera in a phylogenetic context, and as a guide tree in our selection of genera for UCE development to resolve problematic relationships. No experimental research was carried out on these animals in this study.

Ultraconserved elements (UCE) data collection

To resolve the problematic areas in the phylogeny from the Sanger data (pink nodes: Fig. 2a), we selected 24 individuals representing 12 genera (underlined taxon names in Fig. 2) from across four species groups (brown nodes: Fig. 2a) for ultaconserved element (UCE) enrichment. Sequence-capture data collection followed a modification of the approach outlined by Faircloth et al. [25]. Briefly, we fragmented genomic DNA with a Covaris S220 ultrasonicator (Covaris, Inc.), and prepared Illumina libraries using KAPA library preparation kits (Kapa Biosystems) and custom sequence tags unique to each sample [26]. Libraries were pooled into groups of 8 taxa and enriched for 5060 UCE loci (5472 probes). We amplified enriched pools with a limited-cycle PCR (18 cycles) and sequenced final libraries on a partial Ilumina HiSeq 2000 lane. Reads were quality filtered using the Illumiprocessor [27] wrapper for Trimmomatric [28], and assembled into contigs using Trinity [29]. Where alternate alleles differing by less than 5 % sequence divergence (or two nucleotide positions, whatever was greater) were present in a sample for any given UCE locus, Trinity retained the allele supported by the largest number of reads. We used PHYLUCE v. 1.4 (Faircloth et al. [25, 30]) to match contigs to UCE loci and generated two alignments in MAFFT [31]: one containing no missing loci across all individuals (complete) and another containing data for at least 75 % of taxa per locus (75 % complete), which returned alignments of 1114 loci and 4536 loci, respectively.

Fig. 2 a Bayesian analysis (in MrBayes) of ND2 and RAG-1 data, with black dots denoting nodes with posterior probabilities above 0.95. Brown nodes indicate four well-supported species groups (1–4; see text for details) and pink nodes identify poorly supported relationships among these species groups. Underlined taxon names are genera selected for UCE enrichment. b Multi-species coalescent (“species tree”) from the species tree estimation using average coalescence times STEAC analysis, using the complete matrix of 1114 UCE loci. Black dots denote nodes with 100 bootstrap support. Brown nodes indicate the four species groups (Group 2 = brown circle; see text for discussion). Blue nodes identify problematic nodes recovered in Likelihood analysis of the Sanger dataset, resolved with sequence-capture data Full size image

Phylogenetic and biogeographic analyses

Sanger data

We first used Bayesian analyses with MrBayes 3.2.2 [32] of the ND2 and RAG-1 datasets independently in the context of the entire Agamidae to ensure that Draconinae was monophyletic. Once monophyly and lack of conflict between loci was established, we concatenated the two gene partitions for subsequent analyses. We used uniform priors in MrBayes 3.2.2 and partitioned the dataset by locus and codon within each locus for just the members Draconinae sub-family. We then assigned the GTR+ Γ substitution model for each partition and used three chains (two hot and one cold), and carried out 100 million generations, sampled every 10,000 generations. Due to the risk of substitution saturation, we performed analyses including and excluding the third codon position for the ND2 alignment. Convergence between chains, likelihood scores, and estimate sample size (ESS) values were evaluated using Tracer 1.6 [33] In order to obtain a reliable root age for divergence-time estimates within Draconinae, we expanded our ND2 and RAG-1 datasets to include data from all acrodont lineages. We analyzed this expanded dataset using eight acrodont fossils (Additional file 2: Table S2) within a Bayesian framework in BEAST 2.3 [34] using the fossilized-birth-death model [35, 36]. The fossilized-birth-death process provides a model for the distribution of speciation times, tree topology, and distribution of lineages sampled before the present, and treats the fossil observations as part of the prior on node time estimates. We used the root age for the Draconinae resulting from this analysis (85 MYA) as a minimum-age calibration for the root of the Draconinae for subsequent time of divergence estimates within the Draconinae clade.

Sequence-capture data

We performed likelihood analyses in RAxML v.8.1.20 [37] on concatenated datasets for the incomplete (4536 loci) and complete (1114 loci) matrices, using the GTR+ Γ substitution model, and ran 100 fast bootstrap replicates. In addition to the concatenated analysis, maximum likelihood gene trees were constructed for each of the UCE loci included in the complete matrix using Phyluce with RAxML v.8.1.20 [37], under default settings. Phyluce and RAxML were also used to generate gene trees for 500 multi-locus bootstraps [38]. Custom R-scripts (R v3.2.0; R Core Team 2015) and the R library Phybase [39] were then used to infer the STEAC [40] summary species tree for the original and bootstrapped data.

Grafted phylogeny and divergence dating

Using 85 MYA as a minimum age limit for the ancestor of the Draconinae, divergence dates for subclades were estimated in BEAST 2.3 using the ND2 and RAG-1 datasets with linked clock and tree models. We applied Birth-Death tree priors and constrained the relationships to match the results from the analyses of the UCE loci (blue nodes: Fig. 2b) and let the relationships within each species group be estimated by the BEAST analyses. We used a relaxed uncorrelated lognormal clock model and an exponential prior for the mean rate of each partition. Default values were used for all other priors, and the analysis was run for 150 million generations sampling every 12,000 generations, with chain stationarity, and ESS values were evaluated in Tracer 1.6. The first 25 % of trees were discarded as burn-in and the maximum clade credibility tree with median node heights was summarized using TreeAnnotator 2.3 [34]. We converted our alignments to fasta format using seqmagick (http://seqmagick.readthedocs.org/en/latest/). Then, with the estimate for divergence between Mantheyus and other draconine species of 85MYA, we estimated the TMRCA of subclades based on pairwise Hamming distances [41] between UCE loci (with a sequence saturation correction of 0.95) calculated through fastphylo [42], assuming a naïve strict clock. We carried out the calculations using a custom R-script [43]. Any loci where subgroup divergence times exceeded those of the calibration time were discarded due to the likelihood of incomplete lineage sorting and/or excessive rate variation. Using the same methods, we then estimated the time to most recent common ancestor (TMRCA) of the Draco + Ptyctolaemus and species group 1–4 clades using the estimated age of the Non-Mantheyus clade. The estimate of the TMRCA of species group 1–4 was then used to age the split between Acanthosaura and Pseudocalates (species group 1), and the ancestor of species groups 2/3/4. The species group 2/3/4 TMRCA estimate was then used to age the split between Salea and Calotes (species group 2 and 3), and the ancestor of species group 4. Finally, the estimate for the TMRCA of species group 4 was used to obtain an estimate of the TMRCA of Certaophora/Lyriocephalus/Cophotis.

Ancestral area reconstructions were performed using likelihood and Bayesian methods in LAGRANGE within the program RASP 3.0 [44], and in RevBayes 10.10 [45] respectively. Taxa were assigned to their biogeographic zone (Fig. 1) based on their modern day distributions and RevBayes reconstructions were visualized using the online resource Phylowood [46]. Traditionally, the Philippines is not classified as part of Sundaland however, we included taxa from this archipelago in the Sundaland biogeographic area because the entire Philippine agamid fauna is Sundaic in origin.