Taxonomic coverage

We collected new genome-wide sequence data from four bat species, selected from the two suborders and encompassing the paraphyly of echolocating bat lineages (see ref. 13). From the suborder Yinpterochiroptera we studied the non-echolocating Old World fruit bat Eidolon helvum (family Pteropodidae) and two laryngeal echolocating species, Megaderma lyra (Megadermatidae) and Rhinolophus ferrumequinum (Rhinolophidae). From the suborder Yangochiroptera we studied the laryngeal echolocating species Pteronotus parnellii (Mormoopidae) that has independently evolved constant frequency echolocation.

From Ensembl (http://www.ensembl.org/), we also obtained sequence data from two additional bats—the laryngeal echolocating species Myotis lucifugus (Yangochiroptera; Broad Institute) and the non-echolocating Old World fruit bat Pteropus vampyrus (Yinpterochiroptera; Baylor College of Medicine Human Genome Sequencing Center)—as well as the echolocating bottlenose dolphin Tursiops truncatus. Genomic sequences from 15 additional mammal species were downloaded from Ensembl giving a total of 22 mammals (listed in Supplementary Table 1).

Phylogenetic hypotheses tested

To investigate the prevalence of convergent evolution at a genome-wide level associated with the independent evolution of echolocation in bats and cetaceans, we used a method that builds on maximum-likelihood phylogenetic reconstruction. This method compares, for a given sequence alignment of orthologous coding sequences (CDS), the goodness-of-fit of the accepted phylogenetic tree with that of an alternative convergent hypothesis (in this case, in which echolocating taxa were forced into a spurious monophyletic clade). From our data set, we identified and tested three hypotheses: (1) H 0 , the commonly accepted species phylogeny (for example, refs 13, 23, 24, 25) in which cetaceans (represented in our data set by the common bottlenose dolphin Tursiops truncatus) are nested within the even-toed ungulates in the order Cetartiodactyla, and the order Chiroptera is split into the suborders Yangochiroptera and Yinpterochiroptera, with paraphyly of bat laryngeal echolocation13; (2) H 1 , or ‘bat–bat echolocation convergence’ (monophyly of all echolocating bats in the data set); and (3) H 2 , or ‘bat–dolphin convergence’ (monophyly of all echolocating mammals in the data set). All three phylogenetic hypotheses are shown in Fig. 1. The scale bar (in amino acid substitutions) is provided for approximate reference only, as branch lengths were optimized at runtime.

Because the H 2 (bat–dolphin) hypothesis is necessarily a radical rearrangement of the commonly accepted species topology, and the concept of an ‘exact branching order’ or the ‘true’ topology does not apply in this case, we proposed a number of separate but related versions of this hypothesis, all of which were evaluated equally in the analysis. In each case the rest of the mammalian species phylogeny was fixed, as in the H 1 hypothesis. In the first case we constrained all five echolocating taxa to a single ancestral node (‘hard polytomy’); second we enumerated the seven bifurcating trees that are possible where the position of T. truncatus is free to vary, but the suborders of echolocating bats—Yangochiroptera (P. parnellii and M. lucifugus) and Yinpterochiroptera (R. ferrumequinum and M. lyra)—were preserved. A final topology was specified as a soft polytomy, with the resolution of the clade of echolocators being resolved by RAxML at runtime, with the rest of the phylogeny remaining constrained. A majority clade-consensus (MCC) summary phylogeny was constructed from these 2,326 inferred soft-polytomy H 2 trees using TreeAnnotator v1.7.4 (in the BEAST v1.7.4 distribution31). This phylogeny recovered the Yangochiroptera and Yinpterochiroptera clades of echolocating bats with good (>50%) node support. When we compared the goodness-of-fit of all phylogenies (as opposed to pairwise comparison relative to the species phylogeny H o ) we found the species phylogeny was preferred at 1,170 loci (55%), with the bat–bat phylogeny H 1 preferred next most often (548 loci; 26%). The soft-polytomy version of H 2 (resolved by RAxML) was the preferred phylogeny among 50% of the remaining loci, with remaining support equally split between the other H 2 versions. We therefore adopted the soft polytomy, RAxML resolved version of H 2 as our main bat–dolphin hypothesis.

Sequencing and data set assembly

Novel sequence data from the four bat species listed above were generated by BGI on an Illumina Genome Analyzer platform (Illumina), based on genomic libraries of 500-bp insert sizes. Using this method we obtained approximately 33–41 Gb of short read sequence data per species.

The CLC de novo algorithm (CLC bio) was used for assembling raw reads into contigs using different k-mer size values ranging from 32 to 50. The assembled contigs from the CLC output were then processed using the module Prepare of the SOAP package to do scaffold assembly using the scaff command of SOAPdenovo. Finally, gaps were filled using the GapCloser32 tool. The resulting assemblies consisted of between 210,080 and 315,526 genomic sequences (depending on species), with an average depth of coverage of 17× to 18×. Estimated genome size was approximately ∼2 Gb in all four bats, whereas contiguity (as assessed by the N50 statistic) ranged from 16,292 bp (M. lyra genomic sequences) to 27,140 bp (E. helvum). Homology-based gene prediction analyses using the genBlastG33 tool recovered 20,424 gene models for R. ferrumequinum, 20,043 for M. lyra, 20,455 for E. helvum and 20,357 for P. parnellii, in line with published gene content values for other mammals34. The completeness/contiguity of the gene representation was evaluated using the CEGMA (Core Eukaryotic Genes Mapping Approach) pipeline35,36 and found ranging across species between 61.29% to 77.02% and 90.32% to 96.77% for complete and partial genes, respectively. These compared well to the published M. lucifugus genome; when we analysed that genome using CEGMA the comparable completeness/contiguity scores for complete and partial genes were in the middle of this range (62.9% and 91.5%, respectively).

To identify genes adequate for systematic phylogenetic-based analyses of convergent sequence evolution, we next filtered the above predictions for single-copy orthologous protein-coding genes conserved across the Eutheria. This was achieved by performing reciprocal blast searches against a database consisting of the gene models for the four bats, and using as queries the human sequences of 11,185 genes reported as 1-to-1 or apparent 1-to-1 orthologues between the human and Myotis genomes in Ensembl databases (http://www.ensembl.org/, release 63). In total we determined 7,612 1-to-1 orthologous genes, from which the longest coding sequences (CDSs) were then retrieved from Ensembl for the 18 additional mammalian genomes (Supplementary Table 1).

CDS alignment

Coding gene sequences (CDS) of individual loci were built and aligned as codons using a modified version of transAlign37 incorporating MAFFT38, such that all sequences remained in the correct reading frame. Any ambiguously aligned sites, and codons with excessive numbers of gaps, were removed from each gene alignment using Gblocks39 under the following options: −t = c −b1 = “$b1” −b2 = “$b1” −b3 = 1 −b4 = 6 −b5 = h, where b1 = 70% of the sequences sampled in the data set.

In order to avoid potential biases due to either sequencing or assembly errors, for all phylogenetic and molecular evolution analyses, we chose to focus on only a subset of the identified genes. Specifically, we restricted our downstream analyses on data sets, which after filtering out of ambiguous sites showed no missing data in any of the sampled bats. The exception to this rule was P. vampyrus, which, because of its comparatively lower genome coverage, was missing in around 2% of CDS alignments. All final CDS alignments used in our analyses were characterized by a minimum length of 450 bp (or 150 codons/amino acids) and included a minimum of six bat species, the dolphin Tursiops truncatus and the additional following mammals as outgroups: Canis familiaris, Equus caballus, Bos taurus, Mus musculus and Homo sapiens. Of the 2,326 loci examined, 642 were also included in the analysis of ref. 20.

Sets of genes associated with hearing and vision

We interrogated our full CDS data set for loci that have been implicated in aspects of sensory perception of sound, including those that have been linked to deafness and/or ear development. For this we searched annotation databases hosted on the gene ontology (GO) site DAVID (http://david.abcc.ncifcrf.gov; ref. 40) and cross-referenced all of our CDSs with the terms ‘hearing’ or ‘deafness’. Using this approach, 23 ‘hearing loci’ were identified: TYR, SLC4A11, NECAP1, COCH, JAG1, HOXA1, GTF3C2, PROX1, GGA3, MKKS, SLC44A2, ITM2B, EDNRB, FBXO11, PI4KB, DISP1, ERCC3, HESX1, FZD6, BDNF, NF2, OPA1 and DFNB59. We refer to this set of genes as ‘hearing’.

We repeated annotation searches of our full set of CDSs for associations with vision, this time using the keywords ‘vision’ or ‘blindness’. In total, 75 loci were classified: PAX6, MED24, JMJD6, ATP6AP1, MCM2, TRAF4, GPC4, CIC, RDH8, IMPG2, CAD, RPGRIP1, HPS4, RABGGTA, RP2, CLN5, RPGRIP1L, VPS18, RP1, FZR1, GLI3, UNC119, GLRB, MYF5, COPZ1, MCM3, TTK, HMGCR, NPHP3, PDC, RPE65, PRPF3, ELOVL4, TGIF2, TCTN3, TGFBI, OPN4, ECD, NEUROD4, BBS2, PAICS, APC, LAMC1, SKIL, PDCL, RABGGTB, IFT172, BBS4, INTS7, LGSN, ZEB1, PELO, SMARCA5, KIT, CNNM4, GJD2, CCT3, RHO, RFC4, SLC45A2, VPS39, CTNNB1, STAT3, ADRA1B, GPRC5C, SP3, PRPF8, PVRL3, RRH, BCOR, SIX6, DRD1, NHS, TOPORS and LCAT. We refer to this set of genes as ‘vision’.

In addition, several loci associated with the sensory perception of sound have previously been reported as convergent for bat–bat or bat–whale echolocation in the literature: prestin19, KCNQ4 (KQT-4)17, Pcdh15, Cdh23 and otoferlin18, and Pjvk and Tmc1 (ref. 16). We downloaded the sequences published in these studies (including focal echolocating taxa and background sequences from other mammalian species as in Supplementary Table 1) from NCBI using the published accession numbers. We aligned each locus by MUSCLE41, and manually inspected them for quality. This included an initial phylogenetic step to check for long-branch effects where a de novo topology was estimated under ML (maximum likelihood) using RAxML (build 7.6.2; SSE3.MPI, compiled from source https://github.com/stamatak/standard-RAxML (refs 42, 43)) and the PROTCATDAYHOFF model of amino acid substitution. All seven loci were retained following this step and analysed alongside the automatically assembled CDS data sets. We refer to these manually assembled alignments as ‘published’.

Analysis pipeline

To detect signatures of molecular convergence in genomic data, we compiled an analysis pipeline consisting of previously released software for phylogenetic tree manipulation, phylogenetic reconstruction and codon model analyses in a Maximum Likelihood (ML) framework, as well as of a set of utility classes (available on request) for data handling, parsing and model/hypothesis testing. Our phylogenetic approach differs from genome-wide SNP comparisons for the detection of parallelism within intraspecific populations44, in that codons’ phylogenetic histories are evaluated and compared separately, but aggregated over each locus; we also use simulation to establish a reference null distribution for each locus, and compare observed convergence values for a given test phylogenetic hypothesis to an expected distribution derived from 100 additional phylogenies. This framework incorporates RAxML (build 7.6.2; SSE.MPI, compiled from source (https://github.com/stamatak/standard-RAxML (refs 42,43)) and a modified build of version 4.4b of PAML45; available on request) where input/output (I/O) functions were adjusted to facilitate parallel cluster implementation. The main algorithms remained unchanged and in testing gave identical output to that produced using executables distributed by the authors; available on request. All analyses were conducted on a mixture of 32- and 64-bit processors at the 3,000-node Queen Mary GridPP High Throughput Cluster hosted by the Physics Department at Queen Mary, University of London. Supplementary Fig. 1 gives a schematic representation of the pipeline workflow ((1) to (9) below).

Input (1): for each locus, the multiple sequence alignment was first filtered to remove ambiguous or incomplete codons, as well as premature stop codons. Gaps were retained. The alignment was also checked to determine how many of the 22 possible species were present, and any absent taxa were pruned from the species tree H 0 and the convergence trees H 1 and H 2 using NewickUtilities46 (see Supplementary Fig. 1, panel (i)).

De novo phylogeny estimation (2): we generated a separate de novo phylogeny for each locus using RAxML 7.6.2 (refs 42,43) under the model PROTCATDAYHOFF in rapid-search mode and 10 separate random start trees. The de novo tree was used as an independent estimate of the tree length. ΔSSLS was subsequently calculated under this phylogeny as described below for H 1 and H 2 (see Supplementary Fig. 1, panel (ii)). The soft polytomy present in the H 2 hypothesis (the four echolocating bats plus T. truncatus) was also resolved in a separate RAxML search using a constrained subtree, also under PROTCATDAYHOFF, in this step.

Model fitting (3): we fitted the checked alignment data to the H 0 , H 1 , H 2 and de novo topologies using our modified build of the aaml program in PAML 4.4 (ref. 45) under the WAG + γ model with estimated amino acid frequencies. We also implemented the JONES and DAYHOFF models of amino acid substitution. However, topology comparison requires marginalization of the site likelihoods with respect to substitution model, so that competing phylogenies’ goodness-of-fit may be directly compared. Pilot studies of available alignments of orthologous coding sequences47 showed congruence in relative ΔSSLS estimates for each locus under different models of substitution (available on request). We also repeated our complete ΔSSLS analyses separately under JONES and DAYHOFF; the Pearson’s correlation coefficients between ΔSSLS values for WAG-JONES, WAG-DAYHOFF and JONES-DAYHOFF were 0.746, 0.979 and 0.742, respectively, for H 0 −H 1 ; and 0.917, 0.892 and 0.981, respectively, for H 0 −H 2.

We therefore determined to use the WAG model of substitution for all loci, optimizing model parameters separately for each locus; see Supplementary Fig. 1, panel (iii).

Convergence hypotheses fitting (4): the two hypothesized convergent phylogenies H 1 and H 2 were then fitted to the data as for the species tree H 0 (Supplementary Fig. 1, panel (iv)). Because the data and the substitution models were the same, the difference in likelihood between two phylogenies reflects the strength of support for each in the data (see below).

Comparison of site-wise log-likelihood support (ΔSSLS) (5): we used the mean ΔSSLS of all sites in a locus as the primary measure of strength of support for convergence in this study. This statistic compares the goodness-of-fit of a pair of phylogenetic trees under a given model of evolution at every site in a DNA or amino acid alignment (see Supplementary Fig. 2). First the log-likelihood of the phylogeny (Supplementary Fig. 2a) and substitution model, given the data (Supplementary Fig. 2b), is calculated for every site in the alignment using ML (see above). Site-specific likelihood support, ΔSSLS, was then calculated:

where ΔSSLS for the ith site is given by the difference in log-likelihood units between the log-likelihood of the ith site under H 0 (the species tree) and H a (the alternative tree; one of H 1 or H 2 ). By this definition, sites with a better model fit to H 0 (the species tree) will have a positive ΔSSLS, whereas sites with a better fit to the convergent topologies H 1 or H 2 will have a negative ΔSSLS (Supplementary Fig. 2c). The distribution of overall signal in the locus indicates the strength of support for convergence. In particular, loci with a negative mean ΔSSLS show net site-wise signal for convergence (Supplementary Fig. 2d). Boxplots showing the distribution of mean ΔSSLS by hypothesis are shown in Supplementary Fig. 3; the top 5% of loci by mean ΔSSLS for H 1 and H 2 are shown in Supplementary Tables 2 and 3, respectively (n = 805,053). As expected, mean ΔSSLS for the H 1 and H 2 hypotheses are positive, indicating that most sites in the data set are not convergent. Equally, mean ΔSSLS for the hypotheses defined by the de novo tree comparison is negative, indicating that the topology that was directly fitted to the data has slightly better goodness-of-fit in many cases.

To investigate the nature of the convergence signal at all sites across this genomic data set, we plotted the empirical cumulative distribution function of observed ΔSSLS for H 1 and H 2 for the site-wise data set (shown in Supplementary Figs 4 and 5, respectively). For both H 1 and H 2 , 95% of the site-wise ΔSSLS observations were within ± 0.5 lnL units of the mean. However, some sites displayed large ΔSSLS observations: absolute log-likelihood ΔSSLS of >1 lnL unit were observed for 1,828 and 26,342 amino acids in H 1 and H 2 , respectively. Furthermore, mean variance in site-wise convergence measured by locus was larger than the mean site-wise convergence measured across the whole data set. Together, these indicate that the distribution of ΔSSLS within a gene is aggregated; as the mean locus length in our data set is relatively short but with a large variance (mean number of amino acids 346.8; s.d. 242.4) this suggests that small numbers of large site-wise ΔSSLS may influence mean locus ΔSSLS disproportionately in short loci.

Simulation of expected site-wise ΔSSLS distribution (6): homoplasious amino acid replacements may arise by neutral processes and, therefore, we used simulation to determine whether site-wise convergence was more significant than expected by chance. For each locus, we generated an expected distribution of likelihood differences as follows: using Phylobayes 3.3f (ref. 48), we first used the ‘pb’ MCMC sampler to obtain the posterior distribution of substitution model parameters under the CAT GTR model, constraining the topology to the species (H 0 ) tree. Each locus used one chain of 6,000 steps, sampling every 10 and discarding the first 1,000 as burn-in. We then used the ‘ppred’ function to simulate alignments using the model parameters from the samples in the stationary posterior distribution. Each simulated data set therefore contained identical numbers of sites and taxa to the observed alignment; and because we fitted a mixture model, sites’ heterogeneity parameters should reflect heterogeneity in the observed sequences. To generate an expected distribution, these replicates were analysed identically in the pipeline.

To determine how many replicates to use to form the expected distribution for each locus, we first simulated 50 alignments as described above from six loci that were representative of our data set in terms of alignment length, heterogeneity, ΔSSLS and tree length: ENSG00000008515, ENSG00000095906, ENSG00000121900, ENSG00000167671, ENSG00000170476 and ENSG00000173627. We calculated their site-wise ΔSSLS values in the pipeline and then compared a single randomlyselected replicate’s ΔSSLS distribution to that of a variable number of replicates’ pooled ΔSSLS values in a Kolmogorov–Smirnov test. Replicates (1 <n < 50) were sampled without replacement. Because the test’s D statistic measures the largest difference between the two distributions, it will correlate with the smoothness of their step functions. We determined that smoothness monotonically increased as more replicates were used to construct the reference distribution, with 20–30 replicates providing a reference distribution that was comparably smooth to that derived from 50 replicates (available on request); we repeated these analyses to calculate means and variances. A purely numerical simulation in R using more replicates (10,000) but ΔSSLS values simulated from a normal distribution parametized on the alignments’ ΔSSLS distribution’s means and variances gave a similar result (not shown).

Determining the significance of ΔSSLS signals (7): to ascribe confidence to our observed site-wise ΔSSLS measurements for the trees of interest, we followed a two-stage process. First, we measured the site-wise ΔSSLS for the tree comparison of interest H 0 − H a , described above. Next, we performed the same comparison on the simulated data sets, collated their ΔSSLS values and calculated their stepwise empirical cumulative density function (cdf), with linear interpolation. This allowed us to calculate the cumulative probability of the observed ΔSSLS under the null distribution. We define U, the unexpectedness of an observed site j’s ΔSSLS comparison of the species topology, H 0 and an alternative topology H a as:

Correction for expected U across random (control) phylogenies (8): the unexpectedness measure U quantifies the significance of a given site’s convergence signal, or more explicitly, the cumulative probability of the observed log-likelihood difference between the two topologies, given the species topology is correct. However, in cases where the species topology itself is distant from the maximally likely topology, or where the molecular signal is weak, ΔSSLS scores <0, indicating preference for the alternative topology, may arise spuriously. In this scenario, differential support for any alternative topology may be possible where the signal for the species topology is weak. To control for these cases, we generated 100 additional control phylogenies by resampling the H 1 phylogeny taxon labels without replacement, obtained the mean site-wise U across this random set U r , and calculated the controlled site-wise U, U c as:

for each site j. These were summed across the locus and their arithmetic mean calculated. The random-tree controlled unexpectedness, U c , therefore takes values on [−1, 1], where values greater than zero indicate that the observed ΔSSLS signal for the tree of interest, H a , is both stronger than expected by chance assuming H 0 , and that cumulative probability is itself greater than that seen in random ΔSSLS comparisons.

We filtered loci with corrected unexpectedness U c ≤ 0 from our principal gene lists (Figs 1 and 2, and Supplementary Tables 2 and 3); from our randomizations of hearing, vision and curated genes; from Metacore analysis; and from functional enrichment analyses.

Site-wise selection pressure

The ratio dN/dS or ω denotes the ratio of the rate of nucleotide substitutions that lead to a codon replacement (non-synonymous substitutions, dN) to the rate of nucleotide substitutions that do not change the coding sequence (synonymous substitutions, dS). Under the neutral model, dN and dS are expected to occur at approximately the same rate (ω ≈ 1), whereas sites constrained by purifying (negative) selection are expected to evolve with ω < 1, and those under diversifying (positive) selection with ω > 1. For each locus, we estimated ω across the phylogeny, and also in the clade of taxa hypothesized to be convergent, for every site in the data set using ML in our modified build of version 4.4b of the codeml program within PAML45.

The codon models M7 (the null model, with F3x4 frequencies, beta-distributed ω and 10 site categories) and M8 (site-wise selection, also beta-distributed ω with an additional ω category representing positive selection49) were fitted in our pipeline implementation of PAML 4.4b. The hypothesized phylogenies (H 0 , H 1 , H 2 ) were fixed as the user tree and the models compared by likelihood ratio test (LRT). The individual LRTs’ P values were then corrected post-hoc in the complete data set for multiple tests using the method of Benjamini and Hochberg50. We then considered the site-wise ω estimate only from loci where the M8 model was favoured. In H 1 and H 2 comparisons, 2,235 and 2,234 loci (from the total set of 2,326) passed the LRT, respectively. The site-wise ω estimates (and their mean for each locus) derived from this method were compared to clade-specific ω estimates (see below, values available on request) and incorporated into the principal component analysis for site data (see below).

Clade-specific ω estimation and derived site-wise ω

Large ΔSSLS values might reflect true phylogenetic signal due to evolutionary divergence, or could alternatively arise from sequencing and/or alignment errors (such processing errors are addressed below).

Where ΔSSLS values represent phylogenetic signal this could be due to neutral drift or diversifying (adaptive) selection, in which case ΔSSLS should be proportional to site-wise ω. To test this we fitted the clade-specific Clade Model C (and null model M1a)51,52 and estimated ω on the H 1 and H 2 topologies for the hypothesized clade of echolocating taxa in each case. In this model, three separate ω ratios were estimated in the given convergent clade; ‘category 0’, denoted ω 0 , for sites under purifying selection where 0 < ω < 1; ‘category 1’, denoted ω 1 , for sites evolving neutrally, where ω≡ 1; and ‘category 2’, denoted ω 2 , for sites under diversifying selection, in which ω is free to take values >1. In this model, the three ω ratios are estimated by ML and the Bayes empirical Bayes (BEB) posterior probabilities that each site falls into category ω 0 , ω 1 or ω 2 calculated. For each site, the ω estimated in each category is then weighted by the BEB posterior probabilities and summed to give an estimate of site-wise ω (also see refs 16 and 19). In subsequent analyses we treated sites with a BEB posterior for the divergent site category 2 (ω 2 ) of >0.5 as being under diversifying selection. As with the M8/M7 comparison above, we tested clade model C over the null (M1a) model by LRT, and the resulting complete set of P values were treated with a post-hoc correction following Benjamini and Hochberg50. We considered the site-wise clade-specific ω estimate only from loci where Model C was favoured.

We also examined the tree lengths for each CDS alignment under H 0 , H 1 and H 2 ; all were within the recommended limits for codon-based analyses, indicating that no large, long branches had biased the ML fitting/optimization.

Comparison of ancestral sequences with convergent taxa

Theoretically, signatures of sequence convergence might also arise under conditions of stronger purifying selection in echolocating taxa than in non-echolocating outgroups (for example, Old World fruit bats). However, ω in the fruit bats was not estimated directly, but instead was averaged over all non-echolocating taxa in the phylogeny. As a result, similar ω estimates in both the background clade and the foreground (convergent) clade could arise from unrelated divergence elsewhere in the phylogeny, instead of in the non-echolocating taxa. In such cases, the likelihood support for the monophyly of lineages of echolocating taxa could arise due to the presence of derived amino acid substitutions in the non-convergent outgroup with retention of shared ancestral states in taxa comprising the putative clade of echolocators; misinterpreted as convergence.

Therefore we reconstructed the amino acid sequences at the internal nodes of the H 1 and H 2 phylogenies for each locus using unweighted parsimony53. We then examined every position in the alignment, comparing the sequence at the most recent common ancestor (MRCA) of echolocating taxa with the sequences of extant echolocating taxa sampled at the tips themselves. Where two or more echolocating taxa shared a divergent substitution from the MRCA at the same position, we termed this a ‘parallel’ substitution. Comparing the counts of parallel substitutions with the ΔSSLS evidence for convergence at each locus, we found extremely good correspondence between the two measures; although 551 loci lacked any parallel substitutions for H 1 (441 under H 2 ) overall, only six of our 5% most convergent loci (n = 118) lacked parallel changes for H 1 (seven in the most corresponding H 2 comparison).

Spatial distribution of selected sites

Because it is known that estimated numbers of non-synonymous substitutions, and thus dN/dS ratios (ω), can be inflated by potential sequencing, annotation and alignment errors54, we performed analyses to assess the reliability of our selection results. Sequencing errors and misalignments (that is, continuous blocks of poorly aligned sites) will most commonly be characterized by spatially aggregated distributions of divergent amino acids (and hence correlations in the ω estimates of adjacent sites). We therefore diagnosed the spatial distribution of sites in each locus estimated to have undergone diversifying selection to look for this signal. Within each locus that had passed the LRT M8/M7 (site-wise selection), we identified those sites with a Bayes empirical Bayes posterior ≥ 0.5 (likely to belong to the divergent site class). We then calculated the intragenic distance as the interval in alignment position between each codon under diversifying selection. For this calculation, sequences were treated as circular such that the distribution of intragenic distances would be identical regardless of the location on the gene of any putative group of misaligned codons.

On the basis of this definition of intragenic distances, we developed four measurements of the aggregation of codons of interest along a locus. They were: the minimum distance between two codons; the range (max − min distance); the k distance (defined as the largest integer k such that no more than k + 1 intragenic distances were of length k); and the ‘exponent coefficient.’ The exponent coefficient assumes distances are exponentially distributed; the ranked distances are log-transformed and a linear regression fitted; this coefficient is the ‘exponent coefficient’. We fitted linear, generalized linear and generalized linear mixed models to see if these measures of intragenic aggregation correlated (with or without number of taxa or total CDS length) with ΔSSLS. Results of these analyses (available on request) showed that no CDSs were significant.

Principal component analysis

To explore broader patterns of association among signatures of convergence, selection and putative gene function, for both CDS (locus) and site-wise data, we undertook principal component analyses (PCA). Data were transformed to approximate a standard normal distribution. The PCA was repeated for H 1 and for H 2 with the mean and site-wise ΔSSLS (in CDS and site-wise analyses, respectively) measured with respect to H 1 and H 2 . In the CDS data set we analysed the principal component weightings of mean ΔSSLS, number of taxa present, total amino acid count, count of amino acids with significant support (see ‘Simulation of expected site-wise ΔSSLS distribution’, above), mean ω (dN/dS) for sites in the divergent site class in the convergent clade (see clade-specific ω estimation and derived site-wise ω, above) and log-linear exponent coefficient (a measure of the spatial aggregation of positively selected sites, see ‘Spatial distribution of selected sites’, above). In the data set of all amino acids, we analysed the principal component weightings of site-specific likelihood support (ΔSSLS), number of taxa present, estimated ω (dN/dS; see site-wise selection pressure, above) and estimated ω in the convergent clade (see clade-specific ω estimation and derived site-wise ω, above). For the locus and site-wise analyses, variance was explained approximately equally across all components; example component loadings for the loci-based PCA of H 1 and H 2 are shown in Supplementary Fig. 7a, b, respectively. Example component loadings for the PCA of site-wise data in H 1 and H 2 are shown in Supplementary Fig. 8a, b, respectively.

Locus-wise adaptation/convergence regressions

High-magnitude ΔSSLS sites were positively correlated with estimated site-wise ω in the clade of echolocating mammals in both H 1 and H 2. A positive correlation was previously found between site-wise ω and ΔSSLS for the prestin gene19, that is, sites under stronger positive selection also showed larger ΔSSLS for the alternative (convergent) topology. To test whether site-wise support was driven by selection under each of the given convergent hypotheses, we fitted the following generalized linear mixed model, treating locus as a random effect:

where |ΔSSLS| is the absolute value of the site-specific log-likelihood support; α the line constant; β the regression coefficient; and ω 2 is the clade-specific estimated site-wise ω (dN/dS rate ratio) for the divergent site class (site category 2) in the hypothetical clade of echolocating taxa (that is, the foreground clade of Clade Model C). For details, see ‘Clade-specific ω estimation and derived site-wise ω’, above. Owing to computational constraints arising from the size of the data set, we fitted the model above to replicate data sets directly subsampled without replacement from the original site-wise data set. For this, we jack-knifed 1,000 replicates, each containing 4,000 sites. Mean jack-knife estimates of the model parameters are given in Supplementary Table 5.

Modelling convergence

Several loci displayed significant regressions with a negative correlation between ω and ΔSSLS, and also showed support for convergence based on a negative mean ΔSSLS, but were not categorized as convergent in Fig. 2 because their ΔSSLS did not exceed the threshold we established above; that is, the value of the empirical cumulative density function (eCDF) of mean locus-wise ΔSSLS at the 5% significance level (thresholds were −0.01035 and −0.205 for H 1 and H 2, respectively). However, given their regression suggested a convergent evolutionary trajectory, we predicted the impact of continued selection on these loci by modelling the convergent signal following continued diversifying selection by estimating the ΔSSLS value at ω = 2 using the fitted linear relationship in each locus. We predicted two trajectories for each locus: an upper estimate of ΔSSLS incorporating the uncertainty in the regression parameter estimates (95% confidence intervals of the slope and the intercept); and a central estimate predicted from the mean slope and intercept. Loci were modelled as ‘convergent’ if the upper ΔSSLS estimate passed the ΔSSLS threshold and ‘possibly convergent’ if the upper estimate failed the ΔSSLS threshold, but the central estimate passed it.

‘Convergent’ (Supplementary Tables 6 and 7 for H 1 and H 2 , respectively) or ‘possibly convergent’ predictions (top 50 loci are shown in Tables 8 and 9 for H 1 and H 2 , respectively) were more common in the data sets of CDSs with a priori indicators for convergence (hearing, vision and published loci) than in randomly chosen loci. Predicted convergence signals were significantly stronger for vision loci compared to the background set in H 1 (one-tailed T = −2.11; P ≤ 0.019 with ∼159 d.f.) and significantly stronger for hearing loci compared to the background set in H 2 (one-tailed T = −2.23; P ≤ 0.016 with ∼41.7 d.f.).

Randomization analysis of convergence signal in hearing and vision genes

We analysed the strength of the signal for convergence, measured by mean ΔSSLS and by the number of sites per locus with significant site-wise ΔSSLS, under the H 1 (bat–bat) and H 2 (bat–dolphin) hypotheses among our manually curated subsets (see above) of loci that were associated with hearing (n = 23) or vision (n = 75), or previously reported (published) as convergent in the literature (n = 7). We measured the mean ΔSSLS and number of significantly convergent sites in each of these three subsets. To assess the significance of each result, we performed a randomization separately for each subset of loci as follows: 1,000 replicate data sets were simulated by reshuffling the observed values without replacement, and the expected mean ΔSSLS or number of significantly convergent sites for the subset of loci recomputed. The set of 10,000 expected mean values were taken together to form a null distribution; the position of the observed value in the empirical cumulative density function (eCDF) of the expected values is given as the significance, P. See Supplementary Tables 10 (H 1 ) and 11 (H 2 ).

Functional enrichment analysis

Having screened these genomes for convergence, we compiled lists of the most convergent (strongest ΔSSLS plus credible U c , and alignment heterogeneity/signal: noise scores) loci in our data set. These genes represent prime candidates for molecular convergence, but direct experimental validation of their function has only been conducted in a handful of cases (for example, prestin, Tmc-1, Pcdh15). Because generating tissue-specific expression data for the relevant organs is fraught with difficulties (tropical bat cochleae are tiny, highly mineralized, and their storage in the field presents significant logistical challenges) we elected to analyse these target lists for functional enrichment in silico as an initial step towards inferring their function.

Functional enrichment analysis for all gene lists under investigation (top 5% of loci by signals for convergence in H 1 and H 2 ) was carried out using Fisher’s exact test and Benjamini–Hochberg50 correction. All enrichments were computed with respect to the background defined by the 2,204 loci under study, so as to avoid biases due to possible functional enrichments of the latter gene list. We thus investigated enrichment of our lists for Gene Ontology annotation terms55, KEGG pathways56, common interactors according to the HPRD database57, microRNA targets according to Targetscan58 and genes associated to human diseases according to the Genetic Association Database59.

At a Benjamini–Hochberg50 FDR of 30%, and retaining only functional categories represented by at least five genes in our H1 and H2 lists, only Gene Ontology annotations and microRNA targets showed significant enrichment. However, when a control gene list generated from ranked convergence signals of random phylogenies was used, approximately equivalent levels of enrichment were seen. The functional enrichment we found and present here should therefore be considered putative, and further work carried out to validate the functional roles of the loci with strongest evidence for convergence.

Overall, one significant enrichment was found for the top 5% most convergent H 1 genes, pertaining to alcohol metabolic processes. For H 2 lists, on the other hand, significant enrichments were found for several Gene Ontology categories and microRNA target sets. The comprehensive results obtained for both the functional enrichment analyses are collated in Supplementary Table 12 (117 loci representing the top 5% of loci by signal for H 1 and H 2 ).

Several loci with the most strongest signals for H 2 convergence were found to be associated to the GO annotation ‘sensitivity to light stimuli’ (GO:0009416). Other categories enriched in the most convergent genes are related to vesicle-mediated transport, regulation of cell cycle and cell death.

We also compared our lists to sets of genes highly expressed (defined as reads per kilobase of transcript per million mapped reads (RPKM) >10) in human tissues according to the RNA-seq Atlas, and found the genes from the most convergent H 2 list were enriched in genes highly expressed in the hypothalamus (P = 0.029, Fisher’s exact test; Supplementary Table 13).