Sequencing the large genomes of sharks

We focused on the brownbanded bamboo shark Chiloscyllium punctatum, for which we recently tabled embryonic stages8, and the cloudy catshark Scyliorhinus torazame. Their whole genomes, measured to be approximately 4.7 and 6.7 Gbp, respectively, were sequenced de novo to obtain assemblies including megabase-long scaffolds (Supplementary Note 1.1). We also assembled the genome of the whale shark Rhincodon typus using short sequence reads previously generated3 (Supplementary Note 1.2). Using these genome assemblies, we performed genome-wide gene prediction, assisted by transcript evidence and protein-level homology to other vertebrates. The obtained genome assemblies and gene models exhibit high coverage (Supplementary Fig. 1), and of these, the bamboo shark genome assembly achieved the highest continuity (N50 scaffold length, 1.9 Mbp) and completeness (97% of reference orthologues identified at least partially). Using the novel gene models, we constructed orthologue groups encompassing a diverse array of vertebrate species (see below). Our products outperform existing resources for elasmobranchs and provide the tools for genome-wide characterization of molecular evolution at the origin of jawed vertebrates and later in the chondrichthyan lineages.

Genome-wide trends in molecular evolution

We first examined genome-wide trends of molecular evolution, utilizing one-to-one orthologues in the constructed orthologue groups (Supplementary Note 7). Our comparisons of coding sequences detected a higher similarity in nucleotide and amino acid compositions of sharks to tetrapods and coelacanth than to actinopterygian fishes (Supplementary Fig. 2a,c). We performed a phylogenomic analysis using conserved protein-coding genes, which confirmed the phylogenetic positions of elasmobranchs and the reduced rate of molecular evolution in the entire chondrichthyan lineage (Fig. 1d). The reduced evolutionary rate was further scrutinized by comparing the numbers of synonymous substitutions per site (K S ) between chondrichthyan and osteichthyan lineages (Fig. 1e). The result revealed that synonymous substitution rates for the chondrichthyan lineages were significantly smaller than those for almost all the osteichthyan lineages analysed (Supplementary Note 12), suggesting a reduced intrinsic mutation rate in the chondrichthyan lineages. Our cross-species comparison revealed a remarkable increase in the intron lengths of shark genomes and its correlation with genome size (Fig. 1f and Supplementary Note 10). Our analysis on the composition of orthologue groups did not detect massive gene duplications in the chondrichthyan lineage (Fig. 1g), which was supported by the inference of age distribution of paralogues (Supplementary Note 11). Thus, the increase of genome size in sharks is not attributable to additional whole-genome duplication.

Characterizing noncoding landscape

To characterize noncoding regions, we first scanned elasmobranch genomes for repetitive sequences including those unique to the species analysed (Supplementary Note 4). This identified long interspersed nuclear elements as the most abundant class of repetitive elements, exceeding the proportions of long terminal repeats and those unclassified into any existing repetitive element class, both of which were particularly expanded in the catshark (Fig. 1h). Overall, the genomic regions identified as repetitive elements, including simple repeats, amounted to half of the individual elasmobranch genome assemblies, and their abundance contributed to the observed variation in genome size (Fig. 1h).

Next, we surveyed the elasmobranch genomes for homologues of human conserved noncoding elements (CNEs), which yielded a much larger number of matches than in teleost genomes (Fig. 2a and Supplementary Note 13). Our analysis revealed some CNEs retained by elasmobranchs but missing in teleost fish and C. milii, which included a CNE in an intron of the Tbx4 gene (Fig. 2b) previously reported as the core lung mesenchyme-specific enhancer9. Its presence in a cartilaginous fish that lacks a lung homologue prompts a reexamination of its evolutionary significance. This finding also highlights the problem of using only a single holocephalan species as a representative of chondrichthyans, whether the CNEs missing in this species are lost during evolution or masked in gaps in the genome assembly.

Fig. 2: Noncoding landscape of elasmobranch genomes. a, Number of the retained CNEs identified by the human–chicken genome comparison. b, Elasmobranch homologue of the lung mesenchyme-specific enhancer (LME) in an intron of the mammalian Tbx4 gene. c, Expression profiles of the putative bamboo shark homologues of human lncRNAs expressed in at least one analysed tissue. See Supplementary Notes 13 and 14 for details. Full size image

We also searched for elasmobranch homologues of human long noncoding RNAs (lncRNAs), which again revealed more candidate homologues than in teleost fishes (Supplementary Note 14). These were screened for transcript evidence in bamboo shark RNA-seq data and absence of homology to coding sequences. This screening resulted in the identification of 38 transcript contigs with variable degrees of spatial expression biases (Fig. 2c). These putative lncRNAs included a possible homologue of the Malat1 gene10[,11, whose presence in chondrichthyans was recently suggested only by a sequence similarity to a C. milii genomic region12. The inclusion of the putative bamboo shark Malat1 homologue in our result validates our screening procedure and more importantly, ascertains its noncoding transcription in a chondrichthyan species.

Overall, these findings indicate that despite the variable genome sizes and repetitive element compositions, elasmobranch genomes have undergone less modification in noncoding regions involved in gene regulation since the jawed vertebrate ancestor than is inferred by their evolutionary distance.

Evolution of Hox genes and clusters

Hox genes play crucial roles in embryogenesis and are organized into four clusters (Hox A–D) in osteichthyans (bony vertebrates) except for teleost fishes13. In the shark genomes, we found well-conserved Hox A, B and D clusters, which have identical gene repertories to their C. milii counterparts (Fig. 3a and Supplementary Fig. 5a) expressed in a temporally collinear manner (Fig. 3d). For a comparison of conformational regulation of Hox gene expression by CCCTC-binding factor (CTCF)14, we analysed the distribution of its binding sites with ChIP-seq (Methods and Supplementary Note 15). This comparison on the Hox A, B and D clusters revealed a high similarity between elasmobranchs and amniotes (Fig. 3b), which makes the whole gnathostomes including elasmobranchs distinct from the lamprey that has more CTCF binding sites within Hox clusters15 (Supplementary Fig. 4e). It is thus suggested that the jawed vertebrate ancestor already possessed the mechanism of CTCF-dependent conformational regulation of Hox genes documented for mammals16,17. We identified antisense transcripts in the genomic region of elasmobranchs containing Hoxa11 and -a13 as putative homologues of lncRNAs previously known only in tetrapods, Hoxa11-AS and Hottip (Fig. 3c and Supplementary Note 14). Although the acquisition of Hoxa11-AS is proposed to be linked with the fin-to-limb evolution18, our discovery of the elasmobranch counterparts indicates their early origins in the common ancestor of jawed vertebrates.

Fig. 3: Elasmobranch Hox genes and clusters. a, Structure of the bamboo shark Hox clusters. Coloured boxes denote coding exons of the Hox genes (see Supplementary Fig. 5m for scaffold IDs for Hox C). b, CTCF ChIP-seq peaks in the Hox A cluster for the bamboo shark and the chicken (for details, see Supplementary Note 16.1). c, Putative bamboo shark homologues of mammalian lncRNA, Hottip and HoxA11-AS revealed by RNA-seq read mapping for a stage 28 bamboo shark embryo. d, Temporal expression patterns of Hox A genes in whole catshark embryos. e, Molecular phylogeny of Hox11 inferred using 147 residues. f–h, Embryonic expression of catshark Hox11 genes at stage 27. Scale bars, 500 μm. i, Expression timings of the Hox11 genes in whole catshark embryos. j, Repetitiveness of the whole genomes and the Hox-containing genomic regions (see Methods and Supplementary Note 16.4 for details). The Hoxb10-b13 region was excluded. k, Putative Hox C gene repertories of diverse elasmobranchs. The ocellate spot skate Okamejei kenojei and zebra bullhead shark Heterodontus zebra genes were identified in transcriptome data23,24. Closed pink boxes, presence of intact genes; grey boxes, pseudogenization indicated by a stop codon inside the homeobox (see Supplementary Fig. 5c). The genes connected by a horizontal line form a tandem cluster on a genome scaffold. Full size image

Although the entire Hox C cluster was reportedly missing from elasmobranch genomes19,20, we identified putative Hox C genes in the genome and transcript sequences of the analysed shark species (Fig. 3a,e and Supplementary Fig. 5a–c). While our phylogenetic analyses supported their affiliations to Hox C, those genes showed extremely elevated evolutionary rates. Remarkably, none of the identified, putative shark Hox C genes comprised such a compact cluster as in other jawed vertebrates, spanning within a 100-kbp-long genomic region13—for example, catshark Hoxc11 was flanked by a 50-kbp-long stretch containing no other Hox gene (Supplementary Fig. 5a). In addition, although typical jawed vertebrate Hox clusters are almost free from repetitive elements21 (at most 2.9% in length for elasmobranch Hox A, B and D clusters; Fig. 3a,j), the elasmobranch genome scaffolds containing the putative Hox C genes have accumulated repetitive elements (at least 36.8%; Fig. 3j).

Our analysis on embryonic expression patterns indicated that the identified elasmobranch Hox C genes are still under spatiotemporal transcriptional regulation, which is typically exerting on clustered Hox genes13,22―Hoxc11 (Fig. 3f–i and Supplementary Fig. 5j–l) as well as Hoxc8 (Supplementary Fig. 5g–i) are expressed in the posterior regions concomitantly with their sister paralogues in the Hox A, B and D clusters. We further surveyed the transcriptome data of other elasmobranch species23,24, which uncovered more Hox C genes of the zebra bullhead shark (Fig. 3k). These findings demonstrate that Hox C genes were not lost in a cluster-wide deletion event in the elasmobranch ancestor as proposed previously19, but have eroded intermittently during elasmobranch evolution. Together, while the Hox A, B and D clusters exhibit the canonical, conservative nature even in elongated elasmobranch genomes, their Hox C cluster underwent remarkable lineage-specific, sequence-level modifications.

Encompassing genes secondarily lost in osteichthyan lineages

The constructed orthologue groups contained 304 genes that seemingly existed in the vertebrate ancestor are retained by elasmobranchs, but have disappeared in osteichthyan evolution (Supplementary Table 8). They included a member of the Fox gene family (designated as FoxG3) whose orthologues are retained only by non-tetrapod vertebrates (Fig. 4a). One of its close paralogues, FoxG1, functions as a key regulator of forebrain development in diverse animals25,26. The third paralogue, designated as FoxG2, was identified in only non-tetrapod vertebrates and some reptiles (Fig. 4a). Molecular phylogenetic analysis showed the triplication between FoxG1, -G2 and -G3 early in vertebrate evolution and among-lineage differential gene loss (Fig. 4a and Supplementary Note 17).

Fig. 4: Asymmetric evolution of FoxG genes. a, Molecular phylogeny of the FoxG genes using 192 residues. b, Comparison of characteristics in protein-coding and flanking sequences between the three FoxG genes (see Supplementary Note 17). Two left panels: the numbers of non-synonymous and synonymous substitutions per site (K A and K S , respectively) of bamboo shark and catshark orthologue pairs that were calculated with codeml in PAML v4.9c88. Three right panels: the values of the bamboo shark genes. c, Variable conservation levels of coding (grey) and flanking noncoding (orange) genomic regions of FoxG1, -G2 and -G3 between catshark (used as a reference) and bamboo shark. d–f, Embryonic expression of the three catshark FoxG genes at stage 24 (see Supplementary Fig. 6c–p for expression patterns at stage 27). VII+VIII, acoustico-facial ganglionic complex; X, vagal ganglion; tel, telencephalon; ret, retina; ot, otic vesicle; GC4, GC-content at four-fold degenerate sites. Scale bars, 500 µm. Full size image

While the determinant of loss or retention of gene duplicates is often imputed to their functions27, the effect of intrinsic genomic characteristics, independent of gene functions, has also been proposed as a cause28. As the less-derived shark genomes are expected to better reconstruct the differentiation process of ancient gene duplicates, we performed a multi-faceted comparison focusing on the shark FoxG paralogues. A highly conserved nature of FoxG1, retained in all the species analysed to date, was observed in not only its amino acid sequence but also in synonymous nucleotide and flanking noncoding genome sequences (Fig. 4b,c). This coding/noncoding association was detected for the divergent nature of FoxG2, while the level of sequence conservation of FoxG3 was intermediate (Fig. 4b,c). The among-paralogue variation was also observed in the GC-content of fourfold degenerate sites (GC4; Fig. 4b and Supplementary Table 15). More remarkably, the flanking sequences of the most divergent paralogue FoxG2 contain the most abundant repetitive elements and the highest GC-content (Fig. 4b). These local genomic characteristics may have facilitated the secondary loss of the coding genes embedded in the divergent genomic regions.

Taking advantage of the access to embryonic samples, we analysed the spatial distribution of catshark FoxG gene expression during development (Fig. 4d–f and Supplementary Fig. 6c–p). FoxG2 was expressed in the acoustico-facial ganglionic complex (VII+VIII) and the vagal ganglion (X) (Fig. 4e). FoxG3 was expressed in an anterodorsal part of the retina, in addition to the FoxG2-positive domains (Fig. 4f), while FoxG1 expression was observed in the forebrain in addition to the FoxG3-positive domains (Fig. 4d). Together, the more prone a FoxG paralogue is to secondary loss, the more restricted is its expression domain in shark embryos. This among-paralogue comparison, enabled by the genomic resource of an egg-laying shark, confirms the association of the fates of gene duplicates with the variable natures of genomic regions containing those duplicates.

Early invention of homoeostatic machinery for gut–brain axis

To further characterize phenotypic traits refined in jawed vertebrates, we focused on gene repertories encoding endocrine hormones and their receptors that control growth, reproduction and homoeostasis. Our phylogenetic census in shark genomes and transcriptomes revealed potential orthologues of hormone and receptor genes previously unidentified in this taxon (Fig. 5a). These included prolactin (PRL1), orexin, kisspeptin, spexin, motilin and prolactin receptor implicated in fertility, appetite, digestion and sleep in mammals29,30,31, as well as osmolarity and gastrointestinal control in teleost fishes (Supplementary Note 18). For leptin whose putative orthologue was previously identified in a genomic sequence of C. milii32, we confirmed the orthology of the C. milii and elasmobranch genes to osteichthyan leptin genes, by means of molecular phylogeny and conserved synteny (Supplementary Note 18.10). Of these hormone and receptor genes, all but leptin were suggested to have existed in the vertebrate ancestor by the presence of possible cyclostome orthologues or gene duplication that probably occurred in genome expansion before the divergence of all extant vertebrates33 (Fig. 5a). This inference marks leptin, a key metabolic and neuroendocrine regulator in mammals34, and the signalling cascade through its receptor (LepR) as an invention in the jawed vertebrate lineage (Supplementary Note 18.10). In mammals, leptin is mainly expressed in adipose tissues35, and we could not identify any tissue with intensive expression of its orthologue in sharks that generally lack overt adipose tissues36 (Fig. 5b).

Fig. 5: Elasmobranch orthologues of peptide hormones. a, Cross-species comparison of peptide hormone gene repertories. The numbers of paralogues are indicated in the boxes. Green boxes, the genes whose orthology was inferred; beige boxes, those with ambiguous orthology; red rectangles, the members of the same gene families (see Supplementary Note 18). b, Expression profiles of hormone genes in an embryo and adult tissues of the catshark (see Supplementary Fig. 7a for a full heatmap). PAI (phasitocin) and ASP (aspargtocin) are two forms of oxytocin that are encoded by distinct loci38. PRL, prolactin; GALP, galanin-like peptide. Full size image

Overall, we identified the orthologues of almost all hormones and their receptors involved in the hypothalamo–pituitary and gastrointestinal systems documented mainly in mammals (Fig. 5a and Supplementary Note 18). Among these, the genes encoding oxytocin homologues have undergone a unique gene duplication in the elasmobranch lineage37,38, and our genomic and phylogenetic analysis indicated its intricate evolutionary history through intermittent gene conversions (Supplementary Note 18.3). The similarity in transcript localization of the identified shark hormone and receptor genes to mammalian counterparts, such as PRL1 in the pituitary and motilin in the intestine (Fig. 5b and Supplementary Fig. 7a), suggests the establishment of genetic components of the gut–brain axis39 before the last common ancestor of extant jawed vertebrates.

Sensory and neuronal gene repertories

Visual opsin gene repertories are often altered on adaptation to new habitats with dim light40,41. Previously, two short wavelength-sensitive opsin genes, SWS1 and SWS2, were found to be missing in the C. milii genome42. Our search in the elasmobranch genome assemblies ascertained the absence of not only these two but also the green/blue-sensitive opsin gene Rh2 (Fig. 6 and Supplementary Fig. 8). Moreover, long wavelength-sensitive opsin gene (LWS) is absent from the present cloudy catshark genome assembly, and thus rhodopsin (RHO) is the only visual opsin gene identified in it (Fig. 6 and Supplementary Note 19). Previously, the retention of only RHO was reported in some animals that adapted to fossorial, nocturnal or aquatic life43,44,45. In fact, the cloudy catshark inhabits not only inshore but also the deep sea46 (~300 m) and is a close relative of typical deep-sea dwellers47. Thus, the absence of LWS might be due to an evolutionary gene loss that was permitted in the catshark ancestor by its possible exclusive deep-sea habitat (Fig. 6). Adaptation to the deep sea, into which only blue light penetrates48, was previously corroborated for ray-finned fishes by the blueshifted absorption spectra of RHO pigments49. Our spectroscopic analysis of the RHO pigments revealed blueshifted spectra for not only the cloudy catshark (λ max , 484 nm) but also the whale shark (478 nm) that occasionally migrates down to the bathypelagic zone (~2,000 m) besides daytime surface feeding habits50 (Fig. 6). This study portrays the diversity of visual opsin gene repertories among elasmobranchs and illustrates the potential of in vitro molecular experiments supported by genomic sequence analysis, in understanding underwater ecology of inaccessible species.

Fig. 6: Evolution of visual opsin repertories. Evolutionary history of visual opsin gene loss and duplication, including the duplicated C. milii LWS42, was suggested by phylogenetic analysis (Supplementary Fig. 8a). Dashed blank boxes indicate the absences of orthologues in the currently available genome assemblies. See Supplementary Note 19.1, for details of the habitat depths and the absorption spectra obtained by our spectroscopic analysis. Full size image

Our interest extended to olfactory receptor gene repertories that are often linked to adaptation to new lifestyles51. Although our present study does not include carnivorous epipelagic sharks that might have enhanced olfactory sensing, each of the shark species examined in the present study had only three olfactory receptor family genes (Supplementary Note 21), concordantly with the retention of few olfactory receptor family members by C. milii52. This finding indicates that at least the analysed shark species rely on a distinct molecular mechanism for olfaction from the conventional olfactory receptors.

Neuronal cell identities in mammalian brains are defined by the combinatorial expression of clustered protocadherin (Pcdh) genes53,54. Previously, the C. milii genome was shown to also contain a cluster of Pcdh genes55, but their expression profiles have remained unknown. Our study showed that elasmobranchs contain slightly higher numbers of Pcdh genes in markedly longer clusters than osteichthyans (Supplementary Fig. 9a,b and Supplementary Note 20). The bamboo shark transcriptome data demonstrated that most of the clustered Pcdh genes consist of both variable and constant exons and exhibit a biased expression pattern towards neural tissues, as previously shown for mammalian and teleost counterparts (Supplementary Fig. 9c). These findings, which are expected to be reinforced by single-cell analysis, suggest the early establishment of the mechanism for generating neuronal cell diversity through a Pcdh cluster in the last common ancestor of all extant jawed vertebrates.