Sequencing and annotation

The sequenced cavefish individual was the first-generation offspring of two wild-caught parents, which originated from Pachón cave, Tamaulipas, Mexico. While there are at least 29 caves that contain Astyanax cavefish, the Pachón cavefish are the most studied and exhibit the most extreme troglomorphic phenotypes relative to the other caves2. This genome draft was assembled to a size of 964 Mb, which is similar in size to a congeneric in Brazil17. The draft genome contains 10,735 scaffolds (N50 contig=14.7 kb; N50 scaffold=1.775 Mb), and the longest scaffold size was 9.823 Mb (Supplementary Data 1). Using the Ensembl annotation pipeline18 and RNAseq transcript evidence (eight unique tissues; Supplementary Data 2), we predicted a total of 23,042 protein-coding genes, similar to other sequenced teleost fishes. Zebrafish is the closest sequenced relative to cavefish (diverged approximately 250 Myr)19, and we annotated 16,480 one-to-one orthologs with zebrafish.

To estimate gene representation in the draft genome, we used assembled cavefish transcripts and evolutionarily conserved gene models. Alignment of the Astyanax best open reading frames (Supplementary Data 2) to the genome scaffolds found that across tissue-specific transcriptomes, a median of 81% of transcripts align over at least 75% of their length with at least 90% identity. Further, CEGMA analyses20 indicated that 95% of the 248 ultra-conserved core eukaryotic genes are present in the genome assembly, and 69% of the 248 ultra-conserved core eukaryotic genes were considered complete genes. Collectively, this suggests that the assembly has captured much of the protein-coding sequence in the cavefish genome.

Transposable element annotation

One-third of the cavefish genome is composed of transposable elements (TEs) (Supplementary Data 3 and 4). This repetitive content is comparable to most published fish genomes (Supplementary Data 3), with the exceptions of zebrafish (52.2% TEs)21 and Fugu (2.8%) (ref. 22). In the cavefish, DNA transposons are more abundant and diversified than retrotransposons, as there are at least 12 different superfamilies of DNA transposons representing 12.7% of the genome. In contrast, retrotransposons comprise only 2.3% of the genome (Supplementary Data 4).

It appears that a recent wave of transposition occurred in the cavefish genome (Fig. 1) and was composed mostly of Tc-Mariner and hAT superfamilies, which currently comprise approximately 9.5% of the cavefish genome. Similarly, zebrafish experienced a recent large expansion of repeat families, including Tc-Mariner and hAT superfamilies, whereas another common model, stickleback, has not (RepeatMasker Genomic Datasets). We estimated the potential age of the different of copies for each TE-related superfamilies by calculating Kimura distances assuming that most of the mutations in TE copies are neutral. The rates of transversions (q) and transitions (p) were transformed in Kimura distances using [K=−½ ln(1−2p−q)−¼ ln(1−2q)]. The cavefish genome differs in comparison with zebrafish in that it appears to lack very young elements (as indicated by the Kimura distance from the consensus, Fig. 1, RepeatMasker Genomic Datasets). Given the caveats of possible assembly artefacts, lack of very young elements indicates that it is unlikely that many copies of Tc-Mariner and hAT superfamilies are still active in the cavefish genome (confirmed by analyses of transcriptomes, Supplementary Data 5–9).

Figure 1: TE superfamilies’ history in the cavefish genome. Only superfamilies that show content higher than 0.1% in Supplementary Data 4 were used. Kimura distances are ranged from value 0 representing recent TE copies to 50 for the old TE insertions. Full size image

Identification of candidate genes under QTL

Perhaps the most distinct trait in cavefish is the reduced, nearly absent eye (Fig. 2), which is independently derived in multiple, independent cave invasions2,3,23. In cavefish, early eye development is largely similar to the eye development in surface fish in that lens vesicles and optic cups form, albeit, they are smaller in cavefish even at very early stages (14 h.p.f., hours post fertilization24). The lens apoptosis begins after 25 h.p.f. (refs 24, 25), and the retina undergoes significant apoptosis at about 35 h.p.f. (ref. 24). This apoptosis continues for days to weeks, and leads to an arrest of eye development25,26. We examine the genome for genes under QTL for eye size from Pachón cavefish × surface fish crosses from various studies4,6,7,8,15,16.

Figure 2: Photographs of surface and cavefish. (a,b) Surface fish (line 152) (c,d) Pachón cavefish (line 45). Scale bar for a,c is 1 cm. Scale bar for b,d is 0.25 cm. Photos by B.A.S. Full size image

Across studies, we count a total of 15 non-overlapping QTL for eye-related phenotypes discovered in the Pachón population4,7,8,9,10,15,16 (Supplementary Fig. 1). Scaffolds often did not span the entire critical region comprising a QTL; thus, each QTL critical region may be distributed across several scaffolds. All genes on a scaffold containing a marker linked to the QTL were included. In total, 2,408 genes out of the 23,042 genes annotated in this draft of the genome were associated with these genomic regions. It is likely that a significant portion of these genes that are physically linked to the causal variant are not responsible for the phenotype.

To narrow the list of candidate genes, we examined the gene expression in surface and cave populations with a developmental time course taken at 10 h.p.f., 24 h.p.f., 1.5 days post fertilization (d.p.f.), and 3 d.p.f. RNA from each time period was extracted from 50 whole, pooled individuals and Illumina reads were generated for cavefish and surface fish pools separately. An important caveat to interpreting the gene expression data is that even early in development, cavefish eyes are smaller than surface fish eyes, and lower numbers of transcripts may reflect smaller eyes and not necessarily downregulation. The transcript sequences were also used for obtaining coding variant differences between surface fish and cavefish. Due to the enormity of defining gene to QTL associations for many troglomorphic traits, we primarily focused on the eye phenotype.

Here we used expression data and integrated pathway analysis27 to predict likely phenotypes and the genes potentially underlying those phenotypes. Utilizing prior knowledge of predicted outcome between transcriptional regulators and their target genes27, we implicate 30 genes under the QTL to result in congenital eye anomalies. The direction of gene expression change between surface fish and cavefish supports an increased likelihood of eye anomalies in cavefish relative to surface fish at 10 h.p.f., 24 h.p.f., and 1.5 d.p.f. (for example, 12/27, 12/19, 11/30 genes have expression direction consistent with increased congenital anomaly of the eye, respectively; biased-corrected z score ≥2.266, P<0.0001 in all cases, see ref. 27 for details of z score calculation). At the last sampled time point (3 d.p.f.), the expression data are consistent with increased cavefish eye anomalies, but interestingly, the z scores become smaller with the progression of development (Supplementary Data 10–16) and are not significant at 3 d.p.f.

We performed an enrichment test with data combined across time points and found that the QTL were enriched for genes involved in congenital anomaly of the eye, (30/1,560 relative to 159/12,040 in the total expression data set; χ2-test with Yates correction P value <0.034, χ2=4.48, odds ratio=1.57, 95% confidence interval of odds ratio=1.05–2.35). Additional genes involved in eye development, function and disease were enriched in the QTL set, though not significantly so (129/1,560 relative to 921/12,040 in total data set; χ2-test with Yates correction P value=0.35, χ2=0.88, odds ratio=1.10, 95% confidence interval of odds ratio=0.91–1.34). Therefore, we contend that the eye-related QTL are qualitatively enriched for eye-related genes relative to the rest of the genome, but the eye-related QTL are quantitatively more likely to contain genes associated with congenital eye defects.

Specific candidate genes under eye-related QTL

Several genes found under the QTL are classic candidates for eye development, and we highlight several, which may be particularly promising. We narrowed down the list of candidate genes under the QTL by focusing on those with expression differences between cavefish and surface fish. Statistical comparisons of gene expression levels were performed using the measure of log fold change performed in Cuffdiff 2.1.0 (ref. 28) (see Cuffdiff 2.1.0 documentation for additional details of test). Unless otherwise noted, all P values given below for differential expression between cave and surface fish were generated by this test. Linkage group (LG) names are inconsistent across studies; thus, the LGs given below correspond to the naming scheme in the original study in which the QTL was found and those studies are cited after the LG name.

One of these candidate genes identified by this method is cryaa, an antiapoptotic chaperone protein whose absence of gene expression was hypothesized to play a role in cavefish eye degeneration26. Cryaa falls under a QTL for eye size on LG 27 (scaffold containing marker Am229b) from Protas et al.4 Next, pitx3 is essential for lens development in zebrafish29,30 and knockdown experiments result in zebrafish with degenerate lens and retinas and misshapen lower jaws29. Cavefish exhibit significantly lower expression of pitx3 at 24 h.p.f. and 3 d.p.f. (P<0.002 at both time points, qualitatively lower at all times), but there are only two synonymous differences between surface and cavefish pitx3. Pitx3 is located under the QTL for lens length on LG14 (ref. 4) and for eye size on LG4 (ref. 7). Similarly, rx3 is located under a QTL for eye size on LG4 (refs 4, 8) and underlies a loss of eyes in zebrafish (chokh) and medaka (eyeless) mutants31,32. Rx3 exhibits significantly less expression in cavefish than in surface fish at 10 h.p.f. and 3 d.p.f. (P<0.0003 at both time points, qualitatively lower at all times) and no coding variants. Likewise, under the QTL for eye size on LG4 (refs 4, 8) are the genes olfm2a and olfml2a. Zebrafish knockdowns of olfm2 result in abnormalities in the olfactory pits, eyes and optic tectum as well as reduced and less-defined Pax6 expression in the eye33, and olfm2a exhibited significantly lower expression in cavefish at 3 d.p.f. (P<0.001). We did not detect coding differences in olfm2a, and data were unavailable for olfml2a. Lastly, BCoR is found on LG19 (refs 4, 8). BCoR is linked with ocular colobomas in human and zebrafish34 and its binding partner, BCL6, has been shown to control optic cup morphogenesis through regulation of p53 in zebrafish35. Cavefish exhibit significantly lower expression of BCoR at 10 h.p.f. (P<0.013), and four nonsynonymous coding differences exist between surface and cavefish, though all appear to be in evolutionary labile sites. Importantly, these genes represent only a subset of the interesting candidates under QTL.

Candidate genes in QTL with potentially pleiotropic effects

For several QTL, multiple troglomorphic phenotypes co-localize with eye size, and this co-localization has been suggested as an evidence that selection for some cave-adapted traits resulted in pleiotropic degeneration of eyes7. One of these QTL is involved in vibration attraction behaviour, eye size and superficial neuromast number at the orbit on LG2 (ref. 7). This same QTL for eye size has been identified in multiple studies (LG7 (ref. 4), LG8 (refs 4, 8) and LG1 (ref. 16)) and a QTL for the thickness of the inner nuclear layer of the retina on LG2 (ref. 15). These LGs from various studies all correspond to the same genomic region, and here we count this region as a single QTL. We mainly concentrate on genes that are expressed in both cave and surface fish and appear to have not been pseudogenized in cavefish, as these are genes most likely to have pleiotropic effects and to be good candidates for driving multiple phenotypes that co-localize to the same QTL.

One of the more interesting candidate genes in this region is shisa2, which inhibits Wnt and fibroblast growth factor signalling by retaining their respective receptors, Frizzled and fibroblast growth factor receptor, in the endoplasmic reticulum36. Cavefish expression of shisa2 is qualitatively higher than surface fish at all time points (significantly so at 10 h.p.f., 24 h.p.f. and 1.5 d.p.f., P<0.005), but shisa2 contains only a single synonymous change between cave and surface fish. A duplicate copy of shisa2 is also under an eye QTL found on LG6 (refs 4, 8), and this paralog exhibits no coding differences and elevated, but mostly non-significant, expression in cavefish (24 h.p.f., P<0.0003, not significant at 10 h.p.f. and 1.5 d.p.f.) and lower expression in cavefish at 3 d.p.f. (P<0.0002).

Because shisa2 interacts with major drivers of development, we further assessed quantitative and spatial differences of expression for the two shisa2 genes (LG2 and LG6) by quantitative PCR (qPCR) and in situ hybridization on Astyanax embryos. For both genes, qPCR experiments did not detect significant differences at 36 h.p.f. between the two morphs, suggesting an expression difference of less than twofold (Fig. 3a; Mann–Whitney U-test, P>0.05). At this stage, shisa2-LG2 was expressed throughout the epidermis as well as in the olfactory epithelium and the lens in surface fish, but lens expression was notably missing in cavefish (Fig. 3b). Shisa2-LG6 had a more complex expression pattern, reminiscent of what was described in Xenopus37 and zebrafish38, and included expression in the branchial arches, cranial ganglia, epidermis, olfactory epithelium (like shisa2-LG2), retina and lens (Fig. 3c). No obvious difference was observed between surface fish and cavefish embryos concerning shisa2-LG6 expression pattern. In sum, anatomical analysis detected a lack of shisa2 (LG2) expression in the cavefish lens, which suggests changes in the regulatory region of this gene may contribute to the loss of eyes in cavefish.

Figure 3: Expression patterns of shisa2. (a) Quantitative PCR for shisa2 genes on 36 h.p.f. whole larvae of surface fish (blue) and cavefish (red). No significant difference was found between cavefish and surface fish expression (Mann–Whitney U-test, P>0.05). The error bar is the s.e. of the mean, and the sample size is three in each case (each triplicate is a pool of 10–15 36 h.p.f. larvae). Photographs of in situ hybridization for the indicated shisa2 mRNA at 36 h.p.f. on surface fish (b–d,h–j) and cavefish (e–g,k–m) embryos, focusing on head and eye expression. The bottom pictures (c,d,f,g,i,j,l,m) are centered on the eye region, with focus either on the lens/retina or on the overlying skin. In all panels, anterior is left and dorsal is up. In b,e,h,k, the photographs were taken from lightly labelled embryos (the epidermis is barely labelled) and in c,d,f,g,i,j,l,m, the photographs were taken from more strongly labelled embryos (epidermis expression is visible). Epid, epidermis; olf epit, olfactory epithelium (nose). The scale bars are 25 μm for panels b,e,h,k and 10 μm for the other panels. Full size image

In Xenopus embryos, shisa2 morpholino knockdown or mRNA injection elicit the expression changes for otx2, a key homeobox gene for head and eye development36. We, therefore, also compared otx2 expression patterns and levels in Astyanax cavefish and surface fish embryos, though we cannot localize this gene onto a specific LG. While otx2 pattern is similar in the two morphs during head and brain development (Fig. 4b), lens expression is much weaker in cavefish at 48 h.p.f. (Fig. 4c), as well as when assessed by whole-organism semi-quantitative reverse transcriptase-PCR (Fig. 4a, Supplementary Fig. 2). We have, therefore, identified a potential developmental regulatory cascade that may lead to the cavefish eye loss and that would involve shisa2 and otx2 in the developing lens.

Figure 4: Expression patterns of otx2. (a) Semi-quantitative reverse transcriptase-PCR for the oxt2 genes on 40 h.p.f. whole embryos. Cavefish (CF) otx2 transcripts are slightly less abundant than those of surface fish (SF) compared with an 18S rRNA standard. (b) Photographs of in situ hybridizations for otx2 mRNA at 10, 12.5, 40 and 48 h.p.f. on surface fish (SF) and CF embryos, focusing on head and eye expression. In all panels, anterior is on the left. In lower panels, dorsal is up. Scale bars are 100 μm for panels labelled 40 and 48 h.p.f. in (b). Scale bars are 250 μm for panels labelled 10 and 12.5 h.p.f. in b. (c) Sections of in situ-hybridized SF and CF larvae at 48 h.p.f. show strong otx2 downregulation in the cavefish lens. Scale bars are 100 μm for panels in c. Full size image

In addition to shisa2, we identified candidate genes under this potentially pleiotropic QTL. Several genes meeting our criteria under this particular QTL include prox1 and AIFM1. Two additional genes found in the QTL analysis of O’Quin et al.15, crxa and Tbx2a, are also present under this QTL in our analysis. Prox1 regulates many processes in development including lens fibre elongation and differentiation and the exit of retinal progenitor cells from the cell cycle reviewed in ref. 39. The knockdown of prox1 results in the disruption of the lens-specific γ-crystallin expression and subsequent lens apoptosis40. Cavefish expression of prox1 exhibits a similar spatial pattern to surface fish in the developing lens, and for this reason prox1 was previously considered unlikely to play a role in the cave-specific eye degeneration41. Prox1 is expressed in sensory hair cells of the neuromast and taste receptor cells of taste buds, both of which are more numerous in cavefish relative to surface, but prox1 expression in these structures does not occur until 96 h.p.f. (ref. 41). We detected no sequence differences between cave and surface fish for prox1. However, in our whole-organism RNAseq data, significantly lower expression in cavefish was observed at 24 h.p.f., 1.5 d.p.f. and 3 d.p.f. (P<0.022 in all cases), while marginally non-significant higher expression in cavefish was observed at the earliest sampled time point (10 h.p.f., P=0.083). Significantly lower expression of prox1 during these developmental time points is consistent with increased lens apoptosis in cavefish. Therefore, a re-examination of the contribution of prox1, in light of its location under this QTL for suborbital neuromast cell number, VAB and eye size7 and its quantitative expression differences, may be warranted.

AIFM1 is implicated in significant and progressive optic atrophy in mutant Harlequin mice, and this mutant phenotype can be rescued by injection of an expression vector containing AIFM1 (ref. 42). Cavefish exhibit significantly lower expression of AIFM1 at 24 h.p.f. (P<0.003) than surface fish, and this gene exhibits an intronic splice region variant and five nonsynonymous variants, two of which appear derived in cavefish. These variants were all predicted to be tolerated by a computational method that attempts to determine if an amino acid substitution is detrimental to protein function (SIFT43). Interestingly, the paralog of this gene, AIFM2, is also located under the QTL for eye size found on LG14 (ref. 4) and LG4 (ref. 7). AIFM2 has significantly reduced expression in cavefish relative to surface fish at most time points (10 h.p.f., 24 h.p.f., 3 d.p.f.; P<0.022 in all cases) with qualitatively lower expression at 1.5 d.p.f. (P=0.095). Further, the two splice region variants are fixed between the surface and cavefish, one of which also results in a nonsynonymous change that is putatively derived in cavefish, though this change is predicted by SIFT to be tolerated.

Crxa induces retinal stem cells to differentiate into functional photoreceptors44. When knocked down in zebrafish, crxa prompts the downregulation of genes in the phototransduction cascade45, and is implicated in eye reduction experienced by another troglomorphic fish, Sinocyclocheilus anophtalmus46. This gene exhibits significantly reduced expression in cavefish at 1.5 and 3 d.p.f. (P<0.001; at the other time points, expression could not be tested). Crxa contained no sequence differences between cave and surface fish.

Tbx2a exhibits localized expression in zebrafish mainly in the otic placode, optic vesicle, otic vesicle and retina (also in ventral mesoderm and pectoral fin bud)38. Tbx2 results in smaller optic cups when mutated in mice47, and two copies exist in zebrafish. Tbx2a is involved in craniofacial and pharyngeal arch development48, and its paralog Tbx2b is required for proper retinal neuronal formation in zebrafish49. Tbx2a exhibits lower expression in cavefish at all time points (P<0.07 at 1.5 d.p.f., P<0.001 at all other time points) and three nonsynonymous differences between cave and surface. Only one nonsynonymous difference is putatively derived in cavefish (D401E), and such an amino acid replacement is predicted by SIFT to be tolerated.

Under the second co-localizing QTL for the traits’ vibration attraction behaviour, superficial neuromast number at orbit and eye size located on LG17 in Yoshizawa et al.7, we lacked scaffold coverage for several markers in the center of the QTL (208e, 205d and 221a; Supplementary Data 17). There are several interesting genes in this region (Supplementary Data 11), but few are as compelling as genes found on the co-localizing QTL on LG2 of Yoshizawa et al.7 We expect future drafts of the genome to uncover additional candidate genes in this region.

Candidate genes for additional cave phenotypes

Lastly, we sought to briefly investigate other distinctive traits for cavefish, including reduced pigmentation5,6. First, we found that one of the most famous pigmentation genes, mc1r, known to be mutated in Pachón cavefish5 (the population from which the QTL were mapped), is located under the critical region of the QTL for number of melanocytes in four regions of the body (LG9 (refs 4, 8)). Second, cavefish have an increased number of taste buds and increased number of maxillary teeth4,8. A QTL for number of taste buds contains the serotonin receptor htr2a (LG5 (refs 4, 8)), and taste cell development and signal transduction involves serotonin signalling50,51. Third, a QTL for the number of maxillary teeth in cavefish (LG13 (refs 4, 8)) contains dact2, which significantly inhibits dlx2 during tooth formation in mouse52. When knocked out in mice, dlx2 produces a decrease in the number of molars53, supporting the notion that this gene may have a conserved role in the regulation of tooth formation.

Analyses of putative gene losses

We investigated genes that were putatively lost in the cavefish lineage since the divergence of cavefish and zebrafish, by examining genes that were present in zebrafish and eight additional actinopterygian teleosts available in Ensembl (Supplementary Data 18). These genes were not enriched for 305 gene ontology accessions related to eye development or function, and similar results were obtained for ZFIN anatomical expression data and ZFIN-predicted phenotype.

Transcriptome data from the eight tissues used for gene annotation and the developmental surface fish and cavefish time series were assembled using Trinity54 for a total of 10 separate transcriptomes. Open reading frames were predicted from these assemblies using Transdecoder in the Trinity package. We constructed a BLAST database from the coding regions of zebrafish from Ensembl Genes 74 and queried this database using each of the transcripts in the longest_orfs.cds files with BLASTn. We used a strong e-value cutoff (cutoff<1E-100), and results were robust for all values we examined from 1E-20 to 1E-100. In this way, we identified whether the putatively missing gene in the cavefish genome (but present in the zebrafish genome) was potentially present in the surface or cave-derived RNAseq data.

For several genes that were potential candidates for loss, we could not find a representative transcript for cavefish but could find a transcript copy among the surface fish transcriptome data (Supplementary Data 19). We attempted to confirm the lack of a transcript in cavefish using reverse transcription. However, for all cases that we tried to confirm a putatively missing cavefish transcript, a cavefish transcript was detected.

Although not adding evidence for cavefish-specific loss, for several large gene families, one or several members were not annotated in the genome sequence and were not detected in surface or cavefish transcriptome data. While these results are very preliminary, potential candidates for gene loss include members of gene families involved in vision such as retinol dehydrogenases, crystallins, sine oculis homeoboxes, opsins/rhodopsins (including melanopsin whose truncating mutation is implicated in the loss of a light-entrainable clock in Somalian cavefish55), development, regulation of sleep and circadian clocks (including fibroblast growth factors, gamma-aminobutyric acid A receptors, and dopamine receptors). Likewise, cavefish exhibit excessive locomotor activity compared with surface fish56, and several genes that induce hyperlocomotion when knocked out or blocked in mice or zebrafish do not appear to be present in the current cavefish genome annotation or transcriptome data (Supplementary Data 19). Interestingly, the naked mole rat, a species that also lives in darkness and has reduced eyes, has also experienced losses in similar gene families57. Assembly and annotation errors of large gene families are common in draft genomes; thus, a more extensive and definitive exploration of these complex gene families awaits future studies. We provide a list from the initial, preliminary analysis (Supplementary Data 19) to facilitate future studies.