Genome assembly and annotation

The genome of a female Antarctic blackfin icefish from the Antarctic Peninsula (Supplementary Figs. 1 and 2) was sequenced by single-molecule real-time technology with a PacBio Sequel instrument, yielding ~90× genome coverage and a 13-kilobase average read length (Supplementary Table 1). The genome size was estimated, by k-mer analysis using Jellyfish software, to be 1.1 gigabase pairs (Supplementary Fig. 3). The FALCON-Unzip assembled genome contained 3,852 contigs totalling 1.06 gigabase pairs with a contig N50 size of 1.5 megabase pairs (Mb) (Table 1). Evaluation of the genome for completeness based on BUSCO21 identified 89.9% complete and 3.6% fragmented genes from the 4,584-gene Actinopterygii dataset (Supplementary Table 2). The icefish genome contains 30,773 inferred protein-coding genes based on combined ab initio gene prediction, homology searching and transcript mapping (Table 1 and Supplementary Tables 3 and 4). Small-RNA transcriptomics from 5 tissues facilitated annotation of microRNAs (miRNAs), identifying 290 miRNA genes that produced 334 unique mature miRNAs (Table 1 and Supplementary Tables 5 and 6). The icefish genome contains 50.4% repetitive sequences, most of which (47.4% of the total genome) are transposable elements (Supplementary Table 7 and Supplementary Note 1). Inferring the history of repeat elements by calculating the relative age of transposable element copies through Kimura distance analyses and comparisons with other teleosts (Supplementary Note 1) revealed a recent burst of DNA transposons and long and short interspersed elements. This result is consistent with the hypothesis22 that exposure to strong environmental changes, such as cooling to sub-zero temperatures and a series of glaciation and deglaciation cycles, led to massive mobilization of transposable elements.

Table 1 Icefish assembly and annotation statistics Full size table

Genetic linkage map and genome assembly integration

To make a chromonome (a chromosome length genome assembly)23, we constructed a genetic map for blackfin icefish. RAD-tag sequencing24 produced 20 million reads each for male and female parents, and an average of 2.4 million reads for each of 83 individual progeny. Stacks software25 identified 60,038 RAD-tags, of which 56,256 (93.7%) were present in at least 10 progeny. Of 7,215 polymorphic RAD-tags, 4,952 (55.8%) were present in at least 60 of 83 progeny and 4,023 localized to the male map, the female map or both at a minimum logarithm of the odds (LOD) of 12. JoinMap 4.1 assigned markers to 24 linkage groups (Supplementary Fig. 4, accession SRP118539)—one for each cytogenetic chromosome26. Because the map showed that each icefish chromosome was an orthologue of each medaka chromosome, we numbered icefish linkage groups to match their medaka counterparts. C. aceratus linkage group 6 (Cac6) had the most markers (202) and Cac2 had the fewest (131). Cac21 was the longest (65.8 cM) and Cac12 was the shortest (46.7 cM). Chromonomer software (http://catchenlab.life.illinois.edu/chromonomer) aligned contigs to the genetic map. Of 3,852 contigs in the assembly, 1,063 (27%) aligned on the genetic map by at least one marker for a total length of 820 Mb of the 1,065 Mb (77%) assembly. Only one contig was chimeric (Ice_000013): one end mapped to Cac7 and the other to Cac14 (Supplementary Fig. 4).

Synolog27 displayed conserved syntenies, revealing that each icefish chromosome is orthologous to a single chromosome in both medaka (Oryzias latipes, Ola; Fig. 1a) and European sea bass (Dicentrarchus labrax, Dla; Supplementary Fig. 5a). We detected a single small internal translocation (Supplementary Fig. 5d,e), but no reciprocal chromosomal translocations were found in the lineages of icefishes, sea bass and medaka since their lineages diverged ~113 Ma28. Chromosome stability in teleost fish is remarkable compared with mammals, where, for example, different deer species have between 3 and 40 haploid chromosomes and different rodents have between 5 and 51 (ref. 29). Although the blackfin icefish retains the ancestral chromosome number, many Antarctic notothenioids do not; for example, different species in the genus Notothenia have 13, 12 or 11 chromosomes rather than the ancestral 24 due to centromeric fusions of entire ancestral chromosomes30.

Fig. 1: Chromosome stability of blackfin icefish with respect to teleost outgroups. a, Gene content in icefish chromosomes supports a one-to-one correspondence between icefish and medaka chromosomes. Each line represents orthologous genes in icefish and medaka, colour-coded by icefish chromosome. The few lines that cross linkage groups (LGs) probably represent paralogues. b, A comparison of orthologous gene orders in icefish LG12 (Cac12) and medaka LG12 (Ola12) illustrates icefish-specific chromosome inversions and transpositions (see text). Each line represents orthologous genes in the icefish and medaka chromosome, colour-coded by icefish genomic scaffold. Conserved syntenic blocks are labelled 1–8. c, Comparison of orthologous gene order in Cac12 and European sea bass LG19 (Dla19). Conserved syntenic blocks are labelled 1–8. d, Comparison of orthologous gene order between sea bass Dla19 and medaka Ola12 reveals that most chromosome rearrangements occurred after the divergence of the icefish lineage from the sea bass lineage. M, megabase position along the chromosome. Full size image

Although orthologous chromosomes in icefish, sea bass and medaka chromosomes share gene content, gene order was often not well conserved. For example, Cac12 and Ola12 contain multiple conserved syntenic blocks (Fig. 1b, blocks 1–8) rearranged by inversions and transpositions. Most of those blocks have the same order in Ola12 and Dla12, showing that rearrangements occurred in the icefish lineage after it separated from the sea bass lineage. Conserved blocks ‘4’ and ‘5’ appear in the opposite order in sea bass and stickleback chromosomes (Fig. 1c and Supplementary Fig. 5b), and comparisons between sea bass and medaka (Fig. 1d) or stickleback (Supplementary Fig. 5c) suggest an inversion in the medaka lineage. Other lineage-specific rearrangements in Cac12 are evident, and analysis of other chromosomes confirms that, despite the paucity of translocations over more than 100 Myr (Fig. 2a), multiple rearrangements within chromosomes occurred in the icefish lineage after it separated from the sea bass lineage (Supplementary Fig. 6).

Fig. 2: Comparative analysis of the C. aceratus genome assembly. a, Phylogenetic tree and gene family gain-and-loss analysis, including the number of gained gene families (+) and lost gene families (−). Blue numbers specify divergence times between lineages. The red dotted line indicates the appearance of Antarctic ice sheets (35 Ma), which allowed the circum-Antarctic current to form after the opening of the Drake Passage. Subsequent cooling of the Southern Ocean drove local extinction of most fish taxa and adaptive radiation of the Antarctic notothenioid suborder. E, Eocene; M, Miocene; O, Oligocene; P, Palaeocene. b, Inferring icefish population history by PSMC analysis. The left y axis represents the demographic history of C. aceratus (red line). During the Plio-Pleistocene (3–0.9 Ma), which is shaded blue, Antarctic sea-surface temperatures dropped by around 2.5 °C, judged by a proxy for marine palaeo-temperature changes based on oxygen isotope ratios91,92 (right y axis). Concomitant decreases in marine temperatures (black line) probably allowed the cold-adapted C. aceratus populations to increase in size. The green shading represents the mid-Pleistocene transition, during which temperature fluctuations were large. g, generation time; μ, mutation rate. Full size image

Phylogenomics and genome expansion

A comparison of genome sequences by OrthoMCL showed that the blackfin icefish has 18,636 of 24,159 orthologous gene clusters identified in 13 teleosts. A genome-wide set of 3,718 one-to-one orthologues provided a phylogenetic tree of 13 teleosts using maximum likelihood (Supplementary Tables 8 and 9). According to the time-calibrated phylogeny, the common ancestor of the three Antarctic fishes with genome sequences (C. aceratus, Parachaenichthys charcoti (Charcot’s dragonfish) and Notothenia coriiceps (bullhead notothen)) diverged from the stickleback lineage ~77 Ma, and icefishes diverged from the dragonfish lineage ~7 Ma (Fig. 2a and Supplementary Fig. 7). Gene family analysis identified a core set of 9,647 gene families that were shared among 6 represented fishes (three Antarctic species, stickleback, medaka and zebrafish) and 445 blackfin icefish-specific gene families (Supplementary Fig. 8).

The icefish has 373 significantly expanded and 346 significantly contracted gene families based on the z score of gene count differences among 13 teleosts (Supplementary Tables 10 and 11). The blackfin icefish lineage experienced the largest gene family turnover among the 13 species after it diverged from the dragonfish (significant gains: 280 genes; significant losses: 6 genes) (Fig. 2a and Supplementary Table 12). Gene families with a significant number of genes gained were enriched for sensory perception (Supplementary Note 2), oxidoreductase activity and ion binding (Supplementary Table 13). Forty genes appeared to be positively selected specifically in the icefish lineage after divergence from the dragonfish lineage (Supplementary Table 14). Positively selected genes were enriched in two functional categories: oxidoreductase activity (presumably related to life without haemoglobin) and lipid binding (presumably related to buoyancy increase connected to changes from a strictly benthic lifestyle) (Supplementary Table 15).

Genomic variation and population history

We identified 9,365,677 heterozygous single nucleotide polymorphisms in the genome of the sequenced icefish female, resulting in a frequency of heterozygous sites in the sequenced fish of 8.79 × 10−3, which is greater than other individual genomes of marine fish such as the Atlantic cod (2.09 × 10−3)31 and ocean sunfish (0.78 × 10−3)32. Analysis using the pairwise sequentially Markovian coalescent (PSMC) model33 suggested two epochs that shaped icefish demographic history. First, icefish populations appeared to reach maximum size ~1 Ma at the end of the Plio-Pleistocene cooling event (3.0–0.9 Ma), after Antarctic ocean surface temperatures had dropped by 2.5 °C34. Adaptations made during the slow cooling of the Plio-Pleistocene may have allowed the icefish lineage time to achieve its maximum effective population size. Second, icefish populations appeared to decline during temperature fluctuations in the mid-Pleistocene transition (~1.2–0.55 Ma)35 (Fig. 2b), which probably presented a physiological burden for the thermally sensitive icefish35.

Expansion of AFGP and zona pellucida gene families

AFGP genes, which evolved from trypsinogen genes36, were tandemly duplicated in icefish as they are in Antarctic toothfish (Dissostichus mawsoni)37 (Fig. 3a). Our results show that the Antarctic fish AFGP–trypsinogen locus is situated between mitochondrial ribosomal protein L (mrpl) and E3 ubiquitin-protein ligase CBL (cbl), consistent with the location of the trypsinogen gene in several percomorph teleosts (Fig. 3a). A low-coverage Illumina-based draft assembly of the C. aceratus genome annotated 4 copies of AFGP genes38, whereas our results revealed 11 copies of AFGP genes adjacent to 10 tandem copies of trypsinogen genes and 2 copies of trypsinogen-like protease genes.

Fig. 3: Conserved syntenies for expanded gene clusters identified in the blackfin icefish genome. a, AFGP and trypsinogen gene loci. The pink-shaded area indicates the trypsinogen gene locus. kbp, kilobase pair. b, Zona pellucida c5 (zpc5) locus. c, sod3 gene cluster. Genomic neighbourhoods are shown within representative sequenced teleost genomes. Each arrow indicates a complete gene orientated in the (5′ → 3′) direction. d. Phylogenetic analysis of vertebrate sod3 genes. Divergence times were calculated by applying the mutation rate formula μ = D/2t = 3.28 × 10−9. Ca, C. aceratus (icefish); Dr, D. rerio (zebrafish); Ga, G. aculeatus (stickleback); Nc, N. coriiceps (bullhead notothen); Ol, O. latipes (medaka); Pf, P. Formosa (Amazon molly); Tr, T. rubripes (fugu); Xm, X. maculatus (platyfish). Full size image

Antarctic fish embryos do not appear to express AFGP genes12,39, which raises the question of how these embryos resist freezing12. Zona pellucida egg-coat proteins play roles in fertilization and preventing polyspermy in mammals40, and provide thickness and hardness to fish eggshells41. Zona pellucida proteins from Antarctic toothfish depress the melting point of ice12. The zona pellucida protein family expanded extensively in the C. aceratus genome: 131 zona pellucida genes, including 109 tandemly duplicated genes on 20 contigs (Supplementary Table 16), fell into 11 subfamilies based on sequence similarities (Supplementary Table 17). In contrast with our icefish sequence, 2 other Antarctic species—the bullhead notothen N. coriiceps42 and dragonfish P. charcoti43—had only 18 or 30 zona pellucida genes, respectively, and only 16–35 zona pellucida genes appeared in 7 non-Antarctic teleosts based on Ensembl queries. Furthermore, transcripts of only 18 zona pellucida genes were found in the Antarctic toothfish12. Blackfin icefish had many more copies of the zpax1, zpc1, zpc2 and zpc5 genes than other fish (Supplementary Table 16). For instance, 18 zpc5 genes were tandemly duplicated in a single contig (Ice_000281) that also contained two zpc3 paralogues (Fig. 3b). Another contig (Ice_000114) had five zpc3 and three zpc5 genes, which suggests conserved synteny of these paralogues, in agreement with their location in the genomes of stickleback and medaka (Supplementary Fig. 9). The ovaries and liver express zona pellucida genes in vertebrates44, and most C. aceratus zona pellucida genes were strongly expressed in the ovaries, similar to three other Antarctic fish12 (Supplementary Fig. 10), although transcription of several zona pellucida genes was detected in other C. aceratus organs. It is possible that the extra-ovarian expression observed for some zona pellucida paralogues represents adaptive, C. aceratus-specific neofunctionalization45 of this expanded gene family.

Genes for oxygen-binding proteins

Most teleost genomes have two globin gene clusters that arose during the teleost genome duplication: the LA cluster (with lcmt1 and aqp8 on one side and rhbdf1b on the other) and the MN cluster (flanked by mpg and nprl3 on one side and kank2 on the other)46. Within each teleost hb cluster, α- and β-chain genes generally alternate, in contrast with mammals, in which α- and β-gene clusters reside on different chromosomes46. The loss of hbb genes and pseudogenation of hba genes is an icefish synapomorphy; 15 of 16 icefish species retained only a 3′ fragment of an α-globin gene2. The sixteenth species retained an intact but unexpressed hba gene fused to two β-pseudogenes, which was probably inherited from ancestors due to incomplete lineage sorting2. The results show that the residual α-globin fragment in blackfin icefish mapped to the LA cluster, and the blackfin icefish genome possesses no trace of the MN cluster, although surrounding genes were preserved intact (Supplementary Fig. 11). In contrast, intact genes for myoglobin47, cytoglobin48 and neuroglobin49 appear in the C. aceratus genome in the context of conserved synteny among teleosts (Supplementary Fig. 12). Our sequence provides the substrate to learn the molecular genetic mechanisms that inhibit myoglobin expression, which remain to be elucidated.

Oxidative stress

Some icefishes, including C. aceratus, are more sensitive to oxidative stress than red-blooded notothenioids are50,51,52. The volume of polyunsaturated-fatty-acid-rich mitochondria per volume of skeletal or cardiac muscle cell in icefishes is approximately twice as large as in red-blooded Antarctic fishes, which may make icefishes more susceptible to reactive oxygen species (ROS) formation and lipid peroxidation at current environmental temperatures52. Furthermore, the lower thermal tolerance of icefishes relative to red-blooded notothenioids may be due to increased protein and lipid damage in icefish cardiac muscle50. If disrupted, the respiratory chain in icefish mitochondria generates more ROS than red-blooded Antarctic fish51. Finally, levels of antioxidants in icefishes are low relative to red-blooded notothenioids52. These data suggest that some icefishes are probably under selective pressure to enhance their antioxidant defence systems53.

Gene families associated with ROS homeostasis (Supplementary Table 18), including those encoding superoxide dismutase (SOD) and NAD(P)H:quinone acceptor oxidoreductase (NQO), were expanded in the C. aceratus genome. We found that blackfin icefish has five sod genes (sod1, sod2 and three tandemly repeated copies of sod3) compared with just three sod genes typical for other percomorph teleosts (Fig. 3c and Supplementary Table 19). Mutation rate analysis suggested that the sod3 gene duplicates arose as recently as ~2.3 Ma (Fig. 3d). Because the multiple sod3 genes appear to encode extracellular SOD3 enzymes (each protein possesses an apparent secretory signal peptide), an understanding of their roles in extracellular versus intracellular ROS homeostasis will require further study.

The expansion of nqo1 genes in the icefish genome was striking: we found a total of 33 genes, in contrast with the 2–10 nqo1 genes annotated in most fish genomes (Supplementary Fig. 13). Teleosts generally have two loci containing nqo1 genes in the genomic contexts vang-nqo1s-ackr3 and il17-nqo1-gabarapl; both regions appear to have been ancestrally linked, as they are today in pufferfish (Takifugu rubripes) and medaka, and the icefish nqo1 genes conformed to this pattern. However, C. aceratus possessed 26 additional nqo genes on 3 contigs that did not appear to be part of the 2 conserved nqo1 loci (Supplementary Fig. 13). In addition, the icefish is the only sequenced teleost to have two tandem copies of 8-oxoguanine DNA glycosylase (ogg1), which encodes a protein that excises from DNA a modified base that arises from reactive oxygen damage, whereas other sequenced teleost genomes have just one ogg1 copy (Supplementary Fig. 14).

Circadian adaptation to extremely fluctuating photoperiods

Polar species inhabit an environment with extreme annual fluctuations of day length, raising questions regarding the role of circadian rhythm genes in these organisms. The cry and per genes regulate a phylogenetically conserved circadian feedback loop by reciprocal transcriptional controls54. Teleost genomes have various numbers of cry genes (for example, seven in zebrafish and five in stickleback)55. Although icefish maintained the genomic structure around cry and per genes that is strongly conserved in teleost genomes, cry1, cry2, per2a and per3 sequences appeared to be specifically deleted in icefish evolution (Fig. 4a and Supplementary Table 20). The icefish genome possesses only three cry genes—the smallest number identified in any teleost. Although the bullhead notothen and dragonfish genome assemblies are incomplete, they also possess and lack the same circadian rhythm genes as blackfin icefish; thus, available evidence promotes the hypothesis that the extremes of winter darkness and summer light may have reduced the utility of, and hence decreased the pressure to retain, some circadian rhythm regulators in Antarctic fish. Behavioural studies on Antarctic icefishes and other notothenioid species will be necessary to validate this hypothesis.