As the species with a sequenced genome closest to our most recent aquatic ancestor, the coelacanth provides a unique opportunity to identify genomic changes that were associated with the successful adaptation of vertebrates to the land environment.

Over the 400 Myr that vertebrates have lived on land, some genes that are unnecessary for existence in their new environment have been eliminated. To understand this aspect of the water-to-land transition, we surveyed the Latimeria genome annotations to identify genes that were present in the last common ancestor of all bony fish (including the coelacanth) but that are missing from tetrapod genomes. More than 50 such genes, including components of fibroblast growth factor (FGF) signalling, TGF-β and bone morphogenic protein (BMP) signalling, and WNT signalling pathways, as well as many transcription factor genes, were inferred to be lost based on the coelacanth data (Supplementary Data 7 and Supplementary Fig. 9). Previous studies of genes that were lost in this transition could only compare teleost fish to tetrapods, meaning that differences in gene content could have been due to loss in the tetrapod or in the lobe-finned fish lineages. We were able to confirm that four genes that were shown previously to be absent in tetrapods (And1 and And2 (ref. 29), Fgf24 (ref. 30) and Asip2 (ref. 31)), were indeed present and intact in Latimeria, supporting the idea that they were lost in the tetrapod lineage.

We functionally annotated more than 50 genes lost in tetrapods using zebrafish data (gene expression, knock-downs and knockouts). Many genes were classified in important developmental categories (Supplementary Data 7): fin development (13 genes); otolith and ear development (8 genes); kidney development (7 genes); trunk, somite and tail development (11 genes); eye (13 genes); and brain development (23 genes). This implies that critical characters in the morphological transition from water to land (for example, fin-to-limb transition and remodelling of the ear) are reflected in the loss of specific genes along the phylogenetic branch leading to tetrapods. However, homeobox genes, which are responsible for the development of an organism’s basic body plan, show only slight differences between Latimeria, ray-finned fish and tetrapods; it would seem that the protein-coding portion of this gene family, along with several others (Supplementary Note 9, Supplementary Tables 12–16 and Supplementary Fig. 10), have remained largely conserved during the vertebrate land transition (Supplementary Fig. 11).

As vertebrates transitioned to a new land environment, changes occurred not only in gene content but also in the regulation of existing genes. Conserved non-coding elements (CNEs) are strong candidates for gene regulatory elements. They can act as promoters, enhancers, repressors and insulators32,33, and have been implicated as major facilitators of evolutionary change34. To identify CNEs that originated in the most recent common ancestor of tetrapods, we predicted CNEs that evolved in various bony vertebrate (that is, ray-finned fish, coelacanth and tetrapod) lineages and assigned them to their likely branch points of origin. To detect CNEs, conserved sequences in the human genome were identified using MULTIZ alignments of bony vertebrate genomes, and then known protein-coding sequences, untranslated regions (UTRs) and known RNA genes were excluded. Our analysis identified 44,200 ancestral tetrapod CNEs that originated after the divergence of the coelacanth lineage. They represent 6% of the 739,597 CNEs that are under constraint in the bony vertebrate lineage. We compared the ancestral tetrapod CNEs to mouse embryo ChIP-seq (chromatin immunoprecipitation followed by sequencing) data obtained using antibodies against p300, a transcriptional coactivator. This resulted in a sevenfold enrichment in the p300 binding sites for our candidate CNEs and confirmed that these CNEs are indeed enriched for gene regulatory elements.

Each tetrapod CNE was assigned to the gene whose transcription start site was closest, and gene-ontology category enrichment was calculated for those genes. The most enriched categories were involved with smell perception (for example, sensory perception of smell, detection of chemical stimulus and olfactory receptor activity). This is consistent with the notable expansion of olfactory receptor family genes in tetrapods compared with teleosts, and may reflect the necessity of a more tightly regulated, larger and more diverse repertoire of olfactory receptors for detecting airborne odorants as part of the terrestrial lifestyle. Other significant categories include morphogenesis (radial pattern formation, hind limb morphogenesis, kidney morphogenesis) and cell differentiation (endothelial cell fate commitment, epithelial cell fate commitment), which is consistent with the body-plan changes required for land transition, as well as immunoglobulin VDJ recombination, which reflects the presumed response differences required to address the novel pathogens that vertebrates would encounter on land (Supplementary Note 10 and Supplementary Tables 17–24).

A major innovation of tetrapods is the evolution of limbs characterized by digits. The limb skeleton consists of a stylopod (humerus or femur), the zeugopod (radius and ulna, or tibia and fibula), and an autopod (wrist or ankle, and digits). There are two major hypotheses about the origins of the autopod; that it was a novel feature of tetrapods, and that it has antecedents in the fins of fish35 (Supplementary Note 11 and Supplementary Fig. 12). We examine here the Hox regulation of limb development in ray-finned fish, coelacanth and tetrapods to address these hypotheses.

In mouse, late-phase digit enhancers are located in a gene desert that is proximal to the HOX-D cluster36. Here we provide an alignment of the HOX-D centromeric gene desert of coelacanth with those of tetrapods and ray-finned fishes (Fig. 2a). Among the six cis-regulatory sequences previously identified in this gene desert36, three sequences show sequence conservation restricted to tetrapods (Supplementary Fig. 13). However, one regulatory sequence (island 1) is shared by tetrapods and coelacanth, but not by ray-finned fish (Fig. 2b and Supplementary Fig. 14). When tested in a transient transgenic assay in mouse, the coelacanth sequence of island 1 was able to drive reporter expression in a limb-specific pattern (Fig. 2c). This suggests that island 1 was a lobe-fin developmental enhancer in the fish ancestor of tetrapods that was then coopted into the autopod enhancer of modern tetrapods. In this case, the autopod developmental regulation was derived from an ancestral lobe-finned fish regulatory element.

Figure 2: Alignment of the HOX-D locus and an upstream gene desert identifies conserved limb enhancers. a, Organization of the mouse HOX-D locus and centromeric gene desert, flanked by the Atf2 and Mtx2 genes. Limb regulatory sequences (I1, I2, I3, I4, CsB and CsC) are noted. Using the mouse locus as a reference (NCBI and mouse genome sequencing consortium NCBI37/mm9 assembly), corresponding sequences from human, chicken, frog, coelacanth, pufferfish, medaka, stickleback, zebrafish and elephant shark were aligned. Alignment shows regions of homology between tetrapod, coelacanth and ray-finned fishes. b, Alignment of vertebrate cis-regulatory elements I1, I2, I3, I4, CsB and CsC. c, Expression patterns of coelacanth island I in a transgenic mouse. Limb buds are indicated by arrowheads in the first two panels. The third panel shows a close-up of a limb bud. PowerPoint slide Full size image

Changes in the urea cycle provide an illuminating example of the adaptations associated with transition to land. Excretion of nitrogen is a major physiological challenge for terrestrial vertebrates. In aquatic environments, the primary nitrogenous waste product is ammonia, which is readily diluted by surrounding water before it reaches toxic levels, but on land, less toxic substances such as urea or uric acid must be produced instead (Supplementary Fig. 15). The widespread and almost exclusive occurrence of urea excretion in amphibians, some turtles and mammals has led to the hypothesis that the use of urea as the main nitrogenous waste product was a key innovation in the vertebrate transition from water to land37.

With the availability of gene sequences from coelacanth and lungfish, it became possible to test this hypothesis. We used a branch-site model in the HYPHY package38, which estimates the ratio of synonymous (dS) to non-synonymous (dN) substitutions (ω values) among different branches and among different sites (codons) across a multiple-species sequence alignment. For the rate-limiting enzyme of the hepatic urea cycle, carbamoyl phosphate synthase I (CPS1), only one branch of the tree shows a strong signature of selection (P = 0.02), namely the branch leading to tetrapods and the branch leading to amniotes (Fig. 3); no other enzymes in this cycle showed a signature of selection. Conversely, mitochondrial arginase (ARG2), which produces extrahepatic urea as a byproduct of arginine metabolism but is not involved in the production of urea for nitrogenous waste disposal, did not show any evidence of selection in vertebrates (Supplementary Fig. 16). This leads us to conclude that adaptive evolution occurred in the hepatic urea cycle during the vertebrate land transition. In addition, it is interesting to note that of the five amino acids of CPS1 that changed between coelacanth and tetrapods, three are in important domains (the two ATP-binding sites and the subunit interaction domain) and a fourth is known to cause a malfunctioning enzyme in human patients if mutated39.

Figure 3: Phylogeny of Cps1 coding sequences is used to determine positive selection within the urea cycle. Branch lengths are scaled to the expected number of substitutions per nucleotide, and branch colours indicate the strength of selection (dN/dS or ω). Red, positive or diversifying selection (ω > 5); blue, purifying selection (ω = 0); yellow, neutral evolution (ω = 1). Thick branches indicate statistical support for evolution under episodic diversifying selection. The proportion of each colour represents the fraction of the sequence undergoing the corresponding class of selection. PowerPoint slide Full size image

The adaptation to a terrestrial lifestyle necessitated major changes in the physiological environment of the developing embryo and fetus, resulting in the evolution and specialization of extra-embryonic membranes of the amniote mammals40. In particular, the placenta is a complex structure that is critical for providing gas and nutrient exchange between mother and fetus, and is also a major site of haematopoiesis41.

We have identified a region of the coelacanth HOX-A cluster that may have been involved in the evolution of extra-embryonic structures in tetrapods, including the eutherian placenta. Global alignment of the coelacanth Hoxa14–Hoxa13 region with the homologous regions of the horn shark, chicken, human and mouse revealed a CNE just upstream of the coelacanth Hoxa14 gene (Supplementary Fig. 17a). This conserved stretch is not found in teleost fishes but is highly conserved among horn shark, chicken, human and mouse despite the fact that the chicken, human and mouse have no Hoxa14 orthologues, and that the horn shark Hoxa14 gene has become a pseudogene. This CNE, HA14E1, corresponds to the proximal promoter-enhancer region of the Hoxa14 gene in Latimeria. HA14E1 is more than 99% identical between mouse, human and all other sequenced mammals, and would therefore be considered to be an ultra-conserved element42. The high level of conservation suggests that this element, which already possessed promoter activity, may have been coopted for other functions despite the loss of the Hoxa14 gene in amniotes (Supplementary Fig. 17bc). Expression of human HA14E1 in a mouse transient transgenic assay did not give notable expression in the embryo proper at day 11.5 (information is available online at the VISTA enhancer browser website; http://enhancer.lbl.gov/cgi-bin/imagedb3.pl?form=presentation&show=1&experiment_id=501&organism_id=1), which was unexpected as its location would predict that it would regulate axial structures caudally43. A similar experiment in chick embryos using the chicken HA14E1 also showed no activity in the anteroposterior axis. However, strong expression was observed in the extraembryonic area vasculosa of the chick embryo (Fig. 4a). Examination of a Latimeria BAC Hoxa14-reporter transgene in mouse embryos showed that the Hoxa14 gene is specifically expressed in a subset of cells in an extra-embryonic region at embryonic day 8.5 (Fig. 4b).

Figure 4: Transgenic analysis implicates involvement of Hox CNE HA14E1 in extraembryonic activities in the chick and mouse. a, Chicken HA14E1 drives reporter expression in blood islands in chick embryos. A construct containing chicken HA14E1 upstream of a minimal (thymidine kinase) promoter driving enhanced green fluorescent protein (eGFP) was electroporated in HH4-stage chick embryos together with a nuclear mCherry construct. GFP expression was analysed at stage approximately HH11. The green aggregations and punctate staining are observed in the blood islands and developing vasculature. b, Expression of Latimeria Hoxa14-reporter transgene in the developing placental labyrinth of a mouse embryo. A field of cells from the labyrinth region of an embryo at embryonic day 8.5 from a BAC transgenic line containing coelacanth Hoxa9–Hoxa14 (ref. 49) in which the Hoxa14 gene had been supplanted with the gene for red fluorescence protein (RFP). Immunohistochemistry was used to detect RFP (brown staining in a small number of cells). PowerPoint slide Full size image

These findings suggest that the HA14E1 region may have been evolutionarily recruited to coordinate regulation of posterior HOX-A genes (Hoxa13, Hoxa11 and Hoxa10), which are known to be expressed in the mouse allantois and are critical for early formation of the mammalian placenta44. Although Latimeria does not possess a placenta, it gives birth to live young and has very large, vascularised eggs, but the relationship between Hoxa14, the HA14E1 enhancer and blood island formation in the coelacanth remains unknown.