Genome sequencing and de novo assembly

The whole-genome sequence of two Masai giraffe (Giraffa camelopardalis. tippelskirchi) from the Masai Mara (MA1) in Kenya and the Nashville Zoo (NZOO), and one fetal okapi (O. johnstoni) from the White Oak Conservatory was determined by constructing paired-end libraries followed by sequencing using an Illumina HiSeq yielding ca. 30 × coverage. Mate-paired libraries were also prepared from the MA1 Masai giraffe and okapi, and sequenced to increase coverage and to span repetitive sequence elements. The initial sequence reads from giraffe and okapi were aligned to the 19,030 cattle (Bos taurus) references transcripts17 to predict homologous genes (Supplementary Table 1), which yielded 17,210 giraffe and 17,048 okapi genes. The giraffe and okapi sequence data were also used to generate a draft genome assembly with a total length of 2.9 and 3.3 Gb for giraffe and okapi, respectively (Supplementary Table 2). To verify gene predictions and gene structure in cases where the original gene annotations for giraffe and okapi were incomplete or ambiguous, the draft assembly was aligned to dog or human gene sequences. To determine whether substitutions unique to Masai giraffe were conserved in other giraffe subspecies, we performed targeted sequencing of several genes in Rothschild (G.c. rothschildi) and Reticulated (G.c. reticulata) giraffes, which diverged from Masai giraffe ∼1-2 mya (refs 15, 18).

Comparative genome analysis

To identify changes that potentially underlie these unique morphological and physiological adaptations, we analysed the coding sequences of orthologous genes in giraffe, okapi and cattle. Giraffe and okapi genes are highly similar overall with 19.4% of proteins being identical (Fig. 1). Giraffe and okapi genes are equally distantly related to cattle, suggesting that giraffe’s unique characteristics are not due to an overall faster rate of evolution. The divergence of giraffe and okapi, based on the relative rates of synonymous substitutions, from a common ancestor is estimated to be 11.5 mya (Fig. 1), substantially less than the previous estimate of 16 mya (refs 19, 20), which was based on mitochondrial DNA sequence comparisons.

Figure 1: Divergence of giraffe and okapi from a common ancestor. Using the average pairwise synonymous substitution divergence (dS) estimates between giraffe, okapi and cattle as calibrated by the pecoran common ancestor (27.6 mya), the divergence of giraffe and okapi from a common ancestor is estimated to be 11.5 mya. Okapi image adapted from a photograph by Raul654. Full size image

Adaptive evolution of giraffe

Adaptive divergence was evaluated by pairwise analysis of 13,581 giraffe, okapi and cattle genes that showed at least 90% coverage by comparing nonsynonymous (dN) changes in protein coding sequences as well as normalized to synonymous (dS) changes (dN/dS, ω). Enrichment analysis based on gene function (gene ontology (GO) biological processes) and pathway relationships Kyoto Encyclopedia of Genes and Genomes (KEGG) revealed elevation of dN or ω for giraffe in genes related to metabolism (tricarboxylic acid cycle, oxidative phosphorylation and butyrate), growth and development (cell proliferation, skeletal development and differentiation), the nervous system and cardiac muscle contraction (Supplementary Table 2). In parallel, we employed Polyphen2 analysis21 to identify genes that contain amino acid substitutions that are predicted to cause a significant alteration in function and screened for genes that exhibited evidence for positive selection. Genes exhibiting positive selection in giraffe were enriched in lysosomal transport, natural killer cell activation, immune response, angiogenesis, protein ADP ribosylation, blood circulation and response to pheromones (Supplementary Table 3). Over 400 genes were identified from the giraffe–okapi–cattle analysis that exhibited some degree of genetic differentiation in giraffe by the aforementioned analysis. These selected genes were further compared with orthologues across a large set of mammals, including 14 other cetartiodactyls, to more fully assess evidence of positive selection, relative amino acid sequence divergence and to identify amino acid substitutions unique to giraffe among eutherians. Seventy genes displayed MSA in giraffe by these criteria (Supplementary Table 4 and Supplementary Fig. 1). The unique amino acid substitutions identified in these genes were confirmed in the two unrelated individual Masai giraffe and, in some cases, confirmed in Reticulated and Rothschild giraffe by targeted sequencing. Network analyses based on GO biological process revealed eight functional clusters among the 70 MSA genes including development, cell proliferation, metabolism, blood pressure and circulation, nervous system, double-strand DNA break repair, immunity and centrosome function (Fig. 2). Remarkably, nearly half of these genes are involved in controlling developmental pattern formation and differentiation including homeobox, Notch, Wnt and fibroblast growth factor (FGF) pathway genes, major regulators of growth and cell proliferation including the transcription factors MYC, E2F4, E2F5, ETS2, TGFB1 and CREBBP, and the folate receptor 1 (FOLR1).

Figure 2: Network analysis of GO biological process of giraffe MSA genes. Seventy genes were identified that exhibited MSAs based on amino acid sequence divergence as evaluated by neighbour-joining phylogenetic analysis of mammalian orthologous proteins, enrichment of nonsynonymous substitutions, unique amino acid substitutions at sites otherwise fixed in mammals, substitutions predicted to cause functional changes by Polyphen2 analysis and substitutions under positive selection. Cluster analysis was performed on the set of 70 giraffe MSA genes based on GO Biological Process using Cytoscape 3.0 (ref. 68). Full size image

Evolution of regulators of skeletal growth and differentiation

The extraordinarily long neck of giraffe is not due to adding cervical vertebrae as is the case for long-necked birds, but rather to the vertical extension of each of the seven prototypical cervical vertebrae present in mammals13,22. The elongation of the cervical vertebrae in giraffe is probably due to the extension of somites, which give rise to the cervical vertebrae during early embryogenesis22, and is restricted to the cervical region by the combinatorial action of homeobox genes. The major genes and developmental pathways that specify vertebrae differentiation of the axial and appendicular skeleton in giraffe and okapi were compared with other mammals to determine whether unique patterns of amino acid substitutions were found in giraffe (Supplementary Table 5). The homeobox genes HOXB3, CDX4 and NOTO exhibit enhanced divergence in giraffe among eutherians and have unique amino acid substitutions predicted to alter protein function. In addition, HOXB13, which regulates angiogenic and posterior axial skeletal development, shows high amino acid sequence divergence in giraffe and okapi compared with other mammals (Supplementary Table 4). Modulating the posterior to anterior gradient of fibroblast growth factor signalling or changing the cyclical expression of genes in the NOTCH or WNT signalling pathways could potentially modulate somite size. We found that FGFRL1, a decoy FGF receptor, AXIN2, a negative regulator of the WNT pathway, and three genes in the NOTCH pathway including NOTCH4, JAG1 and DLL3 exhibit amino acid sequence divergence in giraffe and exhibited multiple unique amino acid substitutions compared with other eutherians. The divergence of giraffe FGFRL1 is particularly striking with a cluster of seven unique substitutions (Fig. 3a) in the domain that interacts with FGF ligands. FGFRL1 is among nine genes in giraffe that exhibit a significantly higher number of unique amino substitutions at fixed sites in mammals (Supplementary Table 4). FGFRL1 in mammals lacks a tyrosine kinase domain essential for downstream FGF signalling and acts as a competitive inhibitor of the nascent FGF receptors23. Interestingly, Badlangana et al.22 speculated that an inhibitor of FGF signalling might be responsible for modulating the size of giraffe cervical vertebrae based on the discovery that chemical inhibition of FGF signalling increased somite size in the chick embryo24. Consistent with its hypothesized role in regulating unique features of giraffe, FGFRL1 mutations in mice and human display severe defects in skeletal and cardiovascular development25,26,27.

Figure 3: Giraffe genes and pathways exhibiting extraordinary divergence and patterns of amino acid substitutions. (a) Giraffe FGFRL1 contains seven amino acid substitutions that are unique at fixed sites in other mammals and/or are predicted by Polphen2 analysis to alter function (upper panel). Human reference is shown, which is identical to cattle and okapi in this segment. The unique giraffe substitutions occur in the FGF-binding domain region flanking the N-terminal cysteine (asterisk) of the Ig-III loop (lower panel). Red bracket in lower panel corresponds to the sequence in the upper panel. The extracellular structure of FGFRL1 (left) is the same as a prototypical FGF receptor (FGFR, right) but lacks the cytoplasmic C-terminal tyrosine kinase domains seen in FGFR and instead contains a zinc-binding domain. (b) Giraffe FOLR1 contains seven substitutions that each show evidence of positive selection (P<0.05) by the branch-site model. Two of the positive selected sites (PSG), P48S and E222K, are also unique substitutions at fixed sites and Polyphen2 (PP2) analysis predicts them to alter function. P48S is within β-sheet-1 that forms part of the folic acid-binding pocket. The FOLR1 protein forms a globular structure maintained by overlapping disulfide bridges between 16 cysteine residues (red) and tethered to the plasma membrane at S233 by a Gpi anchor. The unique substitution in giraffe, G234Q, immediately adjacent to the Gpi anchor site may alter the anchor site or the rate of its formation. (c) Genes encoding key enzymes in butyrate metabolism and downstream mitochondrial oxidative phosphorylation pathways have diverged in giraffe including the monocarboxylate transporter (MCT1), acyl-coenzyme A synthetase-3 (ACSM3), short-chain specific acyl-CoA dehydrogenase (ACADS), NADH dehydrogenase (ubiquinone) 1β subcomplex subunit 2 (NDUFB2) and succinate dehydrogenase [ubiquinone] iron-sulfur subunit (SDHB). ACSM3 and ACADS are located in the mitochondrial matrix where as NDUFA2, NDUFB2 and SDHB are located in the mitochondrial inner membrane. In addition to being present in the rumen epithelial cells, MCT1 is highly expressed in the heart, skeletal muscle and the nervous system where it acts to transport volatile fatty acids (VFAs) and lactate. (d) Double-strand break repair genes exhibit divergence in giraffe and/or okapi. The mediator of DNA-damage check point 1 (MDC1) binds phosphorylated H2AX, which mark DNA double-strand break, and serves as scaffold to recruit the MRN DNA repair complex composed of NBS1, MRE11 and RAD50 (upper panel). The giraffe and okapi MDC1 gene exhibits a 264 amino acid deletion that removes part of the SDT region that harbours two critical CK2 phosphorylation sites (lower panel). These two phosphorylation sites are among multiple sites that regulate the interaction of MDC1 and NBS1 essential for the recruitment of the MRN complex to double-strand breaks. Full size image

The Giraffe FOLR1 shows exceptionally strong evidence for adaptive evolution including six positively selected amino acid substitutions of which two are predicted to cause a significant change in function (Fig. 3b). FOLR1 mutations are embryonically lethal in mice28 and produce hypomyelination and neurological defects in humans29. In addition to its role in cellular folate transport, FOLR1 is internalized, processed and transported to the nucleus where it regulates components of the FGF and NOTCH pathways30. These changes in giraffe FOLR1 may act in concert with similar changes in FGFRL1 and JAG1, components of the FGF and NOTCH pathways, respectively, to forge major developmental adaptations.

Cardiovascular and metabolic gene evolution

The giraffe cardiovascular system is adapted to regulate blood pressure over a height of 6 m and to maintain cardiovascular homeostasis associated with rapid changes in the relative position of the brain to the heart. The blood pressure of giraffe is 2.5 × higher than man, the left ventricle of the heart is enlarged and the blood vessel walls of the lower extremities are greatly thickened1,31. Giraffe exhibits evidence for adaptive evolution of eight genes that regulate blood pressure or cardiovascular function including two of the major adrenergic receptors α1 and β-2, urotensin-2b and angiotensin-converting enzyme (Supplementary Table 4). BORG1 and RCAN3, which are highly expressed in the heart and purported to have important functions related to cell shape and cardiac muscle contraction, respectively, are also significantly diverged in giraffe32,33. The observed distinctive changes in these genes may provide clues as to the evolutionary origins of giraffe’s high blood pressure, increased cardiac output and modified vasculature.

Giraffe’s elevated stature enables it to feed on acacia leaves and seedpods that are highly nutritious but also contain toxic alkaloids. As with other ruminants, giraffes’ gut microbes ferment plants to generate volatile fatty acids that are transported through the gut epithelium and serve as the main energy source34,35. Included among the MSA genes in giraffe are those involved in the catabolism of volatile fatty acids such as butyrate (MCT1, ACSM3 and ACADS) or downstream oxidative phosphorylation that generate ATP (NDUB2 and SDHB) (Fig. 3c). In addition, these proteins are essential for lactate transport and metabolism that is particularly important for cardiovascular functions36.

Evolutionary changes in DNA and chromosome repair genes

The mediator of damage checkpoint-1 (MDC1) acts as a key scaffold for proteins participating in double-strand DNA break repair, homologous recombination, nonhomologous end-joining and telomere maintenance37,38,39,40,41,42,43, and its sequence exhibits the most radical evolutionary change in giraffe and okapi compared with all other vertebrates. The giraffe and okapi MDC1 gene contains an in-frame termination substitution in exon 5, suggesting either premature termination or alternative splicing to remove the offending termination codons. The complementary DNAs from both giraffe and okapi liver tissue were truncated in exon 5, indicating the use of a cryptic 5′-splice site resulting in a 264-amino acid internal deletion not seen in any other vertebrate. The deleted region corresponds to the ST/Q domain that contains numerous phosphorylation sites that have an impact on important regulatory protein–protein interactions44. Perhaps, not surprisingly, the amino acid sequence of NIBRIN, MRE11 and SOSB2, and BAZB1, which interact with MDC1 (ref. 45) are diverged in giraffe and/or okapi (Fig. 3d). We speculate that the divergence of these genes and those involved in centromeric functions may underlie the unusual degree of chromosomal fusions that occurred in the giraffe lineage46,47. The pecoran ancestor that gave rise to the horned, even-toed ungulates is purported to have had a karyotype of 2n=58–60 as exemplified by cattle46. However, giraffe and okapi have unusual karyotypes among pecorans exhibiting reduced chromosome number of 2n=30 and 2n=44–46, respectively, due to Robertsonian centric fusions of acrocentric chromosomes.