Genome sequencing, assembly, and annotation

We prepared 11 libraries with several insert sizes from Apteryx mantelli genomic DNA and sequenced 83 billion base pairs (Gb) from small insert-size libraries and 120 Gb from large-insert mate-pair Illumina libraries (Additional file 1: Table S1). After read correction [5] we assembled contigs and scaffolds using SOAPdenovo [6] (Additional file 1: Note: Filtering and read correction; Genome assembly) to generate a draft assembly, which spanned 1.595 Gb (Additional file 1: Tables S2 and S3). The N50s of contigs and scaffolds were 16.48 kb and 3.95 Mb, respectively (Additional file 1: Table S3). Since the size of the kiwi genome is unknown, we estimated average coverage using a 19-mer frequency distribution (Additional file 1: Figure S1) which yielded a genome size estimate of 1.65 Gb, placing the kiwi among the largest bird genomes sequenced to date [7] (Table 1; Additional file 1: Table S4). The assembled contigs and scaffolds cover approximately 96 % of the complete genome with an average sequence coverage of 35.85-fold after correction (Additional file 1: Note: Filtering and read correction). Assembly quality was assessed by chaining the kiwi scaffolds to two Sanger-sequenced bird genomes: chicken [8] and zebra finch [9]. A total of 50.09 % (0.8 Gb) of the kiwi genome is alignable in syntenic chains to 79.67 % of the much smaller chicken genome (1.07 Gb). A similar fraction, 57.61 % (0.9 Gb), of the kiwi sequence was alignable to 76.92 % of the zebra finch genome (1.2 Gb) (Additional file 1: Table S5). For comparison, 69.86 % (0.84 Gb) of the zebra finch genome is syntenically alignable to 83.51 % of the chicken genome. However, 91.96 % of the zebra finch sequences that are syntenic-chain-alignable to chicken showed conserved synteny in kiwi, suggesting that the kiwi genome assembly includes the majority of conserved regions between birds.

Table 1 Kiwi genome assembly characteristics and genomic features compared with other avian genomes (see Additional file 1: Table S4) Full size table

We identified a set of 27,876 genes following de novo gene prediction on the assembled genome (Additional file 1: Note: De novo gene prediction and gene annotation). To refine these gene annotations we used 47.5 Gb of transcript sequence data from kiwi embryonic tissue together with the de novo gene predictions and protein evidence from three well-annotated bird species (G. gallus, T. guttata, M. gallopavo) as input to the MAKER genome annotation pipeline [10]. A validated set of 18,033 genes was selected based on their alignment to orthologous genes in other birds and on supporting evidence provided by kiwi transcript sequences. In total, the gene models spanned 306.62 Mb of the assembly, with exons accounting for 23.96 Mb (approximately 1.6 %) of the total kiwi genome.

Evolution of gene families

Gene family expansion and/or contraction have been proposed as important mechanisms underlying adaptation [11]. We explored patterns of protein family expansions and contractions in kiwi and used TreeFam [12] to define gene families in the kiwi and all bird and reptile genomes in Ensembl 73, as well as two nocturnal birds (barn owl, chuck-will’s-widow), two other ratites (ostrich, tinamou) [7] (GigaDB [13]), two mammals (human, mouse), and one fish (stickleback) (Ensembl 73 [14]). In total we identified 10,096 gene families shared between the inferred ancestral state and the 16 species considered, of which 623 represent single-gene families. For these single-gene families we constructed a maximum-likelihood phylogeny [15] (Fig. 1) and tested for changes in ortholog cluster sizes. In accordance with previous estimates, our results indicate a net gene loss on the avian branch [16].

Fig. 1 Phylogenetic tree of 16 species built on 623 TreeFam [12] single-gene families. Branch lengths are scaled to estimate divergence times. All branches are supported by 100 bootstraps. The song bird clade is depicted in blue, Galliformes jn purple, Anseriformes in green, and nocturnal birds in red. Ratites (Struthio camelus and Apteryx mantelli) and Tinamus guttatus are highlighted in light green. The number of genes gained (+ red) and lost (− blue) is given underneath each branch. The rate of gene gain and loss for the clades derived from the most common recent ancestor was estimated [77] to 0.0007 per gene per million years Full size image

Changes of gene-family sizes have been inferred for multiple de novo assembled genomes [17, 18]. However, many of these genomes have rather fragmented assemblies (Table 1); thus, results should be interpreted cautiously, only after manual inspection and ideally independent experimental confirmation.

We therefore manually examined the 130 gene families that had either significant expansion or contraction specifically to the kiwi branch. After excluding expansions that were caused by fragmentation of the assembly [19] only 85 gene families remained significant (Additional file 1: Table S6). Of these, 63 gene families are expanded in the kiwi. An analysis of gene family functions [20] showing expansion in kiwi identified enrichment in categories including signal transduction, calcium homeostasis, and motor activity (FDR <0.0001, Additional file 1: Figure S2A). Among the gene families that show contraction on the kiwi branch we found an enrichment of development-related Gene Ontology (GO) categories (FDR <0.0001, Additional file 1: Figure S2B).

Diversification of tetrapods and the colonization of terrestrial habitats are often accompanied by changes of physiological systems specifically in cellular signal transduction [21]. Membrane proteins are involved in cellular signaling, hence we aimed to determine more specifically which classes of membrane-expressed proteins have undergone changes in the number of coding genes. To this end we annotated the membrane proteome in kiwi, human, all birds, and reptiles present in Ensembl 74, two additional ratites (ostrich and tinamou) and two nocturnal birds (chuck-will’s-widow and barn owl) (Additional file 1: Note: Detection and classification of the membrane proteome; Additional file 1: Table S7). We manually inspected the classes which showed expansion in kiwi, to ensure that the higher number of predicted genes is not a result of assembly fragmentation. We found a significant expansion in kiwi of genes coding for adhesion and immune-related proteins (Additional file 1: Table S7). Additionally, we found a significant expansion of the Ephrin kinases class, which are functionally involved in the development of the sensory-motor innervation of the limb [22] and later on in tendons condensation and developing feather buds [23].

Patterns of natural selection

To determine whether any branch-specific selection is present in kiwi we estimated branch ω-values (Ka/Ks substitution ratios) for 4,152 orthologous genes in eight bird species: kiwi, ostrich, tinamou, chuck-will’s-widow, barn owl, chicken, zebra finch, and turkey using CODEML [24]. Ortholog assignment was based on the orthology relation among chicken, zebra finch, and turkey defined in Ensembl 73 (Additional file 1: Note: Orthologs and Ka/Ks calculation). The kiwi average ω across all the orthologs is comparable to that in ostrich, and higher than in tinamou and night birds (0.291, 0.313, 0.145, 0.202, and 0.200 for kiwi, ostrich, tinamou, chuck-will’s-widow, and barn owl, respectively). This implies a relatively faster overall rate of functional evolution in kiwi and ostrich.

In addition to gene-family expansions/contractions, we used evidence of branch-specific selection to identify genes and functional pathways that may underlie kiwi-specific adaptations. For the 4,152 orthologous genes in the eight bird species we used the branch models from CODEML to perform likelihood ratio tests [24], comparing a simple model of one ω for all sites and branches versus a model where kiwi is defined as the foreground branch and the other birds as background. We first considered genes with a significantly higher ω on the kiwi branch than that in all other birds (LRT >3.84, significance at 5 %, 1 degree of freedom). Functional enrichment using GO [20] categories was tested using a hypergeometric test (Additional file 1: Note: Gene ontology and rapidly evolving genes). The same test was performed on genes evolving significantly slower in kiwi. To assign functional categories as either kiwi-specific, or shared with other ratites or nocturnal birds, a similar procedure was performed for each species of Palaeognathae (ostrich, tinamou) and night birds (chuck-will’s-widow, barn owl) by assigning each in turn as the foreground branch in CODEML.

After multiple testing correction using family-wise error rate none of the categories remained significant. For further analysis we considered only GO categories that had (1) a P value <0.05; (2) at least three significantly changed genes; and (3) the number of significant genes was at least 5 % of the total genes annotated in the GO category. GO categories that were over-represented (P value <0.05) on the kiwi branch, but not present in any of the other considered species, were identified as potentially kiwi-specific changes (Additional file 1: Note: Gene ontology and rapidly evolving genes). Notably, faster-evolving categories present in kiwi, but absent in any of the other species, are related to mitochondrion, feeding behavior and energy reserve metabolic process, visual perception, and eye photoreceptor cell differentiation (Additional file 1: Table S8A). Sensory perception of light stimulus is a faster evolving category shared, surprisingly, with the ostrich (Additional file 1: Table S8B). Among slower evolving categories, the mitochondrial outer membrane was one of the kiwi-specific categories (Additional file 1: Table S9A), while anion channel activity was a shared category with chuck-will’s-widow (Additional file 1: Table S9B). For the potentially biological meaningful categories which could explain kiwi-specific physiology we extracted the genes clustering in the node. GO categories have a high potential to deliver false-positive enrichment, which could be considered biologically meaningful a posteriori [25]. Therefore, future studies need to verify the adaptive functionality of genes belonging to the respective category (Additional file 1: Tables S8C and S9C).

It has been proposed that, in a nocturnal environment, genes involved in circadian rhythm have been under selective pressure [4]. Our species-specific selection screens did not identify circadian rhythm-related categories to be enriched for changed genes in either kiwi or the other nocturnal birds. However, since mutations in even a single gene may be relevant, we analyzed more closely biorhythm regulators from the neuropsin gene family. Encephalopsin (OPN3), melanopsin (OPN4-1), and neuropsin (OPN5) showed a similar ω in kiwi and the other branches and no obvious alterations could be detected in the sequence (Table 2). Similar to chicken [26], kiwi and the other tested birds have a duplication of the melanopsin gene (OPN4-2), which displayed significant signals of positive selection in kiwi but not in the other nocturnal birds. However, a branch-site selection analysis of this gene did not show any significant positively selected sites (Additional file 1: Note: Vision analysis).

Table 2 Annotated opsins in the Apteryx mantelli genome Full size table

Kiwi sensory adaptations – vision

Nocturnality is accompanied by a number of specific changes, including adaptations in visual processing [4]. In contrast to most nocturnal animals, that have large eyes relative to their body size, kiwi have small eyes and reduced optic lobes in the brain [27]. However, the kiwi retina has a higher proportion of rods than cones which is consistent with adaptation to nocturnality [3]. Besides black/white vision mediated via rhodopsin (RHO), most birds have trichromatic or tetrachromatic vision, for which various additional opsins are responsible: OPN1LW (red), OPN1MW (green, RH2), OPN1SW (blue, subtypes SWS1, SWS2) [28]. We identified these genes in the kiwi assembly. The RHO gene in kiwi shows no interruption and no obvious function-impairing amino acid changes compared to other vertebrates. We were able to assemble only a partial sequence of the red opsin OPN1LW (transmembrane (TM) helix 7) and found no previously described deleterious amino acid changes within this region [29].

In the green opsin, OPN1MW, we identified a Glu134 to Lys substitution (relative position 3.49 in the Ballesteros and Weinstein nomenclature) in the highly conserved D/ERY motif of this rhodopsin-like GPCR. We confirmed this mutation in a second Apteryx mantelli individual, as well as in other kiwi species (Fig. 2). To determine whether the change is kiwi-specific we sequenced this domain of OPN1MW in other ratites, including the extinct moa. We found that Glu3.49 is 100 % conserved in all birds for which sequence was available and also in over 250 other vertebrate orthologs. Previous experimental analysis showed that mutation of Glu3.49 to Arg – another basic amino acid – results in a non-functional receptor protein [30]. Furthermore, the Asp or Glu in the D/ERY motif is also highly conserved in most other rhodopsin-like GPCRs and the identical mutation of Glu3.49 to Lys in the thromboxane A2 receptor, for example, prevents the receptor from being functionally expressed on the plasma membrane [31].

Fig. 2 Protein sequence comparison revealed substitutions of Glu3.49 to Lys (E/DRY motif) and Glu6.30 to Gly in kiwi OPN1MW (RH2) and kiwi OPN1SW, respectively. Both residues are 100 % conserved in all birds sequenced so far and over 100 publicly available sequences of other vertebrate OPN1MW and OPN1SW orthologs. To assure the OPN1MW-change is kiwi-specific additional ratites were sequenced, including different kiwi species and the extinct moa. Glu3.49 of the E/DRY motif and Glu6.30 at the N-terminal end of helix 6 are parts of an ‘ionic lock’ interhelical hydrogen-bond network which is highly conserved in many rhodopsin-like GPCRs. Nb – North Island brown kiwi, Ob – Okarito brown kiwi, Gs – Great spotted kiwi, Ec – Emeus crassus (Eastern moa), Pg – Pachyornis geranoides (Mappin’s moa), Chuck-will – Chuck-will’s-widow Full size image

Similarly, at the N-terminal end of TM6 in OPN1SW we identified a highly conserved Glu6.30 which is present in all bird orthologs sequenced so far, except for kiwi OPN1SW where Glu6.30 is substituted by Gly. Previous functional characterization has shown that mutation of Glu6.30 destabilizes the H-bond network resulting in constitutively active opsins and other rhodopsin-like GPCRs [32, 33]. A constitutively active opsin is functionally incapable of light signal transmission [34] and is therefore non-functional.

Besides these two functionally well-characterized positions, we identified several other amino acids substitutions in kiwi OPN1MW and OPN1SW. Further, tests for branch and branch-site specific ω values for OPN1MW and OPN1SW on the kiwi branch showed no evidence for positively selected sites in kiwi (Additional file 1: Note: Vision analysis), suggesting that the greater ω values for kiwi are likely due to loss of constraint on these genes. Hence these genes are likely to be drifting and, considering the fact that only 8 % of all inactivating mutations in GPCRs are stop codons while almost 65 % are missense mutations [35–37], the described loss-of-function mutations in OPN1MW and OPN1SW render color vision of kiwi, unlike for other sequenced ratites (Fig. 2), absent – at least for the green and blue spectral ranges.

We tentatively dated the opsin-loss-of-function event as an indicator of the timing of adaptation to the nocturnal niche. Assuming that the loss of constraint happened on the kiwi branch in a short period of time and changed the rate of selection, measured by the ω value, from the average over bird lineages (0.021 for OPN1MW and 0.014 for OPN1SW, Table 2) to the neutral ω value of 1, the loss of function was dated to 30–38 million years ago (Additional file 1: Note: Vision analysis), which places the event shortly after the arrival of kiwi in New Zealand [38].

Kiwi sensory adaptations – olfaction

Kiwi are unique among birds in having nostrils present at the end of their prominent beaks and have been reported to depend largely on tactile and olfactory senses for foraging [39]. To investigate whether the genome shows signs of olfactory adaptation in kiwi we assessed the numbers of olfactory receptor (OR) genes [40] and the diversity in the OR sequence [41].

The only previous approach to molecular characterization of the olfactory system in kiwi was based on PCR amplification of ORs with degenerate primers [42]. This allowed only a rough estimation of the number of ORs of 478 genes (95 % confidence interval 156–1,708 genes). PCR with degenerate primers only produces incomplete fragments of the genes and hence the accurate quantification of gene families with highly similar sequences, as in the case of ORs, is prone to over-estimation [43]. In contrast, de novo genome assembly facilitates a global assessment of the gene repertoire [44] and can therefore be used to provide a more accurate estimate of the OR repertoire. We thus annotated the OR genes in kiwi, as part of the entire membrane proteome, on the basis of putative functionality and seven transmembrane helices (7TM) (Additional file 1: Note: Olfactory receptor genes identification and annotation). The number of non-OR receptor families was comparable to other avian species, suggesting that the membrane proteome is well annotated in kiwi (Additional file 1: Table S7). This analysis revealed an initial set of 82 OR genes in the kiwi genome. However, ORs are highly duplicated across the genome and such regions could be prone to being overcollapsed during the assembly process. We therefore estimated the copy number of each annotated OR using a correction based on coverage. To obtain the correction factor for each OR, read-coverage in the OR region was divided by the genome-wide average coverage corresponding to its GC bin. Following this correction we estimated that up to 141 OR genes are present in the kiwi genome, of which 86 encode for full-length receptors while the rest are most likely pseudogenes due to frameshifts, premature stop codons, or truncations (Additional file 1: Note: Olfactory receptor genes identification and annotation). The estimated proportion of intact ORs among all OR genes in kiwi (61 %) is lower than previously reported for Apteryx australis [42] (78.6 %), but much higher than in zebra finch (38 %) [45].

Comparative analysis of the OR repertoire shows that the kiwi genome has both the α and the γ subgroups of type 1 OR genes, as reported for other bird genomes sequenced so far [45]. Unlike the majority of other birds analyzed so far, kiwi has a higher number of γ subgroup ORs. Gene family size estimates are highly dependent on genome quality [46] and continuous curation is ongoing even for well-annotated genomes: for example, in the chicken olfactory repertoire the number of annotated ORs changed by a factor of eight in two consecutive Ensembl releases (release 73 – 251 ORs and release 74 – 30 ORs). Further improvement of genome qualities, including kiwi, are therefore required for the identification of a complete set of ORs. Thus, a correlation between olfactory acuity and the number of ORs in different birds could be subject to error.

Phylogenetic comparison of OR repertoires suggest that γ ORs within bird and reptile genomes exhibit contrasting evolutionary rates. Tree topology suggests that γ ORs in a few birds and reptiles show species-specific clustering pattern (Fig. 3). This pattern was previously described in birds and it was suggested that these receptors have undergone adaptive evolution with respect to the occupied environmental niche [45]. However, a few γ ORs belonging to kiwi cluster with their reptilian counterparts, while some cluster basal to the clade containing most bird γ ORs (Fig. 3).

Fig. 3 Maximum likelihood (ML) tree constructed using full-length intact α and γ group olfactory receptors from 10 birds (chicken, zebra finch, flycatcher, duck, turkey, chuck-will’s-widow, barn owl, ostrich, tinamou, and kiwi) and two reptile genomes (anole lizard and Chinese soft-shell turtle). The ML topology shown above was cross-verified using the neighbor joining (NJ) method. Three Class A (Rhodopsin) family GPCRs from chicken genome, dopamine receptor D1 (DRD1), dopamine receptor D2 (DRD2), and histamine receptor H1 (HRH1) were used as the out-group (shown as non-olfactory receptors). The red dot indicates confidence estimates (% bootstrap from 500 resamplings, >90 % bootstrap support from both ML and NJ methods) for the nodes that distinguish α and γ ORs. The scale bar represents the number of amino-acid substitutions per site. The topology supports lineage specific expansions of γ group olfactory genes in the bird and the reptile species. Note, a few of the γ group ORs in kiwi cluster with reptilian ORs (highlighted by orange arrowhead), while some cluster basal to the clade containing bird ORs (highlighted by green arrowhead). The topology supports contrasting evolutionary rates within the analyzed γ ORs, as indicated by short (blue arc with arrowheads) and long branch lengths (pale orange arc with arrowheads). The inset shows the number of intact olfactory receptors in each species that are analyzed using the ML tree topology Full size image

Phenotypic diversity in olfaction is, in part, attributable to genetic variation with a wider range of odors thought to be detectable given more genetic variation [41]. Since the absolute number of ORs might be a poor predictor of olfactory abilities, we investigated the variation in the γ ORs sequence as a measure of the range of possible detectable odors. The average protein sequence entropy was calculated to check for variation within the γ-c clade in each species (Additional file 1: Note: γ-c clade OR within-species protein sequence entropy).

Previous studies have shown that Shannon entropy (H) analysis is a sensitive tool for estimating the diversity of a system [47, 48]. For protein sequence, H ranges from 0 (only one residue is present at that position in the multiple sequence alignment) to 4.322 (all 20 residues are equally represented in that position). Typically H ≤2 is attributed to high conservation [49]. H values in birds were in the range of 0.34±0.05 (zebra finch) to 1.11±0.12 (chicken). The average entropy in kiwi sequences was 1.23±0.15, significantly higher than all other bird species investigated (P value = 0.003 Wilcoxon Signed-Rank test, Additional file 1: Note: γ-c clade OR within-species protein sequence entropy). We conclude that overall the γ-c clade of ORs are highly similar in sequence, in accordance with previously published data [45]. However, since detection of a wider range of odors is correlated to genetic variation of ORs [41], the significantly higher H in kiwi ORs is suggestive for a broad odor acuity in this species in comparison to other birds.

Kiwi morphology

The most prominent phenotype of kiwi, lack of wings, has been linked to energy conservation [50] and to the limited resources in New Zealand in late Oligocene [51]. Like most ratites, kiwi are flightless, but the phylogenetic tree of Palaeognathae implies that this phenotype evolved several times independently in this order [38]. Unlike ostriches and rheas, that possess prominent wings, kiwi show only vestigial invisible wings, while moa lack even vestiges [52].

To determine whether we can identify the genetic basis for the extremely regressed wings in kiwi we annotated genes in the highly conserved signaling pathways related to limb development (Additional file 1: Note: Kiwi morphology analysis; Additional file 1: Figure S3). These include genes belonging to the FGFs, TBX cluster, HOX cluster (Additional file 1: Figure S4; Additional file 1: Table S11), WNT, SALL, and FIBIN genes, known to be responsible for limb and wing development [53] (Additional file 1: Table S12). Growth and transcription factors typically influence the development of both upper and lower limbs, while FIBIN is currently the only gene described to be exclusively involved in the development of the upper limb [53].

For these clusters of genes, we aligned corresponding orthologs and translated multiple alignments, which were then manually inspected. No insertions, deletions, and/or stop codons that would clearly disrupt the open reading frame could be identified in the inspected genes. Additionally, we found all 39 HOX genes expected for the Sauropsid ancestor [54] and investigation of regulatory sequences within the HOX clusters by phylogenetic footprinting showed no preferential loss of conserved DNA elements in Apteryx mantelli compared to Galliformes (Additional file 1: Figure S4; Additional file 1: Table S11).

To detect signs of different evolution in kiwi wing and tail developmental genes we performed a selective constraint analysis using the CODEML branch test (Additional file 1: Note: Selection analysis on limb development genes; Additional file 1: Table S12). Of these genes FIBIN was the only gene that showed signals of positive selection on the avian tree including chicken, turkey, and zebra finch (Additional file 1: Figure S5). Three sites with signs of positive selection that were 100 % conserved in the other species show a different amino acid in kiwi: exchanges of Ser136Ala, Gln148Arg, and Phe162Cys (positions are relative to the mouse Fibin coding sequence). The functional relevance of these substitutions is unclear and needs to be studied when experimental tests of FIBIN function become available.

Since no obvious alterations could be found in the coding sequences of genes involved in developmental processes, which could explain the regressed-wing morphology of kiwi, we further analyzed ultra-conserved non-coding elements (UCNEs) (Additional file 1: Note: Ultra-conserved non-coding elements analysis). UCNEs are defined as DNA non-coding regions of ≥95 % sequence identity between human and chicken, longer than 200 bp [55]. The majority of UCNEs cluster in genomic regions containing genes coding for transcription factors and developmental regulators [56] and experimental studies in transgenic animals have shown that some of these sequences can act as tissue-specific enhancers during developmental processes [57]. Of the 4,351 UCNEs annotated in UCNEbase [55], 19 showed more than the expected 5 % sequence variation as defined in the database [55] (Additional file 1: Table S13). Among these, four were related to HOXA, TBX2, Sp8, and TFAP2A genes which have been previously described in limb development pathways [53, 58, 59], suggesting that changes in non-coding elements could be involved in kiwi’s loss of wings.