Genome sequencing of the African wild dog

Based on alignment with the domestic dog genome, we have sequenced ~90% (5.8× mean read depth) of the Kenyan individual’s genome and ~93% (5.7×) of the South African Lycaon individual’s genome. We identified 16,967,383 autosomal sequence variants (including 14,360,480 single nucleotide polymorphisms [SNPs] and 2,606,903 indels) separating our Lycaon genomes from the domestic dog (Canis familiaris) genome [GenBank: CanFam3.1] (Additional files 1 and 2) [15]. Of these, 1,092,450 (781,329 SNPs and 311,121 indels) were polymorphic in the African wild dog. The remaining 15,874,933 autosomal sequence variants (13,579,151 SNPs and 2,295,782 indels) were monomorphic in the two Lycaon individuals. We identified 717,870 X-chromosomal variants (619,606 SNPs and 98,264 indels), of which 32,801 (23,001 SNPs and 9,800 indels) were polymorphic and 685,069 (596,605 SNPs and 88,464 indels) were monomorphic in the African wild dog. Additionally, we sequenced the Kenyan and South African wild dog mitochondrial genomes to depths of 943× and 1021×, respectively. We annotated the Lycaon autosomal and X-chromosomal sequences using the domestic dog genome annotations [GenBank:CanFam3.1.81] and the mitochondrial genome [GenBank:NC_002008.4] and MSY sequences [GenBank:KP081776.1] using their reference sequence annotations [15–17].

Lycaon demographic history

We analyzed the two wild dogs’ autosomal population histories using PSMC (Fig. 2) [18]. Both genomes exhibited a strong reduction in N e starting 700,000 years ago from maximum N e s of ~28,000 (Kenyan) and ~35,000 (South African) individuals and leveling off 200,000 years ago at N e s of ~7,000 individuals each. Analysis of the Kenyan individual’s X - chromosomal history using PSMC showed a similar pattern (Fig. 2). N e fell from a maximum of ~40,000 individuals 600,000 years ago to ~7000 individuals 200,000 years ago. This N e reduction may represent a past population bottleneck e.g. [19] or lineage splitting e.g. [20]. Further genomic analysis of individuals from across the Lycaon range, especially those from larger founder populations, would help clarify the cause of this pattern.

Fig. 2 Reconstruction of the Lycaon individuals’ autosomal and X-chromosomal demographic history using the pairwise sequentially Markovian coalescent. Initial results are plotted using dark-colored curves, with the bootstrap replicates plotted in lighter hues of the corresponding colors Full size image

PSMC analysis of the Kenyan X chromosome revealed a secondary reduction in N e started 70,000 years ago to a final N e of ~2000 individuals 10,000 years ago. We are unable to infer more recent population history accurately using this method due to the limited numbers of available mutations in the short time frame [18]. Further research using historical museum Lycaon specimens would fill in this temporal gap e.g. [21]. While our reconstructions were robust to coverage variation (see ‘Demographic history reconstruction’ below), higher coverage genomes would also increase the resolution of population history reconstructions [20].

Candidate selected processes and genomic regions

Based on the domestic dog genome annotations, SnpEff 4.1 L predicted that the identified Lycaon autosomal and X-chromosomal sequence variants would cause 34,001,288 and 36,362,161 genic effects in the Kenyan and South African individuals, respectively (Additional files 1 and 2) [22]. The majority of sequence variant effects fell within introns (Kenyan: 62.604%, South African: 63.383%) and intergenic regions (Kenyan: 23.081%, South African: 23.162%). To discover candidate genes that have diverged since Lycaon’s divergence with Canis, we identified Lycaon genes that contained missense mutations and stop codon gains using SnpSift 4.1 L [23]. We identified 15,611 (15,565 genes with missense mutations and 799 with stop codon gains) Kenyan and 9793 (9440 genes containing missense mutations and 741 with stop codon gains) South African wild dog candidate divergent genes. 9506 divergent genes (9159 with missense mutations and 653 with stop codon gains) were found in both wild dogs. These divergent genes were determined by 47,059 Kenyan sequence variants (sequenced at 8.6× mean coverage) of and 27,893 South African variants (6.0× mean coverage). 25,149 variants were shared between the two Lycaon individuals. These variants were very homozygous (Kenyan: 95%, South African: 95%), which suggests that they are the result of divergence between the Lycaon and Canis clades, rather than more recent variants arising within Lycaon.

We annotated the candidate divergent genes’ functions using DAVID 6.7 with domestic dog (option “Canis lupus”) as the genomic background [24]. We found 76 and 29 enriched processes in the Kenyan and South African individuals, respectively (Additional files 3 and 4). We filtered these terms with a Benjamini-Hochberg false discovery rate of 0.05 [25]. After filtration, seven terms (‘Complement and coagulation cascades’, ‘ECM-receptor interaction’, ‘Neuroactive ligand-receptor interaction’, ‘Hematopoietic cell lineage’, ‘Lysosome’, ‘ABC transporters’, ‘Aminoacyl-tRNA biosynthesis’) were significantly enriched in the Kenyan individual. ‘Olfactory transduction’ was significant in the South African individual. The differences in significant terms may reflect population-specific selection pressures on wild dogs. Further population-level investigation is needed to determine the roles these pathways play in Lycaon evolution.

In order to identify regions of low and high diversity, we calculated the numbers of segregating SNP sites across the Lycaon autosomes in 100,000 bp non-overlapping windows using VCFtools 0.1.15 (Fig. 3) [26, 27]. By averaging over large genomic windows, we limited the effects of sequencing errors and allelic drop-out due to low sequencing coverage. We identified 768,416 segregating Lycaon SNPs (Kenyan: 398,891 SNPs, South African: 434,911 SNPs). We observed individual-specific regions of low diversity (<10 segregating SNPs/100,000 bp). The Kenyan individual had runs of low diversity (contiguous regions of low diversity at least 5 million bp long) on chromosomes 4, 6, 7, 12, 15, 21, 27, and 30, while the South African individual had runs of low diversity on chromosomes 1, 5, 8, 12, 14, 19, 27, 29, 30, 34, 36 and 38. These low-diversity regions may be the result of inbreeding and/or population-specific natural selection. Due to the long lengths of these low-diversity runs, encompassing numerous genes, we are not currently able to link low diversity levels to selection on individual genes. These results are not surprising since both populations are recently re-established, either by natural recolonization (Laikipia, Kenya) or deliberate reintroduction (KwaZulu-Natal, South Africa). Previous genetic investigations using microsatellites and mitochondrial DNA found some evidence of rare inbreeding in wild dogs from the Greater Limpopo Transfrontier Conservation Area and KwaZulu-Natal [10, 28]. However, free-ranging wild dogs strongly avoid incestuous matings [10]. For instance, at KwaZulu-Natal, Becker et al. [10] observed only one of six breeding pairs being more closely related than third-order kin. While our chromosomal diversity data do not permit us to discern between inbreeding and/or population-specific natural selection, the possibility of inbreeding is, therefore, concerning from a conservation standpoint. Additional population-level data are required to determine the causes and effects of these low-diversity regions.

Fig. 3 Segregating autosomal SNP sites across the Lycaon genomes. Chromosomes are distinguished by color and separated by black lines. The number of segregating SNPs per 100,000 bp window is plotted on the y-axis in logarithmic scale. We identified population-specific regions of low diversity in both the Kenyan (chromosomes 4, 6, 7, 12, 15, 21, 27, and 30) and South African (chromosomes 1, 5, 8, 12, 14, 19, 27, 29, 30, 34, 36 and 38) individuals. There are also highly variable regions on chromosomes 3 and 16 in both individuals, chromosome 26 in the Kenyan individual, and chromosome 19 in the South African individual Full size image

We also identified regions of high diversity (>200 segregating SNPs per 100,000 bp) on chromosomes 3 and 16 in both individuals and chromosome 19 in the South African individual (Fig. 3). The high-diversity regions comprised 0.44% of the total segregating SNPs in both individuals and included 1138 Kenyan and 970 South African segregating SNPs on chromosome 3,646 Kenyan and 704 South African segregating SNPs on chromosome 16, and 249 South African segregating SNPs on chromosome 19. Using the UCSC Genome Browser [29], we scanned the genome within 100,000 bp upstream and downstream of these high-diversity regions for known genes mapped to CanFam3.1. These included FER, FKBP3, SNRPF, and PJA2 on chromosome 3, HUS1 and PKDIL1 on chromosome 16, and RNF150 and TBC1D9 on chromosome 19. These genic regions may have undergone positive or relaxed selection since the Lycaon/Canis divergence or may represent chromosomal duplications. Future functional analyses will determine which roles (if any) these genes play in Lycaon evolution.

Positive selection on the mitochondrial genome

Each of the 13 protein-encoding mitochondrial genes from the two novel Lycaon mitochondrial genomes were aligned against the corresponding genic sequences from the domestic dog mitochondrial reference sequence and the one publically available near-complete Lycaon pictus mitochondrial genome sequence [GenBank:KT448283.1] [16, 30]. We calculated ratios of non-synonymous substitutions per non-synonymous site to synonymous substitutions per synonymous site (dN/dS) and tested for positive selection using the codon-based Z-test in MEGA6 [31]. We found evidence for positive selection on 12 of 13 genes (COX1, COX2, COX3, ATP6, ATP8, ND1, ND2, ND3, ND4, ND4L, ND5, CYTB; Additional file 5). To determine whether selection occurred primarily on the Lycaon or Canis branches, we reran these analyses excluding the domestic dog sequence. We found evidence for positive selection on all 13 Lycaon mitochondrial genes (Additional file 5).

To confirm branch specific signatures of selection, we aligned the Lycaon CYTB sequences against published canid taxa [GenBank: AY656746, EU442884, GU063864, JF342908, KF646248, KT448273–4, KU696390, KU696404, KU696408, NC_008093, NC_013445] using MAFFT 7.222 [32] as implemented in Geneious 10.0.2 (Biomatters, Ltd., Auckland, New Zealand). We then generated a phylogenetic tree with FastTree 1.0 [33]. Using the ‘codonml’ algorithm in PAML 4.8 [34], we performed pairwise comparisons on CYTB codon data and compared the likelihood of the alternate (non-fixed omega values) or null hypotheses (fixed omega values). A likelihood ratio test was calculated from ln values obtained from these comparisons to determine where evidence of selection was occurring in canids. We found significant evidence (p ≤ 0.001) of positive selection between African wild dogs and coyotes (Canis latrans) (statistic: 128.88) and between Lycaon and Cuon (statistic: 81.48). We also found branch-specific selection between the Kenyan and South African Lycaon individuals (statistic: 92.54).

The 13 candidate selected Lycaon mitochondrial genes are involved in the electron transport chain and adenosine triphosphate synthesis. This suggests natural selection on Lycaon metabolic processes e.g. [35, 36], which is likely given their unique antelope hunting strategies and diet. Moreover, these results are consistent with African wild dogs’ very high metabolic rate and hunting energy expenditure in comparison to domestic dogs [37].

Pelage genes

We extracted CDS corresponding to 11 genes involved in canid coat color (agouti signaling peptide [ASIP], β-defensin 103 [DEFB103A], melanocortin 1 receptor [MC1R], melanophilin [MLPH], microphthalmia-associated transcription factor [MITF], premelanosome protein [PMEL], tyrosinase-related protein 1 [TYRP1]) and type (fibroblast growth factor 5 [FGF5], keratin 71 [KRT71], R-spondin 2 [RSPO2]) [38–46] using Geneious 9.0.4. In cases where there were multiple isoforms or CDS annotations (DEFB103A, FGF5, MITF, PMEL), we chose the longest variant alignable to the domestic dog CDS reference sequence to maximize detection power. To detect positively selected genes, we identified non-synonymous and synonymous SNPs compared to the domestic dog sequence and calculated the non-synonymous/synonymous (N/S) ratio (Additional file 6).

ASIP and PMEL had elevated N/S ratios suggestive of positive selection (5.00 and 9.00 respectively). Lycaon PMEL also had a stop codon gain at amino acid 341, suggesting selection at this locus. Additionally, we found a threonine insertion at amino acid 371 in the Lycaon MLPH gene and a six amino acid deletion corresponding to domestic dog amino acids 186–191 in the Lycaon MITF gene. To further characterize these four candidate genes, we compared the Lycaon coding sequences against all publically available canid sequences using BLAST+ 2.5.0 [47]. None of the Lycaon ASIP, PMEL, MITF, and MLPH CDS haplotypes have been identified in other canids previously. Lycaon ASIP shares 99% nucleotide identity, but only 96% amino acid identity, with domestic dogs. Wild dog PMEL shares both 99% nucleotide identity and amino acid identity with domestic dogs. Excluding the six amino acid deletion, Lycaon MITF haplotypes had >99% nucleotide and amino acid identity with domestic dogs. Lycaon MLPH haplotypes had 98% nucleotide identity and 97–98% amino acid identity (excluding the amino acid insertion) to domestic dogs.

These four genes are strong candidates to explain the characteristic Lycaon pelage. Mutations in ASIP alter relative production of eumelanin and pheomelanin, resulting in lighter and darker hair colors, in numerous species including domestic dogs [41]. Variants in PMEL cause merle patterning in domestic dogs [40]. MITF is associated with white-spotting phenotypes in domestic dogs and causes coat color variants in laboratory mice (Mus musculus). MITF variants are also associated with deafness, small eye size, and poor bone resorption in mice [46]. Nevertheless, further laboratory assays (such as transgenic experiments) are needed to confirm that these identified variants are functional and to determine their phenotypic effects. Furthermore, our data do not permit us to distinguish between positive selection on the Lycaon and Canis branches (e.g. coat variation associated with dog domestication and breed development).