Evolutionary history

To provide an accurate evolutionary framework for the comparative genomic analyses of AWDs relative to other wolf-like canids, we first reconstructed the phylogenetic relationships among species of Canis, Cuon, and Lycaon. Species tree analysis using ASTRAL-III19 produced 102 distinct gene trees from 8,117 25 kb alignments sampled from 38 autosomes (~203 Mb), with the final species tree being the consensus topology of the 100 replicates from the analysis (Fig. 1). The resulting topology shows the Ethiopian wolf basal relative to the rest of Canis, with dhole and AWDs as successive sister lineages having diverged earlier (Fig. 1). Within Canis, the golden jackal and African wolf are independent lineages sister to the clade comprised of the coyote, gray wolf and domestic dog. These patterns of relationship are consistent with previous analyses based on nuclear DNA sequences12,15,20.

We next estimated the age of divergence of the AWD lineage using the inferred tree and two methods. First, we computed average genomic divergence times using MCMCTree21, in which two fossil priors were used to calibrate nodes of the phylogeny (see Methods for details). Our estimates suggest an average genomic divergence of 3.91 mya (95% HPD = 3.30–4.50 mya) between Lycaon and the clade containing Cuon and Canis, approximately two million years earlier than the earliest fossil evidence recorded for the lineage1. This more ancient divergence is feasible because average genomic divergence captures not only the time since species divergence, but also the time for lineages to coalesce in ancestral populations, which are known to be very large in canids18. To better address this discrepancy, we thus jointly inferred a complete demographic model using G-PhoCS22, taking into consideration gene flow between 44 possible pairs of branches in the phylogeny (see Methods for details). Indeed, the species divergence time for Lycaon is then estimated at 1.72 mya (95% HPD = 1.70–1.74 mya; Table S2 and Fig. 1), which is much closer to estimates from both the fossil record and recent analyses of whole-genome data1,12,17. Importantly, while our inferred model suggests prevalent gene flow between divergent canid species, Lycaon is inferred to be largely isolated from genetic exchange with other canid lineages. This isolation provided more time for unique genomic adaptations to evolve.

African wild dogs are uniquely enriched in positively-selected genes related to primary cilia

To identify positive selection events that occurred on protein-coding genes during the evolution of the AWD lineage, the sequencing reads for four AWDs and eight other canid species were mapped to the domestic dog reference assembly (CanFam3.1) to take advantage of the high-quality annotation of the dog reference genome (Table S1). The mapping process was based on the GATK Best Practices pipeline (Methods). For almost all canids, we found that more than 97% of reads successfully mapped to the dog genome. The only exception was a low coverage (12.1x) AWD that had ~93% of the reads mapped to the dog. To avoid potential reference bias from aligning reads to a different species, we further confirmed our results on three recently published de novo AWD reference genomes13.

After calling genotypes with SAMtools and filtering with GATK 3.723 as well custom python scripts, we identified ~19,000 orthologous protein-coding genes. Among these genes, 18,327 passed our quality filters (no internal stop codon, permissible length, and longest transcript) and were used to identify genes under positive selection using the branch-site model21. This test was conducted on each multi-species gene alignment generated with PRANK v.15080324 and using the topology in Fig. 1 as the guide tree. AWD, dhole, and gray wolf were specified as different foreground branches. A gene was considered positively selected if the value obtained from the likelihood-ratio test comparing a model where the ratio of nonsynonymous substitutions (dN) to synonymous substitutions (dS) was greater than 1 (dN/dS > 1) against a null model where dN/dS = 1. Significant differences were determinated with a chi-square distribution with 1 degree of freedom25.

One issue with the branch-site model is that it is highly sensitive to alignment errors. Therefore, we conducted an extensive filtering process on our data, first using SWAMP26 and then visually inspecting the alignments of genes with p < 0.05. After masking regions with unusual enrichment of amino acid changes, we conducted three independent runs for each foreground branch and gene family, and retained the one with the best likelihood-ratio score of each run27. This guaranteed that large log-likelihood ratios depicted from the branch-site model were not the result of convergence problems of the test27. Another concern in the exploration for genes under positive selection is the role of multiple nucleotide changes28. Although these changes may occur simultaneously, the branch-site model assumes that they occur in a successive manner. The result will be unrealistically high likelihood-ratio scores at a codon were nucleotide changes occurred at the same time28. Among our 12 candidate genes (Fig. 1), we identified three genes with multiple nucleotide changes (CC2d2a, TMEM67, PAH) and thus support for the positive selection on these genes should be interpreted with caution. Although it is challenging to elucidate the order of multiple nucleotide changes, our main conclusions are not affected even if we take a conservative approach and do not include such genes in the analysis.

We found 43 genes (Table S3) that were significant at a false discovery rate (FDR) of 20%, after conducting multiple hypothesis testing of 18,327 genes along the three foreground branches (AWD, dhole, and gray wolf). Since only a few genes passed the genome-wide significance threshold, we used all genes with a p-value ≤ 0.01 to test for enrichment of gene functions with G-profiler29. Ensembl identification of genes with a p-value ≤ 0.01 were input as query lists and the 18,327 total gene set was used as the background list. We allowed a minimum of two genes to overlap between query genes and genes belonging to a gene ontology (GO) term. This resulted in genes with specific signals of selection in AWDs overrepresented in terms related to primary cilia (Fig. 1), which are significantly involved in coordinating signaling pathways during mammalian development30.

The disadvantage of common tests for gene ontology enrichment like G-profiler29 is that an arbitrary significance cutoff must be specified, and data below that cutoff is expected to be lost. Therefore, instead of just focusing on some outlier genes with high likelihood ratios, we used polysel31 to conduct analyses of polygenic selection across the full set of tested genes. We looked for pathways that were overrepresented with genes having low or moderate likelihood ratios more than would be expected by chance. This model takes likelihood ratio test statistics estimated from the branch-site test and finds weak to moderate polygenic selection within biological pathways. Then, p-values are generated from an empirical null distribution obtained by randomly sampling gene sets in specified pathways. Using this approach, we also found that the “primary cilia” GO category was significantly enriched with a variety of levels of positive selection (Fig. 1b). To rule out the possibility that this GO category could be enriched just by chance due to the large number of genes, we tested for overrepresentation of significant genes in primary cilia in the gray wolf and dhole and found no evidence of enrichment. Finally, to account for possible errors generated from mapping short reads of the AWD to a different species (domestic dog), we verified every mutation reported in this study (e.g., nucleotide and amino acid deletions and substitutions) with a consensus sequence of the three recently published de novo AWD reference genomes (NCBI Bioproject PRJNA488046; Table S1)13.

Digit reduction through apoptosis

Two developmental mechanisms of digit reduction from the ancestral five-digit morphology have been characterized in mammals. One is related to a complete absence of a digit during development through regulation of the transduction of sonic hedgehog (SHH) signaling and the other involves apoptosis of digits during early development32. The loss of the first digit, as found in AWDs, has been shown to be independent of SHH signaling33. Therefore, we focused our analyses on genes associated with apoptosis pathways, particularly those related to digit development.

We used the Variant Effect Predictor annotation tool34 to identify amino acid-changing substitutions unique to the AWD that could have a significant impact on the associated proteins but will be ignored by the branch-site model test. We identified 403 genes with both high and moderate impact. High impact indicates a disruptive substitution that could cause truncation, loss of function, or nonsense-mediated decay of a protein whereas moderate impact indicating a non-disruptive substitution that might change protein functional efficiency. The substitutions we identified were categorized as in-frame indels, frameshift variations, and stop codon gains (see Methods for details).

Strikingly, we found 596 genes with in-frame-deletions, with moderate impact, unique to AWDs. These amino acid deletions were tested for enrichment of GO categories using G-profiler29. We found that this type of mutation was overrepresented in digit-loss categories with a false discovery rate of 5%. Specifically, the term, “abnormality of the thumb,” was over-represented by the genes FANCC, FANCD2, and FANCM, which are associated with the Fanconi anemia (FA) pathway35. We also observed an overrepresentation of 33 genes with frameshift variation mutations in the olfactory receptor (OR) GO category. Twenty eight of these genes were also enriched in the olfactory transduction KEGG category, in accordance with the dynamic evolution of OR gene families36.

Our results implicate amino acid deletions in genes associated with the FA pathway in the loss of the first digit in AWDs through an apoptosis pathway that typically directs interdigital cell death (Fig. 2). Specifically, in the development of the ancestral five-digit foot, the primary function of apoptosis is to eliminate excessive cells on the interdigital webs which trims the dimension of the digit32,37. When this digit individualization occurs during development, apoptosis has only a small effect on digit dimension38. Studies have shown that FA proteins form a complex with the CtBP1 protein that results in the repression of the DKK1 gene. As expression of this gene is restricted to the interdigital area during the early development of digits, its repression prevents apoptosis from extending into the digits37,39. When the FA-CtBP1 complex fails to properly form, inhibition of DKK1 expression is removed, thereby permitting apoptosis to extend to the digits. We suggest that the amino acid deletions in FANCC, FANCD2, and FANCM found exclusively in AWDs may reduce the binding affinity of the FA protein complex to CtBP1, thus allowing the loss of the first digit through apoptosis (Fig. 2). Our findings are strongly supported by the fact that the mutations we have identified in the FA genes in AWDs are responsible for a condition commonly associated with the absence of the first digit in humans35,40 and expression of DKK1 is related with absence of the first digit in mice41. Moreover, apoptosis as a mode of evolutionary digit has also has been shown in horses, jerboas, and camels32.

Figure 2 Apoptosis of the first digit in the African wild dog. (a) Primitive condition of five digits in the Canidae; note the small first digit in gray wolf called the “dewclaw”. The absence of the first digit is shown in the African wild dog (AWD). (b) Schematic representation of digit reduction and separation of digits. In the normal five-digit pattern scenario shown at the top, apoptosis (blue circles) is restricted to interdigital regions. The first digit, enclosed by a rectangle, is protected from apoptosis by FANCC and FANCM that form a complex with CtBP1 and repress DKK139. The stability of this complex is regulated by the FANCD2-FANCI association100. In the scenario shown below, amino acid deletions (red stars) observed in FA genes may reduce both the affinity of FA genes to CtBP1 and the stability of the protein complex. Consequently, the FANCC-FNACM-CtBP1 complex is not formed and DKK1 is not suppressed (indicated by empty arrow). Deficiency of the FA complex activity increases DKK1 expression. As a result, apoptosis expands to the first digit; note blue circles (apoptosis) on the region of the thumb. (c) Effect of the mutations in the FA and DKK1 genes in humans (Reprinted from ref.35© 2009 with permission from Elsevier) and mice (Reprinted from ref.41 © 2004 with permission from Elsevier). (d) Multiple sequence alignments of mammalian FANCD2, FANCM, and FANCC amino acid sequences showing deletions specific to AWDs. Five AWDs are shown; “RWK481” and “SAMN04312208” are individuals from Kruger National Park, South Africa; “CN3669” and “SAMN04312209” are individuals from Kenya and “Dnv Lycaon pictus” is the consensus sequence of three de novo reference AWD genomes13. The top panel also shows a 3D protein-model of FANCD2 with the location of the observed amino acid deletion, which is important for the association with FANCI. Full size image

Hypercarnivory through sonic hedgehog (SHH) signaling

The primary cilium is a hair-like structure that projects from the surface of cells and serves as a sensory organelle, transmitting signals from the extracellular space into the nucleus42 (Fig. 3a). This structure is located at the dental epithelium and mesenchyme during the formation of dental cusps43,44. As the primary cilium regulates numerous signaling pathways necessary for odontogenesis, ciliary defects can alter the process of cusp patterning45.

Figure 3 Possible mode of evolutionary molar cusp modification in the African wild dog through SHH signaling. (a) The figure at the top right shows genes found to be enriched in primary cilium with polysel; genes above dashed line are those found to be enriched with G-profiler as well (see Methods for details). Only gene names with known function in tooth development are shown. The figure at the top left shows a schematic of the primary cilium and the role of candidate genes in SHH transduction. SHH, represented by red circles, reaches the ciliary membrane. Then, transport of transcriptional factors GLIs such as GLI1, through the axoneme, is promoted by WDR35/IFT121 as well as the efficient docking of the axoneme in the plasma membrane, which is conducted by CC2d2a and TMEM67. Together, these components of primary cilium cause rapid accumulation of GLI into the basal body. Ultimately, GLIs will enter the nucleus and promote the expression of SHH-dependent genes. In the case of CREBBP, the observed amino acid deletion (red star) may increase the affinity of this cofactor with smad genes and increase the expression of TGF-β and BMP dependent genes involved in molar cusp development. (b) Left-bottom figure showing the single cuspid talonid of the lower first molar (carnassial) in the AWD as opposed to a bi-cusped talonid carnassial in the gray wolf. Right-bottom figures show the effect that mutations in CREBBP have on humans53 (reprinted by permission from Wiley-Liss, Inc.; American Journal of Medical Genetics53 © 2007); talon cusp condition is observed; cusps that protrude from the anterior region of incisors on the left and extra cusps on molars on the right. (c) Amino acid alignment of the glutamine-rich region of CREBBP for 51 species of mammals showing the deletions specific to AWDs and the dhole. Five AWDs are shown; “RWK481” and “SAMN04312208” are individuals from South Africa; “CN3669” and “SAMN04312209” are individuals from East Africa; “Dnv Lycaon pictus” is the consensus sequence of three de novo reference AWD genomes13. The ancestral condition in canids is inferred as 15 glutamine residues. Full size image

We found AWD-specific amino acid substitutions in components of the primary cilium, specifically the genes WDR35, TMEM67, CC2d2a and GLI1. These substitutions suggest a combined regulatory effect, through GLI transcription, on SHH-dependent genes (Fig. 3a). Specifically, the role of primary cilia depends on the retrograde (to the basal body region) and antegrade (to the tip of the ciliary membrane) intraflagellar transport (IFT) of the transcriptional factor GLI143,46 (Fig. 3a). This transportation occurs through a microtubule structure called the axoneme. The efficacy of GLI mobilization through the axoneme is dependent upon protein-mediated transport by WDR35 (also known as IFT121) as well as the proper docking of the axoneme into the basal body region, which is mediated by TMEM67 and CC2d2a proteins43,47,48 (Fig. 3a). Mutations within genes encoding primary cilium components alter mobilization of GLI to the basal body, and hence result in gain or loss of GLI function in this region. An increase in GLI will result in the gain of SHH phenotypes such as growth of molar cusps. In contrast, a decrease in GLI will cause loss of SHH phenotypes49 such as inhibition of molar cusp development44. We suggest that the amino acid changes observed in AWDs in WDR35, TMEM67, CC2d2a, and GLI1 may cause rapid transportation of GLI to the basal body, and consequently overexpression of SHH target genes43,50. Variants in the candidate genes reported in this study have been associated with abnormal tooth shape and may thus be related to the exaggerated hypercarnivorous dentition of AWDs, which includes the development of a lower trenchant carnassial43,50,51. Most of the amino acid sites from these genes were unique to AWD even when compared to other placental mammals and marsupials, including two South American canids, Speothos venaticus and Chrysocyon brachyurus (Chavez et al., unpublished data, Fig. S2). Also, specific sites in CC2d2a had a relatively high probability of being affected by positive selection as suggested by a Bayes empirical Bayes test (BEB > 0.90). Although nucleotide changes were not recorded for GLI1 with BEB, mutations in this gene were located within significant windows as suggested by the \(\frac{\theta }{D}\) estimate (p < 0.01; see Methods section) as estimated from the HKA-like test52, after verifying that the amount of information within windows of 25 kb in length was not driving higher differences between diversity and divergence (Fig. S3). Our results suggest that the transduction of SHH through primary cilium may have promoted the modification of a primitive molar with a posterior crushing basin into a trenchant sectorial single cusp in AWDs (Fig. 3b).

Another gene associated with spatial patterning of the tooth, CREBBP, was found to have an amino acid deletion in AWDs. Interestingly, we also observed a two amino acid deletion in the same region in the hypercarnivorous dhole, a canid that also possesses a lower trenchant carnassial heel (Fig. 3b)4. CREBBP is a strong candidate for the modified carnassial observed in these two hypercarnivorous canids. Notably, this gene is associated with abnormal numbers or features (talon cusps) of molar cusps in humans53 (Fig. 3b). Even though CREBBP is ubiquitously expressed, the shared amino acid deletions were in the glutamine-rich region of the protein (Fig. 3c). This region is highly conserved in eukaryotes and serves as the binding site for Smad proteins54,55. These proteins are transcriptional factors that regulate the expression of genes located at the dental lamina and mesenchyme, and play important roles in regulating differentiation and proliferation of cells during tooth development56. Specifically, Smad transcription factors enter the nucleus and bind to the coactivator CREBBP and regulate the expression of target genes (Fig. 3a). We suggest that the observed amino acid deletions observed in CREBBP in AWDs and dholes may alter the formation of the Smad-CREBBP complex. This ultimately will have a regulatory effect on the expression of TGF-β and BMP dependent genes56. In Smad knockout mice, dental cusp formation is affected56. Our results suggest that the formation of the trenchant heel of AWD, initially guided by primary cilium components, may be reinforced by a regulatory effect of Smad transcriptional factors on genes involved in tooth development56. Also, our findings suggest that this may be the regulatory pathway that also determines the blade-like cusps in the dhole.

The shared amino acid deletions in the CREBBP gene observed in AWDs and dholes (Fig. 3c) could have arisen through different routes. First, the changes could have evolved independently in each species. Alternatively, the changes could have resulted from past adaptive introgression between species, as has been found among species within the Panthera lineage57. To account for the latter possibility, we conducted tests of admixture in the context of the evolutionary and demographic history of the sampled canids22. Results from models with and without gene flow among different lineages suggested a history of extensive admixture among species of Canis (Supplementary Discussion; Table S2 and Fig. 1), consistent with recent findings12. However, the genomic data for AWDs and dholes suggest little or no gene flow between these lineages and those leading to Canis species. Although the lack of post-speciation gene flow between the dhole and AWD could suggest that the amino acid deleting changes in CREBBP evolved independently, we do not entirely rule out the possibility of shared post-divergence ancestry. Particularly, we found a low proportion of divergent sites between the AWD and dhole (only 9 divergent sites out of 3,000 flanking sites analyzed) around CREBBP that suggest a plausible shared ancestry. Regardless of the mode of evolution of CREBBP, the amino acid deletions shared between the AWD and the dhole may reflect similar selective forces favoring hypercarnivory4.

Positively-selected genes associated with AWD pelage coloration

Assuming that positive selection could have occurred in genes associated with the unique pelage coloration and patterning seen in AWDs, we tested a set of 151 genes that have been shown to be involved in mammalian pigmentation58,59,60,61,62 using the branch-site test21. We found six genes with AWD-specific signals of positive selection, three of which are known to have relevant function in coat coloration: MYO5A, HPS6, and PAH (Fig. 4a). The resulting amino acid changes were confirmed in a consensus sequence (“Dnv Lycaon pictus” in Fig. 4b) of three high-coverage de novo AWD genomes. Moreover, most amino acid changes in coat color were unique to AWDs when compared to other species of placental mammals, marsupials, and monotremes (Fig. 4b).

Figure 4 Candidate genes and possible mechanisms associated with coat pigmentation and patterning in African wild dogs. (a) Plot showing Q-values depicted from the branch-site test after a multiple hypothesis correction of 151 different coat color genes with the AWD as the foreground branch (see methods). Significant genes (Q-value < 0.20) are shown above the horizontal blue line; for illustration purposes, Q-values shown on the Y-axis were transformed to -log10. Only names of genes with relevant functions are shown. Illustration at the bottom showing coat color in the AWD; genes are shown in red circles and are placed on the type of color they regulate (e.g., PAH regulates brown and yellow colors). (b) Amino acid alignment of coat color candidate genes for an average of 49 species of mammals showing changes specific to AWDs. Five AWDs are shown: “RWK481” and “SAMN04312208” are individuals from Kruger National Park, South Africa; “CN3669” and “SAMN04312209” are individuals from Kenya and “Dnv Lycaon pictus” is the consensus sequence of three de novo reference AWD genomes13. (c) A schematic showing the role of candidate genes in color pattern. (I) shows that black coat color will need both proper cargo of melanin by HPS6 to melanosomes and its transport by MYO5A to the bulk of the hair. (II) In the case of white blotches or spots, they could be the result of either proper transport of melanosomes but containing no melanin or melanosomes containing melanin that fail to reach the bulk of the hair. The figure on the bottom-right shows the effect of MYO5A mutations on the pelage of a domestic mouse69. (d) 3D model of the PAH protein depicted with SWISS-model (see methods) that shows a mutation on the cofactor of the enzyme (B\({{\rm{H}}}_{4}\)). The scheme at the bottom illustrates the role of PAH and its cofactor. The right figure shows a gradual recovery of the black color of yellow mice (PKU) with deficiency of PAH65 (Reprinted by permission from Springer Nature: Springer Nature, Gene Therapy ref.65 © 2006). Full size image

Considering that positively-selected substitutions related to the pigmentary system might fall outside protein-coding regions and could have also occurred during recent evolutionary history, we used the Hudson-Kreitman-Aguadé (HKA) test to examine regions with high divergence and low diversity for signals of selective sweeps. To conduct this test, we first called genotypes using Haplotype Caller in GATK 3.823. We then calculated polymorphism within 100 kb windows among the four AWDs genomes that were mapped to the domestic dog reference genome. At the same time, per-site divergence was calculated between the AWD with the highest coverage (LPI_RKW 4881) and the domestic dog. Considering that demographic effects are expected to have an effect across the entire genome, we used empirical p-values to identify loci with extreme values of high divergence and low diversity as evidence for positive selection.

After verifying that the amount of information within 100 kb windows was not driving higher differences between diversity and divergence (Fig. S3), a total of 159 genes were located within windows with a magnitude of differentiation greater than expected by chance (empirical p-value < 0.01). Among the genes identified in the HKA test, seven were also observed in the branch-site test (Fig. 1). From this set of genes, we identified HPS6 as a candidate gene that may have been recently selected. Among the set of genes with AWD-specific signals of positive selection, we did not observe four genes (ASIP, MITF, MLPH, PMEL) that were previously shown to have elevated ratios of non-synonymous/synonymous substitutions in two lower coverage AWD genomes11. For example, these authors reported a stop codon-gain in PMEL, but we found this was due to a misorientation in the codon translation frame (Fig. S4).

The positively-selected genes associated with pigmentation we detected have notable functions that strongly suggest regulation of the variegated pelage of AWDs (Fig. 4a). Although phenylalanine hydroxylase (PAH) has several pleiotropic effects, phenylalanine levels are closely related to melanin deposition63,64,65 (Fig. 4d). Levels of phenylalanine are determined by the PAH gene, whose protein catalyzes hydroxylation of phenylalanine to tyrosine64,66. Using available information to construct a 3D structure of the PAH protein, we located the AWD-specific mutation in the biopterin H domain (Fig. 4d), which contains a binding domain for the PAH cofactor67. This finding suggests that the mutation observed in the protein domain of PAH could be in part responsible for the AWD coat color pattern, as this gene and its cofactor are known to regulate the proportion of yellow and black fur64 through the conversion of phenylalanine to tyrosine66 (Fig. 4d). The observed amino acid change in MYO5A (myosin 5 A) is located at the myosin head region (Fig. S5), which is relevant to the motor function of the protein and the transportation of melanosomes68. Mutations in this gene result in patchy color patterns in mice69 (see Fig. 4c) and dilute pigment in dogs61. The AWD-specific mutations observed in MYO5A could be associated with white and black patches by regulating melanosome transport to the bulk of the hair70,71 (Fig. 4c). Similarly, sites under selection in HPS6 may be responsible for the white and black patches by regulating melanin deposition in melanosomes59,72 (Fig. 4c). Mutations in this gene are associated with dilution of melanin in mice72 and Hermansky-Pudlak syndrome in humans (oculocutaneous albinism), a group of autosomal recessive disorders that cause abnormally light coloring of the hair and skin73. As all AWDs are born black2 and develop pigmentation patterns as puppies, longitudinal studies of gene expression will help corroborate the function of these genes in AWD pigmentation.