Sample collection and whole-genome sequencing

Ten dingoes and two NGSDs were sequenced for the current study. The samples of dingoes have a wide distribution across Australia (Fig. 1a), and the two NGSDs are from the NGSD Conservation Society stud book. After DNA extraction, individual genomes were sequenced to an average of 14.7×. We also retrieved 97 canine whole-genome sequences from published articles3,21,30,31,32,33, which involved 1 dingo, 1 Taiwan village dog, 43 indigenous dogs from China and Vietnam, 19 individuals from various breeds, 4 village dogs from Africa, 6 Indian village dogs, 3 village dogs from Indonesia, 3 village dogs from Papua New Guinea, and 21 wolves from across Eurasia (Supplementary Data 1). Downloaded data have a high quality and an average sequencing depth of 14.6×. Overall, the dataset covers all major dog and wolf groups34 that are putative ancestors of dingoes. Raw sequence reads were mapped to the dog reference genome (Canfam3) using the Burrows-Wheeler Aligner (BWA)35. DNA sequence analysis was done using the Genome Analysis Toolkit36. After strict filtering, we identified ~24.7 million autosomal SNPs for further analysis (see details in the Methods).

Fig. 1: Population structure and genetic diversity of 109 canids. a Geographic locations of the 109 canids analyzed in this study. The map was drawn by the R Packages (maps: https://CRAN.R-project.org/package = maps). b Principle component analysis of the 109 canids. c Principle component analysis of only the dingo and NGSD, red = dingo from Southeast Australia; blue = dingo from West/central Australia; brown = dingo from Northeast Australia; green = NGSD. d Structure analysis of all the 109 individuals. e A phylogenetic tree for all the 109 individuals. W, wolves (orange in tree); BV, breeds and village dogs outside China/SE Asia (sky blue in tree); NI, indigenous dogs from north China (green brown in tree); TW, Taiwan indigenous dog (gray in tree); SI, indigenous dogs from southern China (purple in tree); D, dingoes (red in tree); NGSD, New Guinea Singing Dogs (green in tree); IN, Indonesian village dogs (deep blue in tree);B, breeds; IB, intermediate breeds; EB, European breeds; PG, Papua New Guinea village dogs; ID, Indian village dogs; AI, African village dogs. Full size image

Population structure and phylogenetic analysis

Principal component analysis (PCA) of the 109 individuals was performed to explore the relationships among dingoes, NGSDs, and other canids. In a two-dimensional plot of the genotypes, there is a clear separation in three groups: wolves, dogs and dingoes/NGSDs. Dogs can be divided into two basic groups: dogs from Europe and indigenous dogs from Asia. All dingoes and NGSDs cluster together tightly, on a relatively large distance from the dogs. Thus, the dingo and NGSD populations are genetically clearly distinct from domestic dogs. Among the dogs, Indonesian village dogs cluster closest to the dingoes/NGSDs, followed by indigenous dogs from southern East Asia (South China) (Fig. 1b). We then analysed the 10 dingoes and 2 NGSDs separately, to explore their detailed structure (Fig. 1c). The two-dimensional plot separates NGSDs from the dingoes. The dingoes cluster together, but are distributed in three sub-clusters in accordance with geographical origin: Southeast, West/central and Northeast Australia. This suggests that there are subpopulations within the dingo population.

To explore the genetic relationships among the 109 individuals, we also performed a structure analysis using the expectation maximization (EM) algorithm in ADMIXTURE to cluster the individuals into different numbers of groupings37. Partitioning the individuals into four groups gave least cross-validation error (0.31132), and separated the samples into: (i) wolves, (ii) dingoes and NGSDs, (iii) indigenous dogs from southern East Asia and Indonesia, and (iv) breeds and village dogs from other regions (Fig. 1d, Supplementary Figs. 1, 2). We find an admixture of these four components with varying proportions among indigenous dogs from northern China, some dog breeds and the village dogs from India and New Guinea, consistent with the results of the PCA. Notably, two dingoes, D05 and D06, show a mixture indicating hybridization with European breed dogs, and D05 originate from the region in Australia with highest incident of dingo-dog hybridization (the Southeast)26,27,28,29. Therefore, we performed D-statistics analysis using qpDstat in the Admixtools software package38 to test events of gene flow between the dingoes and European breeds in the form of D (Dhole, European breed; Pop1, Pop2), where Pop1 was all dog groups tested in turn and Pop2 was each individual dingo. Interestingly, the results varied depending on the tested Pop 1 (Supplementary Fig. 3). When Pop1 was NGSDs, there were seven dingoes showing significantly positive D (Z>3): (D00 (Z = 8.457), D01 (Z = 9.394), D03 (Z = 5.338), D05 (Z = 14.551), D06 (Z = 10.221), D07 (Z = 8.888), and D08 (Z = 5.423)). However, when Pop1 was any of the other populations (Indonesian village dogs, indigenous dogs from southern China, Taiwan indigenous dog, indigenous dogs from north China, Indian village dogs, African village dogs, respectively), there was significantly negative D (Z < −3) in all cases (Supplementary Fig. 3). We then used NGSDs as Pop2, testing D (Dhole, European breed; Pop1, NGSDs), where Pop1 was all dog groups and the dingoes tested in turn. This gave significantly negative D in all cases (Indonesian village dogs: −23.2, indigenous dogs from southern China: −27.8, Taiwan indigenous dog: −34.1, indigenous dogs from north China: −42.0, Indian village dogs: −34.7, African village dogs: −28.0 and dingoes: −8.8), but it is notable that dingoes had considerably less negative value than all other populations. This indicates that all dog populations show higher affinity to European breed dogs than the dingoes, except the NGSDs which have the lowest affinity to European breeds. Therefore, we made another D-statistics analysis, where Pop1 was all dingo individuals tested in turn and Pop2 was all other dingo individuals tested in turn. This showed that when three of the dingoes (D01, D05, and D06) were Pop2, they had significantly positive D (Z > 3) compared to most other dingoes and no significantly negative D (Z < −3) in any comparison (Supplementary Data 2), suggesting that these three dingoes could have a gene flow with European breeds. We also performed an additional D-statistics using the red fox33 as outgroup and obtained very similar results (Supplementary Data 2, Supplementary Fig. 4).

We further performed phylogenetic analysis by the Neighbor-Joining (NJ) approach (Fig. 1e, Supplementary Fig. 5). The result matches the observations from the PCA and structure analysis. First, dogs and dingoes/NGSDs separate from the wolves, and then they further split into two clades, one including dingoes and NGSDs together with indigenous dogs from Indonesia, southern East Asia and Taiwan, and the other including village dogs and breeds from all other regions. Indonesian village dogs are closest to the dingoes and NGSDs, and dogs from southern East Asia are basal to the clade. This suggests that indigenous dogs from southern East Asia may be the ancestors of dingoes. Notably, India has been suggested as a possible origin for the dingo but, similarly to the PCA, the dogs from India cluster in the second clade, far from the dingoes39,40. We also performed phylogenetic analysis by the Maximum-Likelihood approach (Supplementary Fig. 6), obtaining consistent results. We further used the qp3pop program38 to perform outgroup f3-statistics analysis in the form of f3(Dingoes, Pop2; Dhole)38,41, to assess the relative genetic similarity of the dingo population to the other populations (Supplementary Table 1). The highest value of the f3-statistics (indicating highest degree of shared genetic history) was obtained for the NGSD population, followed by Indonesian dogs, and indigenous dogs from southern China. We also performed TreeMix analysis (Supplementary Figs 7, 8). The topology is consistent with the aforesaid phylogeny constructed by the NJ and ML approaches, and indicates a single admixture event: from the dingo/NGSD clade to the Papua New Guinea village dog lineage. All these results agree in suggesting that indigenous dogs from southern East Asia were the ancestors of dingoes.

Notably, the branch to dingoes and NGSDs is relatively long. We therefore estimated nuclear diversity using the genetic diversity θ π , grouping individuals into six dog populations. The result shows that dingoes had the lowest diversity of the six dog populations (Supplementary Fig. 9). This suggests a severe bottleneck event in the evolutionary history of dingoes, or long periods of isolation with low effective population size, which may explain the long phylogenetic branch39,42.

Demographic and migration histories

Based on the results from the TreeMix, phylogeny, and Outgroup f3 analyses, the indigenous dogs from southern East Asia are plausible ancestors of dingoes and NGSDs, with the Indonesian village dogs as the most closely related population. To study the migration and demographic history of the dingoes, we performed a demographic analysis using G-PhoCS43. We repeated the computations three times by randomly picking 1000 neutral loci and randomly selecting three samples among the dingoes (Supplementary Table 2) and set the gene flow between southern East Asia dogs and Indonesian dogs. Based on a mutation rate of 1.3 × 10−9 per site per year44 and a generation time of 3 years40,41,42, our analysis indicates that the split between dingoes and Indonesian village dogs occurred around 8300 (CI: 5400–11,200) years ago and that, before that, Indonesian village dogs diverged from the indigenous dogs from southern East Asia around 9900 (CI: 6500–12,700) years ago (Fig. 2a, Supplementary Data 3). Furthermore, we performed a second round of G-PhoCS analysis replacing the two dingoes indicated to be admixed with dogs (D05 and D06) with D01 and D08, respectively (Supplementary Table 3), giving results consistent with the first analysis (Supplementary Data 4). The G-PhoCS analysis also shows that the dingo population has a very small effective population size compared to the dog populations. We also used smc++ employing unphased whole genomes to infer population history45. This analysis approximated the split between Indonesian village dogs and dingoes at around 9100 years ago, in consistence with the result of G-PhoCS. We also used smc++ to estimate dates for the population history of dingoes/NGSDs and dogs. The result shows that the dog population experienced a slight growth after the population split, while the dingo and NGSD populations suffered a decrease (Fig. 2b), followed by an increase possibly reflecting the expansion into the new ecological niches in Australia and New Guinea. Notably, the NGSDs show a severe decrease in more recent times followed by a sharp increase. This is consistent with the history of the western population of NGSDs (bred outside New Guinea the last 60 years), which originates from very few individuals.

Fig. 2: Demographic history of dingoes and dogs from southern East Asia. a Demographic history inferred for indigenous dogs from southern China (SV), Indonesian village dog (IN) and dingo using G-phocs. b Inferred effective population sizes over time for indigenous dogs from southern China (SV), Indonesian village dog (IN), dingo and NGSD using SMC++. Full size image

Mitochondrial genome analysis

We also performed phylogenetic analysis based on mitochondrial genomes, analyzing totally 35 dingoes and 3 NGSDs, the 10 dingoes and 2 NGSDs sequenced in this study and 25 dingoes and 1 NGSD from Cairns et al.46, in the context of 169 dogs and 8 wolves from across the Old World from Pang et al.47. We constructed a phylogenetic tree showing all dingoes and NGSDs to group into a single branch, separated from all domestic dogs except one, a dog (A103 10002) originating from Hunan in South China (Fig. 3a). The dingo/NGSD branch is part of the major domestic dog haplogroup A, to which approximately 75% of domestic dogs worldwide belong. Haplogroup A has six sub-haplogroups, and the dingo/NGSD branch is part of sub-haplogroup a2, which is frequent in dogs originating from across East Asia but absent in western Eurasia47. Notably, of the eight dogs clustering closest to the dingo/NGSD branch, seven were from Mainland or Island Southeast Asia and one from East Siberia. These results indicate that dingoes and NGSDs originate from domestic dogs in Southeast Asia, via Island Southeast Asia, and that dingoes and NGSDs are closely related, as earlier suggested24,42,46,48.

Fig. 3: Phylogenetic and demographic history analysis of mtDNA. a Neighbor-joining tree based on mitochondrial genomes from 35 dingoes and 3 NGSDs, and from 169 domestic dogs and 8 wolves from across the Old World, with 4 coyotes as outgroup. The yellow box and inset figure indicate the branch in which all dingoes and NGSDs cluster together with a single domestic dog from South China, and the 3 most closely related dogs outside this branch. b Map depicting geographic sampling of dingoes across Australia. Circles represent the 10 individuals sequenced in this study and triangles 25 additional samples from Cairns et al. The red line indicates the genetic subdivision between the southeastern/eastern part (S/E) and the rest of the continent. c Bayesian analysis of mitochondrial genomes for the sub-dataset identified in Fig. 3. The dingo/NGSD branch including all dingoes and NGSDs and a single South Chinese domestic dog and, as outgroup, the three most closely related dogs outside that branch. The scale axis indicates time estimates using the mutation rate of 7.7 × 10−8 per site per year with SD 5.48 × 10−9 from Thalmann et al.49. The colored branches indicate geographical origin of dingo samples, see Fig. 2. The star highlights the single dingo sample from southeast Australia that does not cluster in the S/E-related branch. Full size image

To study the detailed phylogeny among dingoes and NGSDs we created a sub-dataset including all individuals in the dingo/NGSD branch and the three most closely related dogs (yellow box in Fig. 3a), and constructed new phylogenetic trees (Fig. 3c, Supplementary Figs. 10, 11). These trees show a division of dingoes into two main branches, following a geographical distribution earlier reported by Cairns et al.46; all dingoes from Southeast and East Australia (we denote this region S/E), except one, group in one branch, and all dingoes from all other parts of Australia group in the other (Fig. 3b, c). The S/E-related branch also includes two of three NGSD samples, while the third NGSD and the domestic dog from South China (A103 10002) have an intermediate position. Notably, outside the S/E region there is only limited geographical structure among the dingoes. Thus, there is a genetic subdivision of dingoes between the southeastern/eastern part of Australia and the rest of the continent.

Molecular clock analysis (based on a mutation rate of 7.7 × 10−8 per site per year)49 suggests a most recent common ancestor (MRCA) for all dingoes and NGSDs (the division into the two main branches) ~8300 years ago, in agreement with the nuclear genome estimate for the split between dingoes and Indonesian dogs. Notably, the two main branches both have MRCAs ~4600 years ago, indicating population expansions.

Natural selection in feralization

Our analyses of population structure and demography confirms that the dingoes originated by feralization of domestic dogs around 8300 years ago and have remained virtually isolated from both the wild and the domestic ancestor until recent time. This affirms that the dingo is an excellent model for studies of the genomic effects of feralization. We used analysis of population branch statistics (PBS)50 and iHS51 to identify positive selection in the dingoes. Firstly, the PBS was calculated by the formula (Eq. 1, see Methods for details), with non-overlapping 20 kb genomic windows. By comparing the three pairwise Fst val \({\mathrm{PBS}}1 = \frac{{{{T}}_{{\rm{ds}}} \,+ \,{{T}}_{{\rm{db}}} \, - \, {{T}}_{{\rm{bs}}}}}{2}\) ues, we can estimate the frequency change that occurred in the dingoes50. We retrieved genomic regions with the top 5% PBS1 windows by the value of PBS1 >0.14476. Furthermore, we performed a windowed iHS test52, dividing the genome into the same non-overlapping 20 kb windows, and identified candidate regions for selection as those in which more than 30% of the sites had an iHS absolute value above the threshold (2.4217, top1% of iHS). In summary, the overlap of the two approaches indicated 87 candidate windows under positive selection, containing 50 genes (Supplementary Data 5) considered as candidates associated with feralization of dingoes.

We performed the GO enrichment evaluation using the parent-child model53 in the topGO R package54. To control for biasing factors, such as gene size and clustering of related gene families, we used the same approach as Pendleton et al.22. We calculated permutation-based p values (p perm ) for each GO term, and the parent-child significance scores observed for each GO term were compared with the distribution of identified gene sets by applying the parent-child test by 1000 randomly permuted genome intervals. Hereby, we identified 67 GO terms that were significantly overrepresented (p perm < 0.05) and represented by more than one gene (Supplementary Data 6). Notably, there were four GO terms related to metabolism: fatty acid derivative biosynthetic process (GO: 1901570, p perm = 0.001), fatty acid derivative metabolic process (GO: 1901568, p perm = 0.001), regulation of carbohydrate metabolic process (GO: 0006109, p perm = 0.001), and regulation of carbohydrate catabolic process (GO: 0043470, p perm = 0.01). These functions may be related to diet change of dingoes. The candidate genes include also genes related to, e.g., reproduction and neuronal function which may have played roles in the feralization adaptation of dingoes: Prss37 (Protease, Serine 37), shown to be required for male fertility in mice55, ARHGEF7 (Rho Guanine Nucleotide Exchange Factor 7) which promotes the formation of neural spine and synapses in hippocampal neurons56, and TAS2R5 (Taste 2 Receptor Member 5) which plays a role in the perception of bitterness57. Interestingly, four of the genes have previously been indicated to be related to the domestication of dogs: SLC5A1 and ZNF516 were identified by Axelsson et al.20, TAS2R5 and ZNF516 by Cagan et al.58, and a novel gene (ENSCAFG00000023577) was found by Pendleton et al.22.

Change in the candidate regions in two processes

To compare the candidate regions in the domestication and feralization steps we also performed the PBS analysis using the formula \({\mathrm{PBS}}2 = \frac{{{{T}}_{{\rm{ds}}} \, + \, {{T}}_{{\rm{ws}}} - {{T}}_{{\rm{dw}}}}}{2}\) (Eq. 2, see Methods for details), comparing dingoes, dogs from Southern East Asia and wolves, to identify genomic regions in dingoes that were more similar to wolf than to dog. High PBS2 values (the first percentile rank was used as threshold, 0.0766) indicate large difference between dog and dingo and between dog and wolf, but low difference between wolf and dingo. This identifies regions with large difference between dingo/wolf and dog, while regions with smaller difference may be ignored. Based on this, we identified 1100 windows with high PBS2 values, and compared these windows with the 87 windows identified as candidates associated with feralization by PBS1 and iHS. This identified 17 overlapping windows, containing 13 genes, with high values for PBS1 and iHS as well as for PBS2 (Table 1, Fig. 4). This suggests that these 13 genes were under positive selection in dingoes and also more similar to gray wolves than to domestic dogs. Functional annotation showed that four of these 13 genes are associated with neurodevelopment, metabolism and reproduction (Table 1, Supplementary Table 4). Specifically, ARHGEF7 (Rho Guanine Nucleotide Exchange Factor 7) may promote the formation of neural spine and synapses in hippocampal neurons56, SLC5A1 plays an important role in the absorption of glucose and sodium59, TAS2R5 (Taste 2 Receptor Member 5) may play a role in the perception of bitterness57 and Prss37 (Protease, Serine 37) is related to reproduction55. We visualized the genotypes in 70 canine samples for these four genes. This showed that the genotypes for dingo and NGSD were almost identical for all four genes, and that dingo/NGSD had low diversity with most positions being homozygote for the non-reference variants (Supplementary Fig. 12). It also showed that dingo/NGSD were more similar to the wolves than to dogs, dingo/NGSD and wolves sharing the non-reference homozygote type in many positions. The dogs were more heterogenous, with a large proportion of sites that were heterozygous, or homozygous for the reference variants. However, there was a large difference between the dogs from Southern East Asia and Europe. For all four genes, diversity was largest for the dogs from Southern East Asia, which had all three genotype variants in most positions. In contrast, for three of the genes, SLC5A1, TAS2R5, and Prss37, the European breed dogs were homozygous for the reference variant across almost the whole region. This suggests that selection has occurred in European breeds but not in the dogs from Southern East Asia, which would imply that these genes were not under selection during the domestication of the dog but during the later development of the European breeds. It also suggests that selection in these genes had not occurred for the ancestors of dingoes but occurred during the feralization of dingoes. Given the strong bottleneck in dingoes, we also performed simulations based on the inferred demographic history as null expectation for selection, with consistent results (Supplementary Figs. 13–15).

Table 1 The 13 genes under selection in both domestication and feralization. Full size table

Fig. 4: The result of iHS and PBS2 for three candidate regions. Blue curves indicate PBS2 value, blue dots indicate iHS value. Thick red horizontal line indicates range of gene, and thin black horizontal line gives the threshold. Full size image

Functional assay revealed that a mutation in dingo gives decreased enhancer activity for ARHGEF7

ARHGEF7 is related to neural function56 and may therefore be involved in behavior changes in the development from dog to dingo. We found an A-to-G mutation (chr 22: 59234593) within the ARHGEF7 gene, which had a very high allele frequency in dingoes (100%) and wolves (93.3%) compared to indigenous dogs from southern East Asia (32.5%). Furthermore, detailed bioinformatics analysis showed that the A-to-G mutation may influence the expression of ARHGEF7 since it is located in a transcription factor-binding site60. To test whether the SNP variants can actually affect expression of ARHGEF7, we performed dual-luciferase reporter experiments (enhancer assay using pGL3-promoter vectors) on two human cell lines (Daoy, human medullablastoma and HEK293, human embryonic kidney) and one canine cell line (MDCK, Madin-Darby Canine kidney). The analyses showed that all three cell lines displayed significantly lower enhancer activities for SNP-G than for SNP-A (Fig. 5), suggesting that SNP-G may confer decreased endogenous ARHGEF7 production.