Sample collection and sequencing

Four grey wolves from locations across Eurasia and three Chinese indigenous dogs from Southwest China were collected for this work (Fig. 1a). In addition, we also sequenced dogs from three breeds, one German Shepherd, one Belgium Malinois and one Tibetan Mastiff (Table 1). Of the four grey wolves and six dogs we sequenced, the effective throughput for each individual ranges from 8.92X to 13.56X (Supplementary Table S1). Sanger sequence data for the reference Boxer genome was also downloaded from the NCBI trace archive for subsequent analysis7.

Figure 1: Sampling and diversity information of the dog and wolf individuals. (a) The geographic locations for the four grey wolves (GW1–4), three Chinese indigenous dogs (dogCI1–3), two European dog breeds (dogGS: Germany Shepherd, and dogBM: Belgium Malinois), and one Tibetan Mastiff (dogTM) used in this study are indicated. (b) SNP and small indels overlapping between the three different populations, respectively, (wolves, Chinese indigenous dogs and dog breeds). (c) Low-diversity regions (LDRs) plotted across the genome for the grey wolf 1. The cutoff value for LDRs is 0.00005. (d) LDRs plotted across the genome for the Chinese indigenous dog 1. (e) LDRs plotted across the genome for the German shepherd. LDR plots for the other individuals are shown in the Supplementary Fig. S3. Full size image

Table 1 Sample and sequencing throughput for all 11 individuals. Full size table

After aligning the short reads to the reference genome, we identified single-nucleotide polymorphisms and small insertions and deletions (length <50) for all the individuals (Details of the data flow are presented in the Supplementary Fig. S1). Across the 11 individual genomes, a total of 13,923,223 SNPs were identified, of which 10,740,377 were found within the 4 wolves, 7,164,136 within the 3 Chinese indigenous dogs and 6,958,268 within the 4 breed dogs (Fig. 1b). A parallel analysis was also conducted for small indels, which yielded a similar pattern with the greatest number found in wolves and least within the breed dogs (Fig. 1b). Through experimental verification, we found current scheme in identifying variants maintains high levels of sensitivity with very limited amount of false positives. For example, we found that the overall false positive rate is less than 5% and for non-singleton polymorphism, genome-wide false negative is less than 10% (Supplementary Note 1).

Genetic diversity and population structure

Using the heterozygous sites called within a diploid organism, we performed a sliding window analysis of the genetic diversity θ (4 Nμ) along the genome for each individual. Interestingly, the genetic diversity shows a decreasing order from wild wolves, to Chinese indigenous dogs and then modern breeds (Table 1). This trend is most evident when we partition the genome into segments of very low diversity and plot this pattern across the genome (Fig. 1c–e). This decreasing order matches with the expectation from a two-stage history where Chinese indigenous dogs represent the groups following the first domestication event.

Using the phased genotypes, linkage disequilibrium, in terms of the correlation coefficient (r2), was calculated for wolves and the Chinese indigenous dog populations. As seen in Fig. 2a, linkage disequilibrium decreases rapidly for both wolves and the Chinese indigenous dogs. Within distances as short as 5 kb, levels of correlation decrease very rapidly to below 0.2, with this trend being slightly stronger in the wolves than in the Chinese indigenous dogs. The similarity in linkage disequilibrium observed here suggests that a relative weak population bottleneck might have occurred during dog domestication.

Figure 2: Population structure and principle component analysis. (a) Correlation coefficients (r2) were calculated for the wolf/dog populations over 50 kb windows. (b) Structure analysis on all the individuals with K=2. (c) Principle component plots for the first two PCs for all 11 individuals. Inset figure is a zoomed-in version of the dog group. (d) Principle component plot for 1203 canids including our data and individuals from a previous study4. The group 1 is the cluster of dogs that are closest to grey wolves. Full size image

Given the genotypes across the genomes, we did Bayesian clustering inferences by partitioning the individuals into K=2 and K=3 groups. As seen from Fig. 2b, when we try to cluster the individuals into two groups, the first cluster separates all of the grey wolves from the dogs. Interestingly, the Chinese indigenous dogs and the Tibetan Mastiff showed a closer relationship with the wolves. When we tried to partition the sample into three clusters, the analysis started to split the wolves into further groups, likely due to the higher distances within the wolves (Supplementary Fig. S4).

In order to further explore the relative relationships between these individuals, a principle component analysis with all the individuals were carried out. When plotting the first two principle components, dogs and wolves were separated as two distinct groups (Fig. 2c). Interestingly, all of the dogs clustered quite tightly together and distantly from the wolves, however, the Chinese dogs, including the Tibetan Mastiff, were located slightly closer to the wolves (Fig. 2c inset).

Previous studies, using SNP genotyping arrays, have surveyed the global distribution of genetic diversity across a large number of dogs and wolf-like canids. When we combined the sequenced individuals with the 1,191 canids surveyed previously5, we found that the Chinese native dogs, together with several dog breeds that originated from China/Southeast Asia, are among the first tier of individuals that is closest to the grey wolves (Fig. 2d). In addition, when we compared the Chinese indigenous groups with native dogs from other geographic regions (for example, African village dogs15), Chinese indigenous dogs are also found to be much closer to wolves than native dogs from other places surveyed to date (Supplementary Note 2). The close proximity of the Chinese indigenous dogs and breeds originated from Southeast Asia to grey wolves, together with the high genetic diversity observed in the Chinese native dogs, support a Southeast Asia origin for dogs9,10.

Demographic history

Using joint site frequency spectra generated after polarizing the polymorphisms with an outgroup species (a red wolf), we inferred the population demographic history under an isolation migration model16. As presented in Fig. 3, the effective population size for the wolf was found to have been relatively stable. The inferred effective population size for the extant wolf population is very similar to that inferred for the ancestral population, with the extant population being 94% of the size of the ancestral population. Interestingly, during domestication, the Chinese indigenous dog population experienced a mild bottleneck and the effective population size was reduced to 16% of the ancestral population size. Following the bottleneck, the population size has been steadily increasing to about 32% of that of the ancestral wolf population, which is largely consistent with the mild reduction in genetic diversity and the slight increase in linkage disequilibrium observed in the Chinese native dogs relative to the wolves.

Figure 3: Inferred demographic history for the wild wolves and the Chinese indigenous dogs. The extent and ancestral population sizes of two species are labelled. The migration rates between two populations are also labelled. As the current wolf’s average diversity θ is equal to 0.00141 (θ=4 N e μ) per kb and current wolves have an effective size that is 94% of the ancestral population, we estimated that the effective population size of the ancestral wolf to be around 53,000. Full size image

With an assumed mutation rate of 2.2 × 10−9 per year17 and a generation time of 3 years, the effective population size of dogs at the beginning of the bottleneck is found to be around 8,500 and the effective size of the extant Chinese indigenous dog population to be around 17,000. Compared with other domesticated species, which typically experienced a population shrinkage of several magnitudes18,19, this level of population size reduction is rather weak.

The population divergence time is estimated to be around 32,000 years ago, which is much older than previous estimates using mtDNA data9,10 (see discussion). The estimated migration rate is not very large either. The migration rate from wolves to dogs (M dw ) is slightly higher than that estimated for the other direction. The estimated migration rate is compatible with our observation that dogs and wolves exist as two rather disjoint clusters in the PCA and structure analysis, and is also in agreement with previous observations that introgressive hybridization between dogs and wild wolves is rare20. Behavioural or selective constraints imposed on these two groups might be the limiting factor contributing to the low level of gene flow20,21.

In order to access the statistical confidence in the estimated parameter values, we performed a non-parametric bootstrap test of the demographic history by resampling the SNPs to generate data sets of the same size with replacement. Under a variety of parameter settings, we found that the estimated values show a similar profile to that presented in Fig. 3 (see Methods as well as Supplementary Note 3), thus, the inferred demographic history shown here is supported with strong statistical confidence.

Putatively selected genes during dog domestication

As selection acting during the first stage of domestication should be shared among all dogs, we thus screened for candidate positively selected genes during dog domestication by looking for regions that show low diversity in all seven dogs and have high divergence between dogs and wolves. To avoid the possibility that a low-diversity segment was inherited from the wolf population, we filtered regions that showed relatively low diversity in wolves.

Using a set of stringent conditions for positive selection, we identified the top 1% of the genome that is expected to be enriched for genes bearing the signature of positive selection. This portion of the genome is distributed across 198 segments carrying a total of 311 genes (Supplementary Note 4, Table S6 and Fig. S11). It is worth pointing out that demographic factors also tend to generate genetic patterns that mimic traces of positive selection22. Thus, this candidate list is expected to be enriched for genes responsible for the domestication of the dog. When genes were analysed by their broad classification in the Gene Ontology, three major categories, namely reproduction, digestion and metabolism and neurological process stood out strongly (Table 2).

Table 2 Gene ontology analysis of the candidate selected genes. Full size table

Genes related to digestion and metabolism are particularly interesting. Multiple GO terms ranging from nutrient transport (for example, lipid) to the regulation of the digestion process (for example, cholesterol) are over-represented. An example of a gene that shows evidence of positive selection is the MGAM gene, an important maltase-glucoamylase in the final steps of starch digestion23. Along with the recent shared history between dogs and humans, in particular adopting an agricultural based living condition, large changes in the food source for dogs, during the transition from being a carnivore to an omnivore, might have been the driving force for the positive selection for these types of genes24.

The other interesting GO category is the neurological process. Genes associated with nerve cells themselves (for example, axon) and their connectivity (for example, neuron projection) are among the set of genes that are positively selected. Strong selection on behaviour (for example, reducing aggression) and neurological traits (for example, complex interactions with human beings) is often involved in the first steps of animal domestication25. Genes of this class thus might underlie the processes that led to the successful domestication of the dog (see later sections). In addition, quite a few genes involved in sensing local environmental stimuli, for example, sound (MYO3A) and smell (NCAM2 and OR2F1), are also on the list of selected genes. Large changes in the environment for dogs during domestication might have driven positive selection in these genes, some of which might reflect relaxed selective constraints on these proteins where loss of the activities of these genes is often adaptive (for example, less is more26).

Parallel selection in both human and dog

Humans and dogs both experienced a suite of similar environments in the recent past. Natural selection, driven by convergent environmental pressures, might thus have worked on a similar set of genes in the two genomes. Genome-wide scans for positive selection in humans have been conducted using a wide variety of methods and data sets27,28. For example, Akey22 compiled a collection of human genome regions that had been identified in at least two of nine different genome scans for positive selection22. To identify genes that may have been positively selected in parallel, we compared our list of positively selected genes in dogs with that from humans compiled in Akey22.

Among the orthologous gene pairs between human and dog (a total of 17,661 gene pairs), 1,708 positively selected genes were identified for humans and 233 genes were found for dogs. Comparing these two data sets, 32 genes exist in the overlapping set between the two species (1.4 fold enrichment at a marginal significance of 0.03). Table 3 highlights genes of particular interests, with a full list summarized and presented in Supplementary Note 5 and Table S8.

Table 3 Positively selected genes found in both humans and dogs. Full size table

A group of genes that appear to be under positive selection in both humans and dogs are those involved in digestion and metabolism. For example, two members of the ATP-binding cassette transporters superfamily, ABCG5 and ABCG8, which have pivotal roles in the selective transport of dietary cholesterol29, were found on both lists. As domestication has lead to drastic changes in the proportions of plant food, relative to animal food, natural selection on these genes in both species is expected due to this shared evolutionary history.

A second groups of genes selected in both species are those involved in neurological processes. An example of an interesting gene is SLC6A4, an integral membrane protein that transports the neurotransmitter serotonin30 and is a target of many psychomotor stimulants such as amphetamines and cocaine. Variation in this gene is responsible for a wide range of neurological pathogenic conditions such as aggressive behaviour31, obsessive-compulsive disorder32, depression and autism33,34. The most striking aspect is compulsive disorders, of which the two species share many similar phenotypes. Most interestingly, dogs respond similarly to the drugs that are used to treat humans (for example, clomipramine hydrochloride, a serotonin-reuptake inhibitor often also used as an anti-depressant drug), suggesting possible common genetic components for these behaviours in humans and dogs. Association studies have found that both the receptor and the downstream metabolite of SLC6A4 are correlated with aggressive behaviour in dogs35,36. The protein coded by SLC6A4 might underlie the genetic component of many neurological traits in both dogs and humans.

Aside from genes involved in metabolism and neurological processes, the other most prevalent class of genes that overlap between the two species is the cancer related genes. A good example is MET, the mesenchymal epithelial transition factor, which is an important proto-oncogene. Abnormal activation of the MET pathway leads to a variety of tumours. Many other cancer related genes, including those involved in the cell cycle and apoptotic pathways, are present in our shared list, and are further discussed in Supplementary Note 5.