To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.

The process of dog domestication is still poorly understood, largely because no studies thus far have leveraged deeply sequenced whole genomes from wolves and dogs to simultaneously evaluate support for the proposed source regions: East Asia, the Middle East, and Europe. To investigate dog origins, we sequence three wolf genomes from the putative centers of origin, two basal dog breeds (Basenji and Dingo), and a golden jackal as an outgroup. We find that none of the wolf lineages from the hypothesized domestication centers is supported as the source lineage for dogs, and that dogs and wolves diverged 11,000–16,000 years ago in a process involving extensive admixture and that was followed by a bottleneck in wolves. In addition, we investigate the amylase (AMY2B) gene family expansion in dogs, which has recently been suggested as being critical to domestication in response to increased dietary starch. We find standing variation in AMY2B copy number in wolves and show that some breeds, such as Dingo and Husky, lack the AMY2B expansion. This suggests that, at the beginning of the domestication process, dogs may have been characterized by a more carnivorous diet than their modern day counterparts, a diet held in common with early hunter-gatherers.

Funding: This work was supported by an NSF Postdoctoral Fellowship DBI-0905784 (AHF), NSF Graduate Research Fellowship (RMS), Searle Foundation Scholar Award (JN), NSF grant EF-1021397 (JN RKW AHF RMS EH), NIH T32 HG002536 (EH), NIH (NIGMS) grant GM102192 (IG AS), UC MEXUS-CONACYT doctoral fellowship 213627 (DODV), PhD grant from Fundação para a Ciência e a Tecnologia (Portugal) SFRH/BD/60549/2009 (PMS), PhD Grant from University of Bologna (Italy), XXIV cicle, Biodiversity and Evolution (MG), National Science and Technology Support Project of China 2012BAC01B06 (ZF), Rosztoczy Foundation (PM), USHHS Ruth L. Kirschstein Institutional National Research Service Award #T32 CA009056 (KS), National Institute of Bioinformatics (BLG), Intramural Program of the National Human Genome Research Institute (HB HGP EAO), JAEDOC-CSIC (EFS) (OR), Marie Curie CIG PCIG10-GA-2011-303772 (CA) Programa de Captación del Conocimiento para Andalucía­a for sequencing of Chinese wolf at BGI, ERC Starting Grant 260372 and MICINN (Spain) BFU2011-28549 (TMB), and NSF grant DEB-0948510 (ARB CDB). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

We chose to sequence a small number of individual genomes to high coverage, rather than larger numbers of (pooled) individuals at low coverage, to take advantage of recently developed demography inference methods based on small numbers of high quality genomes [18] – [20] . These methods allowed us to disentangle the effects of incomplete lineage sorting (ILS)–the discordance from the population phylogeny at individual loci resulting from deep coalescence–and post-divergence gene flow, which pose a particular challenge in analysis of such recently diverged species as dogs and wolves [21] . Combining the results of multiple complementary methods provided us with an integrated, robust view of the shared history of dogs and wolves, including population divergence times, ancestral population sizes, and rates of gene flow. Using polymorphism data from 10 million single-nucleotide variant sites, we investigated: 1) the size of the ancestral wolf population at the time of wolf/dog divergence; 2) the geographic origins and timing of dog domestication; 3) post-divergence admixture between dogs and wolves; and 4) lineage-specific characteristics of the recently discovered dog-specific AMY2B expansion [15] .

The three wolves were chosen to represent the broad regions of Eurasia where domestication is hypothesized to have taken place (Europe, the Middle East, and East/Southeast Asia) [6] , and specifically, were sampled from Croatia, Israel, and China ( Figure 1 ). The Dingo and Basenji represent divergent lineages relative to the reference Boxer genome [10] and maximize the opportunity to capture distinct alleles present in the earliest dogs. These lineages are also geographically distinct, with modern Basenjis tracing their history to hunting dogs of western Africa, while Dingoes are free-living semi-feral dogs of Australia that arrived there at least 3,500 years ago ( Figure 1 ) [17] . As a result of their geographic isolation, the natural range of wolves has never extended as far south as the geographic sources for these two dog lineages [6] , thus they are less likely to have overlapped and admixed with wolves in the recent past. Sequencing the golden jackal in principle allows us to infer the ancestral state of variants arising in dogs and wolves ( Text S1 , S2 ), though in practice this was complicated by the observation of wolf-jackal admixture (see below). For some analyses, we also leverage data from a companion study of 12 additional dog breeds ( Text S1 ).

Gray wolves have been dominant predators across Eurasia and North America, often exerting top-down impacts on the ecological communities they inhabit [1] , [2] . As humans expanded out of Africa into Eurasia, they came into contact with gray wolves and, through a complex and poorly understood process, dogs emerged as the first human companion species and the only large carnivore to ever be domesticated. Archaeological evidence provides partial clues about dog origins. For example, dog-like canids first appear in the fossil record as early as 33,000 years ago in Siberia [3] . However, it is not clear if these proto-dog fossils are ancestral to living dogs or instead represent failed domestication attempts or simply morphologically distinct wolves [3] . Similarly, the geographic origin of dogs is uncertain, with distinct lines of evidence supporting Southeast Asia, the Middle East, and Europe as potential domestication centers, and ruling out Africa, Australia, and North America [4] – [10] . Nonetheless, several recent studies have begun to illuminate the genetic basis of traits that changed during dog domestication and breed formation, advancing the general understanding of how genetic mechanisms shape phenotypic trait diversity [11] – [14] . For example, a recent study found an increase in copy number of the amylase gene (AMY2B) during dog domestication suggesting adaptation to starch-rich diets [15] . Given the unique behavioral adaptations of dogs, including docility and the ability to form social bonds with humans [16] , comparative genomics analyses of dogs and wolves holds great promise for identifying genetic loci involved in complex behavioral traits [14] . However, the demographic context of selection must first be understood to determine how it may have affected patterns of genetic divergence between dogs and wolves.

Results

Ancestral population sizes of dogs and wolves Genome-wide patterns of heterozygosity provide useful information on long-term effective population sizes. The mean heterozygosity rates (per nucleotide position) observed in the genome sequences of the Basenji and Dingo were 9×10−4 and 6×10−4, respectively (Figure 3A, Table S6), consistent with a rate of 6×10−4 previously observed in modern dog breeds [22], and considerably smaller than the rates observed in the three wolf genomes (1.2×10−3–1.6×10−3). This twofold reduction in heterozygosity observed in dogs relative to wolves can be superficially interpreted to reflect a relatively weak two-fold reduction in effective population size of dogs relative to their ancestors, assuming that genetic variation in modern wolves is representative of the ancestral population. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Heterozygosity and historical changes in effective population size. (A) Box plots of heterozygosity measured in 5000 100 kb windows for each sample. (B) Reconstruction of historical patterns of effective population size (N e ) for individual genome sequences. Based upon the genomic distribution of heterozygous sites using the pairwise sequential Markovian coalescent (PSMC) method of Li and Durbin 2011 [20]. Time scale on the x-axis is calculated assuming a mutation rate of 1×10−8 per generation (see Text S8); estimates from the full data and 50 bootstraps are depicted by darker and lighter lines, respectively. https://doi.org/10.1371/journal.pgen.1004016.g003 To better understand the changes in ancestral population sizes that influenced dogs and wolves, we employed the pairwise sequential Markovian coalescent (PSMC) method [20]. This method infers ancestral effective population sizes (N e ) over time, based on a probabilistic model of coalescence with recombination and changes in heterozygosity rates along a single diploid genome. We applied PSMC to each of the five genomes (Figure 3B, Text S8) and converted the mutation-scaled estimates of time (to years) and population size (to numbers of individuals) by assuming an average mutation rate per generation of μ = 1×10−8 and an average generation time of three years [22],[25] (see Discussion). The inferred tracks of ancestral N e in dogs show a population decline of at least 16-fold over the past 50 thousand years, from greater than 32,000 individuals (ancestral N e for Basenji lineage: 32,100–35,500; for Dingo lineage: 32,500–37,400 95% bootstrap CI) to less than 2,000 individuals (Basenji lineage: 1640–1980 at 4,000 years ago; Dingo lineage: 704–1042 at 3,000 years ago). Interestingly, wolves also show a considerable, yet milder, 3-fold reduction in effective population size to present estimates between 10,000 and 17,000 for the three wolf samples. For clarity, we note that with PSMC the population size trajectories are effective sizes for the lineages that eventually lead to the canid samples as they are known today (e.g. as Basenji or as Dingo) and that looking backwards in time eventually trace back to the common ancestral lineage of dogs and wolves. Our observations do not appear to be biased by very recent inbreeding in dogs and wolves, as we found that runs of homozygosity do not affect our inferences of ancestral N e (Text S8). These results indicate the ancestral wolf population from which dogs were domesticated was considerably larger than estimated from current levels of diversity in wolves and suggest that simple comparisons of nucleotide diversity in present-day dogs and wolves lead to substantial underestimates of the severity of the bottleneck in dogs.

Phylogenetic relationships and admixture between dogs and wolves Individual genome sequences include valuable information about phylogenetic relationships between our samples. However, interpretation of these phylogenetic signals is challenging due to the possibility of post-divergence gene flow between dogs and wolves, as well as ILS, which is an expected consequence of the large ancestral population sizes inferred by PSMC. Indeed, we observed predominant ancestral polymorphism in our data: for variant sites with no missing data, and where variants were observed in dogs or wolves, 32.0% of variant sites were shared across dogs and wolves, 47.3% were private to wolves, 20.2% were private to dogs, and only 0.5% were fixed between dogs and wolves (Table S3). Pairwise sequence divergence captures mean coalescent times that are robust to both ILS and moderate levels of gene flow (see below). Thus, to provide accurate estimates of phylogeny given these demographic processes, we constructed a neighbor-joining (NJ) tree from a conservative estimator of genome-wide pairwise sequence divergence for all pairs in our seven genomes, including the Boxer reference and using the golden jackal as an outgroup (Figure 4A, Text S8, Table S7). Bootstrap support for all nodes was 100%, with dogs and wolves recovered as monophyletic sister clades. Surprisingly, the Boxer reference is only slightly more divergent from the three wolf genomes than it is from the two dog genomes. To evaluate the robustness of our phylogenetic inference, we also constructed a NJ tree using an estimator of sequence divergence for which all possible mismatches between alleles from a pair of individuals are counted (Table S8). The consensus tree based on this metric places the Chinese wolf at a position sister to a clade of our other wolf and dog samples (Figure S1), but the bootstrap support for this relationship is low (54%), suggesting poorer resolution with this estimator. Importantly, both approaches and additional phylogenetic analyses strongly support the hypothesis of dogs forming a distinct clade (Text S8, Tables S9, S10). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. Neighbor-joining tree and admixture signatures from ABBA/BABA tests. (A) NJ tree constructed from genome-wide pairwise divergence, calculated using equation E8.1 in . All nodes have 100% bootstrap support. Dashed lines indicate admixture edges that were statistically significant in ABBA/BABA tests. (B) ABBA/BABA tests with significant Z-scores (values ≥3 are significant). All comparisons made are shown in Table S11. For each row, boldfaced labels indicate admixing lineages. https://doi.org/10.1371/journal.pgen.1004016.g004 One important factor that could complicate inference of divergence between dogs and wolves is post-divergence gene flow. To examine admixture in our sampled genomes, we employed the nonparametric ‘ABBA-BABA’ test for gene flow between two divergent populations, such as humans and Neandertals [26], from individual genome sequences. This method tallies site patterns for four taxa, compares them to those expected under an assumed phylogeny and then uses this comparison to identify significant pattern asymmetries that cannot be explained by ILS or sequencing errors. We applied this test to all dog-wolf sample pairs, using the golden jackal as an outgroup and one of the other four samples as an additional ingroup (Text S8). We found significant evidence of admixture for three population pairs: Israeli wolf and Basenji, Chinese wolf and Dingo, and Israeli wolf and Boxer (Figure 4B, see also Table S11). Care should be taken in interpreting these results, as the detected admixture signals may reflect gene flow between lineages ancestral to our contemporary samples. The signal for Chinese wolf and Dingo likely represents ancient admixture in Eastern Eurasia, and the signal observed for Israeli wolf, Basenji, and Boxer likely represents ancestral admixture that occurred in western Eurasia. The resulting phylogeny with admixture edges (Figure 4A) is used as the starting point for a more comprehensive examination of joint demographic model for dogs and wolves.

A complete demographic model for dogs and wolves We next inferred a complete demographic model for dogs and wolves, including population divergence times, ancestral population sizes, and rates of post-divergence gene flow by jointly analyzing all seven genomes using the Generalized Phylogenetic Coalescent Sampler (G-PhoCS) [19], a recently developed Bayesian demographic inference method. The method is based on a full coalescent-based probabilistic model that considers both ILS (by modeling ancestral population size) and post-divergence gene flow (by allowing lineages to migrate between populations through designated migration bands). G-PhoCS conditions its inference on a given population phylogeny, and uses information on local genealogies at a large number of short, unlinked, neutrally evolving loci to generate samples of demographic parameters from an approximate posterior distribution. We applied G-PhoCS to a multiple sequence alignment of the six genomes and Boxer reference in 16,434 carefully filtered putative neutral autosomal loci using the NJ tree to indicate the topology of the population phylogeny (Text S9, see discussion on alternative topologies below). Initially, we considered various migration bands with significant signatures of gene flow (Text S9). We found evidence of bi-directional gene flow between Israeli wolf and Basenji, as well as Chinese wolf and Dingo, consistent with our findings from the non-parametric ABBA-BABA test. Interestingly, the joint analysis of all genomes indicated that admixture inferred by the ABBA-BABA test for the Israeli wolf and the Boxer is likely a result of gene flow from a population ancestral to Basenji into a population ancestral to Israeli wolves. We base this conclusion on the observation that there is no significant signature of admixture between Boxer and Israeli wolf in the ABBA-BABA test or the G-PhoCS inference when Basenji is also included in the analysis. Using G-PhoCS we were also able to examine signatures of admixture in the jackal outgroup, which cannot be detected using the ABBA-BABA test, and found significant gene flow between the golden jackal and the Israeli wolf, as well as the population ancestral to all dog and wolf samples. Our divergence time estimates imply that dogs and wolves diverged 14.9 thousand years ago (kya) with 13.9–15.9 kya Bayesian 95% credible interval (CI), assuming an average mutation rate per generation of μ = 1×10−8 and three years per generation (Figure 5A). Divergence times between wolf populations were tightly clustered at 13.4 kya (11.7–15.1 kya), and divergence between dogs was estimated to have occurred slightly more recently, at 12.8 kya (11.8–13.7 kya; divergence of Dingo) and 12.1 kya (10.9–13.1 kya; divergence between Boxer and Basenji). Interestingly, we inferred a divergence time of 398 kya (382–415 kya) between the golden jackal and the population ancestral to dogs and wolves, which is considerably more recent than previously reported [27]. To validate this finding, we ensured that our estimates appropriately account for ancestral gene flow into the golden jackal population (Text S9) and validated the position of our sample within the golden jackal lineage by comparing polymorphism data from that genome to a larger panel of wolves and jackals (Text S5, S11). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 5. Demographic model of domestication. Divergence times, effective population sizes (N e ), and post-divergence gene flow inferred by G-PhoCS in joint analysis of the Boxer reference genome, and the sequenced genomes of two basal dog breeds, three wolves, and a golden jackal. The width of each population branch is proportional to inferred population size, and stated ranges of parameter estimates indicate 95% Bayesian credible intervals. Horizontal gray dashed lines indicate timing of lineage divergences, with associated means in bold, and 95% credible intervals in parentheses. Migration bands are shown in green with associated values indicating estimates of total migration rates, which equal the probability that a lineage will migrate through the band during the time period when the two populations co-occur. Panels show parameter estimates for (A) the population tree best supported by genome-wide sequence divergence (Fig. 4A) (B) a regional domestication model, and (C) a single wolf lineage origin model in which dogs diverged most recently from the Israeli wolf lineage (similar star-like divergences are found assuming alternative choices for the single wolf ancestor. Estimated divergence times and effective population sizes are calibrated assuming an average mutation rate of 1×10−8 substitutions per generation and an average generation time of three years. See Text S9 and Table S12 for details. https://doi.org/10.1371/journal.pgen.1004016.g005 G-PhoCS produced estimates of ancestral effective population sizes compatible with the ones inferred by PSMC, with a large effective population size of 45,000 individuals (44,200–44,800) for the population ancestral to dogs and wolves, followed by a 22-fold reduction to 2,000 individuals (700–3,200) in the population ancestral to all dogs, and a more moderate 3.6-fold reduction to 12,600 individuals (1,000–25,000) in the population ancestral to all wolves. As with our inferences based on PSMC, we estimate a far more severe domestication bottleneck than previously reported [22], [23]. The main discrepancy between PSMC and G-PhoCS concerns the timing of these changes. G-PhoCS associates this reduction in N e with the divergence between dogs and wolves at around 15 kya, whereas PSMC infers a gradual population decline starting as early as 50 kya (Figure 3B). As PSMC is based upon the density of heterozygous sites within the genome sequence of an individual, it does not directly infer divergence times. However, one can informally estimate them as the points when N e trajectories that are overlapping diverge moving forward in time towards the present. The discrepancy between G-PhoCS and PSMC reflects the distinct models used by these methods: G-PhoCS assumes a constant population size for every branch of the phylogeny, which prevents it from characterizing gradual changes in population size, whereas PSMC tends to produce smoothed traces of ancestral N e , which may limit its ability to capture rapid population bottlenecks. To test which of the inferred models has a better fit to the data, we simulated data under both models, and then used each method to analyze the data simulated under the model inferred by the other method (Text S8, S9). These two reciprocal tests indicated that the early and gradual population decline inferred by PSMC is compatible with a more recent dramatic reduction (Text S8, Figure S2), and that divergence time estimates of G-PhoCS were not compromised by its inability to model gradual changes in N e (Figure S3). Both results support the demographic model inferred by G-PhoCS, which has a relatively recent divergence between dogs and wolves followed by a dramatic reduction in population size. We additionally validated the robustness of our demographic parameter estimates under the set of loci chosen for the analysis as well as assumptions made on intra-locus recombination (Text S9).

Alternative models for dog domestication The demographic model we inferred using G-PhoCS reflects the population phylogeny estimated in the NJ analysis. To validate the robustness of our inference to this assumption, we considered a series of alternative topologies that correspond to plausible scenarios of the shared histories dogs and wolves. When we assume a model in which each dog population originated from the wolf population corresponding to its geographic origin (a model of regional domestication, e.g. Figure 5B), G-PhoCS infers extremely large rates of post-divergence gene flow between dogs and between wolves. For instance, the total rate of gene flow from Basenji to Boxer is inferred to be mtot = 1.24 (0.93–1.59, 95% Bayesian CI), implying a probability near 100% for any Boxer lineage to have migrated from a population ancestral to Basenji. Total rates above 30% were inferred for additional migration bands, such as Basenji-to-Dingo (0.47), Croatian-to-Israeli wolf (0.33), and Croatian-to-Chinese wolf (0.33) (Figure S4). In terms of the number of migrants per generation (4N e m), these estimates translate into 0.26 (CI: 0.15–0.38), 4.48 (CI: 2.52–6.36), and 0.89 (CI: 0.56–1.23), reflecting large amounts of gene flow, which is unlikely given historical separation of these geographically distinct populations. In contrast, the migration rates estimated in our original inference were considerably lower, with nearly all total rates falling below 10% (Figure 5, Text S9, Table S12), indicating a better fit of that topology to the data. Another set of alternative topologies we examined is one in which the dog clade originates from one of the four branches in the wolf sub-phylogeny (e.g. Figure 5C). Assuming such topologies, G-PhoCS infers that dogs diverged from wolves less than 200 years after wolves diverged from each other (Figure S5), whereas in the original inference conditioned on the NJ tree, the divergence between dogs and wolves was estimated to have occurred 1,400 years before the divergence between wolf populations. All other parameter estimates were not significantly affected by the choice of origin population for the dog clade. Thus regardless of our assumptions on the identity of the wolf population from which dogs originated, we infer that dogs diverged from the sampled wolf populations at about the same time these wolf populations diverged from each other. Additionally, the greater difference between estimated divergence times in our original analysis provides some support for our initial assumption that dogs and wolves form sister clades.

Assessment of models in lights of site configuration statistics Because G-PhoCS does not yet support statistical tests for model selection, we assessed relative support for the alternative models by performing simulations under each model, and comparing the simulated and real data with respect to a series of site configuration statistics informative about the topologies of local genealogies. For every quartet in our sample set that contains the jackal outgroup, we computed the relative frequencies of bi-allelic sites in which each of the two alleles (denoted A and B) is present in exactly two of the four individuals. Similar statistics are used in the ABBA-BABA test for admixture, but in our case we were also interested in the frequency of the BBAA configuration, which is the one compatible with the topology of the assumed phylogeny (see Text S8 for more information). We compared frequencies of the three configurations in 20 quartets observed in our data with those observed in data simulated under the three demographic models shown in Figure 5, denoted as “dog/wolf reciprocal monophyly” (Figure 5A), “regional domestication” (Figure 5B), and “ISW-source” (Figure 5C). This comparison allowed us to draw conclusions regarding the fit of each of these models to the data with respect to the distribution of local genealogies it implies (Table S10). The three models appeared to be fairly compatible with the data overall, with the reciprocal monophyly model showing the lowest discrepancy (absolute error = 0.43), followed closely by the ISW-source model (absolute error = 0.47) and then trailed by the regional domestication model (absolute error = 0.82). The regional domestication model showed the largest discrepancy in quartets including Dingo and at least one other dog, indicating considerably weaker support for the dog clade and its internal structure than present in the data. This implies that the patterns of sequence similarity between dogs are more compatible with a distinct dog clade than they are with similarity solely generated by gene flow between the different dog lineages. The ISW-source model showed high discrepancy in quartets containing the Croatian and Israeli wolves, indicating that the model has problems capturing the phylogenetic relationships between those wolves and the dogs. The reciprocal monophyly model provided the best fit to the data, but it did show some discrepancy in quartets containing both the Dingo and the Chinese wolf. This is perhaps related to the large credible intervals for the rates of gene flow between these populations in the G-PhoCS inference (CHW→DNG, 0–6%; DNG→CHW, 2–6%). In conclusion, these tests show that topological signatures in the data provide strong support for a monophyletic dog clade and somewhat weaker support for a monophyletic wolf clade.