Significance Dogs were the first domesticated species, but the precise timing and location of domestication are hotly debated. Using genomic data from 5,392 dogs, including a global set of 549 village dogs, we find strong evidence that dogs were domesticated in Central Asia, perhaps near present-day Nepal and Mongolia. Dogs in nearby regions (e.g., East Asia, India, and Southwest Asia) contain high levels of genetic diversity due to their proximity to Central Asia and large population sizes. Indigenous dog populations in the Neotropics and South Pacific have been largely replaced by European dogs, whereas those in Africa show varying degrees of European vs. indigenous African ancestry.

Abstract Dogs were the first domesticated species, originating at least 15,000 y ago from Eurasian gray wolves. Dogs today consist primarily of two specialized groups—a diverse set of nearly 400 pure breeds and a far more populous group of free-ranging animals adapted to a human commensal lifestyle (village dogs). Village dogs are more genetically diverse and geographically widespread than purebred dogs making them vital for unraveling dog population history. Using a semicustom 185,805-marker genotyping array, we conducted a large-scale survey of autosomal, mitochondrial, and Y chromosome diversity in 4,676 purebred dogs from 161 breeds and 549 village dogs from 38 countries. Geographic structure shows both isolation and gene flow have shaped genetic diversity in village dog populations. Some populations (notably those in the Neotropics and the South Pacific) are almost completely derived from European stock, whereas others are clearly admixed between indigenous and European dogs. Importantly, many populations—including those of Vietnam, India, and Egypt—show minimal evidence of European admixture. These populations exhibit a clear gradient of short-range linkage disequilibrium consistent with a Central Asian domestication origin.

The domestic dog, Canis lupus familiaris, is found living with and around humans throughout the globe. Selective breeding of dogs has been practiced for thousands of years, but the majority of modern breeds are less than 200 y old and of European ancestry (1, 2). Most dogs in the world are not purebred or even mixed-breed dogs, but rather belong to free-breeding human-commensal populations (“village dogs”) (1, 3, 4). The history and lineage of most modern breeds is well established (5, 6), but the genetic relationships among village dog populations and between village dogs and breeds is less understood.

Global surveys of mitochondrial and Y chromosome diversity in dogs have concluded that domestication occurred in southern China less than 16,500 yBP (7⇓⇓–10). In contrast, the earliest archeological evidence for dog-like canids occurs in Europe and Siberia, and Mt haplotypes found in ancient and modern gray wolves appear to be consistent with an origin of dogs from European wolves (11). These conflicting observations could be due to demographic processes after domestication (bottlenecks, migration, and admixture), altering patterns of genetic diversity or simply a consequence of a sparse archeological record in East Asia during this period. Archeologists and geneticists agree that dogs evolved from Eurasian gray wolves at least 15,000 yBP (2), but precise determination of the domestication origin(s) is elusive.

Whereas the Y and Mt chromosomes are just two inherited loci, autosomal markers offer a vastly richer picture of the patterning of genetic variation genome-wide and better resolution for demographic inference. Efforts to identify the basis of phenotypic diversity and genetic diseases in dogs have yielded large genomic datasets of purebred dogs readily available for demographic inference (6, 12, 13). Genomic comparisons of purebred dogs and wolves show Middle Eastern wolves have more haplotype sharing with dogs than other wolf populations (6), but this is likely due to dog-wolf introgression in the Middle East (14) rather than an indication of Middle Eastern origins.

Inference of early population history using purebred dogs is hampered by the confounding effects of artificial selection and bottlenecks and by the relative dearth of breeds without European ancestry. Genetic analyses have identified fewer than 20 “basal” breeds that have remained isolated enough from modern admixture to retain genetic signatures reflecting their geographic origins, and most of these lineages were severely depleted by genetic bottlenecks in the modern era (1, 2, 6). Whereas patterns of linkage disequilibrium (LD) in people can be used to trace human origins to Africa (15), similar analyses in purebred dogs show LD patterns dominated by breed-specific bottlenecks without any spatial trends to suggest a domestication origin, even in basal breeds (12, 16⇓–18).

Because village dogs are geographically widespread and genetically diverse, they can be highly informative of dog population history if recent admixture with foreign dogs is minimal (4, 19, 20). Bottlenecks and artificial selection have drastically skewed genetic diversity within breeds, but the larger effective population size ( N e ) of village dogs make them a better reflection of the genetic structure present in dogs before the modern era (21). Village dog populations that are relatively free of admixture should show genetic signatures reflecting the origins and movement of early dogs [and humans (22)], including the spread of pastoralism into Europe, the Bantu expansion in Africa, the peopling of the Americas, the settlement of the Pacific, and, most recently, European colonialism throughout the Americas and elsewhere.

Village dogs and local breeds represent an important but underused resource for disentangling the complicated evolutionary history of dogs. To this end, we genotyped a diverse panel of 549 village dogs from 38 countries and 4,676 purebred dogs from 161 breeds on a semicustom Illumina CanineHD array consisting of 185,805 markers, including 582 and 336 Mt and Y markers, respectively (13). We combined this with existing Mt and array data (6, 8, 11, 13, 23⇓⇓⇓–27) to amass the largest canine diversity panel assembled to date, allowing efficient comparison of Y, Mt, and autosomal loci to evaluate the forces patterning genetic variation in diverse dog populations.

Discussion This study represents the largest-ever survey of worldwide canine genetic diversity using nuclear, Y, and Mt markers. We confirm high diversity and low LD in village compared with purebred dogs (4, 12, 19) and show how village dog populations improve inference of dog evolutionary history. This increased geographic and genetic resolution reveals the effects of bottlenecks and admixture in extant populations, as well as evidence for an origin of dogs in Central Asia. Like previous studies, we find high levels of Mt and Y haplotype diversity in East Asia (8⇓–10, 29, 34), but we also find high levels of Mt and Y diversity in India and Southwest Asia, respectively (Table 1). Whereas previous studies have used the high levels of uni-parentally inherited haplotype diversity as evidence for an East Asian, specifically Southern Chinese, origin for dogs, genome-wide LD patterns among populations suggests a different process. Namely, domestication occurred in Central Asia where early dogs carrying nearly the full complement of Mt and Y haplotypes spread to nearby Asian regions, including Afghanistan, India, and Vietnam. The substantial N e in these regions, particularly East Asia, allowed these haplogroups to survive and diversify to a greater extent than in Central Asia. Higher N e in East vs. Central Asia is supported both by census estimates (35) and by the more negative slope of the LD decay curve in East vs. Central Asia (Fig. 5), because recent population history has a greater impact on long-range vs. short-range LD (30, 33). Gray wolves were clearly present in Central Asia during the Mesolithic, and both wolves and human hunter-gatherers were exploiting large mammals during this time (36). Increasing human population density, blade and hunting technology, and/or climate change during the Late Paleolithic in Central Asia (28) may have altered prey densities and made scavenging crucial to the survival of some wolf populations. Adaptations to scavenging such as tameness, small body size, and a decreased age of reproduction would reduce hunting efficiency further, eventually leading to obligate scavenging (37). Whether these earliest dogs were simply human-commensal scavengers or they played some role as companions or hunters that hastened their spread is uncertain, but clearly adaptation to conditions outside this initial domestication origin [e.g., efficient starch digestion (38) and aseasonal breeding (39, 40)] has also been important in dog evolution. Although SNP array data are poorly suited for estimating the timing of ancient population events, it does shed light on the conflicting estimates of dog origins in previous genetic studies. Because there is incomplete lineage sorting between dogs and wolves, estimates based on Mt or Y haplotype diversity are sensitive to assumptions regarding the number of founder haplotypes in early dogs (8, 41). Nuclear datasets offer better resolution for parameterizing demographic models, but two such studies have yielded widely varying results [14 vs. 32 kya (14, 42)]. Our LD data support a relatively strong domestication bottleneck in dogs followed by substantial population expansion, particularly in East Asia. An ancient origin with a weak domestication bottleneck and small current N e in Asian village dogs is also consistent with the allele frequency data in ref. 42, but a more recent, stronger domestication bottleneck, and large current N e could be consistent with both allele frequency data and LD decay rates, bringing the inferred timing of dog origins more in line with archeological estimates. Central Asia has been considered a likely domestication origin for dogs by some archeologists (43), but it has been poorly represented in previous genetic studies of dog origins. The pattern of reduced short-range LD in populations near Central Asia is most parsimoniously explained by an origin of dogs somewhere in this region, but we cannot rule out the possibility that dogs were domesticated elsewhere and subsequently, either through migration or a separate domestication event, arrived and diversified in Central Asia. For example, European dog populations have undergone extensive turnover over the last 15,000 y (44), erasing the genomic signatures of early European population history. Although it is difficult to explain the clear gradient of short-range LD out of Central Asia if dogs were domesticated from a far-flung region, studies of extant dogs cannot exclude the possibility of earlier domestication events that subsequently died out, or were overwhelmed by more modern populations. Further analysis of diverse dogs throughout Central Asia and surrounding regions is crucial for precisely resolving the origin and history of early dogs. Refining the timing of dog domestication could yield substantial insights into the process by which dogs became domesticated and the wolf and human population(s) involved. Ancient DNA analysis will surely contribute to our understanding of early dog populations, but where ancient specimens are unavailable, village dogs are often the best proxy we have to ancient populations. Many indigenous populations have already succumbed to swamping gene flow from foreign dogs, so further work characterizing remaining indigenous populations genetically, morphologically, and behaviorally, is vital for building an improved understanding of dog evolutionary history.

Methods Sample Collection. The majority of samples used in this study come from blood stored in the Cornell Veterinary Biobank collected in accordance with Cornell animal care protocols 2005-0151 and 2011-0061. These samples include 4,676 purebred dogs from 161 breeds, 167 mixed breed dogs, and 549 village dogs from 38 different countries (SI Appendix, Table S11). Blood was stored in EDTA, and DNA was extracted by salt precipitation. Genotyping. Samples were genotyped on a semicustom Illumina SNP array containing 173,662 SNPs from the CanineHD array (13) and 12,143 markers identified using whole genome sequencing (45). A total of 166,171 markers remained after filtering markers with > 5 % missing data, discordant genotypes between technical replicates, or extreme divergence from Hardy–Weinberg expectations (HWEs; observed heterozygosity vs. HWE ratio <0.25 or >1.0). Genotype and geographical data have been deposited in Dryad (datadryad.org, doi:10.5061/dryad.v9t5h). PCA. PCA of unrelated village dogs was run using the smartpca program distributed in the Eigenstrat v5.0.1 software package (46). Village dogs were used to define the PCA space, and Basenjis, Carolina Dogs, New Guinea Singing Dogs (NGSDs), and one dog each from other breeds were projected onto it. Haplotype Analysis. The array included 582 Mt markers, of which 367 were polymorphic and passed quality control filtering. Additionally, seven markers that introduced multiple cycles in the haplotype network suggesting genotyping error were removed. We added 431 additional dogs with published complete Mt sequences (8, 11, 23⇓⇓⇓–27) based on their genotypes at the marker positions. Haplotypes were named according to published convention (8), with some published haplotypes mapping to multiple haplotypes in this study due to the markers we used outside the control region. These haplotypes were split and are indicated with a letter. Conversely, some of the sequenced haplotypes are identical across all 360 marker positions on the array and are included as a single combined haplotype (e.g., C1_2). The array included 336 Y chromosome markers, of which 207 were polymorphic and passed quality control filtering. One of these was removed because it introduced several cycles in the haplotype network. Haplotypes were named to correspond with Ding et al. (10), subdividing haplotypes to account for our enriched marker set. Haplotype networks were constructed in R v3.1.0 using the Ape and Pegas packages (47⇓–49). The distance matrix was calculated based on the count of differences, and then a minimum spanning forest was calculated. Networks were visualized in R using the igraph package (50). We defined haplogroups as groups of haplotypes at least 2 SDs further apart than the average distance between haplotypes, as measured by number of differences at the array marker positions. Regional haplotype diversity (H) was computed with regions defined by geography and PCA. To control for sample size differences, we subsampled ( N = 20 ) dogs 100 times within regions and counted the number of observed haplotypes. LD Decay. LD is a reflection of N e , with LD at proximate SNPs reflecting historic N e and LD at distant SNPs reflecting N e in more recent times (30, 33). To ensure estimates of LD were not biased by particular individuals or by the choice of sample size, we used the PLINK 1.0.7 (51) --genome command to remove related ( π ^ outliers) and the --het command to remove inbred ( F > 0.25 ) individuals. We then performed two parallel analyses, one retaining 6 individuals per population and one retaining 20, randomly selecting the individuals 100 times to compute means and SEs. LD was calculated using the command --maf 0.3 --r2 --ld-window 999 --ld-window-r2 0 --ld-window-kb 200, and averaging within bins based on inter-SNP distance was performed using a C script (12). Admixture. Ancestry of individual dogs was determined using ADMIXTURE software (52). For a global view of village dog ancestry, we included all unrelated individuals from NGSDs, Basenjis, Carolina dogs, and village dogs, and a single individual from select dog breeds. For each K, 10 replicates were run using a different random seed. The replicate with the lowest cross-validation score for each K is reported. To estimate historical relationships between populations we used TreeMix (53). We built admixture trees for dog breeds with gray wolves as the root, and for village dogs with coyotes as the root. For the village dog tree, we combined our data with previously published wolf and coyote Affymetrix v2 data (13). Only SNPs genotyped on both arrays and passing quality control were included (36,358 total). Trees were calculated using a range of numbers of migration events (m); we report the trees where further migration edges do not appreciably improve the fit. With the same populations, we calculated pairwise F s t values using a custom C script. We formally tested for admixture (indigenous vs. European) for African and Pacific Island dogs by computing f3 statistics using the Admixtools package (54) for each population with Europe as one source population and Basenji, Vietnam, or Borneo as the other. Populations were the same as for the village dog TreeMix analysis, and wolves were the outgroup for testing the bounds of the admixture percentage. For American dogs, a suitable unadmixed source population was not available, so we used a PCA-based approach (55) to identify the extent of European ancestry in individual breed and village dogs. Village dogs from Europe, Alaska, and Vietnam were used to define the PCA space, and then dogs from the Americas were projected onto it. The same approach was used to investigate indigenous versus European ancestry proportions from East Asian breeds using village dogs from Europe, Borneo, Vietnam, Mongolia, and Vietnam to define the principal components.

Acknowledgments We thank the countless dog owners and enthusiasts who facilitated sample collecting, including Carol Beuchat, Laura Colín, Jon Curby, Gautum Das, Ricardo de Matos, Baird Fleming, George Hicks, Gary S. Johnson, Warren Johnson, Janice Koler-Matznick, Leonard Kuwale, Kris Kvam, Judith Liggio, Stephanie Little Wolf, Gaby Matshimba, Mark Neff, Casey Quimby, Sue Ann Sandusky, Myrna Shiboleth, Jo Thompson, Steve Wooten, Asociación de Amigos por los Animales de Sosúa-Judy’s Pet Lodge, Animal Care in Egypt, Animals Fiji, Beirut for the Ethical Treatment of Animals, Liberia Animal Welfare and Conservation Society, Mongolian Bankhar Dog Project, Gump South Pacific Research Station, and the Qatar Animal Welfare Society. We thank Joy Li (Cornell Veterinary Biobank) and the Cornell University Genomics Core Facility for technical help. We thank A. G. Clark, S. Gravel, S. M. Myles, and R. K. Wayne for critical input. Funding for this project came from National Science Foundation Grant 0516310, National Geographic Society Expedition Council Grants EC0492-11 and 1P-14, Zoetis Animal Health, Cornell University Center for Advanced Technology, Cornell University, and dozens of PetriDish donors, including Sandra Coliver, Buck Farmer, Richard Gardner, Kathryn Sikkink, and Elaine and Chris McLeod.