Y-chromosome sequence variation among domestic dogs

We here present the first study of Y-chromosome DNA sequence diversity among dogs worldwide, hereby, obtaining genetic data for a second independently inherited marker to evaluate the scenario for the origins of domestic dogs. We analysed 151 dogs sampled from throughout the world and, for a reference, also 12 wolves and 2 coyotes, for 14 437 bp of Y-chromosome DNA sequence (see Table 1, Materials and methods, Supplementary Dataset 1, 2 and 3). In total, there were 49 nucleotide positions with binary substitutions (1 substitution/295 bp) and 14 with indels, and among dogs 30 substitutions (1 substitution/481 bp) and 11 indels. The 49 substitutions define 32 haplotypes: 28 found among dogs (1 shared with wolf), 2 wolf specific and 2 coyote specific. The genetic relations between haplotypes were reconstructed in a most parsimonius phylogenetic tree (Figure 1a), without homoplasy in any nucleotide position.

Figure 1 Phylogenetic and geographical distribution of haplotypes. (A) Most parsimonious phylogenetic tree. Haplotypes (symbolized by circles for dog, squares for wolf and hexagons for coyote; black dots are hypothetical intermediates) are separated by one substitutional step. The area of the circles is proportional to the frequency of the haplotype among dogs. Haplogroups (see text) are indicated by colour; haplotype 2* cannot be assigned to HG1 or HG3 and therefore white. (B) Geographical distribution of haplogroups. Graphs show number of individuals carrying each haplogroup, colours referring to haplogroups according to (a). Populations: a, Scandinavia; b, Britain; c, Central Europe; d, South Europe; e, Fertile Cr; f, SW Asia East; g, Northern Africa; h, Southern Africa; i, Siberia; j, North China; k, Central China; l, South China; m, Southeast Asia (l and m jointly forming ASY); n, Japan; o, America. For definitions of geographical regions, see Note to Table 1, and Materials and methods. (C) Trees (see a) showing representation (blue, shared with other regions; yellow, unique to the region; white, not present) and frequency (proportional to area) of haplotypes among dogs in geographical regions. Europe C/S, Central and South Europe; SE Asia, Southeast Asia. Full size image

Single basecalls are expected at every nucleotide position of the haploid Y-chromosome. However, three positions situated in one amplification fragment (fragment G; see Materials and methods and Supplementary Dataset 2) had double basecalls resembling diploid variation (∼50% of each nucleotide) for some individuals. The haplotypes with double basecalls (haplotypes with names including ‘*’) group in three parts (separate for each position) of the phylogeny (Figure 1a); the double basecalls are obviously caused by a duplication of the DNA segment and subsequent substitution in one of the copies at three different positions.

In accordance with dogs originating from wolf (Wayne, 1993; Clutton-Brock, 1995), the wolf haplotypes differed by 0–4 substitutions and the coyote haplotypes by at least 15 substitutions from the closest-dog haplotype. Among the 12 wolf samples there were three haplotypes: H23*, which was shared between dog and one Chinese wolf, H26 (found in one American wolf), which was separated from two dog haplotypes by one substitution and H27 (found in nine Chinese and one Scandinavian wolf), which differed by four substitutions from the closest-dog haplotype (Figure 1a).

Five major dog haplogroups but at least 13 male founders

The dog haplotypes clustered in five major groups (dubbed haplogroups HG1, HG3, HG6, HG9 and HG23 after their respective central haplotypes) consisting of one or two frequently occurring central haplotypes surrounded by less frequent haplotypes (Figure 1a). This pattern is suggestive of an origin of dogs from five wolf founders, carrying five haplotypes from which all other haplotypes subsequently derived through substitutions within the dog population. However, calculations based on the number of substitutions expected to have occurred among the 151 dog lineages since the time of the dog origins indicate a larger number of founders from wolf. We estimated the substitution rate from the mean number of substitutions between dog/wolf and coyote (18.1 substitutions (17.5–18.6, 95% confidence limits) or 1.25 × 10−3 substitutions per site (1.22 × 10−3–1.29 × 10−3)), and the time since the split between the lineages leading to wolf and coyote. There is no exact archaeological calibration point for this separation; 1.8–2.5 (Nowak, 2003) or around 3.2 million years ago (Tedford et al., 2009) has been suggested but it may have occurred 1.5–4.5 million years ago (Nowak, 2003; Tedford et al., 2009). The substitution rate was thus calculated as a broad range at 1.39 × 10−10–4.18 × 10−10 substitutions per site per year (1.35 × 10−10–4.31 × 10−10, 95% confidence limits) or one substitution per 165 746–497 238 years (160 782–513 078 years). This is less than half the rate estimated for the human Y-chromosome (Xue et al., 2009), which is possibly related to the method used for identifying the dog Y-chromosome sequences (see Materials and methods), which involved selection for similarity to human Y-chromosome sequences, possibly enriching conserved regions of the dog Y-chromosome.

Assuming that dogs originated 11 500–16 000 years ago, according to the archaeological record (Dayan, 1994; Chaix, 2000; Raisor, 2005; Wang and Tedford, 2008; Napierala and Uerpmann, 2010), mtDNA data (Pang et al., 2009) and autosomal SNP data (Skoglund et al., 2010), at average 0.023–0.097 substitutions (0.022–0.100, 95% confidence limits) would have occurred per dog lineage since the time of origin from wolf. Further, assuming conservatively that all 151 dogs represent independent lineages leading back to the dog origins, 3.5–14.6 substitutions (3.4–15.0, 95% confidence limits) would have occurred among the 151 dog lineages since domestication. Given a total of 28 haplotypes among the domestic dogs, this implies that 13.4–24.5 (13.0–24.6) of the 28 haplotypes, rounded down to 13–24, are intact from the wolf founders. Thus, our data indicate that the Y-chromosome genepool of the relatively limited number of dog samples in this study originates from at least 13–24 different wolf Y-chromosome haplotypes. The formation of the dog haplotypes in five star-like clusters must therefore partly stem from the relations between haplotypes in the founder wolf population(s). Notably, an origin of dogs from numerous male wolves is in line with both mtDNA data indicating that dogs originated from a minimum of 51 female wolf lineages (Pang et al., 2009) and MHC data (from the low diversity European dog population) indicating an origin from at least 21 wolves (Vilà et al., 2005). Therefore, multiple genetic datasets indicate that dogs originate from a large number of domesticated wolves.

Two of four principal haplogroups are shared universally but two are almost exclusive to East Asia

The dog Y-chromosome gene pool was to a large degree shared among the populations across the world (Figures 1b and c). Two of the five haplogroups (HG1 and HG23) were virtually universally represented and carried by 62% of all dogs in the study. The three central haplotypes within these haplogroups, H1, H1* and H23*, were carried by almost half (46%) of the dogs and shared by dogs in Europe, SW Asia and China, by 75%, 44% and 32% of the individuals, respectively.

However, there were also distinct differences in the geographical representation and distribution of haplogroups and haplotypes. The other three haplogroups were also distributed across relatively large distances but not universally spread. HG3 was found in East Asia (including Siberia) and America, and at lower frequency in SW Asia, Scandinavia and Britain, but not in samples from the European continent and Africa. HG6 was found in East Asia and at low frequency in SW Asia, but was absent elsewhere. Finally, HG9 was found in only totally four individuals, but as far apart as East Siberia (one individual) and Central Africa (three individuals).

As the sample sizes were relatively limited, haplogroups with low frequency, for example, HG9 may have remained undetected in some populations. However, the general pattern was that the four main haplogroups were relatively equally represented in the eastern part of the world, whereas west of the Himalayas and the Urals haplogroups HG1 and HG23 were represented by 89% of the individuals, and HG6 and HG3 rare or absent. Thus, HG1 and HG23 were universally represented, whereas HG3 and HG6 had restricted distributions. Only in East Asia and SW Asia all four major haplogroups were represented.

Highest genetic diversity in Southwestern ASY

Accordingly, except for the practically universal representation of haplotypes H1, H1* and H23*, the representation and frequencies of haplotypes differed considerably among regions, as demonstrated in the phylogenetic trees (Figure 1c). In some regions, for example, Europe, frequency was very high for a few haplotypes, mainly H1, H1* and H23*, and other parts of the phylogeny was empty. Other regions, for example, ASY, had a larger number of haplotypes at more even frequencies and representation across the phylogeny. These differences in genetic coverage are reflected in difference in genetic diversity measured as the number of haplotypes per sampled individual and HD (Table 1). In many cases the samples were too small to yield significant differences, but the general trend was that the highest values for genetic diversity among all regions were found within ASY, there were medium values in other parts of East Asia, and in SW Asia and Africa, and low in Europe and America.

Comparing the three major regions suggested as potential origins for dogs, ASY had the highest diversity with 13 haplotypes among 23 samples, and a HD of 0.901, to compare with SW Asia and Europe, which had 9.58 and 6.50 haplotypes at resampling of 23 samples, and a HD of 0.863 and 0.734, respectively (Table 1). Importantly, except for haplogroup HG9, practically the full diversity for dog Y-chromosome DNA was covered in ASY, such that all haplotypes in other regions were maximally one step from haplotypes in ASY (Figure 1c). The highest diversity worldwide was found within the Southwestern part of ASY (Southw ASY; Southeast Asia and the adjacent Chinese provinces Yunnan and Guangxi) with 11 haplotypes among 16 individuals, 10.10 haplotypes at resampling size 14, and a HD of 0.950. In contrast, at the other end of Eurasia, Europe had 7 haplotypes among 32 samples and 5.31 haplotypes at resampling size 14, almost half compared with Southw ASY. The remarkably low diversity for Europe is related to high frequency of haplotypes H1 (carried by 47% of the individuals) and H1* (22%) and that the other parts of the phylogeny are largely empty. This pattern was shared across Europe, by the north and south parts of the continent as well as Britain, and must therefore stem from the first origin of the European-dog population and not from later intense breeding, as it is unlikely that all haplogroups but HG1 would have been lost independently in several different lineages leading to today's breeds. SW Asia had 10 haplotypes among 25 samples and 7.35 haplotypes at resampling size 14, and had much higher frequency of haplogroup HG23 (68%) than other regions, whereas only one and two samples carried HG3 and HG6, respectively. Within SW Asia, the Fertile Crescent region (Fertile Cr; West Iran, Israel and East Turkey) had a higher diversity, with HD higher, but the number of haplotypes lower than ASY. Also here the frequency of HG23 was high (57%), but all four main haplogroups were represented. Among other regions, Siberia had especially high diversity, with marginally lower values than Southw ASY for number of haplotypes and haplotype diversity. Central and N China and Africa had medium diversity values and the small sample of American dogs had three haplotypes among nine samples.

Thus, diversity differences were generally small across the Old World, but Southw ASY had the highest diversity of all regions. The large difference between the opposite sides of the Eurasian continent is striking, and further highlighted by comparing the samples from Europe, having seven haplotypes among 32 samples, and Southeast Asia with six haplotypes (distributed among all four major haplogroups) among only 7 samples (Figure 1c).

With this study, two independently inherited markers have shown genetic diversity among dogs worldwide to be highest within ASY. It is also notable that, in similarity to the mtDNA data, ASY had the most comprehensive coverage of the phylogenetic diversity of all regions. The haplotypes were distributed across the four major haplogroups such that all haplotypes in other regions were at most one substitution from a haplotype found in ASY (Figure 1c). Therefore, except HG9, all haplotypes across the world were identical to or differed by a single substitution from a haplotype found in ASY, and may potentially have derived from haplotypes present in ASY.

A possible single origin of all haplogroups in ASY, but not in SW Asia or Europe

The haplogroups were geographically distributed in a distinct pattern (Figure 1b). HG1 had a frequency close to 100% in Europe and Africa, and HG23 a high frequency in SW Asia and Central China, but both haplogroups were also represented at lower and relatively even frequency virtually worldwide. In contrast, HG3 and HG6 were almost exclusively restricted to East Asia, at moderate frequency. This pattern may be explained by an origin of all four haplogroups from a single (not necessarily homogenous) founder population somewhere in East Asia, for example ASY, and genetic bottlenecks reducing diversity in other populations. However, separate origins of the haplogroups in different regions followed by non-symmetrical migrations between populations are also possible.

The high frequency (81%) and large number of haplotypes (four) of HG1 in Europe could possibly be explained by an origin of HG1 in Europe, after which only two of four haplotypes derived from the wolf founders would have spread to other regions. However, because of the high frequency in Europe of this haplogroup, a larger number of derived haplotypes are expected than in other regions. Among the 26 European lineages carrying HG1, 0.60–2.51 substitutions (0.57–2.60, 95% confidence limits) would be expected to have occurred during the 11 500–16 000 years since the origins of dogs. This indicates that only the universal haplotypes, H1 and H1* were inherited from wolf and the others derived from mutations within the European dog population. Therefore, HG1 being virtually universally represented, its geographical origins cannot be definitely identified based on this dataset. Similarly, SW Asia had a high frequency (68%) and the largest number of haplotypes (five) of HG23. In this case, 0.39–1.64 (0.37–1.70) substitutions would be expected among the 17 lineages in SW Asia, weakly indicating that HG23 may have originated in SW Asia. It is notable that the Fertile Cr had four haplotypes among eight individuals carrying HG23. However, HG23 was represented almost universally and also ASY had a high diversity for HG23, with three haplotypes among three samples. For HG3, ASY had six haplotypes among eight lineages. Only 0.18–0.77 (0.18–0.80) substitutions would be expected since the origins of dogs, leaving the majority of haplotypes identical to the haplotypes carried by wolves; the star-like formation of HG3 was obviously inherited from the founder wolf population. The large number of HG3 haplotypes in ASY indicates an origin of this haplogroup in ASY or adjacent regions, but a relatively high diversity (four haplotypes among six individuals) in Siberia is also notable. Finally, HG6 being found almost exclusively in East Asia most probably originated somewhere in this region.

Consequently, it is not possible to definitely point out from where each haplogroup originated. However, it can with greater certainty be concluded from where the haplogroups did not originate. Thus, it seems very unlikely that haplogroups HG3, HG6 and HG23 would have originated in Europe or Africa, or haplogroups HG3 and HG6 in SW Asia. Therefore, three out of four of the dogs Y-chromosome genepool clearly originates from outside Europe as only HG1 may have originated there. Importantly, the extremely low diversity in Europe cannot be linked to the intense breeding of European dogs in historic times (see Discussion). It also seems clear that a maximum of roughly 50% of the genepool (HG23 and HG1) may have originated in SW Asia. In contrast, the full dog Y-chromosome gene pool may have originated somewhere in East Asia, including ASY. ASY is especially likely considering that, uniquely, all haplotypes of the four major haplogroups differed by at most one substitution from haplotypes in ASY.

To conclude, the Y-chromosomal DNA data indicates that if the domestic dog originated from a single geographical region this could have happened in ASY but not in SW Asia or Europe. If the dog originated from several regions, at most 50% of the gene pool may have originated in SW Asia or Europe. Thus the Y-chromosome data indicates that wolves in ASY were the major source of genetic diversity for dogs.