It has previously been shown that LD extends much further within breeds than it does among breeds or within wolves [17] , [18] . Our analysis reveals that between-breed LD is significantly greater than wolf LD, consistent with a bottleneck in dogs during domestication. LD within the single village dog population decayed at a similar rate as LD between dog breeds, also consistent with a shared domestication bottleneck shaping LD patterns in both breed and village dogs. Perhaps surprisingly, village dogs exhibited fewer long ROHs than wolves, indicating that, at least in historical times, village dogs have likely maintained a higher effective population size or better inbreeding avoidance than their gray wolf progenitors. Similarly, haplotype diversity was also marginally higher in village dogs than in gray wolves across 500 kb windows ( Figure 1C ). Taken together, these observations suggest a radical reshaping of the dog genome on multiple time scales with the recent process of breed formation playing a particularly important role in transforming ancestral genetic variation into differences among breeds that show a high degree of genetic and phenotypic uniformity.

(A) LD decay curves based on mean r 2 , including mean LD decay when dogs are selected from 10 different breeds (“between breed” LD). (B) Distribution of long runs of homozygosity in each group. (C) Number of haplotypes across all autosomal 15-SNP windows and number of autozygous individuals per breed at each genomic position computed using 10 individuals per breed. Each window can contain 1–20 different haplotypes and each genomic position can have 0–10 individuals appearing autozygous.

We find that while LD extends over 1 Mb within every breed surveyed, across all dogs combined it decays extremely rapidly, consistent with previous studies [4] , [15] . This suggests few IBD segments are shared across multiple breeds and those that are shared are quite small ( Figure 1A ). Homozygous runs are also longer and more numerous in breed dogs than village dogs or wolves ( Figure 1B ), with individuals from nearly every breed exhibiting 10–50 ROHs greater than 10 Mb. Interestingly, Jack Russel terriers are an extreme outlier, with fewer such ROHs and, overall, higher levels of diversity than other dog breeds. Autozygosity levels were high in all breeds (lowest mean autozygosity = 7.5% in Jack Russel terriers, highest mean autozygosity = 51% in boxers); however, very few breeds exhibited genomic regions that were autozygous in all individuals at the megabase scale. In contrast to human populations, where patterns of autozygosity in populations are not generally correlated with haplotype diversity [16] , pure bred dogs show a very strong negative correlation between autozygosity and haplotype diversity ( Figure 1C ). One notable exception is basenjis, which show high haplotype diversity and high autozygosity, suggesting a recent population bottleneck post-breed formation has induced higher levels of inbreeding than expected. This is consistent with the breed history of basenjis in the United States, which are believed to descend from a small founder population.

To investigate how human-directed breeding has altered the landscape of the dog genome, we quantified pairwise SNP linkage disequilibrium (LD), haplotype diversity across 15-SNP windows (as inferred by fastPhase [14] ) and runs of homozygosity (ROHs) greater than 1 Mb for each individual (indicative of autozygosity) using the genotype data from the 59 breeds with ≥10 individuals and a population of village dogs and wolves (see Methods ). Long ROHs are a product of recent inbreeding, indicative of contemporary population size and mating system, whereas haplotype diversity and LD across shorter genomic scales (<1 Mb) are informative of more ancient population processes.

Given our finding of little sharing of IBD segments among individuals from different breeds, we expect that when coincident sharing occurs across breeds with a similar phenotypic trait, these genomic segments likely underlie heritable variation for that trait. We searched for the strongest signals of allelic sharing by scanning for extreme values of Wright's population differentiation statistic F ST [19] , [20] across the breeds. The 11 most extreme F ST regions of the dog genome contained SNPs with F ST ≥0.57 and minor allele frequency (MAF)≥0.15 ( Table 1 ). Six of these regions are strongly linked to genetic variants known to affect canine morphology: the 167 bp insertion in RSPO2 associated with the fur growth and texture [8] , an IGF1 haplotype associated with reduced body size [2] , an inserted retrogene (fgf4) associated with short-leggedness [5] , and three genes known to affect coat color in dogs (ASIP, MC1R, and MITF [4] , [21] , [22] ). Two other high F ST regions correspond to CFA10.11465975 and CFA1.97045173, which were associated with body weight and snout proportions, respectively, in previous association studies [23] , [24] . Two known coat phenotypes (fur length and fur curl [8] ) also exhibited extreme F ST values. Only a limited number of high F ST regions were not associated with a known morphological trait ( Figure 2 , black labels). Here, we focus on illuminating the potential targets of selection for these regions as well as identifying genomic regions that associate with skeletal and skull morphology differences among breeds.

Genome-Wide Association Mapping of Morphological Differences among Dog Breeds

We investigated the genetic architecture of morphological variation in dogs using a breed-mapping approach to look for correlations between allele frequency and average phenotypic values across 80 breeds at 60,968 SNPs (see Methods). We computed male breed-average phenotypes for each of 20 different tape measurements, and also computed breed averages from museum specimens for 15 long bone and 20 skull/tooth dimensions (Figure S1). For all 55 measures, we conducted association scans with and without controlling for overall breed body size, and also controlled for breed relatedness by using breed-average relatedness as a random effect in the linear mixed model. We also looked for genomic regions underlying body size variation and ear floppiness.

Body size variation is greater across dog breeds than in any other terrestrial species [25], with smaller stature likely being selected for during domestication and with large and small body sizes being alternatively selected for in different breeds. Our scan for body size (defined as log(body weight)) yielded several significant genomic associations, with the six strongest signals occurring at CFA15.44226659, CFAX.106866624, CFA10.11440860, CFAX.86813164, CFA4.42351982, and CFA7.46842856. The corresponding P-P plot compares the observed distribution of −log 10 p values (i.e., blue and red points in Figure 3A) to the expected distribution under a model of no-association (i.e., dashed line which represents equality of Expected and Observed) and demonstrates an excess of significant signals since the tail of the distribution is well above the diagonal dashed line. When the top six regions (and linked SNPs) are removed, the observed p value distribution (i.e., gray points in Figure 3A) is strongly shifted towards the null expectation, suggesting these six QTLs account for the bulk of the association signal in our data. The first four signals are among the highest F ST regions in the dog genome (Table 1) with the CFA4 signal also exhibiting an elevated F ST (0.46), consistent with diversifying selection among breeds for body size. The signal on CFA15 corresponds to the location of IGF1 which encodes a growth factor previously described to control a significant proportion of size variation across dog breeds [2]. The CFA10 signal corresponds to the location of HMGA2, a gene known to affect body size variation in humans [26] and mice [27]. Both HMGA2 and a locus corresponding to the CFA7 signal, SMAD2, have been previously associated with dog body size [23]. In contrast, the signals on CFA4 and CFAX hits have not previously been associated with body size variation in dogs. Interestingly, the CFA4 signal contains (among other genes) the STC2 locus, a known growth inhibitor in mice [28]. The two signals on the X chromosome lie in separate LD blocks that each contains dozens of genes. Other than IGF1, all the other regions will require fine-mapping in order to confirm a single candidate gene. In all six regions, wolves are not highly polymorphic (MAF<0.1), and except for the CFA10 signal, the derived allele is at highest frequency in small breeds.

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Genome-wide association scans across the breeds using allele frequencies of the SNPs and breed-average phenotypes for (A) log(body weight), (B) ear erectness (floppy versus erect ears), and (C) allometric snout length. The p values of the SNPs were computed using the linear mixed model method for (A and C) and weighted permutation method for (B). SNPs passing Bonferroni correction are marked with orange circles; SNPs included in best-fit predictive models are marked with gray dashes. P-P plots for the scans are shown in the right-hand column. (D) Matrix showing phenotype identity (upper diagonal) is uncorrelated with genome-wide IBS (lower diagonal) between dog breeds for body weight and ear type. Genome-wide IBS is plotted as a scaled value where 0 corresponds to the lowest amount of IBS between any two breeds (0.62) and 1 corresponds to the highest amount of IBS (0.83). Boxers are not shown since their IBS values are low in comparison to other breeds due to the SNP ascertainment bias on the array. https://doi.org/10.1371/journal.pbio.1000451.g003

Another key trait that varies substantially among breeds is ear type. All adult wild canids have erect ears, but dog breeds are alternately fixed for various ear positions, including floppy ears. This paedomorphic trait is retained by adults of some breeds in many domesticated mammals, including dogs, cattle, goats, and rabbits. We looked for SNPs associated with breeds fixed for floppy or erect ears, and found a single region on CFA10 that is likely responsible for the difference in ear type (Figure 3B). The derived allele at this locus is nearly fixed in floppy-eared breeds, consistent with the floppy ear position being the derived phenotype (Figure S2). This SNP is also within a region associated with body size in this study (near HMGA2), although the strongest signal for ear position occurs nearly 0.5 Mb upstream, near MSRB3 (Figure S3). Floppy-eared breeds show sharply reduced heterozygosity, suggesting this region, the highest F ST region in any autosomal segment of the dog genome (Table 1), has undergone strong selection for floppy ear position or perhaps some correlated trait.

Snout length is another trait that varies considerably among breeds, and like floppy ears, short snouts are associated with neoteny in many domesticated mammals [29]. Association mapping using breed-average values for absolute snout length highlights similar genetic regions as those suggested for body weight, but introducing log(body weight) as a covariate in the association model allows for an allometric correction and reveals QTLs underlying proportional snout length (Figure 3C). The strongest signals for proportional snout length are CFA1.59832965 and CF5.32359028. Both are within the top 5% of high F ST SNPs (F ST = 0.55 and 0.42, respectively) and are only found at high derived-allele frequency in breeds with short snouts (brachycephalic).

Using forward stepwise regression, we combined potential signals into a multi-SNP predictive model for each trait. In the models of body weight, ear type, and the majority of measured traits, most of the variance across breeds could typically be accounted for with three or fewer loci (Figure 4 and Table S1). Correlated traits (e.g., femur length and humerus length) yielded similar SNP associations. For the 55 traits, the mean proportion of variance explained by the top 1-, 2-, and 3-SNP models was R2 = 0.52, 0.63, and 0.67, respectively (see Table S1). After controlling for body size, mean proportion of variance explained by these models was still appreciable—R2 = 0.21, 0.32, and 0.4, respectively. Notably, the most significant genomic regions were similar even using naïve association scans that did not control for population structure (Figure S4). In terms of breed mapping, the level of population structure common to groups of breeds was insufficient to strongly bias association inferences (see Figure S5 and Methods).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. Summary of associations across genomic regions for multiple traits. Each row corresponds to a trait (either absolute or proportional), and each column corresponds to a genomic region that has been found associated with at least one trait. The shading of each rectangle shows the R2 statistic of the single marker model for the trait for all significant associations (p<5.0e-5 for absolute external traits, p<1.0e-4 for skeletal and proportional traits after correcting for population structure). When multiple SNPs in the region are significant, the largest value of the R2 statistics is reported. https://doi.org/10.1371/journal.pbio.1000451.g004

For most of the traits investigated, we found that a few QTLs of large effect governed the phenotypic differences among breeds. For example, for proportional height at withers, we observe a large-effect QTL on CFA18 that we have previously shown corresponds to an fgf4 retrogene that accounts for chrondrodysplasia or disproportional dwarfism in breeds such as corgis, basset hounds, and dachshunds [5], although we also find a novel QTL for height on CFA20. Likewise, skull shapes were largely dictated by regions on CFA1, CFA5, CFA26, and CFA32. In addition, the CFAX.105274087–106866624 region that was associated with body size is also associated with skull length, even after accounting for breed-average body weight. Nearly all of these regions were also associated with dental traits, in addition to a strong association on CFA16, suggesting a suite of correlated traits that are principally governed by a few genomic regions.