Animals and behavior phenotyping

The genotyped dog sample consisted of 190 laboratory beagles ranging between 6.2 and 0.8 years, with an average of 1.8 years, at the time of testing. These individuals were selected from a total of 437 dogs previously tested as described below. The behavioral test results rendered four principal components, of which the second was associated with human-directed behavior, showing significant heritability10. We therefore selected dogs with the 95 top and 95 tail scores on the second principal component for the present analysis. The dogs were bred, kept and handled under highly standardized conditions at a research dog kennel in the south of Sweden. Although the population was intentionally outbred, many of the dogs were related on different levels and some degree of inbreeding occurred, which has to be taken into consideration in the analysis. The behavioral study of these dogs has been published previously and more detailed information on handling, housing, ethograms and testing can be found there10.

Briefly, dogs were each tested once using an unsolvable task paradigm consisting of three identical problems of which one was unsolvable. The task was to slide a Plexiglas lid to the side in order to obtain a treat in a container underneath. One of the three lids had been fixed and could not be opened, which constituted the unsolvable part of the problem. In the procedure room a female researcher, previously unknown to the dogs, was facing the test setup sitting passively on a stool approximately 1.5 meters from the problem-solving device, which was placed on the floor, while the dogs were left to move freely in the room for 3 minutes. The behavior tests were video recorded and later analyzed using the Noldus Observer X 10 software. Prior to testing, dogs’ willingness to eat the presented treats had been scored and dogs that did not eat the treats were excluded from the study.

The human-directed social behaviors analysed in the current study are presented in the ethogram (Table 4) and includes physical contact and staying in close proximity to the human. In the previous study, behaviors were summarized through a principal component analysis (PCA) resulting in 4 principal components (PCs) named test interactions, social interactions, eye contact and physical contact based on the behaviors with the highest and lowest scores within each PC10. Statistical analysis showed that sex had a significant effect on the PC related to social interactions where females scored higher than males (Univariate GLM; F 1,432 = 17.60; p < 0.01). Effects of sex and age on each of the behaviors used in this study can be found in ref. 10. Additionally, narrow-sense heritability (h2) was estimated to 0.23 (HPD: 0.06–0.42) for the PC related to social interactions using MCMCglmm in R.

Table 4 Ethogram of the human-directed social behavior phenotypes used. Full size table

In the present study, GWAS was performed as described below on the behaviors listed in Table 4 as well as on the scores on the second principal component (PC2) from Persson, et al.10 related to social interactions and the third principal component (PC3) related to eye contact-seeking behaviors.

DNA sampling, extraction and genotyping

All experimental protocols were approved by the Swedish ethics committee in Lund, Sweden (21 June 2012), and the methods were carried out in accordance with the relevant guidelines. DNA was obtained either from buccal cells or from blood samples. Buccal cell DNA was collected with and extracted from buccal swabs using the standard protocol of the Isohelix kit DDK-50. Samples were kept in Lysis Buffer and proteinase K for 48 hours prior to continuing with the protocol. Single 50 μl elutions were used. DNA was extracted from whole blood samples using the standard protocol of the QIAGEN QIAamp DNA mini kit (DNA Purification from Blood or Body Fluids, Spin Protocol). Single 100 μl elusions were used. Subsequently, DNA was quantified using a Nanodrop ND-1000 and stored at −20 °C until further use. DNA samples from 95 dogs with the top PC2 scores and 95 dogs with the lowest PC2 scores (in total 190 samples filling two plates) were genotyped with the Illumina’s Infinium assay and the CanineHD BeadChip at the SNP&SEQ Technology Platform, Uppsala University41,42.

Quality control

Initial quality control was performed at the SNP&SEQ Technology Platform in Uppsala, using the software GenomeStudio 2011.1 from Illumina Inc. This was done in order to confirm both sample and marker quality43. All individuals had call rates >98%, which was above the 90% threshold, hence no samples were removed.

Further quality control was performed on 172 115 SNP markers using the PLINK 1.07 analysis software (http://pngu.mgh.harvard.edu/~purcell/plink/)44. Markers failing the criteria of Hardy-Weinberg expectations p > 0.05, genotyping rate > 90% and minor allele frequency (MAF) of >5% were removed finally leaving 85 172 markers for the association analysis.

Genome-wide association study

Raw phenotype data can be found in Supplementary Data S9 and genotype data in Supplementary Data S10 & S11 publicly available at https://osf.io/m5a3v/?view_only=a6e3a4383f894d7cbda34c0750f26fb0. PLINK v1.07 was used to perform the initial naïve genome-wide association analyses (GWAS)44. Genome-wide corrected empirical p-values were determined by 100.000 max (T) permutations. The genomic inflation factor (λ) was ranging from 1.0–1.4 for the different phenotypes suggesting a minor level of population stratification. To adjust for population stratification and relatedness the rest of the GWAS was performed using the GEMMA software (http://www.xzlab.org/software.html). GEMMA implements the Genome-wide Efficient Mixed Model Association algorithm for GWAS45. First, a centered relatedness matrix was estimated from genotypes for each phenotype. The association tests were then performed with the relatedness matrix, phenotype, genotype and the fixed effects (sex and/or age) fit into a univariate linear mixed model as follows:

where y is a vector of phenotypes, W is a covariate matrix with the vector α of the fixed effects including the intercept, x is a vector of marker genotypes and the effect size of the marker is β, u is a vector of random effects and ε is a vector of residual errors. Additionally, GEMMA estimates the proportion of variance in the phenotypes explained (PVE or “chip heritability”). The Wald frequentist test was chosen to test for significance in GEMMA and to determine the genome-wide significance the Bonferroni threshold for p < 0.05 for 85 172 tests was calculated. In order to make sure that the population stratification had been accounted for, GenABEL in R was used on the Wald p-values to make a new estimate of λ using the regression method46. To generate Manhattan and Q-Q plots the qqman R-package was used. Q-Q plots of GEMMA p-value distributions for each of the studied behaviors can be found in Supplementary Appendix S12. The genomic inflation factor (λ) for each behavioral phenotype, before and after accounting for population structure, is presented in Table 1 together with PVE or “chip heritability” estimates. We studied a laboratory beagle population, which had been intentionally outbred, but there was still some population stratification present in the tested individuals that can be seen in the naïve (raw) genomic inflation factor (λ) in Table 1. Conveniently, the GEMMA software is able to account for population structure as well as inbreeding and covariates, which is demonstrated by the λ-value being much closer to 1.0 in the GEMMA analysis. This is further supported by the Q-Q plots of the GEMMA Wald p-values in Supplementary Appendix S12. To visualize the population structure, a multidimensional scaling (MDS) analysis was performed in PLINK and the first and second dimensions were plotted in R with top and tail dogs indicated (Supplementary Figure S13).

Behavioral means for the different genotypes of the significant and suggestive SNPs were analyzed through univariate general linear models in IBM SPSS Statistics 22. Subsequent linkage disequilibrium (LD) analysis and visualization was done using the Haploview v4.2 software47. LD windows were estimated based upon visual examination of the combined D’ and LOD heat maps. SNPs that were considered to be in LD had D’ of 1 and LOD-score between 0.15 and 1. Additionally, the effect of the haplotype consisting of the significant SNP and its closest neighboring SNP, with the highest association, was tested using univariate general linear models.

Validation and genotyping through pyrosequencing

In order to validate the GWAS results, 70 samples from the original population of 500 beagles were genotyped for the BICF2G630798942 and BICF2S23712114 SNP markers using polymerase chain reaction (PCR) and subsequent pyrosequencing. Additionally, another 30 samples previously genotyped with the CanineHD BeadChip were analyzed to validate the genotyping. Briefly, primers were designed using the software PyroMark Assay Design 2.0.1.15 by QUIAGEN©. The primers used for BICF2G630798942 were: forward biotinylated in 5′ CTGCCAGGGACTCCTGAG, reverse CTCAAGGCAGCCCATCACT and sequencing reverse GGAGGCTTGCTGCCG. For the BICF2S23712114 SNP the primers used were: forward biotinylated in 5′ CATGTCACAGTTGAGGGGATAGGT, reverse TCTTCAGACAGCCCACCCA and sequencing reverse CAGTCCAGGAAGGAATA. The PCR-mixture contained, for each sample, 0.12 μl DreamTaq™ DNA Polymerase 5 u/ μl (Thermo Scientific), 2,5 μl of 10X DreamTaq™ Buffer (Thermo Scientific), 0.5 μl dNTP 10 mM (2.5 mM each, BIOLINE), 0.5 μl of each primer diluted to 5 μM (Invitrogen), 19.9 μl of nuclease free water and approximately 100–200 ng of DNA template. The final PCR volume was 25 μl for each sample and the reaction was run on the Palmcycler PCR by Corbett. The PCR cycle consisted of an initial denaturation at 95° for 3 min, 40 cycles of 30 s denaturation at 95°, 30 s annealing at 63° for the BICF2G630798942 primers and 61° for the BICF2S23712114 primers, 30 s extension at 72° and a 10 min final extension at 72°.

Pyrosequencing was performed on the entire PCR product according to the PyroMark Q24 Vacuum Workstation Quick-Start Guide found at www.qiagen.com. The results were analyzed using the PyroMark Q24 2.0.6 software. For the BICF2G630798942 SNP, one sample had to be excluded due to PCR amplification failure resulting in a total of 69 additional individuals. The genotypes of the 70 additional individuals were analyzed together with the previous 190 individuals using Univariate GLM in SPSS Statistics.