Abstract Consumer genomics enables genetic discovery on an unprecedented scale by linking very large databases of personal genomic data with phenotype information voluntarily submitted via web-based surveys. These databases are having a transformative effect on human genomics research, yielding insights on increasingly complex traits, behaviors, and disease by including many thousands of individuals in genome-wide association studies (GWAS). The promise of consumer genomic data is not limited to human research, however. Genomic tools for dogs are readily available, with hundreds of causal Mendelian variants already characterized, because selection and breeding have led to dramatic phenotypic diversity underlain by a simple genetic structure. Here, we report the results of the first consumer genomics study ever conducted in a non-human model: a GWAS of blue eyes based on more than 3,000 customer dogs with validation panels including nearly 3,000 more, the largest canine GWAS to date. We discovered a novel association with blue eyes on chromosome 18 (P = 1.3x10-68) and used both sequence coverage and microarray probe intensity data to identify the putative causal variant: a 98.6-kb duplication directly upstream of the Homeobox gene ALX4, which plays an important role in mammalian eye development. This duplication is largely restricted to Siberian Huskies, is strongly associated with the blue-eyed phenotype (chi-square P = 5.2x10-290), and is highly, but not completely, penetrant. These results underscore the power of consumer-data-driven discovery in non-human species, especially dogs, where there is intense owner interest in the personal genomic information of their pets, a high level of engagement with web-based surveys, and an underlying genetic architecture ideal for mapping studies.

Author summary The genetic underpinnings of many phenotypic traits in domestic dogs remain undiscovered. Although two genetic loci are known to underlie blue eye color in dogs, these do not explain all cases of blue eyes. By examining > 3,000 dogs from the Embark Veterinary, Inc. customer database, representing the first genome-wide association study (GWAS) driven by consumer genomics in dogs and the largest dog GWAS cohort to-date, we have shown that a region of canine chromosome 18 carrying a tandem duplication near the ALX4 gene is strongly associated with blue eye color variation, primarily in Siberian Huskies. We also provide evidence that this duplication is associated with blue eye color in non-merle Australian Shepherds. While beyond the scope of this work, future studies of the functional mechanism underlying this association may lead to discovery of a novel pathway by which blue-eyes develop in mammals. These results highlight the power and promise of consumer-data-driven discovery in non-human species.

Citation: Deane-Coe PE, Chu ET, Slavney A, Boyko AR, Sams AJ (2018) Direct-to-consumer DNA testing of 6,000 dogs reveals 98.6-kb duplication associated with blue eyes and heterochromia in Siberian Huskies. PLoS Genet 14(10): e1007648. https://doi.org/10.1371/journal.pgen.1007648 Editor: Gregory S. Barsh, Stanford University School of Medicine, UNITED STATES Received: February 12, 2018; Accepted: August 20, 2018; Published: October 4, 2018 Copyright: © 2018 Deane-Coe et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All data necessary to replicate our results are freely available on Dryad [https://doi.org/10.5061/dryad.h305b15 <https://doi.org/10.5061/dryad.h305b15>]. Funding: This study was funded by Embark Veterinary, Inc. and the participants that provided DNA and phenotypic information via Embark’s web-based platform. ARB, co-founder and part owner of Embark Veterinary, Inc. participated in study design, data collection and analysis, decision to publish, and preparation of the manuscript. Competing interests: PED, ETC, AS, ARB and AJS are employees of Embark Veterinary, a canine DNA testing company which will offer commercial testing for the variant described in this study. ARB is co-founder and part owner of Embark.

Introduction Humans have been exerting multifarious selection on dogs since their domestication from wolves, including strong natural selection during adaptation to a domesticated lifestyle followed by intense artificial selection during modern breed formation [1–3]. One unintended consequence of this selection is that the canine genome now encodes dramatic phenotypic diversity highly amenable for genetic mapping, with moderate genome-wide divergence between breeds except near loci under selection [4–6] and long tracts of linkage disequilibrium that can be effectively scanned with microarrays [7]. Genetic discoveries in dogs benefit breeding efforts and animal welfare, and they are valuable for translational studies in humans because dogs and humans exhibit many analogous physical traits, behaviors, and diseases in a shared environment [5, 8]. In humans, blue eyes first arose in Europeans [9] and may have been favored by sexual selection due to an aesthetic preference for rare phenotypic variants [10], as an informative recessive marker of paternity [11], and/or as a by-product of selection for skin de-pigmentation to increase UVB absorption [12]. Whatever the cause, this selection has acted on the regulatory machinery of OCA2 (Oculocutaneous Albinism II Melanosomal Transmembrane Protein), which controls transport of the melanin precursor tyrosine within the iris [13, 14]. Because blue eyes result from reduced melanin synthesis, other mutations affecting melanocyte and melanosome function in the retinal pigment epithelium (RPE) can also recapitulate the phenotype [15]. In dogs, blue eyes are iconic of the Siberian Husky, a breed of northern latitudes. Prized among breeders, it is not known whether blue eyes confer adaptive benefits for high latitude dogs as has been hypothesized for humans, and the genetic basis has not yet been discovered. According to breeders, blue eyes in Siberian Huskies are a common and dominant trait, including solid blue and complete heterochromatism (one blue and one brown eye), whereas blue eyes appear to be a rare and recessive trait in breeds like the Border Collie, Pembroke Welsh Corgi, and Old English Sheepdog. The only genetic factors known to produce blue eyes are two cases associated with coat coloration: “Merle” and “piebald” dogs have patchy coat colors due to mutations in Premelanosome Protein (PMEL17) and Melanogenesis Associated Transcription Factor (MITF) that can lead to one or two blue eyes, or slices of sectoral heterochromia, when de-pigmented regions extend across the face [16, 17]. PMEL is regulated by MITF, the master regulator of melanocyte development [18]. Rarely, non-merle Australian Shepherds have unexplained cases of solid blue eyes or complete heterochromia, as in huskies, and the genetic basis of this trait is similarly unknown [19]. We employed a novel genomic resource—a panel of 6,070 dogs genetically tested on a high-density 214,661-marker platform, with owners that had contributed phenotype data via web-based surveys and photo uploads—to examine the genetics of blue eyes in a diverse panel of purebred and mixed-breed dogs.

Discussion In this study, we discovered a haplotype containing a 98.6-kb duplication that is strongly predictive of blue eyes and heterochromia in dogs. While we cannot definitively rule out a different typed or untyped variant on this haplotype causing the trait, we feel that the duplication is a plausible causal candidate worthy of further functional investigation. We were able to validate the presence of this duplication with three independent methods: log R intensity from our microarray data, PCR and Sanger sequencing, and read-depth analysis of 17 whole-genome sequences. Further, we showed strong concordance between log R intensity and a consistent haplotype identified from blue-eyed cases in phased haplotype data. Using those phased haplotypes, we found that the duplication-carrying haplotype was more strongly associated with the blue-eyed phenotype than any single marker on our genotyping array (chi-square duplication P = 5.2x10-290; GWAS SNP P = 4.9x10-120; S4 Table). Further investigation of variant calls in resequencing data, a comparison of sequences between carriers and non-carriers of the duplication, interestingly revealed no convincing functional targets (S2 Table). While direct functional validation of the duplication is outside of the scope of this research, we suggest that the proximity of this duplication to ALX4 makes it a prime candidate for functional investigation. To date, the most familiar examples of duplications affecting phenotype are those related to dosage, cases where one or more duplication events increased gene copy number and, therefore, the amount of translated protein product available for cellular function [22, 23]. However, this duplication sits in an intergenic region between the tetraspanin CD82 and Homeobox gene ALX4 (NCBI; UCSC Genome Browser [24]). Two non-coding RNAs (ncRNAs) are annotated on the complementary strand, including an uncharacterized long noncoding RNA (lncRNA) that overlaps the 3’ breakpoint of the duplication (Fig 1; S4 and S6 Figs). We could find no evidence that CD82 is functionally associated with eye color in humans or any other animal, but ALX4 and its paralogs play an important role in both mammalian eye development [25, 26] and pigmentation [27]. Research on the genetics of striping patterns in African striped mouse (Rhabdomys pumilio) and Eastern chipmunk (Tamias striatus) demonstrated that a close paralog of canine ALX4, ALX3, is a repressor of MITF with dorsally striped expression, leading to reduced melanin content and lighter coat color where ALX3 is upregulated [27]. Gene expression studies in humans have additionally demonstrated that ALX4 itself is expressed in the RPE [28], and in zebrafish (Danio rerio), expression of ALX4 orthologs, alx4a and alx4b, are enriched in iridophores, which originate in common with melanocytes from the neural crest [29]. Given the importance of cis-regulatory elements in local gene regulation [30, 31] and the location of the duplication upstream of ALX4, we propose that this large duplication in a candidate regulatory region could cause blue eyes by increasing expression of ALX4 in the RPE, leading to repression of MITF and a reduction in melanin in the iris. The high proportion of blue-eyed heterozygotes in our analyses (53% of blue-eyed dogs) suggests that the duplication, if causal, is dominant in its phenotypic effect. However, the existence of 46 brown-eyed heterozygotes with similarly elevated Δ log R (P = 0.35 comparing blue and brown heterozygote distributions) suggests that one or more additional genetic factors may modify or mask the duplication’s effect on eye color (Fig 3; S8 Fig). This effect is not completely explained by an individual’s genotype at four previously characterized pigmentation genes (A, E, B, and K loci; S6 Table), although carriers of the duplication that also carry at least one copy of the dominant melanistic mask (Em) allele are significantly more likely to have brown eyes than duplication carriers without melanistic mask (P = 0.0018). Functional follow-up studies are needed to explicitly assay regulatory changes in ALX4 caused by this duplication; however, we have shown that this mutation is highly (but not completely) penetrant and most common in Siberian Huskies. The presence of the duplication explains at least 75% of blue-eyed cases in our discovery panel of customer dogs (81 / 108 blue-eyed dogs carried the associated haplotype and have elevated Δ log R values consistent with duplicated markers). In summary, by using consumer genomic data to drive this research, we were able to build the largest canine GWAS dataset to date, determine the prevalence of a putatively causal variant, a duplication upstream of ALX4 highly associated with blue eyes, across a diverse population, and utilize our relationship with owners of specific dogs to learn more about the inheritance of the trait. As more canine genetic testing is done on high-density array platforms, these databases hold particular promise for unlocking the genetic basis of complex phenotypes for which dogs are a particularly useful model, including cancer, behavior, and aging.

Methods Discovery samples We solicited phenotype data from customers whose dogs have been genetically tested by Embark Veterinary, and who have agreed to participate in research, by implementing an online survey about their dog’s morphological traits at http://embarkvet.com and encouraging participation via email. We initiated the survey on February 7, 2017, and, as of November 23, 2017, owners of 3,248 adolescent and adult dogs whose eye color can be assumed to be developmentally complete (6 months or older) had submitted a response to the section of that survey that asks about eye color (S11 Fig). Most were owners of mixed-breed dogs, and 21% were owners of purebred dogs (N = 668 / 3,180). A subset of owners (N = 68) selected "other", indicating that their dog had an eye color not represented by any of the seven options. In total, 156 dogs in this dataset were reported to have either solid blue eyes (N = 73) or heterochromic eyes (partially blue; N = 83), compared to 3,024 with some shade of solid brown. We encoded this trait as a binary phenotype in case-control format (0: brown, 1: blue) and considered both solid blue and heterochromic dogs as cases. Ancestry from 185 different dog breeds, landraces (village dogs) and gray wolves was represented in this dataset. Genotyping & quality control Customer dogs were genotyped on Embark’s custom high-density 214,661-marker platform (213,245 filtered to autosomal, chromosome X, and pseudoautosomal region markers). Total genotyping rate was 99.5%, and all dogs (N = 3,180) had less than 2.5% missing data and passed standard filtering in PLINK [32]. After filtering, 90% of variant sites (192,108 / 213,245) were genotyped across at least 95% of individuals and were included in subsequent analyses. Genome-wide association We constructed a relatedness matrix from centered genotypes and ran a genome-wide association test based on a univariate linear mixed-model in GEMMA [33], using the eigenvalues and eigenvectors of the relatedness matrix to control for confounding effects of shared ancestry, particularly among dogs of the same breed or breed group (groups of closely related or recently derived breeds). We identified significant associations by applying a threshold of P < 5.0 x 10−8 to the Wald test statistic. Whole genome sequence analysis We downloaded whole genome sequence data for 17 dogs from the NCBI Sequence Read Archive [34], conducted an end-to-end alignment using Bowtie2 in—very-sensitive alignment mode [35] calculated read depth coverage across sites using SAMtools [36], and investigated mapped paired-end reads in regions of interest using the Integrative Genomics Viewer (IGV) [37, 38]. For each dog, we calculated the change in read depth between the putative duplicated region, demarcated by discordantly mapping paired-end reads (44791417 to 44890166-bp), and 5Mb of flanking sequence immediately surrounding it (chr18:42999825-44791417 and chr18:44890166-48000173). We called variants across a 1-Mb region surrounding the most associated GWAS SNP using HaplotypeCaller from the Genome Analysis Tool Kit (GATK) [39] for 9 SRA samples (ERR911199, ERR911200, ERR1014362, ERR1990016, SRR1122359, SRR1124049, SRR1124304, SRR1784129, SRR2095539) with robust read depth and compatible alignment formatting (including four huskies with the duplication, and one husky and four other breeds without the duplication). Following batch variant-calling for SNPs and indels independently, we refined the call set by filtering variants that did not meet QC requirements for minimum read depth (SNPs and indels < 2), the phred-scaled p-value from a Fisher’s exact test for strand bias (SNPs > 60, indels > 200), the variant position relative to the end of the read (Mann-Whitney rank sum test, SNPs < -8, indels < -20) and, for SNPs, mapping quality (root mean square < 40, Mann-Whitney rank sum test < -12.5). We then identified 47 SNPs or indels present in the dogs with the duplication, and absent from those without the duplication (S2 Table). Since eye color phenotypes were not available for datasets archived to SRA, this approach was able to identify additional variants that co-segregate with the duplication (and are therefore at least as explanatory for the phenotype), but it was not possible to scan for variants that perform better (i.e. those that might additionally explain brown-eyed duplication carriers). PCR amplification of the duplication We designed primers to amplify the midpoint span of the duplication, as well as the 5’ and 3’ flanking regions of the duplicated region as positive controls. Genomic DNA (gDNA) remaining from microarray analysis (2 uL) was used for PCR reactions using the following primer combinations: ALX4_5Fl_1F + ALX4_5Fl_1R; ALX4_Dup_2F + ALX4_Dup_2R; ALX4_3Fl_1R + ALX4_3Fl_1R. All PCR reactions were performed using Q5 High Fidelity Taq Polymerase (NEB Cat No M0491) in a total volume of 20 uL following the manufacturer’s protocol. The following cycling parameters were used: 98°C 30s, 40X (98°C 10s, 60°C 30s, 72°C 30s), 72°C 5m, 16°C hold. PCR product was visualized on a 1% agarose gel with 1X GelRed (Biotium Cat No 41003); product from 9 dogs were submitted for purification and Sanger sequencing at Genewiz (Genewiz.com). Haplotype analysis Genotype data for all dogs was phased against a proprietary reference panel, with missing data imputed using Eagle2 [40]. We examined phased data around the putative duplication breakpoints in 26 blue-eyed dogs in the GWAS data set that were homozygous for the CFA18 A allele. This revealed a single haplotype bearing this allele that spanned 81 markers between positions 44,336,453 and 45,170,144 that was present in all 26 dogs in at least one copy (S4 Fig). We further defined a 43 marker core haplotype between positions 44,737,897 and 45,170,144 that was present in two copies in 24 of the 26 dogs, and in one copy in the remaining two dogs. We compared the change in intensity between duplication markers and flanking regions (Δ log R) according to both the eye color phenotype and haplotype state of each dog. For all remaining dogs in the discovery panel and all dogs in the validation panel, we calculated the number of copies of the core haplotype present in their phased data (excluding the markers inside the duplication: 44,825,760, 44,838,433, 44,849,276, 44,855,038, 44,858,831, 44,876,627). Validation samples The validation panel included 2,890 dogs of various breeds, none of whom carried the most associated CFA10 marker for merle. Eye color could be scored for these individuals when customers had uploaded high resolution profile photos. Photos were available for 70% of dogs bearing both the associated haplotype and a log R signature of the duplication on that haplotype. Ethics statement Owners of participating dogs were part of the Embark Veterinary, Inc. customer base. Owners provided informed consent to use their dogs’ data in scientific research. Owners provided photographs of their dogs and filled out online survey questions concerning their dog’s eye color; no invasive methods for genotype or phenotype collection were used, nor were dogs ever handled by researchers. Owners were given the opportunity to opt out of the study at any time during data collection. The discovery cohort was selected from data available before August 2017; the fine mapping cohort was selected from data available before Dec 2017. All published data have been deidentified of all Personal Information as detailed in Embark’s privacy policy (embarkvet.com/privacy-policy/).

Acknowledgments We thank customers of Embark who participated in online surveys on behalf of their dogs, as well as all the Embark employees who made this work possible, particularly Ryan Boyko, Matt Barton, Adam Gardner, Tiffany Ho and David Riccardi. We also thank Jackie Bubnell, the Embark scientific advisory board, Cornell University, and the Kevin M. McGovern Family Center for Venture Development in the Life Sciences, for their guidance and encouragement.