UK Biobank

Participants in UK Biobank responded to the question “what is your natural hair colour” with one of six possible answers. We used only self-reported, white British individuals, confirmed by genotype16. In addition, of the individuals who were third-degree relatives (first cousins) or closer, identified by genotyping16, only one of any related group was analysed. This left 343,234 participants with hair colours shown in Table 1. The white non-British individuals and the relatives removed from the primary analysis were subsequently analysed to validate the genetic risk scores that we derive.

Table 1 Number and percentages of hair colours in the UK Biobank cohort, by gender Full size table

Generally the frequencies of different hair colours are comparable to other populations with considerable Northern European ancestry, such as the QIMR cohort from Australia15, but as expected the prevalence of red hair is higher and black hair is lower than in Southern European cohorts. Within UK Biobank a higher proportion of females report red or blonde hair than males and a much lower proportion of females report black hair. Whilst there may be some self-reporting bias, we and others have previously shown using colorimetry that females on average have hair that is both more red and lighter3,17.

Genotypes for more than 800,000 SNPs and indels were directly assayed by UK Biobank using custom Affymetrix arrays, and an additional ~40 M variants imputed using the Haplotype Reference Consortium panel16.

Red hair colour and MC1R

We performed a GWAS comparing individuals with red hair to a combined group of black- and brown-haired individuals. Accounting for genetic structure within the UK Biobank by inclusion of the first 15 genetic principal components adequately controlled the genomic inflation in our analysis (λ GC 1.024). The strongest association with red hair is located around the MC1R gene on human chromosome 16 (Fig. 1a, Supplementary Table 1, Supplementary Figure 1a), which fits with the expectation that this locus is the principal genetic factor determining red hair colour. We find that the strongest signal of association in the region of MC1R (rs34357723; OR 9.59, p < 2.25×10−308) does not originate from any observed amino acid changes, but is an SNP located some 97 kb from the 5′ end of MC1R and remains significant even after adjusting for all coding variants in MC1R. As we know that multiple MC1R alleles affect red hair colour, we performed stepwise conditional association testing and identified 31 additional association signals in this region at genome-wide levels of statistical significance (p ≤ 5×10−8), altering the odds of having red hair compared to brown and black hair (Supplementary Table 1). Only ten association signals can be directly attributed to amino acid changes, nonsense or frameshift mutations within the MC1R coding region. Included in these are two missense variants rs368507952 (R306H) and rs200000734 (R213W) not previously associated with red hair colour.

Fig. 1 Manhattan plots of GWAS data. Data plotted for a red hair vs. black plus brown hair, b blonde hair vs. black plus brown hair, c brown hair vs. black hair. Points are truncated at –log10(p) = 50 for clarity Full size image

In addition to these 10 coding variants, we find 21 associations beyond the MC1R coding region at distances up a megabase both 3′ and 5′. These distant associations have been observed in other studies2,12. Although these variants potentially could affect long-range regulatory elements of MC1R, it is likely that they are synthetic associations caused by low linkage disequilibrium (LD) between the associated SNPs and multiple coding variants.

We asked how many cases of red hair can be accounted for by MC1R coding variation. We included rs3212379, located only 120 bp 5′ of the transcription start site of MC1R, as a candidate transcriptional regulatory variant. Including this variant, the two newly associated missense variants described above and 13 previously described coding variants we find that the proportion of red-haired individuals with two MC1R alleles is 92%, whilst only 6.3% carry a single allele. The cases of red hair with only one or no variants (similar to that seen in a study of an Australian cohort6) may be explained by, for instance, (a) rare coding variant alleles not genotyped or imputed in this study, (b) additional extragenic variation affecting MC1R expression, (c) dominant action of specific alleles, (d) variation in other genes in the same or a parallel pathway or (e) misreporting of hair colour.

It is well established that different MC1R coding variants have different penetrance with respect to red hair (termed “R” and “r” for high and low penetrance)6. With this very large cohort we are able to more precisely quantify the degree of penetrance of each allele, whether as homozygotes or in combination with any other allele. (Fig. 2, Supplementary Table 2). Similar to others we find that penetrance of missense variants ranges from less than 1% as homozygotes (V60L, V92M) to over 90% (D294H). Given the large odds ratios (OR) we consider the three newly identified variants to be high penetrance alleles. We also calculated the minor allele frequency and the OR for red hair for all analysed variants (Table 2). When analysed without conditioning, the three “r” variants have ORs less than 1, as previously described18. However, when conditioning on multiple high penetrance variants, the OR for V60L is >3 indicating its effect occurs primarily in combination with other alleles.

Fig. 2 Penetrance matrix of MC1R coding variants. Combinations of all coding variants plus the non-coding variant rs3212379, located close to the 5′ end of MC1R. Depth of shading in green indicates the % of the genotype with red hair, also indicated by the rounded numbers. Cells filled in black have no data. Full data are in Supplementary Table 2 Full size image

Table 2 MC1R variants Full size table

Additional red hair colour-associated loci

In addition to the associations around MC1R on chromosome 16, we observe 8 additional associations at genome-wide significance (Supplementary Table 1). Statistical fine-mapping of causal SNPs (PICS)19 in some cases indicated a single likely casual variant, whilst in others one of more than 50 variants could potentially be the causal SNP. We find a previously unreported association at rs276645354, at which the minor allele reduces the probability of red hair. This variant lies less than 2 kb from transcriptional start site (TSS) of POMC, which encodes α-MSH, the agonist of MC1R. Increased expression of POMC is likely to promote melanogenic signalling and thus dampen the effect of those MC1R variants which have some, albeit reduced, signalling activity. A single variant in an intron of RALY, located 5′ of ASIP, the gene encoding the inverse agonist of MC1R, was associated with red hair. This variant, rs6059655, is also an expression QTL (eQTL) for ASIP expression in skin, with the red-hair-associated allele showing higher mean expression levels20 (www.gtexportal.org) (Fig. 3a). We suggest that variants that increase ASIP expression in the skin or hair follicles lead to greater competition with α-MSH for melanocyte MC1R binding, antagonising melanogenic induction and increasing the pheomelanin in melanocytes.

Fig. 3 Gene expression variation at ASIP and epistatic interactions with MC1R variants. a Gene expression data from GTex of ASIP in sun-exposed skin, ordered by genotype at rs6059655 and normalised to the homozygous no-risk genotype (GG). Boxplot indicates the median expression of ASIP, and the error bars indicate 95% of the data in 320 individuals, 277 with no-risk allele, 42 with middle risk allele and 2 with the high risk allele. b−d Epistatic interactions between MC1R coding variants and other red-hair-associated loci. b The high penetrant allele D84E shows no trans interactions, c the low penetrant allele V92M shows interactions at ASIP, d the low penetrant allele V60L shows interactions with ASIP, HERC2/OCA2 and PKHD1 Full size image

We find a variant in HERC2 associated with a decreased probability of red hair. It is well established that variants in HERC2 alter transcription of the neighbouring pigmentation gene OCA2 which is additionally associated with blue eyes and blonde hair colour12,21,22,23,24. Recessive mutations in OCA2 result in albinism. It is possible that varied expression of OCA2 modifies the effect of reduced signalling though variant MC1R, such that red hair colour is not expressed.

An association is also seen in the TSPAN10 gene, also known as oculospanin, which is highly expressed in melanocytes and retinal pigment epithelium. The lead SNP lies in strong LD (r2 = 0.995) with a non-synonymous variant (rs6420484; Y177C) affecting a conserved amino acid. This association signal is also in moderate LD (r2 ~ 0.4, D′ ≥ 0.95) with a previously reported association with increased hue-saturation of eye colour, which corresponds to darker eyes, in a Dutch cohort25. Previous targeted knockdown of the murine Tspan10 mRNA resulted in reduced melanocyte migration in a trans-well migration assay26, indicating this gene may be a good functional candidate as a novel hair colour gene.

Epistasis between alleles at MC1R and other loci

Detecting epistasis in complex traits is challenging. Epistatic effects are believed to be much smaller than main effects, which are typically already very small in the case of polygenic traits. However, due to the large effects of some genetic variants on hair colour this might be a more tractable model to detect epistasis. We tested all associated genetic variants from our analysis of red hair colour against each of the MC1R coding variants, including the known “R” and “r” MC1R red hair colour alleles, by constructing a logistic regression model whilst correcting for relevant covariates (see Methods). At a P value of 3.9×10−7 (i.e. 0.05/128,205, the number of statistical tests performed), we found consistent epistasis signals between MC1R variation and a ±1.5 MB region surrounding rs6059655, which is the hair colour-associated ASIP eQTL SNP (Fig. 3b, d, Supplementary Figure 2, Supplementary Table 3). We also detect epistasis between both rs1805005 (V60L) and rs1805008 (R160W) and the HERC2/OCA2 region. It has been noted previously that OCA2 variation affects the penetrance of the weaker red hair alleles of MC1R6. Finally, we also find evidence of epistasis between V60L and PKHD1 on chromosome 6. The magnitude of the UK Biobank cohort has allowed the identification of hitherto unknown epistatic interactions contributing to red hair.

Genome-wide association analysis of blonde hair colour

Whilst red hair is essentially a Mendelian trait modified by additional loci, the genetic architecture of blonde hair colour is concordant with a polygenic trait. We performed a genome-wide association analysis comparing blonde hair to combined brown and black hair-coloured individuals. Following conditional association testing to uncover additional signals of association, we discover 213 lead variants associated with blonde hair colour (Fig. 1b, Supplementary Table 4, Supplementary Figure 1b). In many cases multiple signals of association are found close to the same genes. This could be a result of multiple, independent associations (as is the case for MC1R, for example). Alternatively some or all signals may each be correlated with the same variant that has been neither genotyped nor imputed. Many signals of association are close to, or within, previously known pigmentation genes from both human and model organism studies These allelic effects span a spectrum of OR and minor allele frequencies consistent with many other phenotypes with an underlying polygenic architecture27,28 (Fig. 4a). Using probabilistic association of causal SNPs (PICS)19 on these independent variants we are able to find 64 which were likely to be to a single candidate causal SNP, amongst which are 23 coding variants (7of which lie within MC1R). Several variants notably stand out, which have been previously associated with a variety of pigmentation related traits in humans (including SLC24A4, HERC2/OCA2, SLC45A2, TYR, TYRP1, EDNRB), some of which have been specifically linked to alterations in transcriptional regulation (IRF4 and KITLG)29,30.

Fig. 4 Odds ratio and minor allele frequency for blonde hair and polygenic phenotype scores. a Plot of minor allele frequency of blonde hair-associated variants vs. log of the odds ratio for blonde hair. Variants are colour-coded for annotation; intergenic (yellow), intronic (purple), 2 kb upstream or 500 bp downstream (cyan), non-synonymous coding (green). Error bars indicate 95% confidence intervals in OR, according to the logistic model calculated with blonde vs. brown and black hair colour (blonde = 39,397, non-blonde = 283,920). b Genetic scores derived from all lead variants from blonde vs. brown plus black hair colour, assuming an additive genetic model. The line in the boxplot indicates the median value and the error bars the 95% of the data (blonde: 39,397 individuals, light brown: 141,414 individuals, dark brown: 127,980 individuals and black: 14,526 individuals). Colours yellow, light brown, dark brown and black match the hair colour analysed Full size image

Genes associated with red hair colour, in particular MC1R, are also identified in our blonde hair analysis. Although 93% of individuals with red hair carry two MC1R variants, these make up only 15% of people who carry two MC1R variants. The majority of people with two variants have blonde (15%) or light brown hair (41%). The proportion of individuals with blonde hair decreases with one or no variants whilst the proportion with dark brown and black increases (Supplementary Table 5). We show the incidence of different hair colours on each combination of MC1R variants in Supplementary Figures 3−6.

Given the observed differences in red and blonde hair frequency between males and females, we performed association analyses separately for each gender. Directly comparing the ORs of all significantly associated variants for males and females we find a strong correlation between the sexes (Supplementary Figures 7−8).

We compared our results for both red and blonde hair with those of Hysi et al.15. The variants we identify correspond to 163 distinct genes, of which 93 are also reported by Hysi et al. Conversely, they report 137 significantly associated lead variants, 23 of which we did not analyse because they did not pass our quality control. Of the remaining 114, only 73 show a significant association in our study. However, to better compare the results we looked for significant associations within fixed genomic distances from the 137 of Hysi et al. Within 10 kb of their associated variants we find 93 associations, and within 100 kb we find 100 (Supplementary Table 6). Hence 43% of the genes we identify are novel, and we find 73% of those found by Hysi et al.

Genome-wide association analysis of brown hair colour

Based on the large number of associations with blonde hair, we hypothesised that hair colour may lie on a continuous genetic spectrum from black to blonde through brown hair. Thus, we might expect to observe a subset of the blonde-associated variants associated with brown hair. Following both primary and conditional analyses we find 56 lead variants associated with brown vs. black hair (Fig. 1c, Supplementary Figure 1c, Supplementary Table 7), 28 of which are the same associated variant with blond hair and with the same direction of effect. Of the remaining lead variants 23 identify the same genes seen in our blonde hair analysis, and further 3 are associated with red hair. Of the two novel genes, KRT31 lies within a large locus encoding multiple keratin genes, in which we also observe associations with blonde hair. Only PIGU does not have a significant association with other hair colours, although we observe an association with another member of the same gene family, PIGV, suggesting that paralogous genes may be associated with hair colour differences. Additional GWAS of light brown and dark brown hair colours as expected identify more associations when light brown is compared to dark brown/ black than when brown is taken as a single category. Fewer associations are seen with dark brown alone vs. black (Supplementary Figures 9 and 10).

Polygenic phenotype scoring

To test the hypothesis that the genetic basis of hair colour is polygenic and that hair colour falls on a continuum as a genetic trait, we constructed a polygenic score for hair colour. Specifically we constructed a blonde hair colour polygenic phenotype score by taking the variants that reached genome-wide significance in the blonde vs. brown and black hair colour conditional analysis (5×10−8), as a linear combination of the allele-weighted regularised logistic regression coefficients. We found that self-reported black, dark brown, light brown and blond hair lie on an approximately linear spectrum (Fig. 4b). We confirmed the same pattern across hair colours in two groups of individuals excluded from all previous analyses; related individuals (Supplementary Figure 11a) and individuals with European, but non-British ancestry (Supplementary Figure 11b).

Additionally we calculated the SNP heritability of the different hair colours in the Biobank cohort (Supplementary Table 8). We estimate the SNP heritability of red hair to be 0.403, blonde as 0.301 and brown as 0.234. Removing the MC1R-associated variants on chromosome 16 results in a residual model for red hair with SNP heritability of just 0.018; MC1R therefore explains 73% of the observed red hair heritability. Removing all variants associated with red hair, we find a heritability estimate of 0.041 which indicates that the identified loci explain ~90% of the heritability of red hair. Performing the equivalent analysis for blonde hair shows that the identified loci account for 73% of the SNP heritability, and for brown hair the identified loci account for 47% of the SNP heritability.

eQTL

In order to aid the interpretation of our GWAS and identify functional hypotheses, we tested associated variants for statistical colocalisation with eQTL signals from skin biopsies in the GTex20 (Supplementary Tables 9−14) and TwinsUK cohorts31, (Supplementary Tables 15−17). We were able to link 37 variants with cis eQTLs with high probability (posterior probability > 0.8). Among the variants with the highest probability are at RALY, upstream of ASIP as noted above and in the first intron of TSPAN10 for red hair. Most of the eQTLs are associated with gene expression at considerable distance and often with several genes. Among the most significant eQTLs, across all three datasets, are several missense variants in MC1R, which are independently linked to expression changes in multiple genes located up to several hundred kb from MC1R. These may well be synthetic associations reflecting weak LD and the unusual behaviour of this segment of the genome. Whist the colocalisation of hair colour association signals with skin tissue cis-eQTLs may appear promising, they are at best a strong indication of biological effect and will require extensive further hypothesis testing to establish any role in determining pigmentation.

Hair colour loci are enriched for regulatory features

To understand the transcriptional regulatory mechanisms that might underpin the observed genetic associations with hair colour, we examined the potential for these variants to affect the chromatin landscape in cell types relevant to pigmentation. Specifically, we tested histone tail modifications associated with gene activation or repression and with chromatin accessibility (DNase I hypersensitive sites) in melanocytes, keratinocytes, fibroblasts and other cells. Additionally, the proximity of several association signals to core promoter regions raises the possibility of alterations to TSS and pigmentation cell-specific regulatory factors, i.e. the melanogenesis master regulator MITF.

Using a permutation-based approach (GoShifter)32, we tested each annotation in each cell type where data were available (Table 3). We find statistical evidence of enrichment of pigmentation-associated genetic variation overlapping histone marks of both gene activation (H3K4me3) in melanocytes and repression (H3K9me3, H3K27me3) in melanocytes, fibroblasts and keratinocytes. In addition there is enrichment of MITF binding sites in melanocytes and TSS in iris pigmentation cells. These associations give strong support to the notion that we are able to identify functional elements altered by genetic variation.

Table 3 Chromatin enrichment Full size table

Enrichment for skin and hair genes

To further aid the interpretation of our GWAS findings, and identify shared biological pathways related to pigmentation determination, we took all of the blonde hair lead variants overlapping genic regions extending 2 kb upstream of the TSS and 500 bp downstream of the 3′ end. If no genic region overlapped the lead SNP, then we used the two closest genes within 500 kb (Supplementary Table 18). These candidate genes were then used as input to test for enrichment in known pigmentation phenotypes, utilising the MouseMine database33. We identified ~200 orthologous mouse genes in the database, which we analysed for site of expression and mutational phenotypes. Of the 172 genes with expression data, 89 were expressed in the skin (P = 1.3×10−9) (Supplementary Table 19). One hundred and thirty-two genes had mouse mutant phenotype data and of these, 50 had an integument phenotype (affecting the skin and skin appendages) (P = 5.2×10−7). Not surprisingly, 18 of these affected pigmentation, but we unexpectedly found that 70% affect primarily skin, hair or other skin appendages rather than pigmentation (Table 4).

Table 4 Phenotype of mouse mutants at candidate genes Full size table

Follicular melanocytes, keratinocytes and dermal papilla cells have mutual interactions; the dermal papilla signals to melanocytes with ASIP, the melanocytes transfer melanin granules into the keratinocytes. Perturbations of these interactions could affect the amount and type of melanin delivered to the hair. Furthermore, variation in growth rate could impact the effectiveness of melanin transfer. Recent GWAS have identified 14 loci associated with hair shape variation34. Remarkably, we have identified seven of these, ERRFI1, FRAS1, HOXC13, PADI3, KRTAP, PEX14 and LGR4 as affecting blonde/non-blonde hair colour (P = 1×10−11, Fisher’s exact test). In addition, the refractive and reflective properties of individual hairs may affect perceived colour35 and there is evidence that different coloured hairs have different morphology. Vaughn et al. have demonstrated a strong inverse correlation between the lightness of hair colour and the diameter of the shaft; blonde hair is thinner than dark36.

In summary, the very large dataset provided by UK Biobank has enabled us to dissect the complex genetic nature of hair colour. This forms the foundation for functional analysis linking genetic variation to phenotype and exploring the cellular interactions between melanocytes and other cells in the hair follicle.