Study subjects

We analyzed data for 6357 individuals from the CANDELA sample, recruited in Brazil, Chile, Colombia, Mexico and Peru (Supplementary Table 1, http://www.ucl.ac.uk/silva/candela16). All volunteers provided written informed consent. Ethics approval was obtained from: Universidad Nacional Autónoma de México (México), Universidad de Antioquia (Colombia), Universidad Perúana Cayetano Heredia (Perú), Universidad de Tarapacá (Chile), Universidade Federal do Rio Grande do Sul (Brazil) and University College London (UK).

Phenotype data

A physical examination of each volunteer was carried out using the same protocol and instruments at all recruitment sites. Eye color was recorded in five categories (1-blue/gray, 2-honey, 3-green, 4-light brown, 5-dark brown/black). Hair color was recorded in four categories (1-red/reddish, 2-blond, 3-dark blond/light brown or 4-brown/black), as described in ref. 18. Individuals with red hair were excluded prior to the analyses, as it is a rare in our sample (frequency of 0.6%) and this phenotype is known to stem from rare variants in MC1R. A quantitative measure of constitutive skin pigmentation (the MI) was obtained using the DermaSpectrometer DSMEII reflectometer (Cortex Technology, Hadsund, Denmark). The MI was recorded from both inner arms and the mean of the two readings used in the analyses. Measurements across the two arms were compared for each individual to assess variability of the MI measurement. The absolute difference between the two measurements was taken as the variability for an individual, and the median variability across all individuals was 1.03 units (Supplementary Figure 17). For comparison, the range of variation of MI in the CANDELA dataset is 20 to 65 units (in the QC-d set of individuals used for GWAS analyses). For visually inspecting the skin color distribution corresponding to variation in MI (Fig. 1a), MI values were converted to approximate RGB (red, green, blue) values (Supplementary Figure 18).

In addition to a direct assessment of eye color into four categories, we obtained quantitative variables related to eye color from digital photographs of the study subjects (taken following a standardized protocol as described in ref. 18). One of the two eyes was selected based on image quality. Photographs were landmarked manually via a graphical interface designed in MATLAB (Supplementary Figure 1). Ten landmarks were used to delimit and extract the visible part of the iris. Additional landmarks were placed to select the whitest part of the sclera. This white reference and the darkest part of the pupil were used to normalize the image, adjusting for variable color casts or illumination levels across images. An adaptive threshold was then used to remove highlights such as reflections on the iris. The resulting images were individually checked for the presence of errors during the digitization steps leading to their exclusion. In total, 5513 iris images were retained for extracting RGB pixel color values.

A set of 195 photos were landmarked independently by two raters to assess inter-rater variability in extracted iris color. The median absolute difference between the RGB color values of the two raters across the whole set was 3.3 units (on a scale of 0–255).

The multivariate median of the RGB values across all pixels was calculated in order to obtain average RGB values for an iris (Fig. 2d, Supplementary Figure 2). Such RGB values, or their principal components (Supplementary Figure 3C), have been used in certain genetic association studies49. However, although the RGB color space is convenient for digital imaging, it is not necessarily the most appropriate in terms of human perception or biological relevance. Several other color spaces have therefore been considered in genetic studies of pigmentation. In particular, the HCL and CIE Lab color spaces have the advantage over RGB of being perception based23,50. Furthermore, it has been shown that melanosome density and the skin MI are strongly correlated brightness (L)51. The main difference between the HCL and CIE Lab color spaces is that HCL, being directly derived from RGB, represents the three primary colors (red, green, blue) in opposing corners, while the CIE Lab represents four colors in different corners (red against green and blue against yellow). Since the HCL values in the CANDELA dataset occupy mainly the opposing red-orange and cyan-blue color hues (Fig. 2d), for this study we considered the HCL color space more informative than the nearly equivalent CIE Lab color space.

H is a circular variable representing color hue (tone) ranging from 0° to 360°, with red at 0°, green at 120° and blue at 240°. C (chroma or saturation) ranges from 0 (no color) to 1 (fully saturated color). L (lightness or brightness) ranges from 0 (black) to 1 (white). It was observed from the bicone color model (Fig. 2d) that the set HCL values lie approximately on a two-dimensional plane passing through the vertical central axis at an angle of ~20° (obtained from the circular median of H). H values were therefore standardized by subtracting 20°. Furthermore, since H is a circular variable, it was converted to cos(H) prior to its use for the analyses performed here. Cos(H) ranged from −1 (blue/gray eyes) to +1 (olive/brown/dark brown eyes). As the distribution of HCL values was nearly planar, sin(H) showed comparatively little variation (equivalent to taking a projection onto the plane) and was ignored.

Genotype data

DNA samples from participants were genotyped on the Illumina HumanOmniExpress chip, which includes 730,525 SNPs. PLINK v1.922 was used to exclude SNPs and individuals with more than 5% missing data, markers with minor allele frequency <1%, related individuals with Identity-By Descent estimate (IBD) >0.1 (i.e., removing third-degree relatives (who have IBD 0.125) and higher) and those who failed the X-chromosome sex concordance check (sex estimated from X-chromosome heterozygosity not matching recorded sex information). After applying these filters, 669,462 SNPs and 6357 individuals were retained for further analysis. Due to the Native American, European and African admixture of the study sample (Supplementary Figure 5), there is inflation in Hardy–Weinberg P values. We therefore did not exclude markers based on Hardy–Weinberg deviation, but performed stringent quality controls at software and biological levels (see also Supplementary Figure 14 from Adhikari et al.52). The SNP quality metrics generated from the GenCall algorithm in GenomeStudio were used for quality control. SNPs with low GenTrain score (<0.7), low Cluster Separation score (<0.3) or high heterozygosity values ((het. excess) > 0.5) were excluded53. The heterozygosity excess filter performs a function similar to a Hardy–Weinberg equilibrium check, but is more direct since it is based on the heterozygosity value, which unlike the P value does not depend on sample size. Only SNPs that satisfy these criteria across all genotyping plates were retained53. The imputation ‘concordance’ score, which is a measure of poor genotyping quality, was also used to exclude some genotyped SNPs (see below). Finally, subsequent to the GWAS analyses (see below), the genotyping cluster plots for the index SNP identified were checked manually to verify genotyping quality.

Genotype imputation

The chip genotype data were phased using SHAPEIT254. IMPUTE255 was then used to impute genotypes at untyped SNPs using variant positions from the 1000 Genomes Phase 3 data. The 1000 Genomes reference dataset included haplotype information for 1092 individuals across the world for 36,820,992 variant positions. Positions that are monomorphic in 1000 Genomes Latin American samples (Colombia, Mexico and Puerto Rico) were excluded, leading to 11,025,002 SNPs being imputed in our dataset. Of these, 48,695 had imputation quality scores <0.4 and were excluded. Median ‘info’ score (imputation certainty score) provided by IMPUTE2 for the remaining imputed SNPs was 0.986. The IMPUTE2 genotype probabilities at each locus were converted into most probable genotypes using PLINK v1.922 (at the default setting of <0.1 uncertainty). Imputed SNPs with >5% uncalled genotypes or minor allele frequency <1% were excluded. IMPUTE2 provides a ‘concordance’ metric for chip genotyped SNPs, obtained by masking the SNP genotypes and imputing it using nearby chip SNPs. Genotyped SNPs having a low concordance value (<0.7) or a large gap between info and concordance values (info_type0 – concord_type0 >0.1), two suggested indicators of poor genotyping quality, were also removed. Median concordance values of the remaining chip SNPs was 0.994. After quality control (QC), the final imputed dataset contained genotypes for 9,143,600 SNPs.

Statistical genetic analyses

Narrow-sense heritability (defined as the additive phenotypic variance explained by a genetic relatedness matrix (GRM) computed from the SNP data) was estimated using the software GCTA56 by fitting an additive linear model with a random effect term whose variance is given by the GRM (with age and sex as covariates). The GRM was obtained using the LDAK software19, which accounts for LD between SNPs. An LD-pruned set of 160,858 autosomal SNPs was used to estimate continental ancestry using the ADMIXTURE program57 (Supplementary Figure 5). The correlation between traits and covariates was examined calculating Pearson's correlation coefficients (using R).

PLINK 1.922 was used to perform the primary association tests on the best-guess imputed genotypes (genotypes with the highest probability, i.e., the most probable genotypes) for each pigmentation phenotype using multiple linear regression. We used an additive genetic model incorporating age, sex and 6 genetic PCs as covariates. PCs were obtained from an LD-pruned dataset of 160,858 SNPs. Individual outliers (including individuals with >20% African or >5% East Asian ancestry, as estimated by ADMIXTURE) were removed and PCs recalculated after the removal of these individuals. The number of PCs to be included in the regression was determined by inspecting the proportion of variance explained and by checking scree and PC scatter plots (Supplementary Figure 6A).

Pigmentation is one of the best-characterized complex human traits (albeit mainly in Europeans), with many variants robustly replicated across tens of association studies. We sought to leverage this prior knowledge in order to empower our GWAS. Statistical theory indicates that incorporating known covariates in a linear regression model increases power to detect association58, and simulation studies show that this applies to GWAS of population samples59. The situation in case–control studies of disease is more complex because in that setting association testing is affected by disease prevalence and effect sizes59,60, so that disease GWASs have only occasionally conditioned on established loci61. However, conditioning on known large-effect SNPs in an unselected population sample (like the CANDELA cohort) for common pigmentation variation is an ideal setting in which to exploit the added power provided by conditional analyses. We thus examined which established pigmentation SNPs had strong effects in our sample and used them to perform a conditioned GWAS. Searching online GWAS catalogs and published studies, we identified 161 SNPs that have been reported in previous association studies of pigmentation traits (Supplementary Table 12). Of these SNPs, 139 SNPs were present in the CANDELA imputed dataset (the rest being lost during QC). We obtained P values and proportions of trait variance explained for each these 139 SNPs. We then selected SNPs that were both genome-wide significant (P value < 5 × 10–8) and that explained a relatively large proportion of trait variance (proportion of R2 > 0.5%, Supplementary Table 13) to define a list of established pigmentation SNPs with strong effects in the CANDELA sample. If several of these SNPs were located in the same gene region (usually a region with strong LD), and in order to avoid collinearity, we retained only the most significant SNP. The following six SNPs met these criteria and were used to perform a conditioned GWAS: rs16891982 (SLC45A2), rs12203592 (IRF4), rs10809826 (TYRP1), rs1800404 (OCA2), rs12913832 (HERC2) and rs1426654 (SLC24A5).

The polygenicity of the pigmentation traits examined in the CANDELA sample was evaluated using the tail strength (TS) statistic25, which measures the overall strength of univariate (single-SNP) associations in a genome-wide test dataset. This statistic is related to other multiple-testing methods calculated on a set of P values, like the false discovery rate and the area under the curve. In a GWAS with n SNPs, if the ordered P values are p (1) ≤ p (2) ≤ … ≤ p (n) , the statistic is

$${\mathrm {TS}}(p_1, \ldots ,p_n) = \frac{1}{n}{\kern 1pt} \mathop {\sum}\limits_{k = 1}^n {{\kern 1pt} \left( {1 - p_k\frac{{n + 1}}{k}} \right)} .$$ (1)

Under the null hypothesis of no association between the trait and all SNPs, TS should equal 0. A positive value of TS indicates the overall extent of association in the entire dataset and is interpreted as polygenicity, with higher values of TS indicating greater polygenicity. The asymptotic variance of TS can be approximated by is 1/n* where n* is the effective number of independent SNPs. As LD pruning on our dataset yielded 160,858 SNPs (see Methods), the SD can be estimated as 1/√160,858 = 0.0025, and a confidence interval would be TS ± 3 × SD = TS ± 0.0075. The estimates of TS statistics obtained in the GWAS analyses performed in CANDELA data are shown in Supplementary Table 4A (and compared with the standard genomic inflation factor, λ). For three previously published GWAS studies on the same CANDELA cohort and using the same genetic PCs, lambda and TS statistic values are very close to zero for some traits that show few or no associations (Supplementary Table 4B), indicating that there is no inherent substructure remaining in the dataset after controlling with the genetic PCs. Results from other published GWAS studies show that lambda and TS values vary considerably within the same study, having highest values for pigmentation traits, height and body mass index, which have the largest number of associated SNPs (Supplementary Table 4C).

To evaluate association with all pigmentation traits simultaneously (excluding categorical eye color), we performed a Wald test62. In this approach, a SNP genotype is taken as the dependent variable and all phenotypes are jointly taken as covariates. Due to this increased complexity the runtime per SNP is considerably longer, so an LD-pruned dataset of 181,139 SNPs was used for this analysis (ensuring that all genome-wide and suggestive SNPs from the primary analysis are included) (Supplementary Table 6). A meta-analysis was carried out for the novel index SNPs identified in the primary analyses (Table 1) by testing for association separately in each country sample16. Forest plots were produced with MATLAB 3.2.5 combining all regression coefficients and standard errors. Histograms of the traits within each country were compared to the Forest plots to examine how trait variability across countries relates to the association signals.

Review of functional annotation and gene expression data

Functional annotation in the genomic regions showing association was reviewed using HaploReg v4.163, National Center for Biotechnology Information (NCBI), University of California Santa Cruz (UCSC) and Ensemble databases. Evolutionary constraint in these regions was assessed with the GERP64 and SiPhy65 scores. To evaluate the potential impact of amino-acid substitutions on protein structure and function, we examined the SIFT66 and PolyPhen267 scores. We also queried transcription levels for candidate genes in newly associated regions across all 53 human tissues included in the GTEx database68.

Selection analyses

We computed three selection statistics: the PBS69, iHS70 and Tajima’s D71. Since we were mainly interested in the convergent evolution of pigmentation in West and East Eurasia, we restricted this analysis to CEU and CHB data from the 1000 Genomes Project. PBS scores for CEU were computed using CHB and YRI as reference and for CHB using CEU and YRI as reference. Pairwise F ST were estimated using Reynolds equation72 using only SNPs that were polymorphic in at least two populations. The total number of SNPs with PBS scores in CHB and CEU was ~8,000,000. We calculated iHS using the software selscan73. Ancestral allele states were retrieved from information present in the 1000 Genomes data VCF files (AA (ancestral allele) field) and SNPs with no ancestral allele state were discarded. Unstandardized iHS scores were only estimated for SNPs when: (1) derived allele frequencies >5% and <95%; and (2) the Extended Haplotype Homozygosity (EHH) does not decay below 0.05 after an interval of 1 Mb. The standardized iHS scores were then computed by binning the SNPs by allele frequencies and subtracting the mean and dividing by the standard deviation to obtain a final standardized statistic with a mean of 0 and variance of 1. The HapMap GRCh37 genetic map was used to obtain genetic distances between SNPs. The final total number of SNPs in CEU and CHB was ~3,000,000. We calculated Tajima’s D using VCFtools74 on non-overlapping windows of 10 kb and discarded windows that contained less than 5 SNPs. The final total number of windows for CEU and CHB was ~266,000. We computed empirical P values using an outlier approach by ranking all the genome-wide scores and dividing by the number of values in the distribution, taking the upper tail for PBS and iHS and the lower tail for Tajima’s D selection scores. Throughout the text we considered SNPs with significant selection scores as those with empirical P values lower than 0.01.

To evaluate an enrichment of selection signals at genomic regions associated to pigmentation traits we first estimated haplotype blocks in the CANDELA sample using the definition of haplotype blocks implemented in PLINK 1.922. When constructing haplotype blocks, only pair of SNPs within 500 Kb of each other were considered. For each haplotype block we then estimated the maximum PBS and iHS scores computed in the CEU and CHB populations, and retained only haplotype blocks with at least 5 SNPs. We then contrasted the distribution of maximum PBS and iHS scores at the haplotype blocks containing associated SNPs (i.e., those including SNPs with P values < 10−5) with the distribution of maximum PBS and iHS scores at haplotype blocks in the rest of the genome. We tested the significance of the difference between distributions using a one-sided Mann–Whitney U-test. We did not use Tajima’s D selection scores to perform this enrichment analysis, as this selection statistic is computed in sliding windows (see above) and the windows would sometimes overlap two consecutive haplotype blocks.

To evaluate the possible correlation of allele frequencies at pigmentation genes with solar radiation levels we examined publicly available data for 64 native population samples without evidence of recent admixture (Supplementary Table 10). All samples included a minimum of 10 individuals. Surface solar radiation data were obtained from the NASA Surface meteorology and Solar Energy (SSE) Web site (https://eosweb.larc.nasa.gov/sse/) in kWh/m2/day units. These data included annual solar radiation averages from July 1983 to June 2005 on a 1-degree resolution grid over the globe. Annual solar radiation values were obtained for each population based on published coordinates for sampling locations. In case of unpublished sampling location, we obtained this information directly from the authors or used approximate coordinates such as the middle of the town/city of the sampling location. We used Bayenv2.075 to estimate Bayes Factors (BFs) relating solar radiation to allele frequencies at index SNP. These BFs provide a measure of the increase in the fit of allele frequencies to a linear regression model including solar radiation levels over a null model including only population structure as predictor. The null model was constructed using a covariance matrix of allele frequencies between populations estimated from 10,000 random SNPs (not in LD) after 100,000 Markov chain Monte Carlo iterations. In addition to BFs we estimated Spearman’s rank correlation coefficient (ρ). We ranked the SNPs based on their BFs, and absolute ρ, to obtain empirical P values. The allele frequency at a SNP was only considered to be significantly associated to solar radiation if both BF and ρ estimates showed significance as recommended in Bayenv2.0. As the effect of pigmentation genes could differ between geographic regions, we also conducted separate analyses for Africans, Western Eurasians (including North Africans) and Eastern Eurasians (Supplementary Table 10 lists the populations included in each region).

To estimate the selection coefficient and the time since the start of selection at SNP (rs2240751), we used an ABC approach. We used msms76 to perform coalescent simulations modeling the demographic history of African, European and East Asian populations (for details of the parameters of the demographic model used, see ref. 77 and Supplementary Note 1). We assumed that the minor allele frequency at the time of selection was 1% in Europeans and East Asians and zero in Africans (comparable to the frequency in CEU, CHB and YRI from the 1000 Genomes Project). We performed 1,000,000 simulations of a 500 kb genome segment with a selected allele in the center, and originating in East Asians. We assumed a uniform distribution U (0–0.05) for the selection coefficient and a uniform distribution U (5000–42,229 years ago (ya)) for the starting time of selection. From the simulations we computed 9 summary statistics in a window of 200 kb centered around the selected site: the nucleotide diversity (π), Tajima’s D, Fu and Li’s D, Fu and Li’s F, H1, H2 and H2/H1 as measures of haplotype diversity, F ST between East Asians and Europeans, F ST between East Asians and Africans and the derived allele frequency of the selected variant. We used partial least squares (PLS) to identify the most informative statistics based on a subset of 10,000 simulations (prior to PLS analysis, summary statistics were Box-Cox transformed so that their minimum values were between 1 and 2). For parameter inference we used the first 7 PLS components, as they carried the most information for each parameter (estimated using the root mean squared error) (Supplementary Figure 14). Estimation of parameters was performed using the abc R package78. We selected the top 0.5% simulations based on the smallest Euclidean distance between the observed and simulated summary statistics. From these quantities, we obtained the posterior probability distributions for the selection coefficient and the time since selection, and recorded the posterior median and the 95% credible intervals. We examined the accuracy of the ABC parameter estimates using the predicted error (i.e., the mean square error divided by the prior variance of the parameter) based on a leave-one-out cross-validation of 100 observations (Supplementary Figure 15).

Plots for the selection analyses were made in R.

Immunohistochemistry of MFSD12

Unshaven, full-thickness normal human adult scalp with terminal hair growth was used snap frozen in liquid nitrogen in cubes of 2 cm3. Cryosections of 6–8 µm were cut using a cryostat onto adhesive glass slides and stained with primary antibody against human C19Orf28/MFSD12 N-terminal region (MFSD1; Aviva System Biology ARP44958_P050) at a dilution of 1:600 using standard double immunofluorescence protocols. To assess the possible localization of MFSD12 in melanocytes of skin and/or hair follicles, we used a second primary antibody against the melanocyte lineage-specific antigen gp100. Quality testing of the antibody’s specificity was assessed using commercially obtained sections of human kidney tissue as a positive control. IgG isotype controls were used at the same concentration as the lowest primary antibody dilution. Co-distribution and co-localization of both antigens in the skin and the growing hair follicle were determined if there was merging of the MFSD12 (green)- and gp100 (red)-positive channels to give yellow/orange color. Human skin tissue used in this study was obtained with informed consent and with ethics committee approval.

URLs

For HaploReg, see https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php. For NCBI, see https://www.ncbi.nlm.nih.gov. For UCSC, see https://genome.ucsc.edu. For Ensemble, see http://www.ensembl.org. For GTEx, see https://gtexportal.org/. For selscan, see https://github.com/szpiech/selscan.