I got curious about pigmentation about ten years when reading the coda to Armand Leroi’s Mutants: On Genetic Variety and the Human Body, where he observes curiously that after all these decades geneticists still didn’t understand very well the basis of normal variation in skin color. I read that in the summer of 2005, so Armand had probably written it in 2004 (he can correct me if he has time, he occasionally comments here). Depending on how you view it, it was a fortunate or unfortunate time to write something like this. Over the past ten years geneticists have solved the basis of normal variation in human pigmentation. In fact, most of the major work was completed between 2005 and 2007. In December of 2005 Science published SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans. The authors reported that rs1426654 was nearly disjoint in distribution between Africans and Europeans, and, that it explained on the order of 1/3 of the variance in pigmentation between the two populations (European populations are fixed for the A allele, Africans for the G allele).

There are several facts just within that statement that illustrates why pigmentation genomics has been such a success in comparison to other domains tackled by the new methods. First, pigmentation pathways seem to be somewhat constrained across animals, so model organisms can given us a lot of insight and clues. A lot of the pigmentation genes, such as KITLG, TYR, and SLC24A5, actually increase or decrease melanin production and alter tissue specific expression just as they do in humans, across vertebrates. Second, the fact that I just named genes off the top of my head highlights the fact that are a few conserved loci that explain most of the variance, crop up in study after study. This is in contrast to height, where the variance is distributed across thousands of genes, and the only one I can name off the top of my head is HGMA2. And it explains a princely ~0.3% of the variance of the trait.

This wasn’t entirely a surprise. I happen to have had a copy of The Genetics of Human Populations. In it, L. L. Cavalli-Sforza reported on a classical pedigree analysis of individuals in Britain of varying levels of African ancestry dating to the 1950s. In particular, in genetic jargon the study focused on the variance in trait values between parentals, F1 individuals, and “back-cross” individuals (as well as a few F2 individuals from what I recall). The research concluded that pigmentation was probably controlled by on the order of 10 genes or so. In particular, the authors suggested that the trait was unlikely to be highly polygenic, which for the designs of that period really meant more than a dozen loci or so, beyond which they lacked the power to differentiate the number of independent effects with any precision (i.e., they wouldn’t be able to distinguish between a trait where 25 loci explain 90% of the variance, and a trait where 500 loci explain 90% of the variance). Third, pigmentation loci exhibit a relatively high pairwise Fst. That is, most of the variation on many of these alleles is partitioned between populations, rather than within them. Obviously that is convenient when you are trying to detect associations between genes and phenotypes which are partitioned on an inter-continental scale.

The illustration with SLC24A5 is pretty straightforward; the frequency of the derived allele is 100% in Europeans, and over 99% ancestral in unadmixed Sub-Saharan Africans. In the 1000 Genomes frequency in the Utah white American sample of the derived A allele is 100% (out of 99 individuals). In the 91 British individuals it is 100%. In the Tuscan set of 107, there are 213 A alleles, and 1 G allele. In the 107 Spanish individuals, the A allele is at 100%. In contrast, for the Yoruba Nigerian data set, there are 3 A alleles for 213 G variants. For the Esan of Nigeria, it is 5 A for 193 G. For the Chinese samples from Beijing, 6 A alleles, and 200 G. At this point you might think that the A variant at this SNP position is diagnostic of European ancestry, but it is not. I, for example, am homozygous for the A variant, as are both of my parents. In the 1000 Genomes data there are 25 Bengalis who are AA, 42 who are AG, and 19 who are GG. In the Sri Lankan Tamil population A is at 49% frequency.

The figure to my left is from Heather Norton’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians, and it uses neighbor-joining trees to represent genetic distances at particular loci then known (2007) to be implicated in inter-continental variation in pigmentation. The abbreviations are pretty self-evident, WA=West African, NA = Native American, EA = East Asian, IM = Island Melanesian, SA = South Asian, and EU = European. What you see is that pigmentation genes are not particularly phylogenetically representative. That is, whole genome relationships, whereby all non-Africans form one clade set against Africans, are not reflected here. Looking at these patterns, you would have inferred that Europeans were the outgroup. And, the lowest genetic distance from West Africans are Island Melanesians. What’s going on here is Island Melanesians and West Africans have similar phenotypes in skin color, and that is being reflected in these genes. Roughly, Melanesians and West Africans exhibit a fair amount of functional constraint around pigmentation genes. They haven’t changed much. In contrast, East Asians and Europeans actually are not too different in their pigmentation on a world-wide scale, but that is not reflected in these trees. Why? As is made clear in the title of Norton et al.’s paper East Asians and Europeans arrived at their phenotypes via different mutational paths. I say different mutational paths because there is a broad overlap in genes, but, the alleles are often different (different SNPs or regulatory elements within the gene).

One of the questions that I often get is how to translate genetic variation into realized trait value shifts in individuals, as opposed to simply proportion of variation explained within the population. Luckily, geneticists who study pigmentation have a quantitative unit, a “melanin index” (MI), which naturally utilizes the fact that individuals with darker skin exhibit less reflectance. But there are two problems giving a simple answer to these sorts of questions. First, a substitution of an allele may have an average effect, but, that effect may not be realized for various reasons (e.g., epistasis). And there are still individual differences between people with the exact same genotype. Second, that effect manifests within a population, and different populations have different mixes of alleles.

The table to the left is adapted from The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. I think we can agree that the results here fit our intuitions. These are averages. Some of the populations in this list, such as the South Asian ones, as well as African Americans, exhibit a lot of variance within population. We now know why; they have a lot of segregating variants. Even within families you can see variation across siblings of quite an extreme nature. The subtle difference between Europeans and East Asians comports with my experience too. The American white population is mostly Northern European, so this is probably a bit on the low side in MI for a typical European population. A paper on Cuban pigmentation genetics given a median MI for self-identified whites as 34. The ancestry is 86% European, 7% African, and 7% Native American, in this set. Therefore the average Iberian probably is somewhat lighter complected, but not by much. Notice how much darker Bougainville Islanders are than African Americans. Though the latter may be “black” in figurative terms, Bougainville Islanders are black in literal terms. Along with some Sudanic people they are among the darkest skinned in the world. In these data Tamil Brahmins are at 41. These are people whose surnames are often, but not always, Iyer. The stereotype, and my personal experience, is that the modal Tamil Brahmin is light to medium brown. Some are rather dark, while a few may have complexions that veer on brunette white. To be honest in my personal experience I have not met any Tamil Brahmins whose skins are white, though it has not been uncommon for me to meet such individuals with such fair skins from Northwest India, in particular Punjabis and Kashmiris (the best way to judge for me is meeting people in real life, as I’ve heard that Indian celebrities often are made up in a way that lighten them up somewhat).

The supplements of the paper have allele frequencies of SLC24A5 for various castes. Kashmiri Pandits are at >95% frequency for the A allele. Other Brahmins are at ~80%, irrespective of whether they are in the North or South. Punjabis, irrespective of caste are at ~95%. Middle castes in South India, like the Reddy and Naidu, are at ~60 to 65%. Chamars, a Dalit caste in North India clock in at 68%, while the Toda people of the Nilgiri plateau of the far south of India have a derived allele frequency of 86%. The low caste individuals in Bihar at 78%. At the other end of the distribution some of the Austro-Asiatic tribes have very low frequencies. The Juang people for example are at 7%. Part of this may just be recent East Asian admixture. But it can’t explain all of it, these groups are mostly of the same component elements as other South Asians, albeit at fractions skewed toward the Ancestral South Indians (ASI). I don’t see any geographic pattern that suggests why selection would happen in certain regions and not in others, though it is suggestive that the Kashmiris and Toda are both living at high elevations, so are the Austro-Asiatic groups. I’ll get back to this paper when we talk about selection, but I’ll set it aside for now.



Rather, what are the effects on MI of substitutions of particular alleles at given genes again? The paper on Cuban admixture and pigmentation genetics and another using Cape Verde as the population of interest are particular useful, because these two data sets have a wide range in ancestral quanta (these are not the only papers with these sorts of results, but this post isn’t a literature review!). The figure to the right is from the second paper, and shows the effect size in standardized units of variants which were statistically significant in their study. Pretty much every study tends to come to the conclusion that SLC24A5 is the biggest effect locus in the genome on this trait if the data set includes substantial West Eurasian ancestry. The main qualification I’d put on that is that East Asians have been understudied for this trait, so the European derived alleles are much more well understood. Be as that maybe, each substitution of SLC24A5 derived allele, A, reduces MI by ~5 units. That is, it’s additive to a first approximation. Some studies do show a mild dominance effect…but of the A allele. That is, light is dominant to dark (e.g., in the Cape Verde study GG is further away from GA than AA is). It’s actually a consistent result. This is curious, because many people believe that dark skin is dominant to light skin. Thanks to genetics we know in a quantitative sense that that’s not true. In fact, perhaps the reverse is on SLC24A5 and KITLG (concretely, individuals who are heterozygous will be lighter than you would expect going by mid-parent mean).

But, in a qualitative sense it is true, because many people simply “bin” complexion into white and non-white, with the latter encompassing a range all the way from pale olive-brown to black. Really the perception is a function of human culture, and ideas of contagion. I don’t like to make invidious accusations of racism often (I don’t think they’re warranted most of the time), but the perception that dark skin is dominant over white skin seems pretty easily explained by hypodescent within a framework of white racial superiority and exclusivity. Most people who have this impression are not racist at all, but, as per the cliche they’ve internalized some perspectives about the recessive nature of whiteness which derives from a model whereby racial purity is essential and necessary for white identity. And, as I like to say, revealed preferences are telling. The majority of whites rapturously reading Ta-Nehisi Coates‘ Between the World and Me have mostly white friends, live in mostly white neighbors, and date mostly white people. Yes, some of this is happenstance, but a sequence of events which consistently fall in one direction indicate preferences at variance with avowals of racial neutrality (Seinfeld and Girls operate in core white social worlds in a riotously diverse megalopolis where whites are a minority; believe it or not you can be friends mostly with people who are not the same race and exhibit good mental health, just ask me about my experience).

With that sociological tangent out of the way, what does this mean? What if I was GG, instead of AA, on SLC24A5? You would expect I’d be about 10 MI units darker. Instead of being an average complected South Asian, neither dark nor fair, I’d be a dark skinned one. As the above statistics suggest it is very rare to find someone of unadmixed European background who carries a G allele at this SNP. But some do exist in the above data, so what would they look like? Let’s take a Northern European, with an MI ~30. The predicted value is about the same as for a “white Cuban.” In other words, they would be swarthy, notably so in Northern Europe. How about two alleles, so they are a homozygote for the ancestral allele, G. You don’t really see Europeans with this genotype at all today. Assuming all other loci the same (e.g., probably the derived variant on SLC45A2), it looks as if you’d expect this Northern European substituted at that SNP be about the same complexion as many Northern Indians today. Though some Northern Indians can pass as white, they are not common. Most are visibly brown in some sense.

But wait, there’s more! SLC45A2 is not as strong an effect as SLC24A5, but it’s still significant. In the Cuban study a substitution at its major SNP of interest has an effect of ~3 units. If the genotypes at both these loci were ancestral homozygous in a Northern European, then the expected MI would be > 45. That’s around where the Senoi of Malaysia are. Definitely brown, a touch on the darker shade. Then there are other loci, TYR, TYRP1, ASIP, KITLG, and APBA2. Few enough that I can name, but enough that touching on each would be repetitious and boring. SLC24A5 and SLC45A2 seem relevant to pigmentation anytime you have a West Eurasian population in the mix. The other loci are hit and miss. But one thing that comes out of the studies in admixed populations is that there is still a significant residual that has not been accounted for in this variation. In the Cape Verde study 44% of the variance seems to be due to “genomic ancestry.” That is, African vs. European. The implication here is that the loci we’re catching are at the large effect end of the long tail of distribution of effects, and there are smaller effect loci still segregating which we haven’t picked up. In European populations where a lot of this work began only a few large effect loci may be segregating, with the others being fixed, and so not variable. This doesn’t change the big picture about the genomic architecture. But, it’s more like half a dozen loci can explain half the heritable variation, as opposed to 90%. At least in that study (it seems that the population you are studying matters for the final summary statistic).

I left OCA2 and HERC2 out of the above list for a reason. Looking at them alone gives me a reason to post this beautiful figure of eye color distributions on a two dimensional axis. As most of you may know, SNPs in the OCA2 and HERC2 region of the genome account for most of the blue vs. brown eye color variation in Europeans. Eye color varies less in human populations, and fewer genes likely effect this variation. In the Cape Verde sample the proportion of variation explained by African vs. European ancestry was 44% (the r-squared). For eye color? A mere 8% (note that they used an RGB quantification scale, rather than binning phenotypes). The correlation between skin color and eye color in this data set was 0.38, so 14% of the variation of eye color could be explained by variance of skin color.

The combination of brown skin and light eyes in women such as Vanessa Williams, the first black Miss America, is totally understandable. All black Americans with roots in this country have ancestry that goes back to the 18th century at the latest, and all of them have white American ancestry (I’ve looked at a lot of black American genotypes; they’re mostly African, but all have some European ancestry, and I literally mean all). So the derived variants around OCA2 and HERC2 are segregating at frequencies weighted by European ancestry in African Americans, ~20% × 75%, so 0.152, which implies that a few percent of African Americans should have light eyes. While skin color seems mostly additive, eye color does seem to exhibit a recessive expression pattern for the lighter variants. Therefore you need to square the q element of the Hardy-Weinberg equation in this case.

But are the variants that result in blue eyes only relevant for eye color? Might they not explain skin color as well? That depends. The Cape Verde study did not find any of the blue vs. brown eye color SNPs to correlate with skin color when one controlled for genomic ancestry and the state of a nearby pigmentation gene. In contrast, the Cuba study did find that an OCA2 marker had an effect on skin color, a little over 1 MI units. This is a smaller effect compared to SLC24A5 obviously, but it is still an effect. As I indicated above, if you follow this literature you notice that a few genes have major effects no matter how you mix and match the data set and population coverage. Others are spottier, and may not reach statistical significance, depending on your mix of populations. It is important to not make one study dispositive of any particular thesis.

What about hair color? While blue eyes are the majority state in much of Northern Europe, blonde hair in adults is rarer. This makes sense when you notice that one of the major pigmentation genes associated with blonde hair, KITLG, in a derived allele, only has a frequency of that allele at 15% in much of Northern Europe. That means that only a few percent of individuals are homozygote. The above image of mice is from A molecular basis for classic blond hair color in Europeans. The individual in the middle is a heterozygote. The authors claim that they can see a subtle effect. I suppose it’s there if you squint (my son is a heterozygote, and I will report his hair is lighter than his sister’s, who is homozygote for the ancestral variant). The individual to the right in the figure is an pale homozygote for the derived allele. This locus also shows up in cats and horses in generating tissue specific depigmentation, though in humans it has also been implicated in skin color and testicular cancer as well (yes, you read that right!).

But the scientific story about pigmentation isn’t simply one of GWAS after GWAS. There’s a huge evolutionary story here involving classic population genetic parameters, in particular natural selection. Many of these alleles have been implicated in selective sweep events. That is, the allele has increased in frequency very rapidly, often very recently. One major tell is that there are long haplotype blocks around these alleles. This means that there are sequences of variants closely associated with each other, which is suggestive of the fact that they’re co-inherited together as a unit in a region of the genome where the frequency is increasing faster than recombination can break apart the association. The region around OCA2 and HERC2 is Europeans is the third longest haplotype in the Northern European genome. SLC24A5 is a long haplotype that has very little variation in it from which one can infer structure. The paper above, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, the authors sequence the region around that locus to smoke out variation. There just isn’t that much time for the derived allele for to have accrued mutations. They conclude that the SNP in SLC24A5 responsible for lighter skin derives from a common mutation across all the populations in which it is prevalent. That is, the SNP spread through migration or selection from one individual, rather than the extant variation of a population, so that there were several genetic backgrounds from which selection could. A paper from 2013, Molecular phylogeography of a human autosomal skin color locus under natural selection, attempts to look at the haplotype patterns with a bigger population coverage but lower marker density. It comes to the conclusion that “The distributions of C11 and its parental haplotypes make it most likely that these two last steps occurred between the Middle East and the Indian subcontinent.” In other words, the SNP took off from a launching pad in West Asia. If you look at their evidence it is modest at best, they don’t have many variants to generate haplotypes, especially in a genetic region which lacks diversity.

All this talk about the past has been about inference. In the South Asian paper they use Bayesian methods to infer that the derived allele SLC24A5 arose in a genetic background which coalesces 20-30 thousand years ago, with enormous confidence intervals on the order of tens of thousands of years. You don’t know much more than you already did, as the distribution of the derived variant strongly suggests it arose after East and West Eurasians diverged. Haplotype based methods suggest that the sweep up in frequency increased only in the last 5-10 thousand years.

So what do the ancient DNA tell us? The figure to the left is from Eight thousand years of natural selection in Europe. You can see that there is a transect in time of alleles in Northern Europe. Blue is the variant in SLC24A5, green is SLC45A2, and red is OCA2. The variation in allele frequencies over time are pretty similar to what you’d expect for a positive selective sweep, which is what the genomics is telling us occurred. The sweep of SLC24A5 is to fixation. This makes sense on an additive trait where selection prefers homozygote state to heterozygote state. SLC45A2 is close to fixation, though not as total as SLC24A5. Its trajectory has been more gentle, indicating a lower selection coefficient, a least across its arc up toward fixation. For OCA2 the pattern looks like one of demographic decline, as it was fixed in European hunter-gatherers. And yet at some point the frequency began to increase again. As this region of the genome has a long haplotype it’s suggestive of selection, and not just demographic change. Since blue eyes are recessive one major issue for any selective model that hinges on this trait is how selection would be effective at lower frequencies. E.g., if 20% of the population has the alleles then only 4% of the population has the favored trait.

Of course there is Population Genomics in Bronze Age Eurasia, which has a much larger number of SNPs. But unfortunately as they went with a whole genome methodology, they didn’t target the most important functional markers, but caught a lot of tag SNPs which are associated with the major ones. You can find the list for the populations in the supplements, but there are a lot of other genes. I took the table and filtered it for pigmentation SNPs, and also added the ones from the above paper. There is one overlap, at OCA2. As most of the SNPs are not super critical, I just paired them down to really informative ones. You can access the full spreadsheet here.

Bronze Age SNP gene Africa N_Eur S_Asia S_Eur Asia Eur Step HG Neo SHG WHG EN BA Yam rs12821256 KITLG 0.00 0.17 0.03 0.05 0.13 0.07 0.33 0.00 0.10 rs1805005 MC1R 0.00 0.08 0.01 0.20 0.00 0.05 0.00 0.00 0.00 rs1805007 MC1R 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 rs1805008 MC1R 0.00 0.07 0.00 0.03 0.00 0.03 0.00 0.00 0.00 rs1805009 MC1R 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 rs2228479 MC1R 0.00 0.07 0.09 0.10 0.00 0.13 0.20 0.00 0.00 rs885479 MC1R 0.00 0.12 0.08 0.03 0.09 0.00 0.00 0.00 0.00 rs885479 MC1R 0.00 0.12 0.08 0.03 0.09 0.00 0.00 0.00 0.00 rs12913832 OCA2 0.01 0.85 0.08 0.30 0.40 0.41 0.00 1.00 0.56 1 1 0.5 0.5 0.1 rs2470102 SLC24A5 0.05 1.00 0.73 1.00 0.94 0.95 1.00 0.33 0.88 rs28777 SLC45A2 0.12 0.98 0.23 0.95 0.50 0.61 0.33 0.43 0.56 rs35395 SLC45A2 0.16 0.98 0.23 0.95 0.78 0.56 0.00 0.20 0.33 rs1426654 SLC24A5 0.00 1.00 0.69 1.00 0.65 0.18 0.9 1 1 rs16891982 SLC45A2 0.00 0.98 0.06 0.90 0.65 0.00 0.2 0.75 0.4

I didn’t mention MC1R much above because it doesn’t explain much variance. It’s well known for two things. First, there’s a huge body of research from the era of classical mouse genetics on this locus because of its importance in fur coloration, and coat color across mammals in general. Second, a lot of knockouts at this locus seems a necessary, but not sufficient, condition for being red-haired or a ginger. The decreased production in eumelanin combined with constitutive production of pheomelanin results in a reddish tinge. Most people have pheomelanin, but it’s masked by emelanin. When I’ve bleached my hair there are two stages. First, the eumelanin gets stripped out, and my hair is left reddish/copper colored. Then a second bleaching removes the pheomelanin.

Before the “golden age” of pigmentation genetics, basically between December of 2005 and the end of 2007, there was a lot of exploration of MC1R because that’s where the light was. Here’s a paper from 2000, Evidence for Variable Selective Pressures at MC1R:

It is widely assumed that genes that influence variation in skin and hair pigmentation are under selection. To date, the melanocortin 1 receptor (MC1R) is the only gene identified that explains substantial phenotypic variance in human pigmentation. Here we investigate MC1R polymorphism in several populations, for evidence of selection. We conclude that MC1R is under strong functional constraint in Africa, where any diversion from eumelanin production (black pigmentation) appears to be evolutionarily deleterious. Although many of the MC1R amino acid variants observed in non-African populations do affect MC1R function and contribute to high levels of MC1R diversity in Europeans, we found no evidence, in either the magnitude or the patterns of diversity, for its enhancement by selection; rather, our analyses show that levels of MC1R polymorphism simply reflect neutral expectations under relaxation of strong functional constraint outside Africa.

The basic model here is that MC1R started losing function due to relaxation of constraint, and variation started to become dominated by neutral processes. It turns out that Neanderthals too had variation around MC1R. Further investigation suggests that modern Europeans don’t seem to have this variant. More recent evidence suggests that some haplotypes did introgress from Neanderthals at this locus, though perhaps into East Asians far more than Europeans.

So look at the MC1R SNPs in the table above. Neolithic and HG samples are all fixed for the derived variant. That is, one reason it seems implausible that the diversity of MC1R in Europe today is due to long term drift in situ is that it didn’t exist in the continent before the arrival of people from the steppe.

Second, rs12821256, in KITLG, associated with blonde hair in Europeans, is also no present in the ancient hunter-gatherers. But, it is present in the Neolithic farmers, as well as the people coming from the steppe. In fact the steppe samples have a higher fraction than any modern population (in the 1000 Genomes the frequency is ~20% in the British and Finnish samples). Remember, KITLG has been implicated in skin depigmentation in several studies, though the effect size is more modest than SLC24A5.

For the two solute carrier genes the trends are what we already knew. The frequency for 24A5 is high in the steppe, in fact, fixed, and high among the Neolithic farmers. It is low in Western European hunter-gatherers, and segregating at modest frequencies among the Scandinavian hunter-gatherers. The work above suggestions that the genetic background around rs1426654, which is a nonsynonomous change, dates to the Upper Paleolithic. But, both ancient DNA and haplotype based selection methods suggest that in places like Europe and India the frequency of this allele and its flanking sequence have been rapidly rising over the past ~10,000 years. The fact that some European hunter-gatherers had the derived variant of rs1426654, seems to confirm the idea that this mutation arose during the Ice Age, and was widely distributed. But, we can’t really adduce where the particular variant came from until we get good haplotype data from these ancient samples. Let me quote from Molecular Phylogeography of a Human Autosomal Skin Color Locus Under Natural Selection:



With sufficiently strong positive selection for C11, it is possible that this haplotype could have originated anywhere within its current range and spread via local migration. However, selection acting in concert with major population migrations would have facilitated a much more rapid dispersal. Archeological, mitochondrial, and Y-chromosomal data suggest involvement of multiple dispersals in shaping the current populations of Europe and the Middle East (Soares et al. 2010). Because A111T is far from fixation in most Indian samples (Table S1), the high diversity of B-region haplotypes associated with C11 in the GIH sample may be the result of prolonged recombination rather than early arrival of A111T. In fact, the decrease in frequency of A111T to the east of Pakistan suggests that C11 originated farther to the west and after the initial genetic split between western and eastern Eurasians. On this basis, we hold the view that an origin of C11 in the Middle East, broadly defined, is most likely.

Where does this leave us? First, we understand the genetic architecture of normal variation in pigmentation in humans to a good degree. Depending on how much residual there is in smaller effect QTLs there are publications to come which will probably yield a few more genes, but the remaining variance may simple be distributed across many small-effect loci. Second, the frequency of many pigmentation genes seems have changed due to natural selection. in South Asia and Ethiopia the methods have been able to detect genomic signatures of positive selection at SLC24A5. It can’t be ancestry alone, just look at table S5 for South Asia. The range across populations is huge, even if you exclude those with enriched East Asian ancestry.

Third, we don’t really know why this selection occurred across these pigmentation genes. This is going to sound strange of course. There are many theories out there. Readers regularly ask me what I think about Peter Frost’s thesis. My standard response is that I’m skeptical, but who knows? Peter has asserted that the selection he speaks of began in a very narrow delimited area in northeastern Europe. In the next few years we will have ancient DNA and be able to test some of his predictions. A more widely accepted thesis is promoted by Nina Jablonski in Skin: A Natural History. In her model at lower latitudes selection constrains variation due to high UV, while at higher latitudes there is relaxation of that constraint, and selection for vitamin D synthesis. The story is neat, but selection for SLC24A5 at lower latitudes, and higher elevation as those latitudes, occurs.

The map to the left makes clear that the Sudan has some of the highest radiation levels in the world. It is reasonable then that people in this area would have darker skin than anywhere else. But Ethiopia’s radiation levels are not that much lower. And yet we know that there hasn’t been strong selection against the light skin alleles presumably derived from West Eurasian migrants. Rather, the reverse has occurred! None of the parsimonious models seem to explain very well the complexity on offer here.

Then, as Graham Coop observed in response to an Ewen Callaway piece in Nature where the latter inferred that European hunter-gatherers must have been dark skinned and blue eyed because of what genetics implies, we don’t really know the genetic architectures of pigmentation of ancient individuals. The reason is simple: we have genotype data, but not phenotype data. East Asians and Western Europeans converge upon lighter complexions via diverse genetic mechanisms, so why couldn’t ancient European hunter-gatherers be the same? This is a fair point. And, if true, then selection on pigmentation loci couldn’t, by definition, target pigmentation, since there wouldn’t be much heritable phenotypic variation to select upon.

But in response to the idea we should be phenotype-agnostic, pigmentation is one of the most well characterized traits for mammals in regards to the genetics. The parameter space of possibilities is not infinitely constrained. The same genes, and sometimes same mutations, re-occur across different populations. The reason some Melanesians have blonde hair is due to a mutation in TYRP1. Again, this is a locus implicated in pigmentation variation across many populations, and in other mammalian lineages. If we had good high quality whole genome sequences we could actually look for functional mutations across a set of pigmentation loci. If ancient European hunter-gatherers were functionally constrained around the pigmentation genes, or subject to neutral dynamics, that would be informative. A better characterization of all the diverse modern populations will probably give us better expectations of the size of the parameter space of genetic variation and how it maps onto phenotypic variation.

I’ve been giving a lot of thought to this topic for a while. And I have to say that in terms of the evolutionary origin of this trait and its variation, I’m left befuddled. After talking to researchers who are on the cutting edge in this area I’m pretty sure they are confused, too. That’s not dispiriting; that’s the state of science before discoveries push the edge of knowledge further. But, I’d also appreciate it if in response to this very long post readers don’t go Google Pundit on me and start throwing down a list of publications which resolve all these problems. I’m moderately familiar with this literature, and have probably internalized studies which go in both directions. In response to a post into which I put more effort over the last day than I probably should have, I expect the comments to be not-annoying. Or else (I assume you know what’s in that conditional!).