Selection is one of the major parameters which population geneticists investigate. The easiest way to investigate selection is to have omniscience as to the change in allele frequencies over time. If you are a Drosophila geneticist this is feasible, as you control the reproduction of your model organism in the lab. It is obviously much more difficult in natural populations (one reason that I think ecological genetics went into decline for a while is that it is just very hard). And in long-lived species like humans it is really not feasible to “track” change in allele frequencies in real time, as that would take centuries in the least.

So researchers have to make recourse to inferences from patterns of variation in the genome for species like humans, as it allows us to look back into the deep past. The inheritance pattern of Mendelian genetics is such that transmission of variants across the generations can be modeled, and processes such as rapid population growth or positive selection leaves footprints in the genome long after they’ve done their job. So you can test for selection, or population expansion, or bottlenecks, just by looking at patterns that you’d expect being left in their wake. The PSMC method famously infers demographic history of populations by examining variation within a single whole genome!

In regards to selection, which population geneticists are interested in because it is one of the preconditions for the evolutionary process of adaptation, there are many methods of inference from genetic and genomic data. Tajima’s D is an older method which compares different types of diversity across the genome, and popular for those looking more at inter-specific differences. More recently haplotype based tests look for long segments of variants within the genome. EHH and iHS are probably two of the more popular versions of this. Haplotype based methods really didn’t become popular until the middle 2000s because they require a certain density of data which is really “post-genomic” era. Then you have the methods which look for frequency differences between populations, and compare them to the expectation based on patterns across the whole genome (e.g., PBS). Again, these require genome-wide data. More generally the popularity of site frequency based techniques rely on enough data to actually produce a site frequency.

And just as these methods have needs in terms of the raw data necessary to produce viable statistics, they also exhibit different strengths and weaknesses. The haplotype based methods are good at detecting “hard sweeps,” that is, strong positive selection on a novel mutation emerging against the ancestral background. EHH picks up completed sweeps across populations. In contrast, iHS is better at obtaining traction at incomplete sweeps. Though they have good power to detect events on a human microevolutionary scale, think on the order of 10,000 years, they get fuzzy as one approaches the present. Specifically, when iHS detects older incomplete sweeps it may not tell you if the sweep is still occurring, but it probably is. Additionally, they’re not particularly good at picking up “soft sweeps,” where alleles long segregating within the population are driven up in frequency by selection, or polygenic selection where the impact of the coefficient is distributed across the genome.

Finally, there have been attempts to detect selection using ancient DNA. This a technique which takes a step toward omniscience; rather than inferring from extant variation one can track allele frequency change in “real time” through the record of the DNA. The problem of course is that sample sizes are finite and data quality is often hit and miss.

This is why the preprint Detection of human adaptation during the past 2,000 years, out of Jonathan Pritchard’s lab, has me so excited. Using the whole genome sequence data that has come online over the past few years at large sample sizes they manage to infer selection events over the past 2,000 years among the British! Here’s the abstract:

Detection of recent natural selection is a challenging problem in population genetics, as standard methods generally integrate over long timescales. Here we introduce the Singleton Density Score (SDS), a powerful measure to infer very recent changes in allele frequencies from contemporary genome sequences. When applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past 2,000 years. We see strong signals of selection at lactase and HLA, and in favor of blond hair and blue eyes. Turning to signals of polygenic adaptation we find, remarkably, that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we report suggestive new evidence for polygenic shifts affecting many other complex traits. Our results suggest that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.

The basic logic is not difficult to grasp. Derived alleles (the novel ones which mutated recently) subject to selection tend to alter their local genomic region in predictable ways. In particular, derived alleles subject to positive selection will exhibit shallower genealogies than ancestral neutral variants. Conventional neutral processes result in the birth of mutations and extinction of ancestral variants at regular intervals as modeled by the coalescent process. Some alleles will increase in frequency rapidly, and some more slowly, but it will be a random affair. In the figure above the dark branches are ancestral and red derived. The right panel shows that the coalescence of ancestral and derived are regular and approximately the same for a neutral context (i.e., selection is not targeting the derived variant). In contrast, in the left panel you see that the derived variants have a much shallower coalescence, presumably because of rapid expansion in the population of alleles in the recent past back to a common ancestor.

The SDS needs genome-wide data, as well as large sample sizes. In 2016 you have both, at least for some regions of the world and populations. Comparing SDS to haplotype-based methods they find that the biggest differences in selection in the latter are continental-scale; that is, between Europe and Africa. In contrast, SDS tends to zoom in on intra-European variation, because a ~2,000 year time scale is likely to be localized.

They found lots and lots of selection. The signals around LCT and MHC were not entirely surprising. LCT is almost a positive control for a test of selection. It’s pervasive in Europe, but it was only recently selected, and so there are still ancestral variants around (unlike SLC24A5 which went nearly to fixation in a literal sense). MHC has to do with immune response, and that’s always evolving.

Perhaps more interesting is that the authors detect continuous selection on height and pigmentation in their sample. Why height? I’ve been skeptical of some of the genetic arguments in Greg Clark’s A Farewell to Alms (and have told Greg so), but, recent selection for height does seem to align with his idea that the English were particularly wealth and healthy over the past ~2,000 years. And, it also seems to support the suggestion of elite over-production, as presumably tall men would be more well represented among elites for both nutritional and genetic reasons.

The results for pigmentation are intriguing. Some of the older signals don’t show up (e.g., SLC24A5 and SLC45A2). They’re either fixed, or near fixed, so where are the old haplotypes going to be to compare to? But intriguingly the selection around KITLG and OCA-HERC2 still seems to be occurring! Though the authors associate them with hair and eye color, the extreme tissue specific expression does not mean they have no effect on skin color. In the supplements they note that “In all 14 cases the derived allele is associated with either lighter pigmentation (i.e., lighter hair, skin, or eyes) or increased freckling.” Additionally, they state in the main text that “We speculate that recent selection in favor of blond hair and blue eyes may reflect sexual selection for these phenotypes in the ancestors of the British, as opposed to the longer-term trend toward lighter skin pigmentation in non-Africans, generally thought to have been driven by the need for Vitamin D production.”

At this point reader Sean will probably have a meltdown, and have to go to his natural reflex to core-dump everything on sexual selection he has taken in from Peter Frost for the 1000th time. If he doesn’t control his overwhelming sexually selected urge to repeat himself like a robot I’m going to ban him, as I don’t really want to re-read the same comment again. That being said, I don’t really know how seriously the authors take the idea that pigmentation is sexually selected….

I find Geoffrey Miller’s The Mating Mind interesting, but I’m mildly skeptical of the importance of sexual selection in recent human history (as opposed to earlier periods when broad human behaviors became fixed in our lineage). Often sexual selection crops up as a deus ex machina in these sorts of papers (I also don’t see enough variation in reproductive skew to make sexual selection plausible). The reason is simple. Geneticists are good at detecting selection occurring, but far less clear how and why selection is occurring. In this way LCT is an exception.

With all that said, this is an incredible paper. Because of the large genomic data sets in the United Kingdom the preprint focused on the British. But this is the sort of analysis going to expand to all populations in the near future. Genomics will be ubiquitous, as will the tools to make inferences about population history and dynamics.