Author: Davide Piffer

The aim of this study is to identify polygenic selection signatures on intelligence across 26 populations from 1000 Genomes. In the next post, I will expand on this to include more populations (at the expense of SNPs number and reliability)!

Derived allele frequencies and background calibration

At a theoretical level, an ancestral allele is the allele that was carried by the last common ancestor between humans and other primates whereas an allele is derived when it arose in the human lineage after the split from other primates. In practice, this allele is usually ascertained via comparison with chimpanzees. One limitation of this procedure is that if a mutation arose in chimpanzees after the split from humans, then the ancestral allele is not the chimp allele. Thus, 1000 Genomes infers ancestral alleles via alignment with 6 primate species (Ensembl, 2015).

Frequencies of derived alleles are not the same for all populations. Substantial DAF (derived allele frequency) differences across populations have been found, largely due to random drift and population bottlenecks but in part also shaped by different selection pressures (Henn et al., 2015). Non-African populations tend to have higher frequencies of derived alleles, and DAF is positively correlated to distance from Africa (Henn et al., 2015). There are also potential issues with GWAS. For example, a reviewer of a previous submission (https://topseudoscience.wordpress.com/2016/01/10/the-forbidden-paper-on-the-population-genetics-of-iq/) suggested that the minor alleles picked by the GWAS (carried on European subjects) tend to have higher frequencies among the GWAS reference population (i.e. Europeans) than the average genome-wide frequencies of minor alleles. Minor alleles are more likely to be derived alleles, hence these derived alleles will have higher frequencies among Europeans compared to other populations. If derived alleles tend to have a positive effect, the frequency of alleles with positive effect may be higher among Europeans than other populations.

A novel methodology suggested here to deal with this confound is to create a variable which represents a good approximation to the average frequencies of derived alleles picked up by GWA studies. For this purpose, the significant hits (N= 693) from the largest GWAS of human stature to date (Wood et al., 2014) were grouped by allele status. The average frequency of derived alleles (including both alleles with a positive and a negative effect) was computed and then averaged into a single variable, henceforth the DAF index (table 1). Negative and positive alleles were given equal weight to avoid positive selection bias on the index.

Table 1. Mean derived allele frequencies and country IQ.

Population Height Derived IQ Afr.Car.Barbados 0.298 83 US Blacks 0.309 85 Bengali Bangladesh 0.363 81 Chinese Dai 0.359 Utah Whites 0.382 99 Chinese, Bejing 0.365 105 Chinese, South 0.362 105 Colombian 0.372 83.5 Esan, Nigeria 0.286 71 Finland 0.385 101 British, GB 0.381 100 Gujarati Indian, Tx 0.365 Gambian 0.291 62 Iberian, Spain 0.378 97 Indian Telegu, UK 0.362 Japan 0.366 105 Vietnam 0.360 99.4 Luhya, Kenya 0.291 74 Mende, Sierra Leone 0.283 64 Mexican in L.A. 0.376 88 Peruvian, Lima 0.373 85 Punjabi, Pakistan 0.366 84 Puerto Rican 0.369 83.5 Sri Lankan, UK 0.362 79 Toscani, Italy 0.376 99 Yoruba, Nigeria 0.285 71

Using the DAF from the GWAS on human stature, we note that derived alleles (col. 2) tend to be at lower frequencies among African than non-African populations, confirming the findings of a recent study (Henn et al., 2015) on different mutational load at common variants. The hypothesis that this phenomenon could mediate the association between IQ and polygenic scores is also confirmed by DAF’s positive correlation with population IQ (r=0.767).Note that the confounding effect would be present only when there are more derived positive than ancestral positive. If these are represented in equal proportions, the overrepresentation of derived alleles in some populations will be perfectly balanced by the underrepresentation of ancestral alleles and viceversa. However, in cases where there is a dramatic overrepresentation of derived alleles (such as the top significant hits in Rietveld et al., 2013), it is necessary to control for background DAF. Moreover, having a larger sample of SNPs (such as that from the height GWAS comprising 693 SNPs) will enable us to have a more accurate estimate of the background DAF than that we could gain from using a smaller subset of SNPs.

A DAF-calibrated polygenic score is then created by subtracting the DAF index from the average frequency of derived alleles with positive effect from GWAS SNPs. Table 2 reports standardized scores, in descending order (sorted by the mean value of the two scores).

Note that we could also apply the reverse procedure and calculate a background frequency of ancestral alleles (1-DAF). Then one could subtract that from the average frequency of ancestral alleles with positive effect. This is perhaps justified for traits such as height which were not subject to a dramatic increase during human evolution. However, since intelligence has been subject to a sharp increase and most intelligence-enhancing mutations are likely to be human-specific and not shared with our primate ancestors, by focusing on derived alleles one likely amplifies the signal of selection.

Table 2. Background “DAF-free” polygenic scores (P.S). Average is reported as Z scores and reported in descending order.

Population P.S, Rietveld et al., 2014 P.S, p<5*10-8 (N=9) P.S,p<5*10-7>=5*10-8 (N=49) Average Toscani, Italy 1.671 1.620 1.496 1.596 Iberian, Spain 1.567 1.646 1.391 1.535 Finland 1.358 1.645 1.113 1.372 British, GB 0.886 1.446 1.397 1.243 Vietnam 0.481 0.798 1.679 0.986 Japan 1.667 -0.230 1.124 0.854 Utah Whites 0.239 1.319 0.908 0.822 Chinese, Bejing 0.462 0.536 0.736 0.578 Chinese, South 0.494 0.221 0.893 0.536 Chinese Dai -0.229 0.485 0.414 0.223 Gujarati Indian, Tx -0.267 -0.135 -0.159 -0.187 Mende, Sierra Leone 0.133 -0.276 -0.453 -0.199 Colombian -0.847 0.672 -0.433 -0.202 Yoruba, Nigeria 0.309 -0.456 -0.551 -0.233 Puerto Rican -1.178 0.683 -0.220 -0.239 US Blacks -0.245 -0.353 -0.600 -0.399 Gambian 0.233 -0.770 -0.709 -0.415 Afr.Car.Barbados 0.187 -0.931 -0.922 -0.555 Esan, Nigeria -0.626 -0.444 -0.746 -0.605 Punjabi, Pakistan -0.760 -0.928 -0.164 -0.618 Bengali Bangladesh 0.262 -0.646 -1.532 -0.639 Luhya, Kenya -0.947 -1.044 -0.356 -0.782 Indian Telegu, UK -0.389 -1.558 -0.702 -0.883 Sri Lankan, UK -0.230 -1.177 -1.293 -0.900 Mexican in L.A. -2.045 -0.602 -0.508 -1.052 Peruvian, Lima -2.187 -1.523 -1.804 -1.838

The correlation between this score and that obtained using the raw frequencies (total polygenic score= derived and ancestral alleles with positive effect) is r=0.889. These are reported in table 3.

The calibrated scores are correlated to population IQ: r=0.462, 0.628 and 0.752 for the Rietveld et al., 2014, the GWAS significant and the other hits (p<5*10-7>=5*10-8), respectively.

The correlations between the mean calibrated and uncalibrated score and IQ are r=0.68 and 0.790, respectively.

Table 3. Total polygenic scores (Ancestral and derived alleles with positive effect), reported in descending order.

Population Rietveld et al 2014; N=67 p<5*10-8; N=10 p<5*10-7>=5*10-8; N=49 Average Iberian, Spain 0.468 0.566 0.569 0.534 Toscani, Italy 0.467 0.562 0.568 0.532 Finland 0.465 0.573 0.530 0.523 British, GB 0.458 0.548 0.560 0.522 Utah Whites 0.459 0.534 0.530 0.507 Vietnam 0.459 0.491 0.565 0.505 Chinese, Bejing 0.471 0.468 0.555 0.498 Chinese, South 0.466 0.448 0.543 0.485 Puerto Rican 0.449 0.483 0.520 0.484 Colombian 0.445 0.476 0.519 0.480 Chinese Dai 0.454 0.463 0.520 0.479 Japan 0.474 0.399 0.554 0.476 Gujarati Indian, Tx 0.449 0.403 0.493 0.448 Mexican in L.A. 0.431 0.370 0.515 0.439 Punjabi, Pakistan 0.453 0.357 0.490 0.433 US Blacks 0.451 0.360 0.468 0.426 Mende, Sierra Leone 0.458 0.355 0.462 0.425 Yoruba, Nigeria 0.458 0.340 0.468 0.422 Esan, Nigeria 0.455 0.341 0.461 0.419 Bengali Bangladesh 0.450 0.368 0.435 0.418 Gambian 0.456 0.325 0.456 0.412 Afr.Car.Barbados 0.459 0.317 0.460 0.412 Sri Lankan, UK 0.458 0.323 0.445 0.409 Peruvian, Lima 0.427 0.288 0.498 0.404 Luhya, Kenya 0.450 0.292 0.463 0.402 Indian Telegu, UK 0.451 0.293 0.457 0.400

We can apply the reverse procedure to determine if ancestral alleles contain signal above and beyond the background AAF (ancestral allele frequency) distribution. We can carry this out using the Rietveld et al., 2014, the Rietveld et al., 2013 hits with p<5*10-7>=5*10-8, but it is not possible to use the top 10 SNPs because they contain only 1 ancestral allele with positive effect. Table 9 reports the difference between AP for Rietveld et al., 2014 and 2013 and the background AAF (AP-AAF), and population IQ.

Table 4. Ancestral alleles with positive effect – AAF.

Population AP-AAF; Rietveld et al., 2014 AP-AAF; Rietveld et al., 2013 (p<5*10-7>=5*10-8) IQ Afr.Car.Barbados -0.003 -0.079 83 US Blacks -0.025 -0.079 85 Bengali Bangladesh -0.074 -0.105 81 Chinese Dai -0.052 -0.025 Utah Whites -0.062 -0.030 99 Chinese, Bejing -0.022 0.030 105 Chinese, South -0.035 0.000 105 Colombian -0.075 0.012 83.5 Esan, Nigeria 0.007 -0.087 71 Finland -0.067 -0.039 101 British, GB -0.075 0.008 100 Gujarati Indian, Tx -0.069 -0.051 Gambian -0.009 -0.096 62 Iberian, Spain -0.058 0.026 97 Indian Telegu, UK -0.061 -0.096 Japan -0.034 0.012 105 Vietnam -0.051 0.008 99.4 Luhya, Kenya -0.005 -0.099 74 Mende, Sierra Leone 0.005 -0.098 64 Mexican in L.A. -0.097 0.011 88 Peruvian, Lima -0.104 0.038 85 Punjabi, Pakistan -0.052 -0.055 84 Puerto Rican -0.056 0.006 83.5 Sri Lankan, UK -0.043 -0.092 79 Toscani, Italy -0.061 0.019 99 Yoruba, Nigeria 0.000 -0.079 71

The correlation between AAP-AAF (Rietveld et al, 2014) and IQ is negative: r=-0.472. The correlation between AAP-AAF (Rietveld et al, 2013) and IQ is positive: r= 0.742.

Conclusions

Controlling for different population DAFs does not substantially alter the overall pattern, although there is a slight reduction in fit (r x population IQ drops from 0.79 to 0.68), which we do not know if it is just a fluke. The far from perfect correlation with population IQ is due to the top place occupied by Europeans instead of East Asians and a tendency for Latin Americans and South Asians (Indians, Bangladeshi) to score as low as sub-Saharan Africans. We also notice that ancestral positive alleles do not have as strong a correlation to population IQ (r= -0.472 and 0.742) as derived positive alleles (table 4). This is expected on evolutionary grounds, as selection on intelligence should have acted on human-specific mutations rather than on ancestral variants shared with non-human primates.

References:

Ensembl, 2015: http://www.1000genomes.org/faq/where-does-ancestral-allele-information-your-variants-come

Davies, G., Armstrong, N., Bis, J. C., et al. (2015). Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53949).

Henn, B.M., Botigué, L.R., Peischl, S., Dupanloup,I., Lipatov,M., Maples,B.K., Martin, A.R., Musharoff, S., Cann, H., Snyder,M.P., Excoffier, L., Kidd, J.M., Bustamante, C.D. (2015). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. PNAS ; published ahead of print December 28, 2015, doi:10.1073/pnas.1510805112

Rietveld, C.A., Medland, S.E., Derringer, J., Yang, J., Esko, T., Martin, N.W., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science, 340, 1467-1471. doi: http://doi.org/10.1126/science.1235488

Rietveld, C.A., Esko, T., Davies, G., Pers, T.H., Turley, P., Benyamin, B., et al. (2014). Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences, USA, 111, 13790-13794. doi:10.1073/pnas.1404623111

Wood AR, Esko T, Yang J,et al.: Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11): 1173–86.