Before posting up Piffer’s paper, I sent it to a reviewer, someone who works in intelligence research. I explained that many geneticists were dismissive about Piffer’s work on group intelligence, and asked for a critical opinion. Here is that opinion, and Piffer’s replies. Piffer also includes responses to the main themes which came out of the blog commentaries.

Intelligence Researcher: Piffer’s work is one of the most important directions that the molecular genetics of human intelligence is taking right now. Overall, his work is valuable and should be supported. If he uses comparisons between races to get from associations to causal variants, perhaps he can work with some of the leading genomics outfits on this.

The context is somewhat like this: We already have reproducible molecular genetic results about Herrnstein’s meritocracy hypothesis. There are differences in education/IQ polygenic scores between social classes, and within families the children with higher polygenic scores are more likely than those with lower polygenic scores to be upwardly mobile. Herrnstein’s claim that genetics drives social stratification and social mobility is therefore proven definitively, at the molecular level.

In the case of dysgenics, we have some studies already showing that lower polygenic scores lead to higher fertility. This result is not 100% consistent, but we expect the effect only in those populations that actually have substantial dysgenic fertility. We do not yet have studies where, for example, average polygenic scores of representative populations from the 19th century are compared with their present-day descendants to show dysgenics on a time scale of several generations. Woodley of Menie has already produced a very preliminary study showing eugenic fertility on the time scale of the last 3000 years in Europe. This whole field of molecular eugenics and dysgenics in the recent past using ancient DNA needs more work, especially because it is of enormous importance for our understanding of macro-historical trends. Did polygenic scores rise in Europe but decline in the Middle East during the last 2000 years?

What Piffer is doing now is tackling the third major question, which is race differences that have evolved on a time scale of up to 60,000 years (roughly, the time since the African exodus of modern humans). Work on this question is not as far advanced as work on the other questions, and what Piffer has done so far is only the first step in the journey to solving this problem. The reason why progress on this question is so slow is in part political, but there are also scientific obstacles to overcome. One big limitation so far is that all the genome-wide association studies that were done for IQ or education used subjects of European origin.

Piffer: My phylogenetic/spatial analysis is informative regarding the population differences in contemporary populations, and it also sheds light on the selection pressure of the last 60ky. I agree that comparing populations across time would yield a direct test of the directional selection hypothesis and provide a more fine-grained analysis across time. Unfortunately this approach is subject to the issue of LD decay as well, albeit to a much smaller extent, at least within a 10ky period. Woodley and Piffer showed eugenic fertility since the Bronze Age in Europe but ours was a very preliminary study, limited in sample size and in the number of genetic markers. In my next project I would like to combine the much larger knowledge derived from the recent EDU and intelligence GWAS with the huge progress that was made in ancient genomics during the last few years. Reich’s lab has on online database which is constantly updated with new additions to the ancient genomes samples.

Intelligence Researcher: The GWAS hits that we have from these studies are genetic variants that are statistically associated with IQ or education, but most of them are not the causes for variations in these outcomes. There are causal variants all over the genome, in hundreds of different genes. Each of these variations originated as a new mutation some time in the past, sometimes fairly recently, say, 10,000 or 20,000 years ago. In other cases it originated much earlier, hundreds of thousands of years ago. Whenever a new mutation pops up, it pops up in a place on one of the chromosomes that already has a collection of variants in variable sites left and right of the mutation. So, we have not only the new causal variant, but also a whole swarm of non-causal, random variants spread over hundreds of thousands of base pairs left and right of the new mutation. All these are said to be genetically linked because they are transmitted together simply because they are physically close together on the chromosome. Over time, genetic linkage slowly decays because there is crossing-over in meiosis, but even after 100,000 years there still is a lot of linkage disequilibrium (non-random association) over about 100,000 base pairs left and right of the causal variant.

And here we have a problem when studying race differences. If a new causal variant originated after the African exodus about 60,000 years ago, chances are it will still be restricted to part of the world population. Its present-day distribution will reflect the place of its origin more than the selective pressures that have acted on it. A gene can have different causal variants that originated in different races as a result of different, independent mutations. Another possibility is that the association between a causal variant and the swarm of non-causal variants nearby can be different in different races as the result of ancient population bottlenecks, even when the races share the causal variant through common descent.

Piffer: It is true that most of the GWAS hits are not causal variants per se, but they are proxies, so to say, (“tag” SNPs in technical jargon). I think this problem is exaggerated, both on theoretical and empirical grounds and often leads to sweeping claims that reduced accuracy equals zero accuracy. It would be like dismissing surveys or polls out of hand because they are proxies for the entire populations and even the most perfectly random sampling is not representative of the population it is drawn from because of statistical uncertainty.

First of all, one has to prove empirically that a GWAS carried out on Europeans actually would lead to an over-estimate of the European PGS. If one cannot show this, at least they have to say why they expect to find this result. I see no reason why this should be the case, and none of the critics so far have suggested why European GWAS should lead to a bias in favour, rather than against European populations, and why East Asians should be favored even more.

They also have to show why GWAS’ of other traits show a different pattern, with East Asians switching from the top to the bottom of the hierarchy for the height alleles, for example, and Africans moving up to above average values, in accord with stereotypes and statistics on human stature.

GWAS can also pick up population specific variants that are actually trait-decreasing (“detrimental” if the trait is socially desirable or leads to enhanced fitness). For example, a recent GWAS (Asgari et al., 2019) carried out on Peruvians found an allele that is absent from non-American populations and this allele actually decreases height by almost an inch!

Population specific variants are probably more likely to be detrimental because they are recent mutations and purifying selection has not had the time to eliminate them from the gene pool. This would actually lead to a bias against the reference (in most cases European) population. When we compute PGS we flip the frequency of the alleles so if the trait-decreasing allele has frequency X, the trait-increasing allele has frequency 1-X. So if there is a population specific allele with negative effect and 5% frequency, the positive effect allele will have frequency 95% in the reference population, and 100% in the other populations. So this would actually lead to an inflation of the non-reference population score if the population-specific variants tend on average to be detrimental.

Ultimately this question can be settled empirically, and a way of doing so would be to calculate PGS using only alleles that are present at say 1%-99% frequency in all the studied populations (a simple list-wise deletion command with this threshold in R would work). Conversely, the alleles that are present only among Europeans can be counted to determine whether there are more among them with positive or negative effect, and this index would give an estimate of the GWAS bias. Removing the non African specific SNPs would also create a level playing ground, by discarding the possible advantage that non-Africans get from more a possible over-representation of beneficial mutations among population specific variants. Whether this over-representation is higher or lower than what we found for non Africans is unknown, but for the sake of parsimony we assume it is the same.

An example can illustrate the logic behind this methodology: suppose I want to measure the height of two people to determine who is taller, and I cannot use the same tape for measuring both people. Moreover, they live in different countries so I cannot put them next to one another. I know that John is 185cm tall, but the tape I use for measuring Jack is shorter than the one I used to measure John and only measures Jack up to his shoulder. Instead of declaring that I cannot determine who is taller, I take Jack’s measure up to his shoulder, and I measure John again but this time I also measure him up to his shoulder instead of the tip of his head. I then compare their shoulder height, and find that Jack’s shoulder height is 150, whereas John’s shoulder height is 165. I determine than John is taller. Since John is 15 cm taller than Jack at shoulder height, it is highly probable that he will actually be taller than Jack. Only if Jack had a disproportionately long neck or large head, this would not be the case. By this analogy, Africans would have to possess a disproportionate amount of native intelligence-enhancing alleles to offset the polygenic score difference that we found at the variants they share with Europeans.

I have posted a preliminary analysis ( https://rpubs.com/Daxide/488754 ) by computing polygenic scores without those putatively Eurasian-specific/non-African origin SNPs, and found that 1) 80% of the SNPs are common and 20% are putatively non-African in origin; 2) The PGS computed using only the common SNPs has the same correlation with population IQ as the full PGS, that is r=0.88; 3) The Black-White gap is slightly reduced by 0.11%, from 2.43% to 2.32%, whilst the East-Asian gap is slightly increased.

Figures 1 and 2 show the full and the “common” PGS (after removal of non-African origin SNPs).

Figure 1

Figure 2

Only with trans-ethnic GWAS we will have a full grasp of diverging selective forces and we will be able to compute more accurate polygenic scores, but this is not going to happen anytime soon because the bulk of GWAS efforts are still focused on Europeans.

Piffer on commentaries: I would like to add a note here to address some criticism from readers on this forum. The reference to Dunkel et al. could mislead some readers. It refers to their estimate of phenotypic IQ (110) and in no way to their method of PGS calculation. Their estimate is drawn from an unsystematic review of other studies and can be considered a best guess. The PGS for Jews was calculated just like for the other populations, in a manner totally independent from Dunkel et al’s work. I am sorry if some readers might have been misled and I wish I had made this clearer in the paper.

Intelligence Researcher: The result of this is that polygenic scores that were constructed for Europeans may have poor prediction in distantly related races. I would especially expect that this applies for comparisons between Africans and non-Africans. This is a main criticism of Piffer’s conclusions, and he has to acknowledge that this is indeed something that needs extensive future work.

For now, Piffer can try to look for data sets where a few thousand non-Europeans have been genotyped with DNA microarrays and their IQ or educational attainment determined. When races are compared, the polygenic scores should be computed only from those GWAS hits that have directionally consistent effects in the races that are being compared.

Today, the strongest evidence we have for genetic race differences is for the Ashkenazim. Curtis Dunkel found a higher polygenic score for Jews than Gentiles in the Wisconsin Longitudinal Study, and in his last paper Piffer finds in one of the genomics data bases that the Ashkenazi sample had the highest polygenic score of any group in that data base. In this case the results are likely to be meaningful because the Ashkenazim are genetically so similar to other Europeans that problems with different linkage phase etc. are extremely unlikely. It is almost like comparing different social classes in the same country.

The final goal is of course to find the causal variants, separately for each race, and use only these for calculation of the polygenic scores. Piffer tends to focus on the SNPs with the largest effects, and I think he is right to do this because these are most likely to be either causal themselves or to be in very close linkage with the causal variant. In presenting his research, Piffer should emphasize that association studies in non-Europeans are needed not only to compare races, but to find the causal variants based on possibly different linkage phase in different races.

Piffer: I have discussed this issue extensively in my paper so I will be brief here. When I used the set of highly likely causal variants as determined by Lee et al., the PGS’ correlation with IQ is 0.8 and actually increases the Black-White gap.

It has been my plan for a while to compute polygenic scores using only the alleles that have consistent effect in Blacks and Whites and I should do this soon. This constitutes progress towards a universally valid PGS, although it leaves the population-specific variants bias unanswered. That bias can be addressed as I have shown before, or more accurately with trans-ethnic GWAS.

Intelligence Researcher: Another thing Piffer should consider for presentation of his results is that not every GWAS hit has the highest allele frequencies for the high-IQ allele in the populations with higher IQ. Many times it’s the opposite, with the (in Europeans) high-IQ variant being most common in low-IQ populations. This can have many different reasons, and as far as I understand, perhaps 55% of the GWAS hits vary with population IQ as expected, and 45% are opposite. Sometimes, this leads to messy results when one particular association study happens to produce a non-predictive polygenic score or one that predicts in the wrong direction. When presenting his results, Piffer should emphasize this.

There is a huge amount of genetic variation in the species, and only a bit of this is reflected as phenotypic race differences. Of course we are dealing here mainly with non-causal variants that are merely in linkage disequilibrium with causal ones, but still. It gives us the idea that most likely also the causal variants have this very messy pattern of sometimes going with the phenotypic population average and sometimes against it.

Piffer: Empirically, this question can be answered by carrying out a version of the Monte Carlo simulation I presented in my paper, but instead of using the correlation coefficient as a continuous variable (the distribution of the correlation coefficients of each SNP with population IQ), use a dichotomous variable, counting the number of SNPs whose correlation with IQ is positive or negative.

It is true that there is huge variation in the species and many of the associations we see go against the simple pattern defined by phenotypic differences, but this is expected under a polygenic model. Each SNP only has a small effect on the trait, and often selection for a trait alone is not enough to counter the effects of random drift. Also, there is a great amount of genetic correlation between apparently unrelated traits that is simply due to their physical location on the chromosomes, and opposite selection pressures on genetically related traits decreases the net selection coefficient on the outcome phenotype.

Another important factor is that most of the SNPs are actually proxies and this will further confuse the picture. This is why I selected only the GWAS significant hits to obtain a set closer to the causal SNPs, and with the method of correlated vectors I showed that the SNPs with lower significance also have lower correlation with population IQ. If tag SNPs were driving the correlation between PGS and IQ, the opposite pattern would have emerged (i.e. less significant SNPs would have stronger correlation with population IQ).

Let me also clarify two issues that keep being brought up.

1) Lee et al. report a 80% reduction in accuracy for predicting educational attainment among Blacks. Yet their analysis is shallow because they try this only with a single PGS and they don’t try using different significance cut-offs to see how SNP significance affects the trans-ethnic accuracy. Their sample is also problematic because it comes from a low-status, older cohort of African Americans where the heritability of IQ was very low. Some colleagues are working on another sample of African Americans and found that the reduction is much less and the scores retain about 50% validity for Blacks, and that this is even higher for more significant or putatively causal SNPs.

2) Individual prediction is different from group-level prediction. Even if the polygenic score predicts only 10% of the variance in intelligence between individuals, the between-population accuracy can be much higher. This is because natural selection acts homogeneously across the genome, and genes with the same function will be subject to the same selective pressures. Thus, we can safely assume that when future GWAS will discover more SNP-IQ associations, they will conform to the same pattern, because they have been subject to the same selective pressures. A similar principle is used in the social sciences, where scales to measure personality traits or even IQ consist of a handful of items. The skeptic would argue that in order to measure someone’s intelligence one needs to administer an infinite number of items to assess every possible cognitive domain under every possible psychological state. Nevermind that Alice scored 160 on the WAIS, if she took another test she could turn out to be retard. The skeptic would also argue that in order to assess whether someone is an introvert or an extravert, a scale comprising only 10 questions is not sufficient, because the 11th or 12th or Nth question might change the score and flip the individual’s psychological profile. If we follow this line of reasoning, we should discard all of psychology.

The analogy can also be extended to science in general. Scientific studies select samples from the population, because for practical reason it is not possible to study every individual from a population. Inferences from the sample to the population are then drawn. Similarly, GWAS studies produce a sample of SNPs, and I showed that this sample was large enough to yield a very strong predictive power, which survived traditional tests of significance. The call for larger samples in science is a healthy approach, but arguing that a study is not valid because it fails to include some individuals (or some SNPs, in this specific instance) would kill the entire field of statistics on the spot, and bring down with it almost all the studies in the biological and social sciences.

I think that Newton’s laws of reasoning should be reminded to readers who like to posit far-fetched scenarios, which supposedly undermine my arguments. I would like to remind that all these remain speculations not based on empirical evidence and until evidence that disproves my thesis is brought forward, the scenario I depicted in my paper remains the most likely. I would like to conclude with rules 2, 3, 4, which are especially relevant here:

Rule 2) Therefore to the same natural effects we must, as far as possible, assign the same causes. As to respiration in a man and in a beast; the descent of stones in Europe and in America; the light of our culinary fire and of the sun; the reflection of light in the earth, and in the planets.

Rule 3) The qualities of bodies, which admit neither intension nor remission of degrees, and which are found to belong to all bodies within the reach of our experiments, are to be esteemed the universal qualities of all bodies whatsoever.

Rule 4) In experimental philosophy we are to look upon propositions collected by general induction from phænomena as accurately or very nearly true, notwithstanding any contrary hypotheses that may be imagined, till such time as other phænomena occur, by which they may either be made more accurate, or liable to exceptions.