In 1969, Harvard Educational Review published a long, 122-page article under the title “How Much Can We Boost IQ and Scholastic Achievement?” It was authored by Arthur R. Jensen (1923–2012), a professor of educational psychology at the University of California, Berkeley. The article offered an overview of the measurement and determinants of cognitive ability and its relation to academic achievement, as well as a largely negative assessment of attempts to ameliorate intellectual and educational deficiencies through preschool and compensatory education programs. Jensen also made some suggestions on how to change educational systems to better accommodate students with disparate levels of ability.

While most of the article did not deal with race, Jensen did argue that it was “a not unreasonable hypothesis” that genetic differences between whites and blacks were an important cause of IQ and achievement gaps between the two races. This set off a huge academic controversy—Google Scholar says that the article was cited more than 1,200 times in the decade after its publication and almost 5,400 times by December 2019. The dispute about the article centered on the question of racial differences, which is understandable as Jensen’s thesis came out on the heels of the civil rights movement and its attendant controversies, such as school integration, busing of students, and affirmative action. Jensen questioned whether it is in fact possible to eliminate racial differences in socially valued outcomes through conventional policy measures, striking at the foundational assumption of liberal and radical racial politics. His floating of the racial-genetic hypothesis was what set his argument apart from the general tenor of the era’s scholarly and policy debate.

In this post, I will take a look at Jensen’s arguments and their development over time. The focus will be on the race question, but many related, more general topics will be discussed as well. The post has four parts. The first is a synopsis of Jensen’s argument as it was presented in the 1969 article. The second part offers an updated restatement of Jensen’s model of race and intelligence, while in the third part I argue, using the Bradford Hill criteria, that the model has many virtues as a causal explanation. In the fourth and concluding part I will make some more general remarks about the status and significance of racialist thinking about race and IQ. [Note]





1. Synopsis

Jensen’s (1969a) article is often misrepresented, so I will start with a detailed synopsis of what he actually wrote. So as to not break the flow of the original argument, I have presented my comments, where pertinent, in footnotes, with a view to checking how Jensen’s claims have stood the test of time.

1.1. Failure of compensatory education

Jensen started by noting that the Civil Rights Commission had concluded in its 1967 report to the Johnson administration that compensatory education programs designed to eliminate or reduce achievement gaps in schools had failed to do so. The report, which paid particular attention to high-quality programs in majority-black schools, found that “none of the programs appear to have raised significantly the achievement of participating pupils, as a group, within the period evaluated by the Commission.”

Jensen argued that two theoretical ideas undergird preschool and compensatory education programs: the average child concept and the social deprivation hypothesis. The first of these refers to the idea that, save for a few rare individuals suffering from severe inborn neurological defects, cognitive abilities are pretty much the same in all children. Any variation between children is viewed as arising from unequal exposure to knowledge and skills before and outside of school. If such environmental inequality did not exist, all children would perform at more or less the same, adequate level in school.

The second and related idea is that the reason why racial minorities and children of poor parents tend to be below-average achievers in school is that they lack middle-class experiences that would give them the cognitive and non-cognitive skills needed for success. The purpose of preschool and compensatory education programs is then to provide “socially deprived” children with those crucial experiences.

1.2. Measuring intelligence

Next, Jensen embarked on a long discussion of the definition, measurement, and determinants of intelligence, which makes up the bulk of the article. In effect, the whole article is a critique of the average child concept and the social deprivation hypothesis.

Jensen wrote that it is not possible to unequivocally state what intelligence is. An operational approach is much more fruitful, and the “best we can do is to obtain measurements of certain kinds of behavior and look at their relationships to other phenomena and see if these relationships make any kind of sense and order.” [Note] The article then notes how influential the intelligence tests developed by Binet and Simon were, stressing that the Binet-Simon scales and their successors have their origins in the modern Western educational setting; abilities important for success in school were important when the domain of IQ items was defined. The correlation between IQ and scholastic achievement is about .5 to .6, but if data are aggregated longitudinally over many years, the correlation approaches the reliability of the measures.

Jensen then discussed the general factor of intelligence, or g, which was originally proposed by Charles Spearman at the beginning of the 20th century. Performance on all cognitive tests is positively correlated, and the strengths of the test-test correlations cannot be accounted for by superficial (dis)similarities in test content or format. Spearman argued that the core of general intelligence was “the ability to educe relations and correlates.” Similarly, Thomas Aquinas defined intelligence as “the ability to combine and separate,” or “to see the difference between things which seem similar and to see the similarities between things which seem different.” The best measures of g require such abilities.

Jensen noted that attempts to create tests of complex problem solving that do not give rise to a general factor have failed. The g factor accounts for 50% or more of the total individual differences variance in a typical test battery. [Note] Jensen argued that if the term intelligence is used, it should refer to g.

It has been found that a test consisting of tasks designed by Jean Piaget and his collaborators for the study of mental growth in children is loaded on the general factor about as highly as psychometric tests are. Therefore it “seems evident that what we call general intelligence can be manifested in many different forms and thus permits measurement by a wide variety of techniques.” Jensen pointed to cross-modal transfer as a central characteristic of intelligence–this means, for example, the recognition of the same stimulus when it is administered in a different sensory modality, such as tactile versus visual representation.

The g loading of a test will vary as a function of the nature of the tests together with which it is factor-analyzed. Jensen argued that it is possible to fractionate g into smaller sources of variance. He analogized g with a general athletic ability factor that might be derived from a test consisting of various athletic performances. [Note]

The article endorsed Raymond Cattell’s dichotomy of fluid and crystallized intelligence. Fluid intelligence is the capacity for new learning and problem solving, whereas crystallized intelligence represents previously acquired skills and knowledge. Among people sharing a common culture, the two are expected to be highly correlated. Fluid intelligence may top out as early as late teens, whereas crystallized intelligence in expected to accumulate up to old age.

1.3. IQ and occupational status

According to Jensen, IQ is related to occupational status mainly through educational attainment. The average educational and income levels associated with occupations, the prestige of occupations, and the mean IQs of occupations are all correlated with each other at around the .80–.90 range. At the individual level, the correlation between occupational status and IQ has been found to be between .42 and .71 in various studies. Within occupations, the correlation between IQ and job proficiency has been found to be around .20–.25. The correlation between IQ and ease of training for various occupational skills appeared to be around .50. [Note]

1.4. Intelligence, fixed or not?

Jensen criticized the frequently used term “fixed intelligence” as a misnomer that confuses genotypes and phenotypes. Genotypic influence on intelligence is fixed in the sense that the genetic factors are fully laid down at conception. Intelligence, however, is properly a phenotype and it is never fixed, because it is “a result of the organism’s internal genetic mechanisms established at conception and all the physical and social influences that impinge on the organism throughout the course of its development.” The interesting question is the correlation between genotypes and phenotypes at various points in development. The square of this correlation is known as heritability.

The article looked at the evidence for the stability of IQ, viz., the extent to which individuals retain their standing relative to others when retested over the course of time. IQ, like other developmental characteristics, is rather unstable in small children but becomes increasingly stable throughout childhood. After age 8, the stability correlation, when corrected for measurement error, is between .9 and unity. [Note]

1.5. General versus specific abilities

Jensen argued that the term intelligence should be reserved for the general factor, or g. On the one hand, he emphasized that g or IQ tests do not capture the full extent of mental abilities. On the other hand, he stressed that g reflected a biological reality and captured capabilities that have been “singled out as especially important by the educational and occupational demands prevailing in all industrial societies.”

1.6. Population distribution

Jensen went on to discuss the distribution of IQ. Originally IQ scores were defined in relation to mental age, but, due to certain problems with the mental age concept, modern tests define IQ in relation to scores obtained in a same-aged norming sample. IQ scores are normally distributed by construction, but Jensen argued that there are reasons to believe that normality is not just a convenient assumption. Firstly, he referred to the central limit theorem and the expectation that the sum scores of a large set of test responses would be normally distributed, with the caveat that IQ items do not fully match the premises of the theorem, e.g., they are not uncorrelated. Secondly, Jensen argued that if the normal distribution of IQ scores is warranted, IQ scores should behave like an interval scale. Evidence for this has been obtained from sibling studies that show the regression of sibling IQs on one another to be linear almost throughout the IQ scale, with the same mean number of IQ points separating siblings regardless of IQ level. The exception to this pattern (and the normality of IQ scores) is found at the low end of the scale where there is an excess of scores as a result of genetic and chromosomal abnormalities and other pathological conditions. The IQ distribution therefore resembles that of height which is also normally distributed (within sexes) with a “bulge” at the lower end due to dwarfism.

Low IQ due to pathology, sometimes called organic mental retardation, is usually accompanied by abnormal physical appearance, and must be distinguished from familial mental retardation, the latter referring to low IQ scores that are part of the normal distribution of IQ. Evidence for the validity of this distinction comes from sibling comparisons. It has been found that there is no correlation between the IQs of severely mentally retarded individuals and their siblings, whereas in the mildly retarded range we find the usual sibling correlation. Further evidence for the distinction is provided by the observation that severe mental retardation occurs at similar rates across all social classes, whereas mild retardation is concentrated in the lower classes. There is some indication that the density of IQ scores at the high end of the scale is greater than the normal distribution predicts, too, but the evidence for this is inconclusive. [Note]

1.7. Decomposing the determinants of intelligence

Next, Jensen discussed the inheritance of cognitive ability, with the aim of countering the “belief in the almost infinite plasticity of intellect, the ostrichlike denial of biological factors in individual differences.” He argued that “the slighting of the role of genetics in the study of intelligence can only hinder investigation and understanding of the conditions, processes, and limits through which the social environment influences human behavior.”

The article discussed selective breeding studies in animals, paying specific attention to those that bred rats for greater learning ability. As to the applicability of breeding schemes to humans, Jensen quoted a 1967 position statement by three eminent geneticists (James F. Crow, James V. Neel, and Curt Stern) who had written that a “selection program to increase human intelligence (or whatever is measured by various kinds of ‘intelligence’ tests) would almost certainly be successful in some measure. The same is probably true for other behavioral traits. The rate of increase would be somewhat unpredictable, but there is little doubt that there would be progress.” [Note]

Jensen wrote that the inheritance of continuous or metric traits like human intelligence was polygenic in nature, involving “multiple genes whose effects are small, similar, and cumulative.” He noted that the simplest model would require between 10 and 20 genes for intelligence, but that the actual number was probably much larger.

The article presented the following decomposition of intelligence into various components:

As these components are well-known in behavioral and quantitative genetics and their meaning should be clear from their names, I will skip Jensen’s explication of them, save for a few quotes.

Jensen pointed out that assortative mating can have a substantial effect on the population distribution of IQ: [Note]

[A]ssortative mating increases the genetic variance in the population. By itself this will not affect the mean of the trait in the population, but it will have a great effect on the proportion of the population falling in the upper and lower tails of the distribution. Under present conditions, with an assortative mating coefficient of about .60, the standard deviation of IQs is 15 points. If assortative mating for intelligence were reduced to zero, the standard deviation of IQs would fall to 12.9. The consequences of this reduction in the standard deviation would be most evident at the extremes of the intelligence distribution. For example, assuming a normal distribution of IQs and the present standard deviation of 15, the frequency (per million) of persons above IQ 130 is 22,750. Without assortative mating the frequency of IQs over 130 would fall to 9,900, or only 43.5 percent of the present frequency. For IQs above 145, the frequency (per million) is 1,350 and with no assortative mating would fall to 241, or 17.9 percent of the present frequency. And there are now approximately 20 times as many persons above an IQ of 160 as we would find if there were no assortative mating for intelligence. Thus differences in assortative mating can have a profound effect on a people’s intellectual resources, especially at the levels of intelligence required for complex problem solving, invention, and scientific and technological innovation.

On genotype-environment correlations Jensen wrote:

A genotype for superior ability may cause the social environment to foster the ability, as when parents perceive unusual responsiveness to music in one of their children and therefore provide more opportunities for listening, music lessons, encouragement to practice, and so on. A bright child may also create a more intellectually stimulating environment for himself in terms of the kinds of activities that engage his interest and energy. And the social rewards that come to the individual who excels in some activity reinforce its further development. Thus the covariance term for any given trait will be affected to a significant degree by the kinds of behavioral propensities the culture rewards or punishes, encourages or discourages. For traits viewed as desirable in our culture, such as intelligence, hereditary and environmental factors will be positively correlated. But for some other traits which are generally viewed as socially undesirable, hereditary and environmental influences may be negatively correlated. […] In making overall estimates of the proportions of variance attributable to hereditary and environmental factors, there is some question as to whether the covariance component should be included on the side of heredity or environment. But there can be no “correct” answer to this question. To the degree that the individual’s genetic propensities cause him to fashion his own environment, given the opportunity, the covariance (or some part of it) can be justifiably regarded as part of the total heritability of the trait. But if one wishes to estimate what the heritability of the trait would be under artificial conditions in which there is absolutely no freedom for variation in individuals’ utilization of their environment, then the covariance term should be included on the side of environment. Since most estimates of the heritability of intelligence are intended to reflect the existing state of affairs, they usually include the covariance in the proportion of variance due to heredity.

Regarding gene-environment interactions Jensen noted:

There is considerable confusion concerning the meaning of interaction in much of the literature on heredity and intelligence. It is claimed, for example, that nothing can be said about the relative importance of heredity and environment because intelligence is the result of the “interaction” of these influences and therefore their independent effects cannot be estimated. This is simply false. The proportion of the population variance due to genetic × environment interaction is conceptually and empirically separable from other variance components, and its independent contribution to the total variance can be known.

1.8. Misconceptions about heritability

Next, Jensen discussed certain common conceptual errors in debates about heritability. He noted that while both environments and genes are needed for an organism to exist at all, that does not render the question of the relative importance of nature and nurture meaningless. The relevant question concerns the proportions of population variation that can be attributed to each.

While noting that heritability is a population statistic that cannot be used to partition “a given individual’s IQ into hereditary and environmental components”, Jensen wrote that because heritability is the squared correlation between phenotypes and genotypes, it can be used to make probabilistic statements “concerning the average amount of difference between individuals’ obtained IQs and the ‘genotypic value’ of their intelligence.”

The article also noted that heritability is not a constant but depends on the amount of variability in the causal factors. Increasing causally relevant environmental variation will decrease heritability, while increasing genetic variation (while holding the environment constant) will increase heritability.

Responding to the criticism that heritability estimates are meaningless because we cannot “really” measure intelligence, Jensen pointed out that the estimates show that whatever the tests measure, it is heritable. To the extent that the tests are not “culture-free”, heritability estimates will be decreased.

Another common counterargument is that we must be able to “spell out in detail every single link in the chain of causality from genes (or DNA molecules) to test scores if we are to say anything about the heritability of intelligence.” According to Jensen, this is not so because “[s]elective breeding was practiced fruitfully for centuries before anything at all was known of chromosomes and genes, and the science of quantitative genetics upon which the estimation of heritability depends has proven its value independently of advances in biochemical and physiological genetics.”

Still another conceptual confusion is that because something like one’s vocabulary cannot be directly inherited, it cannot be heritable. But people differ markedly in “the amount, rate, and kinds of learning they evince even given equal opportunities.” High heritability indicates that opportunities for learning have been widespread, while low heritability shows the opposite.

High heritability does not necessarily imply immutability. Large changes in environmental conditions may change heritability, or heritability may remain the same while the population mean changes. However, Jensen argued that the degree of heritability says something about “the locus of control of a characteristic”:

The control of highly heritable characteristics is usually in the organism’s internal biochemical mechanisms. Traits of low heritability are usually controlled by external environmental factors. No amount of psychotherapy, tutoring, or other psychological intervention will elicit normal performance from a child who is mentally retarded because of phenylketonuria (PKU), a recessive genetic defect of metabolism which results in brain damage. Yet a child who has inherited the genes for PKU can grow up normally if his diet is controlled to eliminate certain proteins which contain phenylalanine. Knowledge of the genetic and metabolic basis of this condition in recent years has saved many children from mental retardation.

Finally, Jensen pointed out that children share only half of the genetic variants of each parent (or somewhat more under assortative mating). This means that substantial phenotypic differences between parents and children (or between siblings) are compatible with high heritability.

1.9. Empirical estimates of heritability

For empirical estimates of the heritability of IQ, Jensen relied particularly on various publications by Cyril Burt based on British data [Note] , and on Erlenmeyer-Kimling and Jarvik’s (1963) review of kinship correlations for IQ which included data from 58 studies across eight countries.

Jensen summarized Burt’s report of a variance decomposition of Stanford-Binet IQs based on many types of kinship pairs drawn primarily from London schools. This analysis found a heritability of 81% (for unadjusted test scores) or 93% (for test scores adjusted after retesting children whose IQs did not match their teachers’ impressions of their “brightness”). These results must be regarded with caution given that serious concerns about the authenticity of Burt’s data were raised some years after Jensen’s article was published.

Next, Jensen looked at Erlenmeyer-Kimling and Jarvik’s data on IQ correlations among many types of kinship pairs. These data were based on various tests and different testing conditions, and were collected by “numerous investigators with contrasting views regarding the importance of heredity.” Nevertheless, the results were generally compatible with the view that there is a strong polygenic component to IQ differences.

Supplementing Erlenmeyer-Kimling and Jarvik’s data with some of Burt’s, Jensen presented the following table where empirically observed correlations are compared to correlations from two theoretical, purely genetic models:

The table indicates that there are some systematic departures from theoretical expectations, presumably reflecting non-genetic influences. Jensen illustrated these departures in the following graph where the median correlations reported by Erlenmeyer-Kimling and Jarvik are shown for different kinds of kinship pairs reared together and apart:

Jensen’s own calculations, originally reported in a previous paper (Jensen, 1967) and drawing on Erlenmeyer-Kimling and Jarvik’s correlations for monozygotic and dizygotic twins, suggested an average heritability of 80% for IQ, after adjustment for test unreliability. His estimate for the shared environmental effect was 12%, with 8% left for the unshared environment. Adding Burt’s data to the mix, Jensen arrived at an estimate of 77%, which rose to 81% after adjustment for unreliability, which he regarded as the best overall estimate of the heritability of IQ that data allowed at the time.

The correlation between monozygotic twins reared apart provides the conceptually simplest estimate of heritability, provided that environments are uncorrelated within the pairs. Jensen had access to three studies of MZ twin pairs reared apart. Newman, Freeman, and Holzinger’s (1937) study of 19 pairs found a correlation of .77 (.81 corrected for unreliability), and the correlation in Shields’ study of 44 pairs was .77 (.81 corrected), too. Burt (1966) reported a study of 53 pairs where the correlation was .86 (.91 corrected). Jensen regarded Burt’s study as the best of the bunch because it had the largest and most representative sample, along with very early separation of the twins. [Note]

Correlations between children and their foster versus biological parents provide another way to estimate the effects of nature and nurture. Jensen reported that correlations for IQ between adoptive children and their foster parents were between 0 and .20, while correlations between children and their biological parents “gradually increase from zero at 18 months of age to an asymptotic value close to .50 between ages 5 and 6”, and “this is true whether the child is reared by his parents or not.”

Still another way to gauge the effects of heredity and the environment is to correlate adoptive children’s IQs with measures of their rearing environment. On this score, Jensen paid particular attention to the early study by Burks (1928), where the relevant multiple correlation was .42, suggesting that the measured environment accounted for 18% of IQ variance. Burks calculated that the average effect of a one standard deviation change along the environmental scale was six IQ points, which, Jensen noted, was half the magnitude of the average IQ difference between ordinary siblings reared together. Burks also collected data on parents raising their own biological children, along with measures of rearing environments. Sewall Wright later used Burks’ data to arrive at an estimate of 81% for the heritability of IQ (Wright, 1931). [Note]

1.10. Effects of inbreeding

The article went on to discuss research on the effect of inbreeding on IQ. A negative effect is expected because inbreeding increases the probability that an individual will have two defective mutant recessives in a given locus. A large study conducted in Japan after the Second World War found that the children of cousins averaged almost eight points lower than the children of unrelated parents after controlling for age and socioeconomic status. Additionally, an American study found a high incidence of mental retardation among children from nuclear incest matings (brother-sister or father-daughter).

1.11. Heritability of special abilities

The article reported that the heritabilities of non-g cognitive abilities have been found to range from near zero to about .75, with most values between .50 and .70. Several broad abilities have genetic variance independently of g. Few studies had investigated the heritability of non-cognitive skills at the time. However, some evidence on motor skill learning suggested that heritability may be even higher in the non-cognitive sphere than for intelligence.

1.12. Heritability of scholastic achievement

Jensen argued that the heritability of scholastic achievement was considerably less than that of intelligence. He calculated an average heritability of 40% for a variety of scholastic measures, while the contributions of shared and unshared environmental components were 54% and 6%, respectively. He found that estimates for the heritability of scholastic achievement varied over a much wider range than those for IQ, with lower estimates in primary school and for simple forms of learning and somewhat higher estimates in high school and for more complex forms of learning. He referred specifically to twin data from the National Merit Scholarship Corporation in which about 60% of the variation in class ranks could be attributed to the shared family environment. This, together with the small within-family environmental variance, pointed to the family environment exerting a strong influence on scholastic performance. Unrelated individuals reared together were also found to be much more similar in school performance than IQ. [Note]

Jensen argued that environmentally malleable non-cognitive skills were important determinants of scholastic achievement. He proposed that efforts to improve school performance by improving non-cognitive skills had a much better chance of success than efforts to improve intelligence: [Note]

Thus it seems likely that if compensatory education programs are to have a beneficial effect on achievement, it will be through their influence on motivation, values, and other environmentally conditioned habits that play an important part in scholastic performance, rather than through any marked direct influence on intelligence per se. The proper evaluation of such programs should therefore be sought in their effects on actual scholastic performance rather than in how much they raise the child’s IQ.

1.13. Environmental effects

Jensen argued that the effect of the environment on IQ is non-linear. Moving children from an extremely deprived environment to a normal environment can boost IQ by up to dozens of points, but children reared in average environments do not get an appreciable boost from being placed in a cognitively enriched environment. Below a certain threshold of environmental quality, deprivation can have a large negative effect on IQ, but above the threshold environmental variations are mostly inconsequential. The vast majority of participants in research on the heritability of IQ are drawn from environments above the threshold, which accounts, in part, for the finding of high heritability for IQ.

Jensen compared the effect of the environment on IQ to that of nutrition on height. Nutritional deficiencies will lead to stunting but above a certain level of nutritional adequacy, even great variations in eating habits have little effect on stature.

Jensen set the threshold of environmental deprivation under which IQ is strongly affected at a low level. He did not think that a mere lack of middle-class amenities would have a great effect on IQ. To have a large effect, the environment must impose severe sensory and motor restrictions on the child. Examples of such deprivation include mentally retarded orphanage children studied by Skeels & Dye’s (1939), and the case of “Isabel”, a girl who was confined to an attic and reared by a deaf-mute mother until age 6. Children brought up in such conditions have been found to recover and attain normal intelligence after they are placed in a normal environment.

According to Jensen, children described as “culturally disadvantaged” do not generally encounter severe environmental deprivation. [Note] If severely deprived children are moved to an adequate environment, they usually attain large and permanent IQ gains, while the IQ gains of culturally disadvantaged children placed in an enriched environment are slight and transitory. In contrast to severely deprived children, culturally disadvantaged children do not show early cognitive deficits, and they experience average or, sometimes, precocious perceptual and motor development. As culturally disadvantaged children grow up, their IQs become more strongly correlated with those of their parents, which points to genetic influence. On the other had, less intelligent parents may also be less able to provide their children with environmental conditions that are conducive to intellectual development.

The article then discussed a longitudinal study by Heber et al. (1968) which investigated a sample of children born to poor black mothers in Milwaukee, WI. The mean IQ of children of mothers with sub-80 IQs declined over time, while the mean IQ of children of mothers with above-80 IQs did not show a temporal trend. [Note] Jensen suggested that this is not consistent with environmental deprivation, and is more consistent with genetic factors exerting greater influence in older children. He also discussed an old study by Wheeler (1942) of “Tennessee mountain children” where two cohorts of children aged between 6 and 16 were tested in 1930 and 1940. The average IQ of the 1940 cohort was 10 points higher, presumably due to environmental improvements, but both cohorts showed a similar decline in norm-referenced IQ from age 6 to 16. [Note]

Next, Jensen discussed the concept of reaction range according to which “similar genotypes may result in quite different phenotypes depending on the favorableness of the environment for the development of the characteristic in question” and “some genotypes may be much more buffered against environmental influences than others.” Different genetic strains may therefore have dissimilar heritabilies for a given trait.

Jensen emphasized that heritability estimates represent average heritabilities, and that heritabilities may therefore differ between subpopulations within the same population. He noted that all major studies of the heritability of IQ are based on samples of white people, with no studies of blacks, for example. Reaction norms may vary between races in that “some genetic strains may be more buffered from environmental influences”, which would suggest that heritabilities may not be the same for different populations even in the very same environment. [Note] The availability of heritability estimates for different populations would, however, be useful for the purposes of testing some hypotheses regarding genetic and environmental cases of group differences in IQ.

1.14. Physical and biological environment

Jensen put the environmental variance of IQ scores at around 20%. He noted that people tend to think that the environmental variance is due to differences in “social and interpersonal environment, child rearing practices, and differences in educational and cultural opportunities afforded by socioeconomic status.” However, Jensen argued, much or even most of the environmental variance may not be associated with social factors but rather with certain physical and biological environmental factors, with the implication that “advances in medicine, nutrition, prenatal care, and obstetrics” are important for the improvement of intelligence.

The first piece of evidence for the importance of the non-social environment that Jensen discussed is the fact that twins scored an average of 4 to 7 points lower in IQ tests than singletons. [Note] These differences were likely due to prenatal factors as it seems unlikely that twins and singletons would receive such disparate postnatal treatment, especially as the twin deficit had been observed across social classes. The fact that “MZ twins have a higher mortality rate and greater disparity in birth weights than DZ twins” suggested that “MZ twins enjoy less equal and less optimal intrauterine conditions than DZ twins or singletons.” Boy twins averaged lower IQs than girl twins, which is consistent with the general observation that male infants are more vulnerable to prenatal impairment. Birthweight is modestly correlated with later IQ independently of sociocultural factors, and in MZ twin pairs the twin who weighs less at birth usually has a lower IQ in school age. This may be because “the unequal sharing of nutrients and space stunts one twin more than its mate.” Much of IQ variation in MZ twins therefore appears to be due to prenatal environmental factors. The significance of this observation is that differences in intrauterine conditions between singletons in the general population may contribute substantially to IQ variation.

Next, Jensen discussed certain medical techniques claimed to improve the intrauterine environment. He noted that more optimal intrauterine and perinatal conditions appear to be associated with precocious perceptual-motor development. This is a conundrum for those who argue that inadequate prenatal care and complications of pregnancy are to blame for the lower mean IQ of blacks, for black infants do not typically exhibit subnormal development.

Disadvantagenous reproduction-related factors and conditions, such as pregnancies at early ages and in close succession, low birth weight, prematurity, and infant mortality, are correlated with race and social class, but it is unclear to what extent they can account for IQ gaps. 75–80% of cases of mental retardation cannot be explained by known complications of pregnancy, brain damage, or gene and chromosomal defects, and therefore presumably represent the low end of the normal polygenic distribution of IQ. Research indicates that when common reproductive difficulties occur singly, they have no effect on the child’s intellectual status, suggesting that “the nervous system is sufficiently homeostatic to withstand certain unfavorable conditions if they occur singly.”

Reviewing the literature on the effect of premature birth on IQ, Jensen noted that prematurity has a strong relation to brain dysfunction but that the crucial factor appears to not be prematurity per se but low birth-weight. The latter seems to act as “a threshold variable with respect to intellectual impairment.” The incidence of babies weighing less than 5.5 lb is greater in lower social classes, but socioeconomic variables do not account for more than 1% of the total variance in birth-weight.

Black babies weigh less, on average, than white babies even after controlling for social class, but they also mature at a lower birth-weight than white babies. Prematurity and low birth-weight are more common in blacks than whites, but this is not a full explanation of the black-white IQ gap because black children perform significantly less well in cognitive tests than white children matched for birth-weight.

If race differences are ignored and prematurity is defined as a condition where birth-weight is less than 5.5 lb, the association between prematurity and lower IQ can be statistically explained by the common factor of social class. Social class does not, however, fully explain the association between low IQ and prematurity for birth-weights less than 3 lb. The association between IQ and low birth-weight is mainly due to cases of severe mental retardation among very low-weight infants, and otherwise the association is very weak by school age. It is also possible that there are individual differences in genetic predisposition for prenatal impairment.

Severe undernutrition in early years results in lower IQ. If it occurs after early childhood, it appears to have no permanent effect. For example, severely malnourished prisoners of war have been found to suffer no intellectual deficiencies once returned to normal living conditions. Extreme undernutrition is rare in the United States, but some unknown proportion of the urban population might benefit from nutritional supplementation.

First-borns have been shown to have higher IQs, on average, than later-borns. The reason for this phenomenon is probably biological rather than social-psychological. Jensen asserted that it is “almost certainly not a genetic effect.” The birth-order advantage is slight, and is conspicuously observed only in the extreme right tail of the distribution of achievement. [Note]

1.15. Social class

An extensive literature from many countries shows that children’s IQs are associated with the socioeconomic status (SES) of their parents. [Note] The correlation is typically in the range of .35–.40. Jensen pointed out that this correlation is almost a logical necessity because IQ is heritable and the educational system and the occupational hierarchy act as (imperfect) intellectual screening processes.

Jensen cast doubt on the idea of SES as an important cause of IQ by noting that the IQs of children reared apart from their siblings and parents are correlated with the IQs and educational and occupational levels of such biological relatives, and the correlations are almost as strong as among intact families. Moreover, among siblings raised together, those with IQs above the family average tend to move up the SES scale as adults, while those with IQs below the family average tend to move down.

There is a negative correlation between SES and Developmental Quotient in children under age 2, while there is an increasing positive correlation after age 2. Low-SES children thus get a “head-start” on development, but this trend is reversed at later ages when tests become less motoric and more loaded on the general factor of intelligence.

1.16. Race differences

Moving to a more focused discussion of race differences, Jensen started by outlining his social philosophy that puts the individual at the center of the picture:

The variables of social class, race, and national origin are correlated so imperfectly with any of the valid criteria on which the above decisions should depend, or, for that matter, with any behavioral characteristic, that these background factors are irrelevant as a basis for dealing with individuals—as students, as employees, as neighbors. Furthermore, since, as far as we know, the full range of human talents is represented in all the major races of man and in all socioeconomic levels, it is unjust to allow the mere fact of an individual’s racial or social background to affect the treatment accorded to him. All persons rightfully must be regarded on the basis of their individual qualities and merits, and all social, educational, and economic institutions must have built into them the mechanisms for insuring and maximizing the treatment of persons according to their individual behavior.

Jensen noted that if people considered social problems only from the perspective of individuals, there would be no “race problem.” That is, however, a philosophy that few adopt. People like to compare groups, and assess whether there are groups differences in “the most desirable and the least desirable social and occupational roles in a society.” The fact that different races are disproportionately represented in such roles in America, and that so much current thinking revolves around this fact compels research on all the reasons why this inequality exists. To what extent is the inequality due to unfairness, or the use of instrinsically irrelevant criteria like skin color by decision-makers, and to what extent is it due racial differences in the distributions of indisputably relevant characteristics? According to Jensen, these questions can be answered “only through unfettered research”, and no reasonable hypothesis must be ruled out of court for ideological reasons. Attitudes to the contrary “represent a danger to free inquiry and, consequently, in the long run, work to the disadvantage of society’s general welfare.”

Everyone agrees, Jensen wrote, that environmental causes, including past history, contribute to intellectual, educational, and occupational disparities between whites and blacks in America. However, the possible contribution of heredity to racial differences has been “greatly ignored, almost to the point of being a tabooed subject, just as were the topics of venereal disease and birth control a generation or so ago.”

Groups that are geographically or socially isolated from each other for many generations will differ in their gene pools, and will therefore probably show differences in highly heritable phenotypic traits. Races are “breeding populations” where matings are much more common within-population that between populations. Technically, races are distinguished by their different distributions of allele frequencies. Genetic differences are manifested in “virtually every anatomical, physiological, and biochemical comparison one can make between representative samples”, and this surely applies to the brain as well.

The pertinent question about racial-genetic differences in behavioral traits is not about their existence but about the direction and magnitude of the differences, and the medical, social, educational, and other consequences thereof. Some genetic differences are of no consequence, and the idea that all genetic differences arise and persist due to natural selection cannot be accepted.

Dreger and Miller (1960, 1968) and Shuey (1966) reviewed the evidence for black-white IQ differences. Whites outscore blacks by 15 points (1 standard deviation) on average, and this magnitude is quite similar across 81 different tests analyzed in Shuey’s review of 382 studies. 15 percent of blacks outscore the average white individual. If black and white populations were of the same size, 23 percent of IQ differences would explained by race, with within-race differences explaining 77 percent. Controlling for SES reduces the black-white gap to about 11 points. Blacks perform relatively worse in tests that are “culture-free” or “culture-fair”, in tests of abstract abilities, and in non-verbal tests. The variance of black IQ scores has been found to be lower than that of whites–by 40% according to one study.

In tests of scholastic achievement, whites and Asian Americans outscore blacks by about 1 standard deviation, as indicated by the Coleman Report. The gap is relatively constant from grades 1 through 12. Puerto Ricans, Mexican-Americans, and American Indians outscore blacks to a smaller degree.

The black-white disadvantage cannot be completely or directly explained by discrimination or inequitable schooling. Given that intelligence variation is strongly influenced by genetics, it is not unreasonable to propose that genetic differences may be involved in the black-white gap. While this hypothesis has been met with forceful condemnation in social science, “it has been neither contradicted nor discredited by evidence.”

Jensen formulated his position on the plausibility of the genetic explanation in the following way:

The fact that a reasonable hypothesis has not been rigorously proved does not mean that it should be summarily dismissed. It only means that we need more appropriate research for putting it to the test. I believe such definitive research is entirely possible but has not yet been done. So all we are left with are various lines of evidence, no one of which is definitive alone, but which, viewed all together, make it a not unreasonable hypothesis that genetic factors are strongly implicated in the average Negro-white intelligence difference. The preponderance of the evidence is, in my opinion, less consistent with a strictly environmental hypothesis than with a genetic hypothesis, which, of course, does not exclude the influence of environment or its interaction with genetic factors.

Jensen then went on to enumerate various points that he viewed as especially relevant for understanding the causes of the black-white IQ gap. He noted that no one has managed to show that controlling statistically for environment and education would equalize black and white IQs. Then he drew attention to the fact that the proportion of mentally retarded children (defined as IQ<75) is higher in black families than white families at all SES levels. While an environmental hypothesis would predict less of a difference at high SES levels, that is not in fact observed. A genetic hypothesis supplies a ready explanation for this observation: regression to the mean.

Research shows, in fact, that low-SES white children outscore high-SES blacks, on average, and it seems improbable that the cultural opportunities available to poor white children would be superior to those available to middle- and upper-class black children. While environmental explanations of the regression effect have been devised, they often seem to strain credibility.

Environmental explanations of group differences are usually ad hoc in nature: they provide a plausible explanation for the particular case that they were devised to explain but lack generality across situations. The existence of an environmental difference is never a sufficient causal explanation of a group difference. Jensen used the example of father absence as an explanation of the black-white IQ gap, pointing out that research does not support the idea that father’s presence or absence makes an independent contribution to IQ or scholastic outcomes.

The Coleman Report assessed many socioeconomic and environmental factors often believed to be major sources of individual and group differences in scholastic performance, such as reading material and cultural amenities in the home, structural integrity of the home, foreign language in the home, preschool attendance, parents’ education, parents’ educational desires for child, parents’ interest in child’s school work, time spent on homework, and child’s self-concept (self-esteem). These factors were all correlated with scholastic performance within races and ethnic groups, but they were not systematically related to group differences. For example, American Indians were more disadvantaged when it came to environmental factors than blacks, yet they outscored blacks in ability and achievement tests. [Note]

Black infants have been found to develop precociously, especially motorically, when compared to white infants. Developmental precocity correlates negatively with parental SES in whites. High-SES black infants are more precocious than high-SES white infants, while no difference has been found between low-SES black and white infants. These findings can be considered in light of the fact that adverse prenatal, perinatal, and postnatal complications lead to developmental delay. Black precocity in comparison with whites is also found for certain physiological indices of development, such as the rate of ossification of cartilege. Black babies also mature at a lower birth-weight than white babies.

In adults, the largest sampling of white and black IQ scores comes from the administration of the the Armed Forces Qualification Test (AFQT) to representative samples of millions of men. As of 1966, the overall failure rates for whites and blacks in the test were 19 percent and 68 percent, respectively, with an eligibility cut-off point equivalent to an IQ of 86 or so. Approximately half of the black families were middle-class and above when the AFQT was administered, so “even if we assumed that all of the lower 50 percent of Negroes on the SES scale failed the AFQT, it would still mean that at least 36 percent of the middle SES Negroes failed the test, a failure rate almost twice as high as that of the white population for all levels of SES.” [Note]

Jensen ended his discussion of race differences here, asking, perhaps rhetorically, whether such findings question the credibility of exclusively environmental explanations of observed differences.

1.17. Raising intelligence

Jensen noted that the cognitive demands of work are rising, and there will be increasingly fewer jobs available for low-IQ individuals. Thus the advantages of raising intelligence seem obvious. He noted that a small cognitive elite, perhaps 2 percent of the population, is probably responsible for civilizational advances, but that the rest are able to assimilate and enjoy the consequences of the advances. He also noted that tests and degrees can become barriers to entry to jobs where high cognitive ability is correlated with performance but is not required. He suggested that making people more adept at the “essential requirements of a given job” is a more feasible goal than raising intelligence or academic achievement.

Jensen wrote that that given the high heritability of IQ and the threshold nature of environmental effects on it, solely improving the environment of the economically disadvantaged would not be expected to lead to large IQ gains. He criticized an argument by Milton Schwebel, according to whom providing adequate environmental conditions for the currently environmentally deprived children (estimated at 26 percent of the population) would boost their IQs by 20 points. Jensen noted that this is unrealistic considering that it would boost the IQs of the deprived above those of the non-deprived already enjoying adequate environments.

Jensen argued that not only educational services but also public health, social services, and welfare and employment practices are important for boosting intelligence. However, he warned that direct improvements in the environment could have indirect biological consequences as well.

1.18. Dysgenics

Jensen reviewed data on fertility and IQ, concluding that white fertility does not appear to be dysgenic, but that black fertility appears to be so. [Note] On the latter finding, he wrote:

Is there a danger that current welfare policies, unaided by eugenic foresight, could lead to the genetic enslavement of a substantial segment of our population? The possible consequences of our failure seriously to study these questions may well be viewed by future generations as our society’s greatest injustice to Negro Americans.

1.19. Intensive educational interventions

Jensen noted that while large-scale compensatory programs have made little difference for the disadvantaged, much more positive results have been obtained from intensive small-scale experiments “where maximum cultural enrichment and instructional ingenuity are lavished on a small group of children by a team of experts.”

Small enrichment and cognitive stimulation programs have been found to result in gains of around 5–20 points for IQ, and around 0.5–2 standard deviations for scholastic achievement. An analysis by Rick Heber of 29 intensive preschool programs found an average gain of 5 to 10 IQ points at the end of preschool. More gains in IQ are seen when the program comprises special cognitive training. The most intensive programs go beyond the classroom and involve daily sessions in the child’s home, and some programs of this sort have reported gains of up to 20 IQ points. Gains appear to be restricted to children from deprived backgrounds, and are not seen in non-disadvantaged children.

Next, Jensen cast doubt on the findings of intensive intervention studies by pointing to certain limitations in them. These include:

Lack of control groups in many studies.

Selection of kids for programs on the basis of low IQ, which means that they tend to achieve gains simply due to the regression to the mean. Studies with control groups almost always show a regression effect in the control group.

Relative ease of achieving IQ gains in small children–for example, getting two additional Stanford-Binet items right boosts IQ from 85 to 93 at age four, while at age 10 the boost is from 85 to 88.

Materials similar to those used in IQ tests are often found in nursery schools, which may explain IQ gains. Jensen recounted once visiting “an experimental preschool using the Stanford-Binet to assess pretest—post-test gains, in which some of the Stanford-Binet test materials were openly accessible to the children throughout their time in the school as part of the enrichment paraphernalia.” [Note]

Pre-intervention IQ scores are often poorly measured because small children from deprived backgrounds are unfamiliar with the testing situation. Gains during preschool programs may be gains in test-savviness rather than ability.

It is not clear what the psychometric nature of intervention gains in IQ is–are they on g, or on something less important? [Note]

It is notable that gains from enrichment programs are of a similar magnitude and durability as the effects of direct coaching and practice on IQ.

The fadeout effect has been widely observed. Gains in IQ due to enrichment programs have a strong tendency to vanish over time. [Note]

Dubious scalability of small-scale interventions.

Enrichment programs often work by precipitating the acquisition of skills and knowledge that children would acquire anyway at a somewhat later age. Children can learn many thing “prematurely” by associative/rote learning, but that does not mean that complex cognitive structures are being developed.

Next, Jensen reviewed some prominent intervention studies:

The Indiana Project involved deprived, low-IQ Appalachian white children five years of age. A special year-long kindergarten program resulted in gains of 4–10.8 IQ points when compared to control groups.

The Perry Preschool Project in Ypsilanti, Michigan, involved disadvantaged, low IQ children and their parents, seeking to remedy especially verbal skills. There was a gain of 8.9 IQ points after one year of the preschool, but by the end of second grade the gain had faded to 1.6 IQ points, a non-significant difference from the control group.

The Early Training Project run from the Peabody College was an enrichment and cognitive stimulation program that involved disadvantaged children and their mothers. Four years after the start of the program, the experimental group had gained 7.2 IQ points over a control group.

The Durham Education Improvement Program was a preschool program for children from impoverished homes. The participants attained average gains of 2.62 to 9.27 IQ points, depending on the test.

The Bereiter-Engelmann program at the University of Illinois sought to teach specific cognitive skills and scholastic knowledge in small groups. Gains over 18 months were 8–10 IQ points and higher for specific content tests. No control group was used. According to Jensen, the program showed that scholastic performance is easier to boost than IQ, at least in the early years.

A preschool program by Merle Karnes at the University of Illinois attempted to ameliorate specific learning deficits in disadvantaged 3-year-olds. By age 4 the experimental group gained 19.7 IQ points over the control group. A small sample size and lack of longer-term follow-up preclude strong interpretations of these findings.

According to Rosenthal and Jacobson’s famous Pygmalion experiment, low expectations by teachers explain why disadvantaged children perform relatively poorly on IQ tests. They tested the effect of manipulating teachers’ expectations on the IQ development of randomly chosen students, and found that it significantly boosted IQs. Jensen noted that there were various deficiencies in the design of the experiment, and in how the results have been analyzed, and suggested that it be “replicated under better conditions before any conclusions from the study be taken seriously or used as a basis for educational policy.” [Note]

Jensen tentative overall conclusion regarding attempts to increase IQ was that the payoff from preschool and compensatory education programs is small. He thought that improving scholastic performance was a more feasible goal and one that should be emphasized in lieu of attempts to boost IQ. Educators should also assess gains using tests of specific skills rather than IQ. Jensen thought that raising general intelligence was more in the province of the biological sciences than psychology and education. He also thought that the goal of making disadvantaged children indistinguishable from middle-class children was unrealistic.

1.20. Level I and II abilities

Jensen postulated the existence of two types of learning ability: Associative learning ability (Level I), and cognitive or conceptual learning and problem-solving ability (Level II). Level I abilities involve relatively little transformation of the input, and there is a high correspondence between the stimulus input and the response output. Digit memory, serial rote learning, and recall of visually or verbally presented materials are some tasks that are thought to tap into Level I abilities. Level II abilities, in contrast, involve transformation and elaboration of the stimulus input to construct a response. Tests with a high g loading and a low cultural loading, such as Raven’s matrices, tap into Level II abilities.

The significance of the Level I/Level II distinction is that while there are large racial and SES differences in Level II abilities (favoring white and middle- and upper-class individuals), group differences in Level I abilities are small. Jensen therefore believed that the distinction was of a “great potential importance to the education of many of the children called disadvantaged.” Because traditional classroom instruction was principically developed to be compatible with the ability patterns of middle-class students, schools place a greater emphasis on cognitive learning than associative learning. To maximize the potential of students from different genetic and cultural backgrounds, associative learning should be given a greater role in teaching.

While Jensen thought that more research was needed to fully substantiate the Level I/Level II taxonomy [Note] , he nevertheless regarded it as a highly promising way of thinking about learning, and closed the article by proposing that schooling should not be uniform but rather reflect the diversity of human abilities:

If diversity of mental abilities, as of most other human characteristics, is a basic fact of nature, as the evidence indicates, and if the ideal of universal education is to be successfully pursued, it seems a reasonable conclusion that schools and society must provide a range and diversity of educational methods, programs, and goals, and of occupational opportunities, just as wide as the range of human abilities. Accordingly, the ideal of equality of educational opportunity should not be interpreted as uniformity of facilities, instructional techniques, and educational aims for all children. Diversity rather than uniformity of approaches and aims would seem to be the key to making education rewarding for children of different patterns of ability. The reality of individual differences thus need not mean educational rewards for some children and frustration and defeat for others.

1.21. In a nutshell

Much of Jensen’s article was an explication of basic theories and findings from psychometrics and quantitative genetics. Given that the foundational ideas and many excellent empirical studies in those fields date to the first half of the 20th century, Jensen’s article was built on a firm foundation, and there are many things in the article that can still be read with profit. His critical comments on what can and cannot be learned from experiments seeking to boost intelligence and educational achievement remain germane, too.

Jensen’s answer to the two-part question posed in the title of his paper was that there appears to be not much that can be done to boost IQ, but that scholastic achievement is much more amenable to intervention. The latter conviction stemmed from the fact that, firstly, much more than IQ is involved in school success, and, secondly, that those non-IQ things are either reasonably evenly distributed across races and social classes (Level I abilities), or strongly influenced by the shared environment and therefore presumably more modifiable. With the benefit of hindsight, both of the claims about school achievement seem overly optimistic. As detailed in the notes to the synopsis, Jensen had a misplaced faith in the importance of what he called Level I abilities, and his estimate of the contribution of the shared environment to academic achievement appears much too high in light of later research.

As Johnson (2012) pointed out much later, Jensen’s preparation for his article was so thorough that he anticipated just about all the criticisms that were to be presented against his thesis. The attacks on the article rarely came from perspectives that he had failed to consider. It was more a question of how much importance and credibility Jensen versus his critics attributed to a given viewpoint. Nevertheless, much less was known about intelligence and differences between races in those days, and the article was only a starting point for Jensen’s later work.

2. Updating Jensen’s model of race and IQ

2.1. The development of Jensen’s program

It was a stroke of good fortune for race realism as a paradigm and research program that Arthur Jensen turned his considerable talents to questions of intelligence and group differences. Looking at his early publications from the 1950s through the mid-1960s, it was not at all obvious that he was to have a defining influence on the study of intelligence and race. Early on, he published little or nothing on these topics, and his research was focused on the experimental study of human learning, and clinical and personality psychology. His psychological education at Berkeley and Columbia had been behaviorist and psychoanalytic in theoretical orientation, and it was only through self-study and personal contacts with individual differences researchers and geneticists that he was eventually able to escape those then fashionable intellectual dead-ends. [Note] He pointed to his sojourn in Hans Eysenck’s lab in London as a turning point in his career, reflecting on its importance in the following manner decades later:

I emphasize my postdoctoral work with Eysenck, because I believe it planted the seeds of virtually everything I have done since then. It put me on the path that I have followed, in one way or another, for all of my later research. Although each of the many subsequent byways could not have been anticipated, they all led more or less consistently in one general direction–what came to be known as the London School of differential psychology, originated by Galton and with Spearman, Burt, and Eysenck successively as its leading exponents. (I knew personally only Eysenck and Burt.) The London School is not really a school or even a doctrine or a theory. Rather, it is a general view of psychology as a natural science and as essentially a branch of biology. Its central concern is variability in human behavior. It is Darwinian in that it views both interspecies variation and an important part of intraspecies variation (both individual and group differences) in certain classes of behavior as products of the evolutionary process. It is behavior-genetic in that the evolutionary process depends upon genetic variation and selection, and the neural basis of behavioral capacities is subject to these evolutionary mechanisms the same as other physical characteristics. It is quantitative in that it emphasizes the objective measurement and taxonomy of behavior and the operational definition of latent traits or hypothetical constructs. It is analytical in that it subjects quantitative data to mathematical formulation and statistical inference. It is experimental in that it typically obtains measurements, both behavioral and physiological, under specifically defined and controlled conditions. It is reductionist in that it aims theoretically to explain complex phenomena in terms of simpler, more elemental processes. It is monistic (as opposed to dualistic) in that it neither posits nor seeks any explanatory principle that does not consist of strictly physical processes: it views complex psychological phenomena as emerging solely from interactions among more elemental neurophysiological processes and their past and present interactions with environmental conditions. –Jensen, 1998a

In an early article reviewing theories of personality, such as they stood in the 1950s, Jensen pointed out that “theories in psychology are seldom disproved; they just fade away” (Jensen, 1958). This ephemerality and non-incrementalism make much of research in psychology frivolous. In contrast to that, through Jensen’s work race realism became anchored in some of the strongest, most permanent ideas in the social and behavioral sciences, ones that have not shown signs of fading away over the last 100 or more years.

What started as a provisional hypothesis about race and IQ in the 1969 paper grew, over decades, into a full-blown research program. Partly to fill in lacunae in the research literature and partly to respond to criticisms of his arguments, Jensen ended up studying almost every aspect of the problem with characteristic thoroughness. An early outline of this research program is presented in his 1973 book Educability and Group Differences. This book, which is still worth reading, is an update and extension of the 1969 article, and is both more tightly argued and more confident and expansive in its outlook. It contains at least an inkling of everything Jensen was to pursue in his research until his death in 2012.

The next milestone among his publications was Bias in Mental Testing, a 1980 tome that was instrumental in putting to rest the once popular conjecture that test bias is an important explanation of the black-white IQ gap. In 1985, Jensen published “The nature of the black–white difference on various psychometric tests: Spearman’s hypothesis”, a Behavioral and Brain Sciences target article where he laid out the case that the general factor of intelligence, or g, was the main locus of black-white IQ differences (Jensen, 1985). As I will outline later, Spearman’s hypothesis is important because of the specificity that it confers to the IQ gap.

The elaboration of the meaning and significance of the g construct became a great preoccupation of Jensen’s career. While the question of the structure of cognitive ability is logically separate from the question of race differences, Jensen’s monistic view of differences within and between groups meant that his research interests fed fruitfully into each other. He formulated his monism in this way:

There is fundamentally, in my opinion, no difference, psychologically and genetically, between individual differences and group differences. Individual differences often simply get tabulated so as to show up as group differences—between schools in different neighborhoods, between different racial groups, between cities and regions. They then become a political and ideological, not just a psychological, matter. –Jensen (1973)

According to Spearman’s hypothesis, the nature of g and the nature of race differences are intertwined. In 1998, Jensen published his magnum opus, The g Factor: The Science of Mental Ability. This book remains the go-to source for anyone wishing to understand cognitive ability. The book, especially its chapters 11 and 12, contains Jensen’s fullest statement of his research program on race differences in IQ. He wrote about what he termed the default hypothesis in this way (p. 444):

The default hypothesis states that human individual differences and population differences in heritable behavioral capacities, as products of the evolutionary process in the distant past, are essentially composed of the same stuff, so to speak, controlled by differences in allele frequencies, and that differences in allele frequencies between populations exist for all heritable characteristics, physical or behavioral, in which we find individual differences within populations. With respect to the brain and its heritable behavioral correlates, the default hypothesis holds that individual differences and population differences do not result from differences in the brain’s basic structural operating mechanisms per se, but result entirely from other aspects of cerebral physiology that modify the sensitivity, efficiency, and effectiveness of the basic information processes that mediate the individual’s responses to certain aspects of the environment. […] The population differences reflect differences in allele frequencies of the same genes that cause individual differences. Population differences also reflect environmental effects, as do individual differences, and these may differ in frequency between populations, as do allele frequencies.

Jensen denied that there are essential, qualitative differences in the cognitive abilities of blacks and whites. Rather, he stated that the differences are accidental (in the philosophical sense) and quantitative. Smart blacks and smart whites and stupid blacks and stupid whites are smart and stupid in the same ways, and black-white differences are of the same nature as the differences between smart and stupid subgroups of whites or blacks. The between-race differences are due to the same causes, genetic and environmental, that operate within each race. [Note]

2.2. Modern restatement of Jensen’s model

Using CFA terminology (for definitions, see e.g., Brown, 2014), Jensen’s default model, or my interpretation thereof, has the following properties:

A latent, reflective g factor is the largest source of variance and covariance in intellectual tasks among individuals. The black-white IQ gap is around 1 standard deviation, favoring whites, and is mainly (>>50 percent) a g factor gap. There are black-white differences in non-g factors and test specificities as well, but these are minor compared to the g gap and sometimes directionally favor blacks. “Black” and “white” refer to self-identified groups. The g factor can be measured without bias in blacks and whites using commonly administered IQ tests. In other words, strict measurement invariance with respect to race is usually attainable. Individual differences in IQ in both blacks and whites are predominantly (up to 80% in adults) due to additive genetic influences. Differences between whites and blacks in IQ scores are mainly (>50 percent) due to differences in the frequencies of alleles influencing g. The same alleles influence both within-race and between-race differences. Some (<50 percent) of the IQ gap may be due to non-genetic influences, including microenvironmental effects (e.g., lead poisoning) and indirect genetic effects (“genetic nurture”), especially when IQ is measured in childhood. Sociological and social-psychological causes, such as discrimination, have at best a minor effect on the gap. The model properly concerns the black-white IQ gap in the United States only, but to the extent that white and black Americans can be considered as representative samples of the indigenous populations of Europe and sub-Saharan Africa, the model has much to say about the rest of the world as well.

This list is, firstly, a general conceptual representation of the default model, and all empirical analyses of the black-white gap can be thought of as tests of this model, even if many of the listed properties are not necessarily explicitly considered in such analyses. Secondly, it can be considered as a description of a single structural equation model (SEM) that at least potentially incorporates all the listed properties in a single analysis. I will discuss the possible identifiability (i.e., whether its parameters can be estimated from data) of such a model below.

Theoretical impetus for this model derives, besides Jensen’s work, especially from an important paper by Lubke et al. (2003) where it is argued that psychometric measurement invariance has strong causal implications in the study of group differences: when measurement invariance holds, the factors that explain differences within groups must also explain them between groups; the sources of group differences are the same as the sources of individual differences within each group. Given that strict measurement invariance for IQ appears to usually hold between black and white Americans (e.g., Dolan, 2000; Dolan & Hamaker, 2001; Lubke et al., 2003; Trundt, 2013; Frisby & Beaujean, 2015; Lasker et al., 2019), the multigroup measurement invariance model can be seen as a phenotypic manifestation of Jensen’s default hypothesis.

Given the causal implications of the measurement invariance model, a study finding that invariantly measured latent factors and g in particular are highly heritable in both blacks and whites would shift the weight of evidence, buttressing the default model. It would not, however, be a direct test of the model. To directly investigate the etiology of race differences, we need to combine psychometric models with biometric ones. Most biometric models deal only with variances and covariances (or correlations) among observed variables, while the means of the variables, and thus group differences, are ignored. However, as shown in Dolan et al. (1992), it is possible to incorporate means in biometric models, for example in the classical twin model.

A practical example of biometric multigroup modeling is Rowe and Cleveland’s (1996) study of academic achievement in a sample of black and white children. This study—which found evidence of a genetic contribution to black-white differences—is an important proof of concept, but the fact that the sample used was small and unrepresentative, and consisted of pairs of full and half siblings (rather than MZ and DZ twin pairs which can be analyzed in a more powerful and better understood framework) limits its credibility. Another limitation is that the study was based on observed test scores, making it vulnerable to the criticism that the results simply recapitulate the biases of the tests used.

2.3.1. The model

Black-white IQ differences can be depicted as an SEM model that incorporates (most of) the properties enumerated at the beginning of the previous section. While the biometric part of the model appears to not be a practical tool at the moment, I will discuss its properties because it is of theoretical interest and because I think that it holds much promise and could be further developed.

The model can be divided into psychometric and biometric stages. The first, psychometric stage would look like this (click for a larger image):

To clarify my notation, circles are latent variables to be estimated and rectangles are observed variables (tests), while the triangle denotes a mean difference between whites and blacks (the “1” in the triangle is simply a notational convention). The parameters of the model are indicated with letters and numbers on the paths that connect the variables. The arrowheads show the direction of causality for each association between the variables in the model. For notational simplicity, it is assumed that the variances of all variables are unity.

There are five tests in the model, and their covariances are accounted for by the latent g factor. The residuals of the tests (e 1 –e 5 ) not explained by g represent measurement error and variance specific only to a particular test. The model is formally a multigroup confirmatory factor model, with the white submodel on the left and the black submodel is on the right.

The first stage of the model can also be called the strict measurement invariance model, because its purpose is to establish that strict invariance holds between whites and blacks. An analysis of measurement invariance involves establishing that the following four conditions are met in a multigroup confirmatory factor model (note that each condition is a precondition for the next one on the list):

Configural invariance: The number of common factors is the same across groups, and the same indicators (tests) load on the same factors in all groups. Metric invariance: The factor loadings are the same across groups. Scalar invariance: The intercepts of the indicators are equal across groups, i.e., mean differences on indicators between groups are consistent with the size of the factor loadings and can be attributed to the latent common factor(s). Strict invariance: Variances of the residuals are equal across groups.

If these conditions are met in a sequential testing scheme where various indices of fit can be used, differences between groups can be attributed to the same latent factor (or factors) that causes differences within groups. See Lubke et al. (2003) for more discussion of measurement invariance and group differences. I have discussed the topic here, too.

It can be seen that exactly the same variables and paths are included in the black and white submodels. This indicates that configural invariance is expected to be true. The factor loadings a–e are denoted with the same latters in blacks and whites, which means that they are constrained to be equal across races. This corresponds to metric invariance.

The only difference between the black and white models are the values on the paths from the triangle to the g factors. In whites, this value is fixed at 0, while in blacks this parameter, Δ, is estimated in relation to the white value of 0. Blacks and whites are therefore allowed to differ in their mean value on g, and the mean difference is estimated as a contrast between the races, rather than as a difference on some absolute scale. If all black-white differences in tests #1–#5 can be explained by differences in the mean value of g (while constraining factor loadings to equality between races, i.e., metric invariance), scalar invariance holds. Finally, it can be seen from the graph that the loadings of the tests on their unique residuals (s 1 –s 5 ) must be constrained to equality between blacks and whites. Given that the variances of the residuals are unity, this guarantees that strict measurement invariance holds. It should be emphasized that the means of the residuals of all tests are zero. Only mean differences in g are permitted to explain mean differences in test scores between races. (The model can also accommodate race differences in the variances of latent abilities, but that possibility is ignored here.)

If the first stage of the model fits the data well, we can conclude that all individuals who have the same amount of latent ability have the same scores, on average, on all five tests, regardless of their racial identity. Therefore, the tests are unbiased and all black-white differences in them can be attributed to g.

While the end point of measurement invariance investigations is normally the discovery of greater or lesser invariance and perhaps the estimation of latent mean differences, the current model would go further by examining why the two races differ on g. In particular, biometric modeling enables the causal attribution of mean differences to genetic causes, shared (familial) environmental causes, and other, non-shared causes (accidents and other unique experiences)–other potential sources of influence, such as gene-environment correlations and interactions could possibly also be included but they are ignored here. The proportions of the IQ gap that come from biometric sources could therefore be estimated.

The second, biometric stage of the model can be depicted as this graph (click for a larger image):

The invariant parameter values (a–e and s 1 –s 5 ) that were estimated in the first stage are reused in the second stage, i.e., the parameters are fixed to previously obtained values. Some other parameters are fixed to zero. The parameter values marked in red are estimated in the second stage. The biometric variables included are A, C, and E, which correspond to genetic effects, shared environmental effects, and residual effects, respectively. The graph does not specify how the biometric components are estimated, but the classical twin model would be an obvious choice.

However, the model depicted above would not in practice be identified. To estimate the contribution of biometric components to group differences, there must be at least one more phenotypic variable than the number of biometric components included. A solution to this problem would be to remove the g factor from the model and estimate genetic and environmental influences on the five tests, as in this graph:

The first stage established that any differences between whites and blacks must be due to g. The three biometric variables explain 100 percent of the g variance, and, considered together, are perfectly isomorphic with g. The values on the paths from A, C, and E to tests #1–#5 (i.e., α 1 –α 5 , β 1 –β 5 , and γ 1 –γ 5 ) are partial regression weights. In a standardized model, the squared values of these regression weights are equivalent to the proportions that genetic, shared environmental, and residual effects on g explain of the test’s variance. For example, if the estimate for the value of the path α 1 is .80, the proportion of variance that genetic effects on g explain of the test’s variance is .802 = .64. This is not necessarily equal to the full heritability of the test because its residual can also be genetically influenced. However, the first stage of the model guarantees that there are no black-white mean differences in the residuals of any of the tests, so whatever genetic or environmental effects there are on the residuals can be ignored.

The biometric model would in practice be estimated in two steps. At first, racial mean differences in the tests are ignored, and the values of the biometric parameters for each test are estimated. It should be noted that the model requires that the sizes of the biometric parameters be the same in blacks and whites (or at least that the ratios of the unstandardized biometric variances are the same across races; see Dolan et al., 1992). This may seem like a highly limiting constraint, but it should be noted that the heritabilies and environmentalities of IQ are generally quite comparable between whites and blacks (Pesta et al., 2020). Furthermore, the determinants of a latent, race-invariant g factor would be expected to be particularly similar across races. In any case, this is something that can be tested.

After the best values for genetic, shared environmental, and residual paths for each test have been obtained through model comparison (e.g., ACE vs. AE vs. ADE models), the means are added to the model and the model is fitted again. The triangle connected to the biometric components in the graph specifies that whites and blacks may differ in the mean values of any of the three biometric components influencing g; to facilitate model identification, white means are fixed to 0, and black means are estimated in relation to the white means. This is the crucial test of whether black-white differences in the tests and thus in g can be reproduced from means differences in the latent biometric variables. If the fit of the model does not deteriorate when means are added to it, the genetic and environmental factors that account for individual differences also account for black-white differences.

The model will give estimates of mean black-white differences on the three biometric factors. Thus, it is, in principle, capable of providing exact answers to questions at the center of the race and IQ controversy. However, it appears to be the case that if a latent, reflective g does underlie the black-white IQ gap, then the biometric part of the model delineated will be empirically unidentified. This is because the genetic and environmental loadings of the tests on the biometric factors will not provide non-redundant information; given that g is the only source of common variation in the tests, the loadings of the different tests will be the same up to a multiplicative constant.

There are several other preconditions that must be met for the model to be usable. First, the tests must have a joint normal distribution, at least if maximum likelihood with its more moderate sample size requirements is used for parameter estimation. Second, as already mentioned, there must be at least one more test available than the number of biometric components included–i.e., if A, C, and E are estimated, there must be at least four tests (however, as pointed out in Dolan et al., 1992, C and E can be collapsed into a single environmental variable that is correlated between twins, reducing the number of tests needed by one in the classical twin model). Third and perhaps more problematically, the tests must be congeneric, i.e., a single common factor must account for all of their covariances in both blacks and whites.

The demands that this model puts on the data greatly limit its usability. However, given the flexibility of the SEM framework, it seems likely that there are ways to relax some of these requirements. For example, it might be possible to bypass the congenericity requirement by using a bifactor model. This would involve specifying an invariant bifactor model of black-white differences in the first stage, as in Frisby & Beaujean (2015). In the second stage, the loadings on the non-g factors and factor means from the first stage would be retained, while the g factor would be replaced with biometric factors. It would seem that in this setting it might be possible to biometrically decompose mean differences in a g factor that is derived from non-congeneric indicators.

However, regardless of the congenericity requirement, it appears that the nature of g would make the biometric model of the means empirically unidentified, as discussed above. In technical terms, this is because the g model is a common pathway model while the biometric mean model is an independent pathway model (for definitions, see Franic et al., 2013). Would it nevertheless be possible to make the model identified by adding some kinds of information to it, in the way same way that certain identifying assumptions of the classical twin model can be relaxed if additional information is available (cf., Derks et al., 2006 and Dolan et al., 2019)? This is one interesting problem for hereditarians to pursue. Meanwhile, however, other approaches, such as the admixture method discussed later, will have to be used to estimate the contribution of genetic differences to the black-white gap.

2.3. Possible counterarguments to the default model

2.3.1. Race as a unit of analysis

It is often suggested that because races are not “natural kinds”, using them as units of genetic analysis is mistaken. After all, one could find reasonable justifications for lumping or splitting the genetic diversity of humanity differently from how it is done in, say, American discussions of race. The answer to this argument is, firstly, that biological taxonomy is not about “natural kinds.” Jerry Coyne pointed out that it is easy to justify human racial differentiation if one uses the usual biological criteria, rather than the non-biological criteria of “naturalness”:

What are races?



In my own field of evolutionary biology, races of animals (also called “subspecies” or “ecotypes”) are morphologically distinguishable populations that live in allopatry (i.e. are geographically separated). There is no firm criterion on how much morphological difference it takes to delimit a race. Races of mice, for example, are described solely on the basis of difference in coat color, which could involve only one or two genes. Under that criterion, are there human races?



Yes. As we all know, there are morphologically different groups of people who live in different areas, though those differences are blurring due to recent innovations in transportation that have led to more admixture between human groups. How many human races are there?



That’s pretty much unanswerable, because human variation is nested in groups, for their ancestry, which is based on evolutionary differences, is nested in groups. So, for example, one could delimit “Caucasians” as a race, but within that group there are genetically different and morphologically different subgroups, including Finns, southern Europeans, Bedouins, and the like. The number of human races delimited by biologists has ranged from three to over thirty.

I like the pragmatic perspective advocated in Fuerst (2015), according to which a biologically valid unit of analysis is one that is deeply enmeshed in some research program. If you want to argue that race is not a valid variable to use in such research, abstract conceptual arguments have no bite; you have to actually show that the actual way race is used in a research program is invalid. In hereditarian research on the black-white IQ gap the only thing that really needs to be true about race is that self-identified whites differ in terms of many allele frequencies from self-identified blacks—and this is something that is trivial to demonstrate these days (graph from Kirkegaard, 2019):

Given that “everything is heritable” (Polderman et al., 2015), almost any socially salient group will differ from others due to inherited causes in at least some ways. That is why I have never found the “race is not real” argument even remotely persuasive. Much less biologically “real” categories than race are perfectly suitable for genetic analysis.

Another pragmatic reason for using race in genetic analyses is that race is one of the central explanatory variables used by non-hereditarians in American social science. You cannot foreground race in the analysis of all social problems and expect others to completely ignore its obvious genetic correlates.

2.3.2. Are hereditarians overtly egalitarian?

A possible, rather ironical counterargument to the default model follows from the fact that many genetic variants causally associated with IQ are rare and may be “private” to specific populations. Whereas the hereditarian model posits that race differences in intelligence are quantitative (all races have the same abilities, with substantial differences only in central tendencies and, possibly, variances), differences in the genetic architecture of cognition open up the possibility of deeper, essential differences. For example, the analysis by Hill et al. (2018) suggests that close to half of the genetic contribution to IQ variation is due to single nucleotide polymorphisms with a minor allele frequency in the range of 0.001–0.01. Many or most variants that are so rare may be private to specific populations. If the genetic basis of intelligence differences is substantially different in different races, is it reasonable to assume abilities to be at all comparable between races? Have hereditarians underestimated the nature and extent of racial differences?

I think that the challenge of this kind of causal heterogeneity is more apparent than real. Genetic differences between races are of interest to behavioral science because of their downstream, phenotypic effects—on cognitive ability in this case. Downstream effects are a psychological and psychometric rather than genetic issue, and we already know that, to the best of our knowledge, cognitive ability does not differ qualitatively between whites and blacks. The fact that a certain causal genetic locus is polymorphic in one population while the same locus is homozygous in all other populations does not threaten the validity of population comparisons for the phenotype in question. From a causal perspective, such loci are no different from loci that are fixed versus polymorphic among different subpopulations of the same population. For example, if the average additive effect of some allele on IQ is x versus 0 for the other allele, then the effect of the (biallelic) locus in members of the “polymorphic” population is either 0, x, or 2x while in all members of other, “fixed” populations the effect is 0 or 2x, depending on which allele is fixed.

I think that causal heterogeneity is true at least to some extent, but that it does not pose a conceptual problem for the default model. It does, however, make genetic variant discovery harder. The ability to explain all heritable group differences in terms of individual alleles is in the distant future. Therefore, methods such as twin studies and admixture analyses that can indirectly estimate the total effect of all genetic variants remain important.

2.3.4. Modifiability of highly heritable characteristics

It is often said that high heritability does not imply that the phenotype is not modifiable. This is true in a purely logical sense: genes work through physical mechanisms in which we can in principle intervene just like in any other physical mechanisms. However, something being possible in principle is quite different from it being actually possible here and now. We know that a specific dietary intervention helps allay the negative cognitive effects of phenylketonuria, but that is only the case because we have some mechanistic understanding of this rare Mendelian condition. In contrast, we certainly do not understand how the thousands of genetic polymorphisms that are involved in the normal, quantitative variation in IQ exert their influence. Understanding and changing the functioning of up to thousands of polymorphisms, each with a tiny effect on the phenotype, is a very different undertaking than manipulating a single large-effect Mendelian trait—and, in fact, most Mendelian disorders are much less intelligible and treatable than phenylketonuria.

In practice, most attempts to boost intelligence are educational and cognitive-psychological in nature. They do not rely on any molecular-level understanding about the mechanisms of intelligence. Instead, they are based on various hunches about how intelligence might possibly work. The track record of these efforts in producing permament ability gains is predictably poor (e.g., Protzko, 2015; Sala & Gobet, 2017). This has not stopped people from discounting heritability as uninteresting or as lacking in policy relevance. In effect, they have argued that it is sufficient that we can mentally conceive of environmental changes that would equalize outcomes. Sesardic (2005, p. 84–85) noted that such arguments represent the “curious triumph of the possible over the actual.” I think we should dismiss the modifiable-in-principle argument as nonsense, just like we dismiss people promising miracle cures to currently incurable diseases.

Jensen suggested in his 1969 paper that IQ enhancement is more realistically in the purview of biology rather than psychology or education. From a hereditarian perspective, it is certainly possible to equalize white and black IQ distributions if one can resort to eugenics. There is plenty of IQ-linked genetic variation in blacks, and instituting policies that bias black reproduction so that high-IQ parents will have more children than low-IQ parents will inevitably lead to the closing of the black-white gap over time, assuming that white fertility remains less eugenic. In effect, this is what has happened in the case of some recently established black diaspora populations in Western countries. Because of immigrant selection based on educational credentials and financial resources, some expatriate populations (and their offspring) are substantially smarter, on average, than their coethnics in the old country. [Note]

2.3.5. Flynn effect: The revolution that fizzled out

In the last few decades, environmental or nurturist views have dominated social science, but this apparent victory of nurture over nature has much more often been based on skillful rhetoric rather than any firm empirical findings or theoretical breakthroughs. In contrast, hereditarianism, while constantly facing strenuous (but mainly rhetorical) challenges, has gone from strength to strength as an empirical research program, especially when it comes to the individual differences paradigm of mainstream behavior genetics.

Nurturism has nevertheless achieved one apparently great triumph: the discovery that raw scores on IQ tests have rapidly in