Or nearly so. I was planning to publish that blog article for the 31th December 2014. As you can see, I failed in this task, and didn’t finish in the right time. Anyway, I wrote this article, mainly because I am bothered that when people cite The Bell Curve the typical opponent responds with a link toward Wikipedia, specifically the part related to the “controversy” of The Bell Curve. It goes without saying that these persons did not read the books written in response to The Bell Curve. In fact, they have certainly read none of them. It is ridiculous to cite a book you didn’t read, but apparently, it does not bother many people, as I see.

For the 20 years of the book, I found appropriate to write a defense of the book. Or more precisely, a critical comment on the critics. I have decided to read carefully one of these books I can have access, and for what I have read here and there, it is probably the best book ever written against The Bell Curve. I know that Richard Lynn (1999) has already written a review before. But I wanted to go into the details. The title of the book I’m reviewing is :

Devlin, B. (1997). Intelligence, Genes and Success: Scientists Respond to the Bell Curve. Springer.

In fact, I have read that book some time ago, but didn’t find the need to read everything in detail. And I was unwilling to write a lengthy review. But I have changed my mind because of some nasty cowards.



Summary

Concerning the Devlin’s book, the title is somewhat disconcerting. “Scientists Respond to the Bell Curve” suggests to me they do not think Herrnstein & Murray as serious scientists, and perhaps not scientists at all. By the same token, they use an appeal to authority. I prefer the scientific way to discredit an argument rather than this.

Anyway, if I have to give a brief summary, I will say that I appreciate Carroll’s chapter. And also that of Glymour, even though I don’t like the way he expresses his ideas, by using complicated and obscure terms. I also appreciate Daniels et al.’s chapter, even though there are approximations in what they say, and what they have done. The Winship & Korenman’s chapter is not bad either. Finally, I am surprised by Belke’s chapter, because I expected it to be a bad and inaccurate summary, but it’s not. Generally, however, there are plenty of errors.

Now, concerning one of the main claims of the book, which was that the statistical methods employed by The Bell Curve are deeply flawed, I have some disagreement. The main argument that is often used is that the authors employed a weak measure of environmental variable, i.e., SES (a composite of two parental occupation variables, two parental education variables, and family income variable). But the authors already answered it; that was because IQ can mediate the link between these environmental variables and the outcome variable. Murray even believed they could have controlled for too much confoundings. In fact, the real problem with the analysis of Herrnstein & Murray is that they usually don’t include interaction effects, either between SES and IQ, age and SES (or IQ), race and IQ, gender and IQ, etc. Instead, they simply add age and/or gender as single variable(s). Usually, they ignore the variable of race, and focuse on the white population. They could have improved their analysis, but given the large amount of peer-reviewed articles I have read, I am left with the impression that a great number of social scientists would have done the analysis the same way Herrnstein & Murray did, i.e., without interaction effects. I am very confident when I say that interaction effects are rarely used in social science. If Herrnstein & Murray are bad scientists, I am afraid that there are many more of bad scientists than what Devlin et al. could believe.

Ok. Enough talking. Let’s go…

Review

Chapter 1

Reexamining The Bell Curve

Stephen E. Fienburg and Daniel P. Resnick

Fienberg and Resnick (1997) present the authors. Murray was the author of Losing Ground, in which the argument was that the welfare system is costly, counterproductive and discourage people to work. Herrnstein was the author of I.Q. in the Meritocracy, in which the argument was that intelligence and social status have a genetic component. They then (p. 5) present the main argument of The Bell Curve; high heritability of IQ, high predictivity of IQ, genetically mediated socioeconomic differences.

Fienberg and Resnick pursue (pp. 6-8) in narrating the eugenics movement. Karl Pearson and Francis Galton were specialized in statistics, through which they tried to understand the laws of inheritance. They remark that the government did not pay enough attention to biological factors. They believed that training and education cannot create intelligence. Intelligence must be bred. Pearson was fully aware of the problem of inferring causation from correlation, so the statistical methods needed to be improved, and more sophisticated techniques were later employed. Ronald Aylmer Fisher, geneticist and statistician, was also an important figure in the eugenics movement. The statistical tools that underpin the studies of genetics and IQ originate from Fisher. Although brief, the remarks made by Fienberg and Resnick give the impression that all these three scientists did not lack any intellectual integrity.

The telling of how the paradigm has shifted, from nature to nurture, is obscure. It seems to have occurred during the 1920s. And the reason advanced by Fienberg and Resnick (p. 12) was that scientific evidence has accumulated against the genetic theories. Like I said, this is odd. Jensen (1973, 1998) reviewed many of these kind of studies, and none of them were as old as 1920s and before. I have no reason to believe these studies have more weight than the recent studies, especially if modern IQ tests are more reliable, and that reliability increases group differences (Jensen, 1980, pp. 383-384).

Chapter 2

A Synopsis of The Bell Curve

Terry W. Belke

Belke (1997) provides a short summary of The Bell Curve. For someone who has read the book several times, I can tell that Belke’s review is definitely a good one. And he also cites all the pages he finds important. This is greatly appreciated. Here, I will describe how Belke summarizes The Bell Curve.

Chapter 1 tells us that the probability of going to college increased dramatically for students in the upper half of the IQ distribution but decreased slightly for students in the lower half of the IQ distribution between 1900 and 1990. Chapter 2 demonstrates there is also an occupational sorting by intelligence. High-IQ professions have grown tremendously since 1940, and the proportion of individuals in the top decile of IQ in these professions as well. Chapter 3 tells us that the predictive power of IQ (especially general intelligence rather than specific skills) in job performance is high, and is more important than either education or age. But Belke could have also mentioned that the authors (1994, p. 77) did say that the predictivity of IQ increases with tests’ g-loadings. Chapter 4 argues that the value of intelligence in the marketplace has increased, with wages in high-IQ occupations growing more rapidly than wages in low-IQ occupations. The more complex a society becomes, the more IQ becomes important. The prediction is a trend toward more class stratification. It is unfortunate that Belke did not mention that the authors said that heritability can change if the conditions producing variation change.

Chapters 5-12 present the statistical analyses of The Bell Curve on the NLSY79 data. Intelligence is more important than SES in predicting a wide range of social outcome (poverty, school, parenting, welfare dependency, crime, civility and citizenship).

Chapter 13 relates to racial differences. The black-white difference in IQ amounts to 1.08 SD. The theories of cultural bias are untenable. Motivation is also irrelevant. When SES is partialled out, the gap is reduced by a third, but because SES has a genetic component, this method also under-estimates the black-white difference. The black IQ increases with SES but does not converge to the white score (Belke is not accurate because the gap actually increases with SES). The NAEP reveals a gap narrowing in the black-white gap, although Belke did not say that the NAEP is not an IQ test. Belke could have also mentioned Herrnstein & Murray’s (1994, pp. 276-277) review of the literature on the black-white IQ studies that shows no secular narrowing. But he does say that the authors cautioned that genetic differences within groups are not necessarily generalizeable to genetic differences between groups. And also the discussion about Spearman’s hypothesis : the higher the g-loading of a test, the larger the black-white gap. Belke also mentioned the authors’ emphasis on IQ malleability if the differences were genetic rather than environmental, and that they believe that the assumption that environmentally induced deficits are less hardwired and less real than genetically induced deficits is wrong (and so do I). Chapter 14 covers the research on racial differences in social outcomes for which most of them are considerably reduced when IQ is held constant. Chapter 15 covers the dysgenics of IQ, with lower IQ people having more children (and at a younger age) than higher IQ people. The reason has to do with the fact that women wish to take advantage of career opportunities. Chapter 16 illustrates the prevalence of low IQ among people who suffer from social problems.

Chapter 17 covers the topic of IQ gain. Belke mentioned two successful experiments, one in Venezuela and another in coaching the SAT, but curiously he doesn’t mention the authors’ skepticism about the robustness of these results. Then, Belke says that the authors believed that adoption could be more effective than schooling programs, which often results in fade out. Chapter 18 talks about the stagnation in the american education, which is related to the declining SAT scores among the most gifted students. The educational system has been dumbed down to meet the needs of average and below-average students. The SAT pool was shrinking but not expanding, which makes the common view that the SAT decline was due to the expansion of SAT pool untenable. Chapter 19 touches the subject of affirmative action in higher education. The racial difference between whites and blacks in the LSAT was -1.49 SD, and similar differences were reported for MCAT and GRE scores. This is way larger than the difference of 1 SD in IQ usually reported between blacks and whites. The authors suspect a possible consequence of such policy is the dropout of blacks who are aware of their own limited capacity to compete with smarter students. Chapter 20 treats the affirmative action in the workplace. When blacks and whites are equated for IQ, the blacks are hired at higher rates since 1960s with trends increasing into the 1980s. This concerns clerical, professional and technical jobs. The authors advocate that the goal of an affirmative action policy should be equality of opportunity rather than equality of outcome. Chapter 21 tells us about the possible scenarios associated to the actual, expected further cognitive stratification in the future. Chapter 22 tells us the recommendation of the authors, in that a society must operate in such a manner as to allow individuals throughout the entire range of IQ to find valued place. This could be done if the justice system starts adopting simpler rules which will make living a moral life for low IQ people. In the past, low IQ people were able to find a valued place, but not anymore in the contemporary world.

Chapter 3

Of Genes and IQ

Michael Daniels, Bernie Devlin, and Kathryn Roeder

Daniels et al. (1997) conduct a study showing that the heritability of IQ is upwardly biased because the figures do not remove non-additive genetic effects. The additive effects (narrow heritability) are what matter most. This study is the same as the Devlin et al. (1997) that has been often cited by environmentalists. They begin the chapter in saying that “In one eloquent volume, the authors serve up justification for that oft-heard refrain “the poor will always be poor.”” (p. 45). This is one thing I always find amusing among egalitarians. They can’t resist to slam their moralizing speech right on the face of their opponents to make them appear as the bad guys. They continue in saying that “According to H&M, it’s all in the genes, and there is little that can be done about it. Indeed, if IQ and intelligence are highly heritable, H&M’s vision is plausible; if they are not highly heritable, their vision is only a phantasm.” (p. 46). They reiterate when they claim that “It is narrow-sense heritability that is the critical quantity determining the likelihood of both of H&M’s nightmarish genetic visions: cognitive castes and dysgenics.” (p. 53). The first mistake is that H&M (1994, p. 106) never said it’s all in the genes and the second mistake is that H&M (1994, pp. 313-315) said explicitly that things will not change if group differences were environmental rather than genetic. Belke’s chapter (1997, p. 29) mentioned that later point as well.

Daniels et al. (1997, pp. 50-53) describe what is additive and non-additive genetic effects, and the importance of modeling the later effect. Herrnstein & Murray (1994, pp. 105-108) accept a maximum value of 0.80 and a more plausible broad heritability estimate of 0.60. But Daniels et al. argue that the heritability is much lower if one considers only the narrow heritability. Their study focuses on the estimation of shared early external environment (they call it preseparation environment or prenatal maternal effect). They expect such effect to emerge because twins share the womb concurrently whereas siblings share the same womb serially. They argue that even if the mothers may have similar personal habits from one pregnancy to another, the temporal separation between progeny ensures a diminished correlation of sibling IQ (p. 57). They explain in detail the necessity to evaluate non-additive (i.e., dominance) effects. They believe that only the additive genetic portion of the heritability has any predictive value, and that non-additive effects make it far more difficult to predict the outcome of a given mating based on the knowledge of the phenotypes of the parents (pp. 52-53). For example, if the predicted IQ of a child of parents with IQs of 100 and 120 is to be 110, the expected IQ of the child might be far higher or lower than either parents if there were substantial interactions between genes.

The expected correlations, which are determined by the degree of genetic relatedness, between children and midparent and among siblings (reared apart or together) and among dizygotic twins (opposite or same sex) are all 0.50. The observed correlation is 0.50 between children and midparent and 0.60 between DZ twins. And the observed correlations between siblings reared together and apart are 0.46 and 0.24. Given their non-additive model, it is expected that shared maternal effect will be substantial.

In their analysis, assortative mating was modeled (i.e., adjusted). They compare models (e.g., III & IV) which attempted to model early common environment (c²) to be higher for twins than among siblings with models which constrain maternal effects to be zero (e.g., I & II). Models I & III, unlike models II and IV, assume c² to be equal for twins, siblings and parent-child correlations. Maternal (shared) effect seemed to be essential to achieve the best fit (according to Bayes factor). In the best fit model (III), the total (broad) genetic effect was 0.48 and its components, additive and non-additive were (respectively) 0.34 and 0.15. The maternal environment effects for twins and siblings were 0.20 and 0.05, which figures illustrate the extent of this non-additivity. The shared environment (c²) estimate was 0.17. Having reported these results, they claimed to have resolved the puzzle that Plomin & Loehlin (1989) have never been able to resolve, and that is not even resolved today, for what I know; why direct methods of estimating heritability consistently lead to higher estimates than the indirect methods. They argue that accounting for maternal effects and non-random mating explains this curious pattern (p. 58).

One problem with the results is that heritability increases with age. They did not restrict the analysis to adult persons. They also do not correct for measurement errors, as Lynn (1999) cogently noted. Furthermore, heritability (h²) is an r-squared measure and, so, is not an effect size. Its square root should have been used. The SQRT of these figures give SQRT(0.2)=0.44 and SQRT(0.05)=0.22. More problematic is that Bishop et al. (2003, Table 3) did not succeed to replicate their analysis. The likely reason is that Daniels and Devlin did not consider age effect, at least for maternal environment. Bishop discovered that DZ correlation was indeed superior than non-adoptive sibling correlation at ages 2-4 but the DZ correlation was lower than the non-adoptive sibling correlation at ages 7-10. These numbers indicate a diminishment of special twin shared (environment) effects over time. In any case, Daniels and Devlin’s indirect evidence for the impact of the so-called “prenatal” maternal effect must be supported by direct evidence, i.e., interventions. Finally, the last blow was administered by Segal & Johnson (2009, p. 89). The relevant passage reads :

A common assumption is that sharing a womb enhances twins’ phenotypic similarity because the fetuses are equally affected by the mother’s diet, health, medications, and other factors. However, the unique effects of the prenatal environment tend to make twins less alike, not more alike, especially in the case of MZ twins. Furthermore, twins’ prenatal situation cannot be considered vis-à-vis measured traits without reference to twin type (MZ or DZ) and the presence in MZ twins of separate or shared placentas and fetal membranes. Devlin, Daniels, and Roeder (1997) overlooked these distinctions, incorrectly concluding that twins’ shared prenatal environments contribute to their IQ similarity. (It was found that 20% of the covariance between twins and 3% of the covariance between siblings was explained by shared prenatal factors.) Thus, this analysis produced lower estimates of genetic effects than most other studies.

Both Sesardic (2005, p. 108) and Lee (2010, p. 248) share the view that twin-specific prenatal environment does not necessarily inflate heritability.

Having reported their analysis, they provide some comments on Herrnstein & Murray’s prediction of cognitive castes and IQ dysgenics, and additional subjects. Concerning castes, they argue there is no proof of an ever-increasing assortative mating. Murray’s (2012) latest book Coming Apart suggests his prediction was correct; for the United States, at least. Concerning IQ dysgenics, they think the Flynn effect contradicts the idea of dysgenic effect. But there is no evidence that Flynn effect is a real intelligence gain. Measurement invariance does not hold, which means that the IQ gains were not unidimensional. But, except Beaujean & Osterlind (2008), no one has been able to decompose the IQ gains into real and contaminated (real + artifactual) gains. Even the techniques used to decompose IQ gains, i.e., IRT, have their own problems, known as ipsitivity (Clauser & Mazor, 1998, pp. 286, 292; Nandakumar, 1994, p. 17; Richwine, 2009, p. 54; Penfield & Camilli, 2007, pp. 161-162). In the end, it is premature to say anything conclusive about the Flynn effect. Still, they argue (p. 62) that the Flynn effect cannot refute the idea of IQ dysgenics. In that case, I do not understand why they have resorted to this argument. With regard to race differences, they noted that “It is not clear to us why IQ would be positively selected in Caucasians but not in Africans.” (p. 62). Lynn (2006) proposed an evolutionary theory to explain how these differences can emerge. Daniels et al. also cite the Scarr et al. (1977) study failing to confirm the genetic hypothesis, but Jensen (1998, pp. 479-481) remarked there are problems with their studies. See also Chuck’s post. Daniels et al. commented on Herrnstein & Murray’s view that environmental homogeneity increases genetic variation. They deem it as false because the realized heritability is determined by the complex interplay of genes and environments, so that heritability can be zero, one, or in between, when environments are homogeneous (p. 64).

In the epilogue, their comment that “the subtle interplay of environment and genes rarely comes across in these writings, either because the authors judge the subject too complex for their readership or because they don’t grasp it themselves” suggests they fail to notice the footnote 32 at page 107 of The Bell Curve. The authors mentioned assortative mating, genetic dominance and epistasis as additional sources of genetic variation. So, they seem aware of this subtlety, but they didn’t go in depth on this topic. After all, this was not the subject of the book.

Chapter 4

The Malleability of Intelligence Is Not Constrained by Heritability

Douglas Wahlsten

Wahlsten (1997) attacks Herrnstein & Murray’s ideas on heritability. He begins (p. 73) by saying that they are wrong in their affirmation that IQ malleability is limited by heritability estimates. The problem here is that Wahlsten is not a careful reader. The authors did say that “the heritability of a trait may change when the conditions producing variation change” (Herrnstein & Murray, 1994, p. 106). They also understand that, “as environments become more uniform, heritability rises.” (Herrnstein & Murray, 1994, p. 106). So, Wahlsten’s quote on Herrnstein & Murray must be placed on the right context. He just failed to do that. Furthermore, Sesardic (2005, pp. 154-156) has a good treatise on this subject. One element that has confused many people is that the emergence of an effect (genetically caused) is not the same as the persistence of that effect (environmentally caused). Thus, the above claim by Herrnstein & Murray does not imply that a phenotypic characteristic cannot be changed by environmental manipulation when the emergence of that phenotypic characteristic is entirely genetic. We can conclude that what is genetic (the emergence of an effect) is not readily modifiable, and what is readily modifiable (the persistence of an effect) is ipso facto not genetic.

Wahlsten then uses the example of PKU (phenylketonuria) to illustrate the fact that heritability does not constrain modifiability (p. 74). It is caused by a deficiency of the enzyme phenylalanine hydroxylase (PAH). The persons who lack an active form of this enzyme would suffer from abnormally high levels of phenylalanine in the blood and severe brain damage, eventually leading to mental retardation, because they are unable to digest phenylalanine (an amino acid that is a necessary part of a normal human diet). But it was found later that PKU can be rapidly eliminated thanks to a special diet low in phenylalanine. The story is nicely told, the argument well made. But once again, Wahlsten missed the target due to misreading of The Bell Curve (1994, p. 106) and also failed to distinguish between the onslaught of an effect and its continuous presence. It should be noted that environmental does not automatically mean modifiable, especially if we don’t know how to detect and manipulate the environmental causes of a characteristic.

Wahlsten (p. 78) continues by arguing that the Flynn effect does well to counter The Bell Curve’s main idea and even seems to be surprised that the authors dismiss the Flynn effect. This is another misreading because Herrnstein & Murray (1994, p. 308) tell us that some researchers found that the Flynn effect could be due partly, if not entirely, to narrow skills rather than general intelligence per se. They have every reasons to remain skeptic, as the researchers did not at the time (and not even today) understand the nature of the Flynn effect.

Wahlsten (p. 79) then resorts to a ridiculous argument : “The December 29, 1915 issue of the Chicago Herald trumpeted to its public: “Hear how Binet-Simon method classed mayor and other officials as morons” (reprinted in Ref. 16, p. 241)”. If you haven’t guessed what he meant, he just says that IQ has poor predictivity. But the correlation of IQ with occupation and education is one of the strongest and most robust findings in the field of psychometrics (Schmidt & Hunter, 2004; Strenze, 2007). Wahlsten proves to be very dishonest by citing such an old article. At that time, IQ tests certainly had much more imperfections than they do today.

The author (p. 81) cited several studies showing IQ gain for groups of children having 1 year more of school than the other group of children. The meta-analytic effect size for these four studies amounts to 4 IQ points (grade 1 versus kindergarten). Among those studies, there is the Morrison et al. (1995) study that has been commented by Rowe (1997, pp. 142-143). And Rowe noticed that the two groups differ in reading achievement by 0.90 SD before schooling, by 2.63 SD for young grade 1 versus old kindergarten, and by 0.36 SD for grade 1 versus grade 2. The sample sizes were small (N=10 per group). However, if generalizeable, this finding suggests simply that the group having a 1-year delay in schooling will sooner catch up to the other group despite this 1-year deficit. In any case, this pattern is predicted given what usually happens in schooling programs : high and rapid cognitive advantage for the experimental group over the control group but progressive fade out over the years. In other words, schooling can boost intellectual growth, but may not affect the level finally attained. This is why a follow-up study is so important.

Wahlsten (pp. 82-83) cites the Abecedarian, IHDP and MITP studies for illustrating how IQ can be boosted dramatically. I have covered this topic in my earlier blog post. But I remark that Wahlsten ignores follow-up studies and gives no caution on the need of these reports. And even this fantastic IQ gain can be artifactual. Jensen (1969, p. 100) provides a magnificent illustration :

In addition to these factors, something else operates to boost scores five to ten points from first to second test, provided the first test is really the first. When I worked in a psychological clinic, I had to give individual intelligence tests to a variety of children, a good many of whom came from an impoverished background. Usually I felt these children were really brighter than their IQ would indicate. They often appeared inhibited in their responsiveness in the testing situation on their first visit to my office, and when this was the case I usually had them come in on two to four different days for half-hour sessions with me in a “play therapy” room, in which we did nothing more than get better acquainted by playing ball, using finger paints, drawing on the blackboard, making things out of clay, and so forth. As soon as the child seemed to be completely at home in this setting, I would retest him on a parallel form of the Stanford-Binet. A boost in IQ of 8 to 10 points or so was the rule; it rarely failed, but neither was the gain very often much above this. So I am inclined to doubt that IQ gains up to this amount in young disadvantaged children have much of anything to do with changes in ability. They are largely a result simply of getting a more accurate IQ by testing under more optimal conditions. Part of creating more optimal conditions in the case of disadvantaged children consists of giving at least two tests, the first only for practice and for letting the child get to know the examiner.

Like I said. Absolutely magnificent. This detail is truly a very important one, but I have never seen anyone else making this point.

Chapter 5

Racial and Ethnic Inequalities in Health: Environmental, Psychosocial, and Physiological Pathways

Burton Singer and Carol Ryff

Singer & Ryff (1997) spend a large amount of time and place to narrate anecdotal stories of black people in South Africa who suffer stress and humiliation (pp. 106-111). I don’t see the need to add such an emotional touch. In any case, they use these stories to illustrate their focus on psychological resilience and, eventually, the possible effects of such stress on health and, by the same token, IQ. Specifically, there are the black mothers without their husbands who feel anxiety and insecurity about having not enough basic resources, and there are the black men who feel their work and life in the mines is degrading and humiliating, which induces them to exert antisocial activities and violence (p. 99), and which inevitably causes depressed IQs. They explained that the kind of disabilities associated with cognitive impairment due to age is dissimilar for whites and blacks (p. 92), but that overall the whites have better health than black people (p. 93). They say explicitly that “racial discrimination is a central social structural feature of the processes involved in the transmission of tuberculosis” because “It is the convergence of high crowding, dilapidated housing, airborne particulates, poor nutrition, and compromised immunity that are the requisite conditions for the spread of this disease” (p. 94). They focus mainly on tuberculosis (pp. 99-100) and tell us that 70% of cases in the US occur among the minorities and blacks live in environments conducive to transmission of tuberculosis (p. 100) but also to hypertension (p. 113) and they were shown to be the only racial group experiencing high isolation levels. And in South Africa, a society stratified by racial differences, the incidence of all forms of tuberculosis in 1979 was 18 among whites, 58 among asians, 215 among coloureds, and 1465 among blacks per 100,000 people (p. 97).

With regard to hypertension, they admit its familial nature (p. 117) because the correlation between adult sibs usually varies between 0.2 and 0.3 for both systolic and diastolic blood pressures, and similarly or somewhat lower for parent-offspring relationship. The correlations between systolic and diastolic blood pressures are, respectively, 0.55 and 0.58, for MZ twins, while being 0.25 and 0.27 for DZ twins. But they argue that the additive genetic effects are probably lower because of plausible GxE interaction effects.

The main problem of the proposed argument is that these authors see a low IQ score as a product of poor environments, without considering the reverse causation path. The argument also does not account for the fact that the racial IQ gap increases with SES levels. Another curiosity is the finding by Yeargin-Allsopp et al. (1995, Table 4) that the Odds Ratio (blacks over whites) of having mild mental retardation among children increases with SES. That is, the healthier the blacks are, and the further away they fall behind the whites. Generally, blacks were almost twice (OR=1.7) as likely as whites to be mentally retarded when adjusting for SES and birthweight. One must also read Currie’s (2005) paper that shows mathematically how load exposure, ADHD and poverty explain almost nothing of the black-white gap in school readiness. For a more general picture, see this article and this other one.

And, of course, their argument (p. 115) needs to assume that the black-white IQ gap must increase with age. Jensen’s (1974, pp. 998, 1000) review of longitudinal IQ studies shows no evidence for this. Farkas & Baron (2004) concluded the same with regard to the PPVT vocabulary test. Yet one limitation of these studies could be the (very probable) absence of correction for measurement errors.

Curiously enough, the authors end the chapter with a focus on the term “race”, which in their opinion is an arbitrary social construct (p. 117). I don’t see the point mentioning this, and it is not relevant to the debate on race differences in IQ. Anyway, since they come to this, my answer is that the interested readers should read John Fuerst’s essay on The Nature of Race.

Chapter 6

Theoretical and Technical Issues in Identifying a Factor of General Intelligence

John B. Carroll

Carroll (1997) attempts to show by means of CFA whether there is a g factor or not. But he first begins to narrate (p. 129) the early age of psychometrics. Spearman and colleagues believed that a single factor can account for the correlations among mental tests, but then they had to acknowledge the existence of other factors as well. Holzinger developed the bifactor model, in which group factors and the general factors are independent. The bifactor model assumes that test scores can be modeled as linear combinations of scores on a general factor and one or more group factors. But the bifactor was never widely accepted. Later, Thurstone advances the idea that intelligence is composed of multiple factors, and he develops a method of factoring a correlation matrix and a method of rotating the axes of factor matrices to “simple structure” in order to facilitate interpretation of factors. And the g factor vanished. But Spearman and Eysenck disputed this result and found a g factor, along with several group factors. In later publications, Thurstone admits the possible existence of a general factor. Another advocate of multiple intelligence theories was Guilford, with his Structure-of-Intellect (SoI) model. Carroll reanalyzed many of Guilford’s datasets and found numerous higher-order factors, including factors that could be regarded as similar to Spearman’s g. Jensen (1998, pp. 115-117) discussed Guilford more in detail.

Carroll says that these earlier factorial methods were what has come to be known as exploratory factor analysis (EFA). He continues and says that, today, we have powerful techniques known as confirmatory (or structural) factor analysis. And he correctly made the point (p. 131) that exploratory (e.g., EFA) and confirmatory (e.g., CFA) analyses are both needed. The first step, exploratory one, serves to identify possible and coherent models and theories, while the second step, confirmatory one, serves to compare the models by fitting them to the observed data. But he also mentions something even more important. That the exploratory analysis (e.g., factor analysis) cannot prove the existence of a g factor. Just because PC1 or PAF1 has a higher eigenvalue does not demonstrate at all the existence of a g factor. The mere fact that cognitive variables are positively correlated does not validate the presence of a single general factor, as it might indicate the presence of multiple general factors (p. 143). The principal component derived from a matrix of randomly generated correlations is necessarily larger than the remaining components. He says that “factor analysis seeks to determine a factor matrix with the least number of factors, m, that will satisfactorily reproduce the given R” (p. 132); R stands for “observed matrix of correlation”. Satisfactory reproduction of R can be defined in several ways, e.g., “the extent to which m factors appear to account for the common factor variance (the communalities of the variables), or the extent to which the residuals (the differences between the observed correlations and the reproduced correlations) are close to zero” (p. 132) through CFA modeling.

There are fundamental differences between exploratory and confirmatory factor analysis; the methods are actually complementary. The former is concerned with analyzing correlational data to suggest a satisfactory model for describing those data. The latter is concerned with appraising the probability that any given proposed model, even one that might seem quite unlikely, could generate the observed data. Exploratory factor analysis is essentially descriptive, while confirmatory factor analysis appeals to statistical significance testing. Confirmatory factor analysis cannot proceed until one proposes a model that can be tested. One source of such a model is a model produced by exploratory analysis, but this is not the only source; a model could be suggested by a psychological theory.

One important detail here is that if a data set is found to measure a single factor, it is a general factor, but only if the variables are all drawn from different parts of the cognitive domain (p. 144). This assumption would be violated if the variables are drawn from a limited portion of the cognitive domain, because they might then serve to define only a single first-stratum or second-stratum factor. A large and representative battery of tests is thus needed. Having said that, Carroll (1997) uses three old data sets on large cognitive test batteries. The variance-covariance matrices of these (sub)tests were submitted to LISREL modeling of CFA, and he attempted to attain a satisfactory model fit. He succeeded in all analyses, and in each of these analyses the preferred model included a third-order g factor. Carroll (pp. 143-145, 151) gives us the warning that CFA does not prove that g is a “true ability” independent of more specific cognitive abilities defined by various types of psychological tests and observations. Carroll concludes the chapter by saying that Herrnstein & Murray’s view on the general factor of intelligence is accurate. The readers interested in digging into this topic may want to read this essay.

Chapter 7

The Concept and Utility of Intelligence

Earl Hunt

Hunt (1997) tries to downgrade the meaningfulness of IQ test. He does this in a very clumsy way. He says (p. 162) that IQ is not the only important predictor of job performance. Personality matters as well. Good to know, but there is nothing original in what he says. I can also say that Gottfredson (1997) argued that personality is important only in a restricted range of jobs. Anyway, Hunt continues and says (p. 163) that psychometricians disagree on whether there is a general factor of intelligence (Spearman and Jensen versus Thurstone, Cattell, Horn) but curiously enough he didn’t mention that the theorists who favored the multiple intelligence theories were wrong.

Hunt (p. 164) discusses the Gf-Gc theory. To sum up, it’s an interactionist model. The greater the Gf (fluid), the greater the Gc (crystallized). Gf reflects the capacity to solve problems for which prior experience and learned knowledge are of little use while Gc reflects consolidated knowledge gained by education, cultural information, and experience. The causality runs from Gf to Gc. They are both correlated because one’s Gc, at time t, will be an increasing function of Gf at time t-1. And he then reveals something more significant : “the fact that there is a correlation between tests intended to draw out pure reasoning ability and tests intended to evaluate cultural knowledge does not distinguish between g and Gc-Gf theories” (p. 165). Equally astonishing is the claim that the fact that Gf and Gc measures respond differently to outside influences (Gf generally decrease from early adulthood whereas Gc measures increase throughout most of the working years) is enough to disprove general intelligence theories (p. 165).

But I wonder in what way. When Hunt says that Gf and Gc have different age trajectories, he should be aware of the fact that Gc can differ between people having similar Gf just because of cultural differences (Jensen, 1980, p. 235; Jensen, 1998, p. 123). So, when noticing the increasing trend in Gc over time at middle age, the effect may be a cultural one (consolidated knowledge gain), but the decline in Gf at an earlier age may not be cultural. So, there is no point in making this comparison.

That Gf and Gc have different properties does not invalidate g. I can provide another illustration. Braden (1994) explains that deaf people have a deficit of 1 SD in verbal IQ tests, compared to the general population, but (virtually) no deficit at all on nonverbal IQ tests. In Braden’s book (1994, p. 207), Jensen has commented Braden’s work, and explained the compatibility of g with modularity in light of Braden’s findings. The relevant passage reads :

A simple analogy might help to explain the theoretical compatibility between the positive correlations among all mental abilities (hence the existence of g) and the existence of modularity in mental abilities. Imagine a dozen factories (persons), each of which manufactures the same five different gadgets (modular abilities). Each gadget is produced by a different machine (module). The five machines are all connected to each other by a common gear chain which is powered by one motor. But each of the factories uses a different motor to drive the gear chain, and each factory’s motor runs at a different constant speed than the motors of every other factory. This will cause the factories to differ in their rates of output of the five gadgets (scores on five different tests). The factories will be said to differ in overall efficiency or capacity (g), because the rates of output of the five gadgets are positively correlated. If the correlations between output rates of the gadgets produced by all of the factories were factor analyzed, they would yield a large general factor (g). The output rates of gadgets would be positively correlated, but not perfectly correlated, because the sales demand for each gadget differs for each factory, and the machines that produce the gadgets with the larger sales are better serviced, better oiled, and kept in consistently better operating condition than the machines that make low-demand gadgets. Therefore, even though the five machines are all driven by the same motor, they differ in their efficiency and consistency of operation, making for less than a perfect correlation between their rates of output. Then imagine that in one factory the main drive-shaft of one of the machines breaks, so it cannot produce its gadgets (e.g., localized brain damage affecting a single module, but not of g). Or imagine a factory where there is a delay in the input of the raw materials from which one of the machines produces gadgets (analogous to a deaf child not receiving auditory verbal input). In still another factory, the gear chain to all but one of the machines breaks and they therefore fail to produce gadgets. But one machine remains powered by the motor receives its undivided energy and produces gadgets faster than if the motor had to run all the other machines as well (e.g., an idiot savant).

And finally, Hunt’s argument relies strongly on the Gf-Gc model of Cattell and Horn, but we know today this model is not the best approximation of the structure of human intelligence. It’s the VPR (Johnson & Bouchard, 2005). These authors even argued that their finding “call into question the appropriateness of the pervasive distinction between fluid and crystallized intelligence in psychological thinking about the nature of the structure of intellect”. In any case, it is odd to claim that “There is really no way that one can distinguish between g and correlated Gf-Gc factors on the basis of psychometric evidence alone.” (p. 165). Carroll (2003) clearly disproved this idea. He confirmed the existence of a third-order g factor on top of several second-order factors, which include Gf and Gc.

Hunt (p. 167) affirms that the correlation between IQ and job performance decreases as experience accumulates over time. He cites two books. Of course, I do not have access to these ones. But he cited Ackerman (1987). However, Schmidt & Hunter (2004) have reviewed these studies, and they conclude that the predictive validity of IQ does not decrease over time and, if anything, it increases with worker experience. It is interesting to note that all the studies cited by Schmidt & Hunter were published many years before the Devlin et al. (1997) book. Hunt either missed them or ignored them.

Hunt says (pp. 169-171) that the correlations between IQ subtests in the bottom half were higher than those in the top half of the IQ distribution. In other words, Hunt discovered the so-called Spearman’s law of Diminishing Returns (SLODR). And he then concludes that “As predicted by cognitive theory, but virtually ignored by psychometricians, the data from conventional intelligence tests indicate that lack of “general intelligence” is pervasive, but that having high competence in one field is far from a guarantee of high competence in another.” (p. 170). But a better interpretation is that low-IQ people rely more on g than high-IQ people, because high-IQ people can find more room to specialization since they are relieved from the stress associated with having low IQ (Woodley, 2011, pp. 234-236). Indeed, low-IQ people presumably face some barriers in the labour market; the basic needs are more dependent on general cognitive abilities while the secondary needs are more related to narrow cognitive abilities (situational competence).

I am very disappointed in this article, but I appreciate the fact that Hunt (p. 161) refuses to see R² as a measure of effect size and prefers the correlation coefficient. He is perfectly right.

Chapter 8

Cognitive Ability, Wages, and Meritocracy

John Cawley, Karen Conneely, James Heckman, and Edward Vytlacil

Cawley et al. (1997) use regression to predict wages by using the principal components derived from factor analysis of the ASVAB subtests alone, and then along with variables of SES and/or human capital. They begin to say (p. 180) that g is “an artifact of linear correlation analysis, not intelligence”. This passage is just hopeless. Did they read Carroll’s chapter ? They even continue (p. 180) to say that Herrnstein & Murray have claimed there is only one significant intelligence factor, called g, and that they fail to mention that there exist other factors of intelligence. Are these guys serious ? Herrnstein & Murray (1994, pp. 14-15) acknowledge the existence of these factors already. And they go on : “They raise the immutability of cognitive ability when arguing against the effectiveness of social interventions.” (p. 181). Once again, where did Herrnstein & Murray (1994, p. 106) say that IQ is immutable ? Nowhere. Cawley et al. (1997) continue and they say that IQ is not immutable because IQ rises with schooling. They do not answer the question of causality.

Concerning their analysis, they note that in the background model (local and national unemployment rates and a linear time variable), ability (either AFQT or g) contributes between 0.118 and 0.174 to the R² change. In the human capital model (grade completed, potential experience with its quadratic term, job tenure with its quadratic term), the marginal increase in R² due to ability (either AFQT or g) falls to between 0.034 and 0.011.

Having reported these numbers, the authors conclude by saying “payment is not made for “ability” alone, which violates the definition of meritocracy advanced by H&M” (p. 191). Are these guys doing it on purpose ? Throughout the book, Herrnstein and Murray repeatedly say that IQ is not the only predictor of social outcomes. And this has nothing to do with H&M’s idea of meritocracy, which can be easily understood if you read the pages 510-512 and 541-546 of The Bell Curve.

One obvious problem with the analysis of Cawley et al. (1997) is that it is not easily interpretable for the non-initiated readers. They do not explain the meaning of the regression coefficients, e.g., what is the percentage change in log(wage) when PC1 increases by 1SD ? Fortunately, it can be done. When the dependent variable is log transformed, we must simply exponentiate the coefficient of the independent variable(s). For example, the unstandardized coefficients for PC1 in the IQ-only model, are 0.1952, 0.1647, 0.1823, 0.1531, 0.1965, 0.1535, respectively, for black females, black males, hispanic females, hispanic males, white females, white males. So, the exponentiated coefficients (also called odds ratio) are 1.21, 1.18, 1.20, 1.16, 1.22, 1.17. A coefficient of 1.16-1.22 means that for each SD gain in PC1, there is 16-22% percentage gain in wage. This is a modest effect in my opinion. Let’s look at the respective coefficients for the model including all the covariates mentioned above; 0.1235, 0.1045, 0.0904, 0.1084, 0.0903, 0.0828. Their exponentiated coefficients are, respectively, 1.13, 1.11, 1.09, 1.11, 1.09, 1.09. These coefficients show that the expected gain in wage is around 10% by 1SD gain in PC1. Agreed, the effect of PC1 is smaller. But as everyone should know, when we adjust for SES variables, we may have removed some of the effects due to IQ, if IQ can exert an indirect effect on wages through these SES variables. Finally, even if measurement error attenuates the correlation, the odds ratio won’t change that much.

Interestingly, an earlier publication by the same Cawley et al. (1996, Tables 7-8) shows that the importance of PC1 increases with occupational choice. Among blue collar, PC1 has weak effects on log(wage) and the coefficients are, in the same order as defined above, 0.066, 0.047, 0.014, 0.090, 0.029, 0.038. The corresponding exponentiated coefficients are 1.07, 1.05, 1.01, 1.09, 1.03, 1.04. Among white collar, PC1 have coefficients of 0.217, 0.195, 0.150, 0.189, 0.122, 0.119, which correspond to exponentiated coefficients of 1.24, 1.21, 1.16, 1.21, 1.13, 1.13. These effects already adjust for variables of human capital and, so, the effect of PC1 is under-estimated. It’s curious they didn’t mention that their earlier works provide some evidence for the theory favored by Herrnstein & Murray, i.e., that IQ predictivity increases with job complexity.

Finally, a little quibble concerns the insertion of all Principal Components of the ASVAB into the wage regression equation. This is a weird practice, especially since most of these components are meaningless. And even the authors acknowledge that, and even more, “The signs of the coefficients of the second through tenth principal components are irrelevant because each principal component can be reconstructed using the negative of its ASVAB weights to explain an equal amount of ASVAB variance.” (p. 186). Personally, I would have simply used PC1, without considering PC2-PC10.

Chapter 9

The Hidden Gender Restriction: The Need for Proper Controls When Testing for Racial Discrimination

Alexander Cavallo, Hazem El-Abbadi, and Randal Heeb

Cavallo et al. (1997, Table 9.6) reanalyze Herrnstein & Murray’s (1994, p. 324) analysis and conclusion that the set of variables IQ+age is enough to erase the black-white wage gap. The authors faulted Herrnstein & Murray (1994) for having ignored race*age interaction effects, i.e., race differences should not be calculated for a single age. Their Figure 9.1 shows the predicted 1989 earnings by age, each regression line plotted by race. The regression equation used for computing these predicted regression lines contains age, AFQT and parental SES as independent variables. At age 28-29, there is virtually no difference, but the black regression line is above the white regression line at younger ages (25-27) while this has reversed at later ages (30-32). This explains Herrnstein & Murray’s (1994, p. 323, footnote 13) analysis because they calculate the wage gap for people at the average age of the NLSY sample. According to Cavallo et al., the mean age was 28.7. At the same time, Cavallo et al. didn’t use the NLSY79 data in a longitudinal manner, i.e., with repeated measures of wage (Cole & Maxwell, 2003). So, their analysis may not be trusted either. Anyway, their analysis repeated for each separate gender group is still worth mentioning. Their Figures 9.2-9.4 show that white men have an advantage over black men with increasing age but that black women have an advantage over white women with increasing age. All these subgroups have different intercepts and slopes. But most people may not have a stable job and economic situation in their late 20s. I would have restricted the sample to people aged 30+. But at that time, the data weren’t collected for this age category.

Cavallo et al. subsequently repeat the regressions, and use four separate regressions; on black males, white males, black females, white females. Economists know fully well about the huge difference with respect to gender (due in fact to gender role) when it comes to wage differences, that the standard practice is to compute separate regressions (pp. 207-208). This time, they include as independent variables age, age^2, AFQT, SES, education, full-time experience, full-time experience^2, P.T. (part-time) experience, P.T. experience^2. The estimated coefficients are used to decompose the black-white wage gap due to the independent effect of AFQT, following the Oaxaca-Blinder wage decomposition (pp. 210, 213). The analysis shows (p. 211) that 92% of the racial wage gap for men is attributable to premarket factors (of which 38% is due to AFQT and 18% to SES, 17% to education 19% to experience) while the remaining 8% of the wage gap is attributable to other factor. These authors attribute these 8% to racial wage discrimination, because their viewpoint is that earnings can be truly color blind only if the regressions are similar across races when controlling for all the relevant background variables (pp. 204-205).

I agree with Cavallo et al. (1997) that Herrnstein & Murray’s analysis on this one was clumsy and I would have never recommended anyone to do it this way. On the other hand, Cavallo et al. used linear regression with log(wage). Even if this is what most researchers in social science would have done it, the correct procedure would be to use poisson regression (with probably robust standard errors). They also do not analyze the age effect longitudinally, nor did they attempt to separate cohort and age effects by using the method of multilevel regression suggested by Miyazaki & Raudenbush (2000) for longitudinal survey data.

Chapter 10

Does Staying in School Make You Smarter? The Effect of Education on IQ in The Bell Curve

Christopher Winship and Sanders Korenman

Winship & Korenman (1997) begin to summarize the earlier studies showing that education can improve IQ (pp. 220-224). Being out of school for a given period of time is associated with substantial IQ loss. Several longitudinal studies show that IQ increases by 2 to 4 points for each additional year of education. Then, they reanalyze Herrnstein & Murray’s (1994, Appendix 3) analysis on the effect of education on IQ. They attempt to predict AFQT (given in 1981) by using educational attainment (measured in 1980) and age variable and earlier IQ tests given in 1979 (e.g., Stanford-Binet and/or WISC) as independent variables. They applied several corrections to Herrnstein & Murray’s analysis, e.g., proper handling of missing data and the addition of age as covariate and eventually parental SES, the use of Huber’s robust standard errors (for clustered data), and the use of error-in-variables (EIV) regression. The reason given for EIV regression is that the independent variables education and IQ certainly have a reliability lower than 100%. This artifact reduces the true effect size. EIV regression allows the specification of some “assumed” (i.e., usually specified on the basis of past research) reliability estimates. Their preferred model (which also includes parental SES as covariate) is one assuming that both early IQ and education have a reliability of 0.90. In this model (10th), a year of education adds 2.7 IQ points. The analysis is well done, but unfortunately, the data has no repeated measures of the relevant variables. In this situation, any conclusion can only be suggestive.

The authors do not distinguish between IQ gains due to real gains in intelligence and IQ gains due to knowledge gains. It is hard (if not impossible) to conceive, in my opinion, an educational gain that does not incorporate knowledge gain. The consequence of this is that educational gain causes measurement bias. If school-related knowledge is elicited (at least some portion) by the IQ test, a disadvantaged group of people who do not have the required knowledge to get the items correct will fail these items even if they have equal latent ability with the advantaged group. This inevitably causes the meaning of group differences to be ambiguous (Lubke et al. 2003, pp. 552-553). In situation of unequal exposure, the test may be a measure of learning ability (intelligence) for some and opportunity to learn for others (Shepard, 1987, p. 213). This is the reason why test-retest effects are not measurement invariant. The between-group difference is not entirely unidimensional, because a score reflects a component (e.g., knowledge) that is present in one group but not in the other group. That is the very essence of measurement bias, and this is an element that most (if not all) researchers which focus is on education-induced IQ gains usually fail to grasp. The question of how much intelligence (not IQ) can be improved with schooling has never been answered. Not even today. Because of this, all research on educational gain until now have been hopelessly worthless.

Winship & Korenman’s belief that intelligence is malleable through cultural improvement (e.g., schooling) is also contradicted by Braden’s (1994) finding that deaf people have large deficit in verbal IQ and scholastic achievement compared to normal-hearing children, but virtually no impairment in nonverbal IQ. This indicates that intelligence is not affected even by a drastic cultural change, including deficit in schooling. The fact that deaf people have a deficit in (culturally-loaded) verbal tests can substantiate the idea that education-induced IQ gain is essentially a knowledge gain, not intelligence gain.

Even if education has an impact on IQ, the authors’ argument (1994, p. 394) is that inequality in outcomes may increase, because wider availability of resources make high-IQ persons to learn more and faster than low-IQ persons (unless resources are provided only to low-IQ persons). This is also the conclusion reached by Ceci & Papierno (2005).

Chapter 11

Cognitive Ability, Environmental Factors, and Crime: Predicting Frequent Criminal Activity

Lucinda A. Manolakes

Manolakes (1997) reanalyzes Herrnstein & Murray’s (1994) analysis on the relationship between IQ and crime among men. They both use logistic regression, with a dichotomized variable of self-reported crime (more accurately, it’s the sum of many delinquency variables, after being recoded appropriately) categorized as 1 if the score is at the top decile of criminal behavior and 0 if otherwise. The independent variables are the AFQT, parental education, type of residence (urban or rural, if stayed or if moved in one of them), race, IQ*parental education, IQ*race, parental education*race. Although the author could (and should) have probably included the three-way interaction IQ*parental education*race, the author says that this variable wasn’t significant. Given that the sample size is fairly large, I will not object, and it is likely that this variable would have a small coefficient.

While Herrnstein & Murray excluded black people from their analysis because they are known to underreport the frequency with which they engage in criminal acts, Manolakes argued that Herrnstein & Murray excluded black people because they rely on the work of Hindelang et al. who emphasize that criminal self-report scales are predominantly based on less serious crimes of high frequency (the type of crimes that whites are more likely to admit). But blacks are more likely to admit to less frequent but more serious crimes.

To make the aggregated delinquency variable, Manolakes did not use some variables, e.g., running away from home, skipping school, drinking alcohol while under age, and using marijuana, because they are the least serious criminal activities and should better not be considered as such.

Manolakes says (p. 243) that the data is inconsistent with Herrnstein & Murray’s presumption that IQ the only predictor of crime. Of course, if you distort your opponent’s ideas, you can refute them more easily.

Manolakes also argued (p. 243) that if the NLSY variables do not include questions regarding white collar crime, organized crime, corporate crime, consumer fraud, etc., the propensity of criminal behavior among high(er) IQ people is diminished. According to Manolakes, this omission may explain the relationship between lower IQ and higher crime.

The parameter estimates (coefficients) are reported in Table 11.2 but I don’t think it makes sense to report these coefficients or their exponentiated values (i.e., odds ratio) because the author has inserted many interactions. The effect of IQ thus cannot be understood solely by its own parameter but also by the other numerical (continuous) variables. However, Figures 11.2 and 11.3 show the relevant predicted plots. When IQ increases, whites (blacks) are less (more) likely to commit delinquency acts. When parental SES increases, whites (blacks) are more (less) likely to commit delinquency acts. It is not clear how to explain these divergent patterns. Figure 11.4 is also very curious. It shows the probability of being at the top decile of criminal activities by parental education for each quartile (lower, median, upper) of IQ. For lower quartile of IQ, the likelihood increases from 12.3% to 22.1%. For median quartile of IQ, the likelihood increases from 14.4% to 17.3%. For upper quartile of IQ, the likelihood decreases from 16.8% to 13.4%. Figures 11.5 and 11.6 report the same plots, but separately for whites and blacks. For whites, delinquency increases with parental education only for IQ at low and median value. For blacks, delinquency decreases with parental education only for IQ at median and upper value. As we can see, the regression lines are all very different, and difficult to interpret. And Manolakes does not even attempt to explain it, and she leaves that hard task to the criminologists, but ends the chapter in saying that such result contradicts Herrnstein & Murray’s assumption that the justice system needs to be made simpler in order to avoid violent or illegal acts due to limited intelligence. But this conclusion can be correct only if the analysis is correct. And it is not.

The likely reason for the result of Manolakes to differ from that of Herrnstein & Murray is that they did not include blacks in the regression and did not use interaction effects either.

The big problem with Manolakes and Herrnstein & Murray’s analysis is that dichotomizing a continuous variable can cause substantial loss of information and even “misclassification” (MacCallum et al., 2002). For example, people involved in delinquent activities 2 times per month can be very much different than those being involved 10 times and 20 times per month. But logistic regression can treat them as if they were no different from each other. Given the typical distribution of crime and delinquency variables (e.g., 70% of cases with score 0, 15% with score 1, 10% with scores 2 and 3, and 5% with scores 4 and more), the distribution is not symmetric around the median values and the most appropriate analysis is undoubtedly a poisson regression for “rare events” variables. And I think I am planning to correct both Manolakes and Herrnstein & Murray in the near future…

Chapter 12

Social Statistics and Genuine Inquiry: Reflections on The Bell Curve

Clark Glymour

Glymour (1997) has definitely an obscure chapter. I dislike the style of the author, but I still understand the main idea (at least, that’s what I think). He spends a lot of time and energy to explain that factor analysis and regression say nothing about causality and that any result from modeling has no consequences whatsoever if one cannot define a proper theory, principle, logic, and mechanism that can explain the pattern of the data.

Glymour writes “What troubles me more is that the principal methods of causal analysis used in The Bell Curve and throughout the social sciences are either provably unreliable in the circumstances in which they are commonly used or are of unknown reliability.” (p. 259). This guy is not even funny. The authors never considered multiple regression as a causal analysis. And a lot of other practitioners also understand this.

Glymour then writes (p. 263) what is the most important passage of the chapter :

When social scientists speak of “theory,” however, they seldom mean either common-sense constraints on hypotheses or constraints derived from laboratory sciences or from the very construction of instruments. What they do mean varies from discipline to discipline, and is often at best vaguely connected with the particular hypotheses in statistical form that are applied to data. Suffice it to say “theory” is not the sort of well-established, severely tested, repeatedly confirmed, fundamental generalizations that make up, say, the theory of evolution or the theory of relativity. That is one of the reasons for the suspicion that the uses of “theory” or its euphemism, “substantive knowledge,” are so many fingers on the balance in social chemistry, but there are several other reasons. One is the ease of finding alternative models, consistent with common-sense constraints, that fit nonexperimental data as well or better than do “theory”-based models. (I will pass on illustrations, but in many cases it’s really easy.) Another is that when one examines practice closely, “theory”-based models are quite often really dredged from the data – investigators let the data speak (perhaps in a muffled voice) and then dissemble about what they have done.

I do not disagree with him this time. My impression is that he is probably right on this one. A lot of people in social science seems to commit the kind of fallacy that is usually derided by austrian economists, and that everyone can find the idea illustrated in the book Human Action of Ludwig von Mises (1949), such as “History cannot teach us any general rule, principle, or law” and “If there were no economic theory, reports concerning economic facts would be nothing more than a collection of unconnected data open to any arbitrary interpretation”. In short, what Mises (1949, p. 41, 49-51) says is that one should not derive and infer a theory from the data, but use theories to interpret data. A theory that is data-driven is not worth calling a theory. And probably many theories advanced in social sciences are hollow. So, testing these pseudo-theories through CFA-MGCFA models, SEM and other regression techniques, is an enterprise doomed to fail. If this is what Glymour meant, I fully agree with him.

Glymour continues, “Factor analysis and regression are strategems for letting the data say more, and for letting prior human opinion determine less” (p. 264). I have nothing to say, but this is interesting.

Glymour (pp. 265, 268) has listed 8 necessary assumptions of factor analysis. (1) There are a number of unmeasured features fixed in each person but continuously variable from person to person. (2) That these features have some causal role in the production of responses to questions on psychometric tests, and the function giving the dependence of measured responses on unmeasured features is the same for all persons; this is supported by the high test-retest correlations of IQ, but that argument meets a number of contrary considerations, e.g., the dependence of scores on teachable fluency in the language in which the test is given. (3) That variation of these features within the population causes the variation in response scores members of the population would give were the entire population tested; that the function giving the dependence of manifest responses on hidden features is the same for all persons, is without any foundation – if the dependencies were actually linear, however, differing coefficients for different persons would not much change the constraints factor models impose on large sample correlation matrices. (4) That some of these unmeasured features cause the production of responses to more than one test item; that other features of persons influence their scores on psychometric tests is uncontroversial. (5) That the correlation among test scores that would be found were the entire population to be tested is due entirely to those unmeasured features that influence two or more measured features; that all correlations are due to unmeasured common causes is known to be false of various psychometric and sociometric instruments, in which the responses given to earlier questions influence the responses given to later questions. (6) The measured variables must be normally distributed linear functions of their causes; normality and linearity are harder to justify, but at least indirect evidence could be obtained from the marginal distributions of the measured variables and the appearance of constraints on the correlation matrix characteristic of linear dependencies, although tests for such constraints seem rarely to be done. In any case, the other issues could be repeated for nonlinear factor analysis. (7) That measurement of some features must not influence the measures found for other features; that is, there is no sample selection bias (the data are missing at random). (8) That two or more latent factors must not perfectly cancel the effects of one another on measured responses.

He focuses (p. 269) on the following objection however :

There is another quite different consideration to which I give considerable weight. I have found very little speculation in the psychometric literature about the mechanisms by which unmeasured features – factors – are thought to bring about measured responses, and none that connects psychometric factors with the decomposition of abilities that cognitive neuropsychology began to reveal at about the same time psychometrics was conceived. Neither Spearman nor later psychometricians, so far as I know, thought of the factors as modular capacities, localized in specific tissues, nor did they connect them with distributed aspects of specific brain functions. (It may be that Spearman thought of his latent g more the way we think of virtues of character than the way we think of causes.) One of the early psychometricians, Godfrey Thomson, thought of the brain as a more or less homogeneous neural net, and argued that different cognitive tasks require more or less neural activity according to their difficulty. Thomson thought this picture accounted not only for the correlations of test scores but also for the “hierarchies” of correlations that were the basis of Spearman’s argument for general intelligence. The picture, as well as other considerations, led Thomson to reject all the assumptions I have listed. I think a more compelling reason to reject them is the failure of psychometrics to produce predictive (rather than post-hoc) meshes with an ever more elaborate understanding of the components of normal capacities. Psychometrics did nothing to predict the varieties of dyslexias, aphasia, agnosias, and other cognitive ills that can result from brain damage.

I can agree that the mechanisms of those factors have been poorly articulated. But Jensen (1998, p. 130) explains that some modules may be reflected in the primary factors while other modules may not show up as factors, e.g., the ability to acquire language, quick recognition memory for human faces, and three-dimensional space perception, because individual differences among normal persons are too slight for these virtually universal abilities to emerge as factors, or sources of variance.

Glymour also writes “If we adopt for the moment the first four basic psychometric assumptions, then on any of several pictures the distribution of unmeasured factors should be correlated. Suppose, for example, the factors have genetic causes that vary from person to person; there is no reason to think the genes for various factors are independently distributed. Suppose, again, that the factors are measures of the functioning or capacities of localized and physically linked modules. Then we should expect that how well one module works may depend on, and in turn influence, how well other modules linked to it work. Even so, a great number, perhaps the majority, of factor analytic studies assume the factors are uncorrelated; I cannot think of any reason for this assumption except, if wishes are sometimes reasons, the wish that it be so.” (p. 268). Indeed, why do the principal components necessarily need to be uncorrelated among them in an unrotated PC analysis ? This is a difficult question. But, as noted above, Jensen (1998, pp. 119-121, 130-132) has an answer to solve this puzzle. Some modules may not show up as factors. Jensen also talked about the so-called idiots savants, i.e., those who have typically a low IQ and can barely take care of themselves but can nevertheless perform incredibly well in some specialized tasks, e.g., mental calculation, playing the piano by ear, etc., although rarely, if ever, does one find a savant with more than one of these narrow abilities.

Glymour writes “If both regressor and outcome influence sample selection, regression applied to the sample (no matter how large) will produce an (expected) nonzero value for the linear dependence, even when the regressor has no influence at all on the outcome variable.” (p. 271). There are regression models used in econometrics to deal with sample selection such as tobit and truncated regressions. Maximum likelihood and multiple imputation (if correctly used) are also possible solutions to the problem of non-random missing data. Glymour continues (pp. 271, and 272-273) in saying that omitted variable bias causes the coefficients of all independent variables in regression to be biased and can also introduce or remove spurious correlations. Probably all practitioners of regression know the problem of omitted variables. But if there are omitted variables, one needs to explain and demonstrate through logical reasoning that a plausible factor was omitted, rather than to assume there must be necessarily one. A third problem pointed out by Glymour (p. 271) is that of reverse causation, i.e., Y causes X instead of X causing Y. For what it worths, the econometricians have an old technique called instrumental variable (IV) regression (also called 2-Stage Least Squares or 2SLS) that deals with this problem. Econometricians can also use Granger causality test in time series regression, where one is statistically evaluating the hypothesis of Y->X versus the hypothesis X->Y by using the lagged values of the independent variables, even though the usual practice of Granger causality test is to be used as a simple bivariate analysis. In the field of psychometrics and psychology, it is more common to use path analysis and Structural Equation Models (SEM). But ideally, one has to use it in conjunction with longitudinal data in order to get around the problem of equivalent causal models (MacCallum et al., 1993; Cole & Maxwell, 2003). In any case, this reverse causality problem can be alleviated, more or less well.

Glymour believes (p. 274) the result provided by Cawley et al. (1997) in this book illustrates what he says, and that the model preferred by The Bell Curve cannot be trusted, even though he does not believe that Cawley’s model has captured the influence of cognitive ability. Glymour certainly does not trust the current statistical methods.

Chapter 13

A “Head Start” in What Pursuit? IQ Versus Social Competence as the Objective of Early Intervention

Edward Zigler and Sally J. Styfco

Zigler & Styfco (1997) propose some corrections to Herrnstein & Murray’s review of intervention programs on education. They noted “What is forgotten in these rash judgments is that most intervention programs were never intended to raise intelligence.” (p. 284). I’m puzzled. The key thing is that if you boost education through repeated cognitive activities, but intelligence remains unchanged despite of this, the conclusion of The Bell Curve is left unchanged. Education simply does not boost intelligence.

Notably, in the case of the Head Start, Zigler & Styfco say (pp. 286-287) that the goals were to improve the child’s physical health and mental processes and skills with particular attention to conceptual and verbal skills, in helping the emotional and social development of the child, establishing patterns and expectations of success, in increasing the child’s capacity to relate positively to family members and others, in developing in the child and his family a responsible attitude toward society, in increasing the sense of dignity and self-worth within the child and his family. For what I know, the Head Start does not concentrate all of its resources to improve the child’s mental processes, but it is wrong to claim that cognitive ability has never been targeted in the programs. Although I agree with the authors’ claim that it makes no sense to say that the Head Start fails to improve IQ based on the assumption that all of the funding was directed toward improving IQ, when it is not the case. However, the goals of the Head Start were to improve the child’s cognitive environment. So, it is legitimate to affirm that Head Start does not improve IQ, despite the huge financial resources that were allocated to improve the environment. On the other hand, we can still argue that if financial resources were allocated more efficiently, the outcomes could have been different. But in what way ? For example, the authors (p. 287) say that each Head Start center has six components : health screening and referral, mental health services, early childhood education, nutrition education and hot meals, social services for the child and family, and parent involvement. If one of these components is suspected to be inefficient in improving the child’s IQ, the administration can decide which component should be dropped. But if all of these components are believed to play a role in the cognitive development of the children, changing these components will be difficult.

Anyway, perhaps for this reason, Zigler & Styfco (p. 293) believe that initial IQ gains following preschool experience were due to non-cognitive factors, as they say : “The physical and socioemotional aspects of development are more strongly controlled by the environment and, therefore, more effectively targeted by intervention”. They continue in saying (p. 293) that IQ and academic measures correlate at 0.70, which corresponds to an R² of 0.49. They conclude that IQ explains only 49% of the total variance of academic achievement and, thus, IQ is not a very robust predictor. The problem is that R² is not an effect size measure.

Although they admit (p. 294) the fade out in IQ gains, they insist on the improvement observed in scholastic achievement and other social outcomes. More importantly, they observe (p. 295) that the current problem with reviews and meta-analyses is that the results of many different programs are combined so that the robust results are diluted by null effects from others. My review of the reviews of other investigators tells me that there is probably no exception to the rule; indeed, education improves social outcomes but not IQ.

Zigler & Styfco (p. 297) affirm that Herrnstein and Murray explain criminal behavior by low IQ alone. This is a caricature of their work. They never said that anywhere in their book.

Zygler & Styfo (p. 299) inform us that the Head Start has improved the health status of the children but also the psychological well-being of their parents.

Zigler & Styfco now question the meaningfulness of IQ test. They write “The value of the IQ construct as a predictor of performance outside of the realm of school also became suspect. For example, Mercer referred to the “6-hour” retarded child – one whose IQ and school test scores are low but who functions perfectly adequately before and after the school day. By the same token, there are many individuals who achieve very high IQ scores but do not behave competently at home, work, or in social settings. IQ, then, was just not measuring what early intervention specialists hoped to achieve.” (p. 300). In other words, they say that IQ has a poor predictivity. As I said previously, this finding is one of the most robust element we know about mental testing for quite a long time. If they are so eager to prove the absence of IQ predictivity, I wish them good luck.

Zigler & Styfco (p. 303) say that the quality of the programs may not be sufficiently high to meet the needs of very young at-risk children. These qualities involve features such as good teacher/child ratios, staff trained in early childhood, small group sizes, and developmentally appropriate curriculum. For what I know, all these programs typically have such features. So, it is curious that they resort to this kind of argument. They say that in many public preschools, the number of children per teacher and the curriculum used often mirror typical kindergartens and are simply inappropriate for preschoolers. But the Abecedarian and Perry Preschool programs have low child/teacher ratio (6/1) and yet they are both disappointing.

They also write “In our chapter we have railed against this narrow focus because we do not believe that intelligence is the only important human trait.” (p. 307). This is excellent. But who seriously believes that intelligence is the only important human trait ?

Chapter 14

Is There a Cognitive Elite in America?

Nicholas Lemann

Lemann (1997) analyzes Herrnstein & Murray’s idea that a cognitive elite is dominating the United States and, more generally, modern societies. Lemann (p. 320) says that the argument that a cognitive elite emerges with assortative mating must imply that a cognitive elite should have been taking form gradually over time, rather than overnight in the 1950s. However, Herrnstein & Murray say (1994, p. 111) that assortative mating has increased especially among college-educated persons between 1940 and 1987 and have also noted (1994, p. 112) that a smart wife in the 1990s has a much greater dollar payoff for a man than she did fifty years ago. They believe that the feminist revolution (which has begun in the 1950s) has increased the likelihood of mating by cognitive ability, notably by increasing the odds that bright young women will be thrown in contact with bright young men during the years when people choose spouses. But Lemann also says that there is a weak evidence that high IQ people were beginning to accumulate in elite colleges during the 1950s. And he writes : “For example, Herrnstein and Murray’s figures on the (relatively low) average IQ scores at Ivy League schools in 1930, when pursued through the footnotes, turn out to come from the first administration of the Scholastic Aptitude Test to 8,040 students on June 23, 1926, and then the conversion of the scores to an IQ scale. But the takers were not actually students at Ivy League colleges; they were a self-selected group of high school students thinking of applying to Ivy League colleges. What Herrnstein and Murray report as the average IQ of Radcliffe College students is actually the average IQ of 233 high school girls who told the test administrators they’d like their scores sent to Radcliffe College.” (p. 320).

While Herrnstein & Murray believe that only people from a fairly narrow range of cognitive ability can become lawyers, Lemann (p. 321) argues that it’s because only people who get above average scores on the Law School Aptitude Test (LSAT) are allowed to become lawyers and not because of high IQ. True enough, but the authors never made this claim. They say that someone with a high IQ can fail at school, but someone who succeeds at school can hardly be a dumb person.

Lemann (p. 323) also says that as income rises above $100,000, the % of it derived from salaries and wages and business/profession is steadily declining, replaced by long-term capital gains. In other words, Lemann has the impression that the top income shares are composed of inheritors and financiers but not high-IQ professionals. The reason, I suspect, lies on the consequences of economic bubbles that are caused by the over-expansion of money supply, with the kind of scenarios articulated in the Austrian Business Cycle Theory. There are certainly many resource misallocations due to this monetary injection; inheritors and financiers would probably constitute a smaller share of the top incomes were it not for the monetary injection of the central banks. In any case, if Lemann is correct, we should find that IQ is not predictive (or loses predictivity) when income rises at very high levels. And this should be examined empirically.

Chapter 15

Science, Public Policy, and The Bell Curve

Daniel P. Resnick and Stephen E. Fienberg

Resnick & Fienberg (1997) seem a little bit annoyed, as they are afraid that the book of Herrnstein & Murray may revive the kind of opinions held by Galton, Pearson and Fisher.

They also write (p. 330) “Some, such as Stephen Gould in the new edition of The Mismeasure of Man, have argued that Herrnstein and Murray’s reliance on factor analysis is the Achilles heel of their entire effort and have dismissed it accordingly”. Oh God. That looks interesting. Let me remember. The passage of Gould’s (1996, p. 373) book… where was it already…

Charles Spearman used factor analysis to identify a single axis – which he called g – that best identifies the common factor behind positive correlations among the tests. But Thurstone later showed that g could be made to disappear by simply rotating the factor axes to different positions. In one rotation, Thurstone placed the axes near the most widely separated of attributes among the tests – thus giving rise to the theory of multiple intelligences (verbal, mathematical, spatial, etc., with no overarching g). This theory (the “radical” view in Herrnstein and Murray’s classification) has been supported by many prominent psychometricians, including J. P. Guilford in the 1950s, and most prominently today by Howard Gardner. In this perspective, g cannot have inherent reality, for g emerges in one form of mathematical representation for correlations among tests, and disappears (or at least greatly attenuates) in other forms that are entirely equivalent in amounts of information explained. In any case, one can’t grasp the issue at all without a clear exposition of factor analysis – and The Bell Curve cops out completely on this central concept.

Ouch. That must probably be this one, no ? So, what now ? If my memory is correct, Carroll in the Devlin’s book (1997) has written a chapter validating the existence of g and which has, by the same token, destroyed the multiple intelligence theories. But, more importantly, I would have sweared that Thurstone and Guilford were discredited quite a long time ago by now. And Gould resorts to this same argument in 1996 ?

Resnick & Fienberg write “The Bell Curve has been a “Pied Piper” for some segments of the social sciences. It deserves kudos for its way with words, drawing in readers who would otherwise be intimidated by numbers, whether or not they agree with Herrnstein and Murray’s conclusions. This success, however, has its price. We have looked a lot harder at the numbers than the typical reader is invited to do, and we are disappointed.” (p. 330). That’s good. Because I am also disappointed in your book, you see. So, we are even now.

They continue and say “Cultural transmission of traits that influence IQ is certainly possible. But they present no evidence for cultural stability and persistence, which is what their argument requires, and we find such stability unlikely given the rapid rate that culture can and does evolve.” (p. 333). Herrnstein and Murray believe otherwise, but I will not add any personal comment here.

They ask “If IQ scores correlate only in the 0.2 to 0.4 range with occupational success and income, how much importance can we assign to the cultural transmission of traits affecting the IQ of progeny?” (p. 334). These numbers are too low, and I recommend having a look at Strenze’s (2007) paper.

They say “Other variables, such as nutrition, quality of education, and peer culture, are, for various reasons, excluded from the analysis.” (p. 334). This, if I’m not mistaken, is what Jensen (1973, p. 235) termed the sociologist’s fallacy. The view that environmental variables have no genetic component.

They summarize (p. 335) the policy recommendation of Herrnstein & Murray. As expected, it’s a ridiculous caricature. The interested reader should read the chapters 21-22 of The Bell Curve instead of this scam. It is amusing that Resnick & Fienberg have noted (p. 334) that many critics of The Bell Curve have used straw man arguments. But these guys are not necessarily doing better than the others.

They dislike Herrnstein & Murray’s opinion that common people should need and hope for less government and more free market. They end the book with these final writings “Because of their principled opposition to government, Herrnstein and Murray have denied Americans the support of public institutions in the struggle against rising inequality. Without government, it will be a very unequal struggle.” (p. 338). But their distrust may have nothing to do with them being hereditarian. They (in particular Charles Murray) view the government as an inefficient allocator of resources for the same reasons generally advanced by the libertarians, and especially austrian economists. Government’s action causes resource misallocations, wastes and unintended consequences. They (especially Charles Murray) believe in the efficiency of free markets, but Resnick & Fienberg apparently don’t.

Cited references