2400 words

Construct validity for IQ is fleeting. Some people may refer to Haier’s brain imaging data as evidence for construct validity for IQ, even though there are numerous problems with brain imaging and that neuroreductionist explanations for cognition are “probably not” possible (Uttal, 2014; also see Uttal, 2012). Construct validity refers to how well a test measures what it purports to measure—and this is non-existent for IQ (see Richardson and Norgate, 2014). If the tests did test what they purport to (intelligence), then they would be construct valid. I will show an example of a measure that was validated and shown to be reliable without circular reliance of the instrument itself; I will show that the measures people use in attempt to prove that IQ has construct validity fail; and finally I will provide an argument that the claim “IQ tests test intelligence” is false since the tests are not construct valid.

Jung and Haier (2007) formulated the P-FIT hypothesis—the Parieto-Frontal Intelligence Theory. The theory purports to show how individual differences in test scores are linked to variations in brain structure and function. There are, however, a few problems with the theory (as Richardson and Norgate, 2007 point out in the same issue; pg 162-163). IQ and brain region volumes are experience-dependent (eg Shonkoff et al, 2014; Betancourt et al, 2015; Lipina, 2016; Kim et al, 2019). So since they are experience-dependent, then different experiences will form different brains/test scores. Richardson and Norgate (2007) state that such bigger brain areas are not the cause of IQ, rather that, the cause of IQ is the experience-dependency of both: exposure to middle-class knowledge and skills leads to a better knowledge base for test-taking (Richardson, 2002), whereas access to better nutrition would be found in middle- and upper-classes, which, as Richardson and Norgate (2007) note, lower-quality, more energy-dense foods are more likely to be found in lower classes. Thus, Haier et al did not “find” what they purported too, based on simplistic correlations.

Now let me provide the argument about IQ test experience-dependency:

Premise 1: IQ tests are experience-dependent.

Premise 2: IQ tests are experience-dependent because some classes are more exposed to the knowledge and structure of the test by way of being born into a certain social class.

Premise 3: If IQ tests are experience-dependent because some social classes are more exposed to the knowledge and structure of the test along with whatever else comes with the membership of that social class then the tests test distance from the middle class and its knowledge structure.

Conclusion 1: IQ tests test distance from the middle class and its knowledge structure (P1, P2, P3).

Premise 4: If IQ tests test distance from the middle class and its knowledge structure, then how an individual scores on a test is a function of that individual’s cultural/social distance from the middle class.

Conclusion 2: How an individual scores on a test is a function of that individual’s cultural/social distance from the middle class since the items on the test are more likely to be found in the middle class (i.e., they are experience-dependent) and so, one who is of a lower class will necessarily score lower due to not being exposed to the items on the test (C1, P4)

Conclusion 3: IQ tests test distance from the middle class and its knowledge structure, thus, IQ scores are middle-class scores (C1, C2).

Still further regarding neuroimaging, we need to take a look at William Uttal’s work.

Uttal (2014) shows that “The problem is that both of these approaches are deeply flawed for methodological, conceptual, and empirical reasons. One reason is that simple models composed of a few neurons may simulate behavior but actually be based on completely different neuronal interactions. Therefore, the current best answer to the question asked in the title of this contribution [Are neuroreductionist explanations of cognition possible?] is–probably not.”

Uttal even has a book on meta-analyses and brain imaging—which, of course, has implications for Jung and Haier’s P-FIT theory. In his book Reliability in Cognitive Neuroscience: A Meta-meta Analysis, Uttal (2012: 2) writes:

There is a real possibility, therefore, that we are ascribing much too much meaning to what are possibly random, quasi-random, or irrelevant response patterns. That is, given the many factors that can influence a brain image, it may be that cognitive states and braib image activations are, in actuality, only weakly associated. Other cryptic, uncontrolled intervening factors may account for much, if not all, of the observed findings. Furthermore, differences in the localization patterns observed from one experiment to the next nowadays seems to reflect the inescapable fact that most of the brain is involved in virtually any cognitive process.

Uttal (2012: 86) also warns about individual variability throughout the day, writing:

However, based on these findings, McGonigle and his colleagues emphasized the lack of reliability even within this highly constrained single-subject experimental design. They warned that: “If researchers had access to only a single session from a single subject, erroneous conclusions are a possibility, in that responses to this single session may be claimed to be typical responses for this subject” (p. 708). The point, of course, is that if individual subjects are different from day to day, what chance will we have of answering the “where” question by pooling the results of a number of subjects?

That such neural activations gleaned from neuroimaging studies vary from individual to individual, and even time of day in regard to individual, means that these differences are not accounted for in such group analyses (meta-analyses). “… the pooling process could lead to grossly distorted interpretations that deviate greatly from the actual biological function of an individual brain. If this conclusion is generally confirmed, the goal of using pooled data to produce some kind of mythical average response to predict the location of activation sites on an individual brain would become less and less achievable“‘ (Uttal, 2012: 88).

Clearly, individual differences in brain imaging are not stable and they change day to day, hour to hour. Since this is the case, how does it make sense to pool (meta-analyze) such data and then point to a few brain images as important for X if there is such large variation in individuals day to day? Neuroimaging data is extremely variable, which I hope no one would deny. So when such studies are meta-analyzed, inter- and intrasubject variation is obscured.

The idea of an average or typical “activation region” is probably nonsensical in light of the neurophysiological and neuroanatomical differences among subjects. Researchers must acknowledge that pooling data obscures what may be meaningful differences among people and their brain mechanisms. THowever, there is an even more negative outcome. That is, by reifying some kinds of “average,” we may be abetting and preserving some false ideas concerning the localization of modular cognitive function (Uttal, 2012: 91).

So when we are dealing with the raw neuroimaging data (i.e., the unprocessed locations of activation peaks), the graphical plots provided of the peaks do not lead to convergence onto a small number of brain areas for that cognitive process.

… inconsistencies abound at all levels of data pooling when one uses brain imaging techniques to search for macroscopic regional correlates of cognitive processes. Individual subjects exhibit a high degree of day-to-day variability. Intersubject comparisons between subjects produce an even greater degree of variability. […] The overall pattern of inconsistency and unreliability that is evident in the literature to be reviewed here again suggests that intrinsic variability observed at the subject and experimental level propagates upward into the meta-analysis level and is not relieved by subsequent pooling of additional data or averaging. It does not encourage us to believe that the individual meta-analyses will provide a better answer to the localization of cognitive processes question than does any individual study. Indeed, it now seems plausible that carrying out a meta-analysis actually increases variability of the empirical findings (Uttal, 2012: 132).

So since reliability is low at all levels of neuroimaging analysis, it is very likely that the relations between particular brain regions and specific cognitive processes have not been established and may not even exist. The numerous reports purporting to find such relations report random and quasi-random fluctuations in extremely complex systems.

Construct validity (CV) is “the degree to which a test measures what it claims, or purports, to be measuring.” A “construct” is a theoretical psychological construct. So CV in this instance refers to whether IQ tests test intelligence. We accept that unseen functions measure what they purport to when they’re mechanistically related to differences in two variables. E.g, blood alcohol and consumption level nd the height of the mercury column and blood pressure. These measures are valid because they rely on well-known theoretical constructs. There is no theory for individual intelligence differences (Richardson, 2012). So IQ tests can’t be construct valid.

The accuracy of thermometers was established without circular reliance on the instrument itself. Thermometers measure temperature. IQ tests (supposedly) measure intelligence. There is a difference between these two, though: the reliability of thermometers measuring temperature was established without circular reliance on the thermometer itself (see Chang, 2007).

In regard to IQ tests, it is proposed that the tests are valid since they predict school performance and adult occupation levels, income and wealth. Though, this is circular reasoning and doesn’t establish the claim that IQ tests are valid measures (Richardson, 2017). IQ tests rely on other tests to attempt to prove they are valid. Though, as seen with the valid example of thermometers being validated without circular reliance on the instrument itself, IQ tests are said to be valid by claiming that it predicts test scores and life success. IQ and other similar tests are different versions of the same test, and so, it cannot be said that they are validated on that measure, since they are relating how “well” the test is valid with previous IQ tests, for example, the Stanford-Binet test. This is because “Most other tests have followed the Stanford–Binet in this regard (and, indeed are usually ‘validated’ by their level of agreement with it; Anastasi, 1990)” (Richardson, 2002: 301). How weird… new tests are validated with their agreement with other, non-construct valid tests, which does not, of course, prove the validity of IQ tests.

IQ tests are constructed by excising items that discriminate between better and worse test takers, meaning, of course, that the bell curve is not natural, but forced (see Simon, 1997). Humans make the bell curve, it is not a natural phenomenon re IQ tests, since the first tests produced weird-looking distributions. (Also see Richardson, 2017a, Chapter 2 for more arguments against the bell curve distribution.)

Finally, Richardson and Norgate (2014) write:

In scientific method, generally, we accept external, observable, differences as a valid measure of an unseen function when we can mechanistically relate differences in one to differences in the other (e.g., height of a column of mercury and blood pressure; white cell count and internal infection; erythrocyte sedimentation rate (ESR) and internal levels of inflammation; breath alcohol and level of consumption). Such measures are valid because they rely on detailed, and widely accepted, theoretical models of the functions in question. There is no such theory for cognitive ability nor, therefore, of the true nature of individual differences in cognitive functions.

That “There is no such theory for cognitive ability” is even admitted by lead IQ-ist Ian Deary in his 2001 book Intelligence: A Very Short Introduction, in which he writes “There is no such thing as a theory of human intelligence differences—not in the way that grown-up sciences like physics or chemistry have theories” (Richardson, 2012). Thus, due to this, this is yet another barrier against IQ’s attempted validity, since there is no such thing as a theory of human intelligence.

Conclusion

In sum, neuroimaging meta-analyses (like Jung and Haier, 2007; see also Richardson and Norgate, 2007 in the same issue, pg 162-163) do not show what they purport to show for numerous reasons. (1) There are, of course, consequences of malnutrition for brain development and lower classes are more likely to not have their nutritional needs met (Ruxton and Kirk, 1996); (2) low classes are more likely to be exposed to substance abuse (Karriker-Jaffe, 2013), which may well impact brain regions; (3) “Stress arising from the poor sense of control over circumstances, including financial and workplace insecurity, affects children and leaves “an indelible impression on brain structure and function” (Teicher 2002, p. 68; cf. Austin et al. 2005)” (Richardson and Norgate, 2007: 163); and (4) working-class attitudes are related to poor self-efficacy beliefs, which also affect test performance (Richardson, 2002). So, Jung and Haier’s (2007) theory “merely redescribes the class structure and social history of society and its unfortunate consequences” (Richardson and Norgate, 2007: 163).

In regard to neuroimaging, pooling together (meta-analyzing) numerous studies is fraught with conceptual and methodological problems, since a high-degree of individual variability exists. Thus, attempting to find “average” brain differences in individuals fails, and the meta-analytic technique used (eg by Jung and Haier, 2007) fails to find what they want to find: average brain areas where, supposedly, cognition occurs between individuals. Meta-analyzing such disparate studies does not show an “average” where cognitive processes occur, and thusly, cause differences in IQ test-taking. Reductionist neuroimaging studies do not, as is popularly believed, pinpoint where cognitive processes take place in the brain, they have not been established and they may not even exist.

Nueroreductionism does not work; attempting to reduce cognitive processes to different regions of the brain, even using meta-analytic techniques as discussed here, fail. There “probably cannot” be neuroreductionist explanations for cognition (Uttal, 2014), and so, using these studies to attempt to pinpoint where in the brain—supposedly—cognition occurs for such ancillary things such as IQ test-taking fails. (Neuro)Reductionism fails.

Since there is no theory of individual differences in IQ, then they cannot be construct valid. Even if there were a theory of individual differences, IQ tests would still not be construct valid, since it would need to be established that there is a mechanistic relation between IQ tests and variable X. Attempts at validating IQ tests rely on correlations with other tests and older IQ tests—but that’s what is under contention, IQ validity, and so, correlating with older tests does not give the requisite validity to IQ tests to make the claim “IQ tests test intelligence” true. IQ does not even measure ability for complex cognition; real-life tasks are more complex than the most complex items on any IQ test (Richardson and Norgate, 2014b)

Now, having said all that, the argument can be formulated very simply:

Premise 1: If the claim “IQ tests test intelligence” is true, then IQ tests must be construct valid.

Premise 2: IQ tests are not construct valid.

Conclusion: Therefore, the claim “IQ tests test intelligence” is false. (modus tollens, P1, P2)