If you or a family member, beset by a clinical or neurological problem, are given a face-to-face intelligence test, it is likely to be a Wechsler. It is considered the gold standard, and the Full Scale IQ result, the consequence of spending over an hour doing the 10 subtests, is like doing the decathlon: you get a reliable result across a broad range of intelligence domains. It was David Wechsler’s pragmatic approach of testing different skills which gave the Wechsler dominance over other tests, particularly in the clinical domain. In the case of a head injury, even a mild one; or when investigating incipient memory problems or childhood developmental disorders, it is usual to give a broad-band intelligence test to act as a baseline for other investigations. The positive manifold holds across a very broad domain of abilities. What may be evidence of a memory deficit in a bright person may be normal memory in a less able person. To some extent all specialized clinical tests act in the shadow of general intelligence.

Although Wechsler tests are the best known, others like the Woodcock Johnson have established a very good record, particularly in occupational settings, and yet others like Raven’s Matrices have been widely used cross-culturally. All of them work and produce very useful results. On average, a retest on normal samples will be within 4 IQ points of the original test. Others on the list, as you will see below, include the Kaufman Assessment Battery, Stanford Binet, Woodcock Johnson, Differential Ability Scale and the Reynolds Intelligence Assessment tests.

However, there is a problem which might impact you if you are given the Wechsler tests. Apart from the Full Scale result they also produce factor scores, and sometimes test givers interpret these factor scores and the discrepancies between them in ways which may go beyond prudent extrapolation. Why is this? Well, you already know my dictum “No-one gets around sampling theory, not even the Spanish Inquisition”.

In my ancient professional history, the Wechsler had 5 verbal subtests, plus one supplementary test; and 5 non-verbal tests, plus one supplementary. So, in addition to the full total, there were two factors, a Verbal IQ and a Performance IQ. Given that each factor summed up 5 or even 6 tests, that seemed reasonable. This approach held up for decades, leading to easy comparability between childhood and adult results, and between middle age and elderly testing. What happened next was either exciting or dismaying, depending on your attitude to what is now called an “update”.

Test constructors seem to have decided to extract more factors from the same subtests, perhaps to give users the impression that they were getting more bang for their buck. This is very silly, because all test result summaries are based on the individual items taken, and it is only worth declaring that a factor exists if it accounts for a large proportion of the variance. Otherwise, just give the subtest scores. If you extract too many factors you run into a sampling problem: your ratio of test items to factors goes down, and although it is not visible to test users, the reliability and validity are compromised. I was taught a simple rule of thumb: you should have at least five times as many people as variables. It might also be the case that you should have at least 5 subtests to each factor. As you will see below, there is a general trend for all intelligence tests to have a lower ratio of tests to factors. That is, the length of the test is pretty static, but as the years go by most of them claim to have found more factors.

Revisiting the Historical Increase in the Number of Factors Measured by Commercial Intelligence Tests: An Update and Extension of Frazier and Youngstrom (2007)

Ryan J. McGill and Thomas J. Ward, Thomas W. Frazier and Eric A. Youngstrom. ISIR 2018 poster session.

The standardization sample sizes are mostly pretty good, given that this is an expensive process involving face to face testing of over an hour. However, some of the samples are small, and the presumed factors identified unbelievably high. Here is the full poster:

https://drive.google.com/file/d/1oNDWhNIcXiX6IrPqGqEmNTrOPLsDvLHk/view?usp=sharing

Indeed, the same authors say of the most frequently used WISC V (Wechsler Intelligence Test for Children) that the best factorial solution for the test is one factor: general intelligence.

Construct validity of the Wechsler Intelligence scale For Children – Fifth UK Edition: Exploratory and confirmatory factor analyses of the 16 primary and secondary subtests

Gary L. Canivez1*, Marley W. Watkins2 and Ryan J. McGill3

1Eastern Illinois University, Charleston, Illinois

2Baylor University, Waco, Texas

3William & Mary, Williamsburg, Virginia

https://drive.google.com/file/d/1vlljbuz4PWH3jIXvDScGWamFICYsRIPA/view?usp=sharing

In sum, the tests are fine, but too many factors are being claimed. This allow some clinicians free rein to speculate as to why a person does well on one factor and not another, proposing that there is a deficit due to some extraneous cause. This is a common claim in medico-legal cases. I think that test constructors and interpreters should stick to more reliable, valid and prudent claims. Intelligence tests, so long as they sample a broad range of abilities can give you an accurate measure of overall ability. They can probably distinguish between verbal and non-verbal, but not always with confidence. At a pinch they can hazard a guess about three factors, but that is pushing it.

Don’t let yourself or your family members be over-factored.