I started this post to refute some specific arguments, but I changed my mind midstream and decided to add a lot more material than I initially envisioned. This is best viewed as being akin to a FAQ (Frequently Refuted Objections – FRO?) relating to the standardized tests and their use in higher education.

SAT and income are not perfectly correlated

The SAT is certainly modestly correlated with parental income, but it is simply not true that the SAT is nothing more than a measure of family income.

I will briefly plot the 2011 SAT reading scores by income level to illustrate that the r**2 is considerably less than one.

There is significant overlap across the entire income distribution:

Note: These are simulated distributions assuming the data are approximately normally distributed at each income level. This is close enough to the truth to approximate proportions we are apt to find of each group at particular score levels or vice versa (likely under-estimates the amount of overlap and outliers to some degree). You might also note that some of these simulated scores exceed the maximum (800) and minimum (200) scores the college board reports. That is a actually function of arbitrary score ceilings or floors. The true distribution of “ability” would resemble this if they didn’t set arbitrary floors and ceilings (and evidence suggests that it would continue to predict in a similar fashion).

Income correlates well with lots of tests

Similarly sized income gaps are found on essentially all decent tests (including, notably, the ACT).

Likewise, the linear SES-test score relationship is NOT unique to the college standardized tests either. Similar patterns are found in NAEP scores.

These empirical regularities have been noted by others too

If you aggregate this sort data at a high level (or really any reasonably well linearly correlated variables) you can, of course, produce much stronger correlations as noise, measurement error, and other weakly correlated sources of variance tend to get averaged out.

Mean GPA, academic rigor, and the significance HSGPA varies with SES

Low SES students typically take less rigorous courses and they tend to be graded more easily (even within the “same” subject). To wit, see the relationship between GPA and NAEP test scores by school SES (percent of school eligible for free or reduced price lunch–it’s widely used as a proxy for SES).

This despite the fact that mean GPAs are significantly lower in low SES schools.

In short, the gaps the SAT is measuring across income groups are both real and significant.

The same patterns hold true across racial groups, i.e., differences in GPA proportions and differences in the expected NAEP given the same GPA level.

Blacks earning a 3.75-4.00 GPA in 12th grade obtain mean NAEP scores below the mean score of whites earning 3.00-3.75 (about halfway between that and 2.50-2.99).

Likewise, blacks are considerably less likely to earn high grades.

Whites and asians/pacific islanders are more than 300% as likely to obtain a 12th grade GPA greater than 3.75. This excludes dropouts and doesn’t even account for academic rigor (in other words, it’s generally harder for the same person to earn an A in advanced calculus course than remedial algebra).

Even controlling for SES, different racial groups see different average scores

If the SAT truly only measured income, we would also expect the differences across racial groups to be equalized once we control for income. We certainly do not find this. To the contrary, low income whites typically perform about as well as high income blacks (holding income constant we find a bit less than a one standard deviation gap between whites and blacks).

SES & B-W cognitive gaps start very young and keep growing

Significant predictive differences in cognitive ability are found as young as 18 months of age by SES.

Likewise, the black-white gap is found at least as early as 36 months of age.

The income and the black-white gaps are well established by 3rd grade and they grow a bit by 11th grade.

This is yet one more reason why test prep, fancy tutors, and the like are extremely unlikely to explain much of anything here.

The B-W test score gaps are not well explained by schools

Nor are the B-W gaps apt to be well explained by differences in “school quality” because the gaps are large within the very same elementary schools and they are even larger in higher performing schools.

These states tests are, incidentally, strong predictors of SAT and ACT scores.

(note: you can also read this study by Roland Fryer for more confirmation of B-W gaps even controlling for school, classroom, SES, birthweight, etc)

The differences in “school quality” are generally overstated

There is more variation in test scores within the very same classrooms than within schools, districts, and states.

If one can merely “purchase” test scores through income or wealth, we would expect much less variance within the school or classroom relative to broader (geographic) measures like district, state, or nation. That we don’t find this ought to raise questions in the minds people that argue the SAT and the like “only measure income”.

The SAT’s association with family income is almost entirely mediated by family education

This has been found in multiple studies.

Parents’ income has a significant association with SAT scores, but parents’ education is consistently stronger, and regression with effective controls for race, education, and other factors, usually suppresses the income variable to insignificance. The income variable achieved significance when the education threshold was high school diploma most likely because so few parents were dropouts that education was no longer effectively controlled, and parents’ income became a proxy variable for parents’ education…. Part of this dominance could result from heritability in test performance corresponding to parents’ educational attainment, given the high heritability estimates from twins studies for high-stakes standardized exams in the UK and the Netherlands (Bartels et al, 2002; et al, 2013).

Or see this report from University of California system.

Note that in all cases parents’ education level is a much stronger predictor. For instance, SAT-V correlates with income at 0.16 where it correlates with education at 0.39. Parent education also correlates much better with key outcomes: 1st year GPA, cumulative GPA, 4-year graduation rates, etc. When people talk about socio-economic status (SES) keep in mind that that usually significantly involves parent education and/or occupational status (which is also proxy for education) and that these components are doing most of the work when it comes to predicting outcomes like these. [Also parents’ education and income correlate at 0.33 in this dataset]

SES does not significantly mediate the predictive power of the SAT

Column 3 here shows that the residual attributable to SES is very small, -0.01 adjusting for national range restriction, and slightly positive within most schools (implying that high SES are somewhat under-predicted by SAT alone at an institution level).

This same analysis shows that SES and HSGPA are correlated nationally (column one, adjusted for range restriction) and that, in fact, HSGPA alone tends to under-predict high SES people (which goes to my earlier evidence concerning NAEP at given GPA levels).

Most competitive schools use something akin to academic index, an equally weighted average of HS GPA and SAT (and sometimes SAT II), to ball-park estimate students academic prospects, meaning that on average high SES people are under-predicted in practical terms. It’s a sure bet that if they don’t re-weight for the known academic rigor of the school and the curriculum the student took that they’ll systematically under-predict high SES students outcomes.

Some have misconstrued the Univ. of Calif. data to argue otherwise

Similar results were found by that much misconstrued University of California study. However, few of those that talked it up knew or took care to mention the SAT was only weakened by the inclusion of the SAT II into the analysis (which is also well correlated with the SES and the SAT I).

Without including SAT II and parent SES (income and education) into the analysis their beta weights (standardized regression coefficients)for SAT I would have been much higher. It’s also worth pointing out that these beta weights implies that they’d be giving high SES applicants extra-weight! This is the result of the under-prediction of HS GPA for high SES people.

A subsequent reanalysis of the same California university data makes my points clearer (see models 2 and 4).

Even controlling for parent SES and California high school rank (API) the SAT I has a total beta weight of about .38 in model 2 (0.28 + .10) and a beta of .23 in model 4 with the addition of HS GPA…. but again, this is controlling for HS GPA. Unless the anti-test people are willing to take many points off the HS GPA of high SES people or people that attend less competitive schools it’s nonsensical to argue that HSGPA is an appreciably stronger predictor! Without controlling for SES/school quality measures HSGPA loses much of its validity due to its statistical bias (especially systematic under-prediction of high SES).

The whole point of using standardized tests is that they are relatively unbiased predictors that allow for reasonable apples-to-apples comparisons, i.e., they don’t require adjustments for major systematic error. Their advocacy analysis really should have presented what this would look like without any adjustments for parent income, parent education, or school quality because most people want the admissions criteria to be at least neutral.

Someone else re-ran this analysis actually:

This multivariate regressions summary table strongly suggests that a simple model using SAT I and HS GPA (model “D”) is a good predictor and that adding family income, parent education, and SAT II into the mix does little to improve the predictive validity. Model “C” also suggests that, contrary to the “income measurement” people, adding income and education does little to weaken the strength of the SAT I (compare to “D”).

Nor are these tests biased against minorities

According to independent analysis of university of california’s data, blacks and hispanics are somewhat over-predicted by SAT I and SAT II. They are over-predicted a good deal more by HSGPA though.

This is generally consistent with analysis at a national level:

Although the national residuals are quite a bit larger (probably the result of relatively less range restriction).

The predictive power of the SAT is vastly under-estimated

Because students apply to different institutions based on their ability, because schools reject less qualified applicants, and because students tend to sort into different majors and take different courses based, in large part, on their academic strength (or lack thereof), the nominally reported correlations reported tend to seriously downplay the strength of this predictor in the national admissions strategy context. Many of these effects fall under the category of range restriction and can be adjusted fairly easily. Others, like differential course selection behaviors, require more sophisticated methods to estimate their true effects.

If we look within institutions we typically find that SAT scores correlate with GPA at about 0.36. However, after adjusting for range restriction and course difficulty (within and across schools) the correlation coefficient increases to .67. Adding in HS-GPA increases the prediction to 0.78 (correcting for range restriction and course difficulty).

The strength of this prediction does not weaken past freshman year

That it to say that it predicts freshman, sophomore, junior, and senior years equally well for all intents and purposes.

Holding HSGPA constant the SAT offers significant incremental validity

The SAT is well correlated with IQ tests

The SAT correlates about as well with IQ tests as one IQ test correlate with other IQ tests (or the PSAT correlates with the SAT). In fact, I couldn’t find any statistically significant income effect controlling for IQ scores when I analyzed NLSY97.

I found more indicator of an income “bias” in high school GPA:

Lumosity’s cognitive tests show strong correlations with SAT across universities

Do brain games essentially function as IQ tests? A recent analysis suggests they do. Data scientist Daniel Sternberg conducted an interesting analysis using Lumosity data. In his article titled Lumosity’s Smartest Colleges, he analyzed the scores of 89,699 users between the ages of 17 and 25 who attended a college or university and played the game for the very first time. He then examined he correlations between the median SAT and ACT scores (from the universities they attended) with performance on the aggregate score on Lumosity’s tests, which include the areas of Speed, Attention, Flexibility, Memory, and Problem Solving. So just like traditional intelligence and IQ tests, Lumosity has different measures of cognitive function. The correlation between the SAT and Lumosity score (r = .85) and the ACT and Lumosity score (r = .84) were both reasonably high. Here is the graph:

Research shows that some video games can be used as good measures of general intelligence (if we extract the general factor):

It is likely that lumosity’s games functions in a similar way (even if their product is unlikely to change general intelligence). This evidence is at least highly suggestive.

SAT test prep has little effect

SAT test prep generally have very modest effects (at best). Multiple studies have demonstrated this point.

By far the largest effect sizes belong to the those preparation activities involving either a commercial course or private tutor [NEVERTHELESS THE SCORE CHANGES ARE NOT LARGE], and the effects differ for each section of the SAT. On average students with private tutors improve their math scores by 19 points more than those students without private tutors. The effect is less on the verbal section, where having a private tutor only improves scores on average by seven points. Taking a commercial course has a similarly large effect on math scores, improving them on average by 17 points, and has the largest effect on verbal scores, improving them on average by 13 points. With the exception of studying with a book, no other activity analyzed in this manner has an effect on test score changes that is statistically different from zero at a .05 significance level. … Does test preparation help improve student performance on the SAT and ACT? For students that have taken the test before and would like to boost their scores, coaching seems to help, but by a rather small amount. After controlling for group differences, the average coaching boost on the math section of the SAT is 14 to 15 points. The boost is smaller on the verbal section of the test, just 6 to 8 points. The combined effect of coaching on the SAT for the NELS sample is about 20 points.

20 combined points is equal to about 0.09 standard deviations. These are really modest effects.

Test prep rates do not vary all that much

Test prep varies little with income levels:

They do vary somewhat with race, but whites are the least likely of any major group to take it and when they do they see the smaller gains.

Blacks are significantly more likely to do test prep (according to several studies) and they see somewhat larger gains! Regardless, given the minimal differences in test prep and ample evidence that test prep has small effects even when used, it’s extremely unlikely to explain much of the systematic patterns we find nationwide with respect to SES or race.

SATs predict graduation rates between schools too

Using IPEDS it is possible to estimate the effect of the schools (estimated) median SAT score on graduation rates and other outcomes.

source

Correlation coefficients

White: 0.78

Asian: 0.69

Black: 0.70

Hispanic: 0.71

Women: 0.79

Men: 0.82

Total: 0.82

When schools systematically discount standardized tests the effects are very obvious

For instance, US law schools grant admissions preferences of approximately two standard deviations to blacks across the entire pecking order (save HBCUs):

As a consequence of these policies trickling down nationally, approximately 50% of black law students are clustered in the bottom decile of their classes and most of them aren’t much higher than that.

This is a direct result of misguided policies like affirmative action and the results are highly predictable (like the SAT, the LSAT is a relatively unbiased predictor). Moreover, these effects cary on over to graduation, bar passage rates, and even, amongst those that graduate and pass the bar, their employment success. They actually do somewhat worse than expected due to mismatch.

Similar outcomes are seen at the undergraduate level and in other competitive graduate and professional school programs (there is a reason why affirmative action doesn’t stop in undergrad….)

Heritability explains a great deal of these systematic SES relationships

There is a large and growing body of evidence that many behavioral/personality traits (phenotypes) are highly heritable and that, contrary to popular imagination and many social scientists, the so-called “shared environment”, i.e., that which siblings share in common (parents, housing, schools, neighborhoods, etc), explains very little of the variance on average.

Intelligence, one of the key traits here, is estimated to be more than 50% heritable according to many studies in rich western counties. The shared environment is pretty consistently close to zero (the remainder being “unshared”, i.e., measurement error or other influences that siblings do not share systematically). There are other relevant traits that affect outcomes like academic achievement and many of them are also heritable.

See this study on GCSE heritability in the UK (one of the main measures they use for admissions). They estimate that the GCSE score itself, which is almost certainly more “study-able” than our less curriculum-laden tests like SAT and ACT, are more than 60% heritable (although the shared environment effect appears to be non-trivial)

Meanwhile, we know that intelligence is a decent predictor of adult SES:

In fact, intelligence is a better predictor of (future) SES than parent SES. Intelligence also predicts education and occupational status better than it does income.

I do not want to get into the weeds too much with this particular topic, but to try (briefly) to open your mind to the possibility that the parent SES association with academic, occupational, and other forms of “success” are mostly explained through the heritability of intelligence and other phenotypes of interest (e.g., conscientiousness, motivation, personality/extraversion, etc). Most people that attain high SES are significantly more intelligent than average and we know that intelligence is heritable. Even allowing for some regression towards the mean (which is *not* 100%), we should well expect that child SES would be quite well associated with the intelligence and other phenotypes that helped shape their parents’ success — especially over the course of several generations. We do not need to be a perfect “meritocracy” to find that heritability explains a great deal of these observed relationships.

If you look closely at SES or income mobility studies you will find that they resemble more commonly accepted heritable traits like height:

Note: If there were zero SES mobility, if children inherited their parents’ SES on average (no systematic bias in either direction), this slope should be approximately 1 (much more vertical than this than the regression line in this plot). To the contrary, we find that the highest SES are significantly more downwardly mobile (relative to their parents) than those at the 50th percentile (~0 change) and those at the bottom are significantly more upwardly mobile (an average increase of ~30 percentile points from bottom). Put differently, most of the relative “immobility” is happening in the middle, not the top or the bottom.

Although there is clearly some regression towards the mean, (white) males with tall fathers tend to be quite a bit taller than average and males with short fathers tend to shorter than average (even if taller than their fathers).