Imagine that you are a gubernatorial candidate who is making education and college preparedness a key facet of your campaign. Consider these two state average SAT scores.

Quantitative Verbal Total

Connecticut 450 480 930

Mississippi 530 550 1080

Your data analysts assure you that this difference is statistically significant. You know that SAT scores are a strong overall metric for educational aptitude in general, and particularly that they are highly correlated with freshman year performance and overall college outcomes. Those who score higher on the test tend to receive higher college grades, are less likely to drop out in their freshman year, are more likely to complete their degrees in four or six years, and are more likely to gain full-time employment when they’re done.

You believe that making your state’s high school graduates more competitive in college admissions is a key aspect of improving the economy of the state. You also note that Connecticut has powerful teacher unions which represent almost all of the public teachers in the state, while Mississippi’s public schools are largely free of public teacher unions. You resolve to make opposing teacher unions in your state a key aspect of your educational platform, out of a conviction that getting rid of the unions will ultimately benefit your students based on this data.

Is this a reasonable course of action?

Anyone who follows major educational trends would likely be surprised at these SAT results. After all, Connecticut consistently places among the highest-achieving states in educational outcomes, Mississippi among the worst. In fact, on the National Assessment of Educational Progress (NAEP), widely considered the gold standard of American educational testing, Connecticut recently ranked as the second-best state for 4th graders and the best for 8th graders. Mississippi ranked second-to-worst for both 4th graders and 8th graders. So what’s going on?

The key is participation rate, or the percentage of eligible juniors and seniors taking the SAT, as this scatter plot shows.

As can be seen, there is a strong negative relationship between participation rate and average SAT score. Generally, the higher the percentage of students taking the test in a given state, the lower the average score. Why? Think about what it means for students in Mississippi, where the participation rate is 3%, to take the SAT. Those students are the ones who are most motivated to attend college and the ones who are most college-ready. In contrast, in Connecticut 88% of eligible juniors and seniors take the test. (Data.) This means that almost everyone of appropriate age takes the SAT in Connecticut, including many students who are not prepared for college or are only marginally prepared. Most Mississippi students self-select themselves out of the sample. The top performing quintile (20%) of Connecticut students handily outperform the top performing quintile of Mississippi students. Typically, the highest state average in the country is that of North Dakota—where only 2% of those eligible take the SAT at all.

In other words, what we might have perceived as a difference in education quality was really the product of systematic differences in how the considered populations were put together. The groups we considered had a hidden non-random distribution. This is selection bias.

*****

My hometown had three high schools – the local coed public high school (where I went), and both a boys and girls private Catholic high school. People involved with the private high schools liked to brag about the high scores their students scored on standardized tests – without bothering to mention that you had to score well on such a test to get into them in the first place. This is, as I’ve said before, akin to having a height requirement for your school and then bragging about how tall your student body is. And of course, there’s another set of screens involved here that also powerfully shape outcomes: private schools cost a lot of money, and so students who can’t afford to attend are screened out. Students from lower socioeconomic backgrounds have consistently lower performance on a broad variety of metrics, and so private schools are again advantaged in comparison to public. To draw conclusions about educational quality from student outcomes without rigorous attempts to control for differences in which students are sorted into which schools, programs, or pedagogies – without randomization – is to ensure that you’ll draw unjustified conclusions.

Here’s an image that I often use to illustrate a far broader set of realities in education. It’s a regression analysis showing institutional averages for the Collegiate Learning Assessment, a standardized test of college learning and the subject of my dissertation. Each dot is a college’s average score. The blue dots are average scores for freshmen; the red dots, for seniors. The gap between the red and blue dots shows the degree of learning going on in this data set, which is robust for essentially all institutions. The very strong relationship between SAT scores and CLA scores show the extent to which different incoming student populations – the inherent, powerful selection bias of the college admissions process – determine different test outcomes. (Note that very similar relationships are observed in similar tests such as ETS’s Proficiency Profile.) To blame educators at a school on the left hand side of the regression for failing to match the schools on the right hand side of the graphic is to punish them for differences in the prerequisite ability of their students.

Harvard students have remarkable post-collegiate outcomes, academically and professionally. But then, Harvard invests millions of dollars carefully managing their incoming student bodies. The truth is most Harvard students are going to be fine wherever they go, and so our assumptions about the quality of Harvard’s education itself are called into question. Or consider exclusive public high schools like New York’s Stuyvesant, a remarkably competitive institution where the city’s best and brightest students compete to enroll, thanks to the great educational benefits of attending. After all, the alumni of high schools such as Stuyvesant are a veritable Who’s Who of high achievers and success stories; those schools must be of unusually high quality. Except that attending those high schools simply doesn’t matter in terms of conventional educational outcomes. When you look at the edge cases – when you restrict your analysis to those students who are among the last let into such schools and those who are among the last left out – you find no statistically meaningful differences between them. Of course, when you have a mechanism in place to screen out all of the students with the biggest disadvantages, you end up with an impressive-looking set of alumni. The admissions procedures at these schools don’t determine which students get the benefit of a better education; the perception of a better education is itself an artifact of the admissions procedure. The screening mechanism is the educational mechanism.

Thinking about selection bias compels us to consider our perceptions of educational cause and effect in general. A common complaint of liberal education reformers is that students who face consistent achievement gaps, such as poor minority students, suffer because they are systematically excluded from the best schools, screened out by high housing prices in these affluent, white districts. But what if this confuses cause and effect? Isn’t it more likely that we perceive those districts to be the best precisely because they effectively exclude students who suffer under the burdens of racial discrimination and poverty? Of course schools look good when, through geography and policy, they are responsible for educating only those students who receive the greatest socioeconomic advantages our society provides. But this reversal of perceived cause and effect is almost entirely absent from education talk, in either liberal or conservative media.

Immigrant students in American schools outperform their domestic peers, and the reason is about culture and attitude, the immigrant’s willingness to strive and persevere, right? Nah. Selection bias. So-called alternative charters have helped struggling districts turn it around, right? Not really; they’ve just artificially created selection bias. At Purdue, where there is a large Chinese student population, I always chuckled to hear domestic students say “Chinese people are all so rich!” It didn’t seem to occur to them that attending a school that costs better than $40,000 a year for international students acted as a natural screen to exclude the vast number of Chinese people who live in deep poverty. And I had to remind myself that my 8:00 AM writing classes weren’t going so much better than my 2:00 PM classes because I was somehow a better teacher in the mornings, but because the students who would sign up for an 8:00 AM class were probably the most motivated and prepared. There’s plenty of detailed work by people who know more than I do about the actual statistical impact of these issues and how to correct for them. But we all need to be aware of how deeply unequal populations influence our perceptions of educational quality.

Selection bias hides everywhere in education. Sometimes, in fact, it is deliberately hidden in education. A few years ago, Reuters undertook an exhaustive investigation of the ways that charter schools deliberately exclude the hardest-to-educate students, despite the fact that most are ostensibly required to accept all kinds of students, as public schools are bound to. For all the talk of charters as some sort of revolution in effective public schooling, what we find is that charter administrators work feverishly to tip the scales, finding all kinds of crafty ways to ensure that they don’t have to educate the hardest students to educate. And even when we look past all of the dirty tricks they use – like, say, requiring parents to attend meetings held at specific times when most working parents can’t – there are all sorts of ways in which students are assigned to charter schools non-randomly and in ways that advantage those schools. Excluding students with cognitive and developmental disabilities is a notorious example. (Despite what many people presume, a majority of students with special needs take state-mandated standardized tests and are included in data like graduation rates, in most locales.) Simply the fact that parents typically have to opt in to charter school lotteries for their students to attend functions as a screening mechanism.

Large-scale studies of charter efficacy such as Stanford’s CREDO project argue confidently that they have controlled for the enormous number of potential screening mechanisms that hide in large-scale education research. These researchers are among the best in the world and I don’t mean to disparage their work. But given the enormity of the stakes and the truth of Campbell’s Law, I have to report that I remain skeptical that we have truly ever controlled effectively for all the ways that schools and their leaders cook the books and achieve non-random student populations. Given that random assignment to condition is the single most essential aspect of responsible social scientific study, I think caution is warranted. And as I’ll discuss in a post in the future, the observed impact of school quality on student outcomes in those cases where we have the most confidence in the truly random assignment to condition is not encouraging.

I find it’s nearly impossible to get people to think about selection bias when they consider schools and their quality. Parents look at a private school and say, look, all these kids are doing so well, I’ll send my troubled child and he’ll do well, too. They look at the army of strivers marching out of Stanford with their diplomas held high and say, boy, that’s a great school. And they look at the Harlem Children’s Zone schools and celebrate their outcome metrics, without pausing to consider that it’s a lot easier to get those outcomes when you’re constantly expelling the students most predisposed to fail. But we need to look deeper and recognize these dynamics if we want to evaluate the use of scarce educational resources fairly and effectively.

Tell me how your students are getting assigned to your school, and I can predict your outcomes – not perfectly, but well enough that it calls into question many of our core presumptions about how education works.