Submitted on February 2, 2012

Unlike many of my colleagues and friends, I personally support the use of standardized testing results in education policy, even, with caution and in a limited role, in high-stakes decisions. That said, I also think that the focus on test scores has gone way too far and their use is being implemented unwisely, in many cases to a degree at which I believe the policies will not only fail to generate improvement, but may even risk harm.

In addition, of course, tests have a very productive low-stakes role to play on the ground – for example, when teachers and administrators use the results for diagnosis and to inform instruction.

Frankly, I would be a lot more comfortable with the role of testing data – whether in policy, on the ground, or in our public discourse – but for the relentless flow of misinterpretation from both supporters and opponents. In my experience (which I acknowledge may not be representative of reality), by far the most common mistake is the conflation of student and school performance, as measured by testing results.

Consider the following three stylized arguments, which you can hear in some form almost every week:

Only one-third of our students are reading at grade level; our schools are failing; 95 percent of the teachers in this district receive satisfactory ratings, but that can’t be accurate, because only half the students are proficient in math and reading; These reforms are working – state test scores have risen steadily.

All three of these inferences are inappropriate for one primary reason: they fail to acknowledge that raw, unadjusted testing results – whether actual scores/proficiency rates or changes in those scores/rates – are not, by themselves, credible measures of school performance. They are largely (imperfect) measures of student performance. There is a difference.

Everyone involved in education knows that most of the variation in testing outcomes is “between students” – i.e., has to do with factors, most unmeasured/unobserved, that are attributes of the students themselves and their upbringing and environment (such as English proficiency, oral language development, background knowledge, family situation, etc.).

This well-established finding is sometimes interpreted to mean that schools (or teachers) can only exert minimal influence on student performance. That is false. Not only are schooling factors among the only targets within the purview of education policy, they can also be very influential. Improvements in the quality of schooling/instruction can have substantial effects on student outcomes (though I sometimes think we need to be more realistic about the pace of change).

Nevertheless, learning is complex and much (if not most) of it occurs outside of schools and/or before children reach schooling age. Test scores – and changes in those scores – are subject to these influences. A school with low test scores is not necessarily a “failing school," just as a school with very high scores is not necessarily successful.

Similarly, one should not assume that a school’s slow score growth is necessarily caused by a problem in that school. The reason why the research on school (and teacher) effects is so complex is that much of it is geared toward controlling for all of the external factors that can be measured and are known to affect outcomes. In other words, the analysis is trying to isolate that portion of student performance that can reasonably be attributed to school performance. A great deal of the raw variation is also simple random error.

Yes, when a group of students' test scores rise over a few years, that’s a pretty good tentative indication that the school is doing something correctly. But it’s all a matter of degree. The gains (assuming they’re even measured with longitudinal data, which they often are not) will also reflect factors (e.g., prior achievement levels) that have nothing to do with the school, to an extent that can vary widely. If you rely solely on unadjusted testing results, you don’t know. And if you don’t know, you risk making decisions based on erroneous assumptions.

The worst part is that this distinction – between tests as measures of student performance versus school performance – is ignored by policymakers just as frequently as it is in our public discourse.

States are closing schools, handing out ratings and awarding grant money based on horribly flawed misinterpretations of raw testing data. It’s one thing for journalists and the public to make this mistake; it’s something else entirely for the people we rely on to decide education policy to make it too.

In short, I would be a lot more optimistic about “data-driven decision making” if so many of the decision makers weren’t such erratic drivers.

- Matt Di Carlo