Submitted on August 8, 2011

In education policy debates, the phrase “what works” is sometimes used to mean “what increases test scores." Among those of us who believe that testing data have a productive role to play in education policy (even if we disagree on the details of that role), there is a constant struggle to interpret test-based evidence properly and put it in context. This effort to craft and maintain a framework for using assessment data productively is very important but, despite the careless claims of some public figures, it is also extremely difficult.

Equally important and difficult is the need to apply that framework consistently. For instance, a recent working paper from the National Bureau of Economic Research (NBER) looked at the question of whether gifted and talented (GT) programs boost student achievement. The researchers found that GT programs (and magnet schools as well) have little discernible impact on students’ test score gains. Another recent NBER paper reached the same conclusion about the highly-selective “exam schools” in New York and Boston. Now, it’s certainly true that high-quality research on the test-based effect of these programs is still somewhat scarce, and these are only two (as yet unpublished) analyses, but their conclusions are certainly worth noting.

Still, let’s speculate for a moment: Let’s say that, over the next few years, several other good studies also reached the same conclusion. Would anyone, based on this evidence, be calling for the elimination of GT programs? I doubt it. Yet, if we applied faithfully the standards by which we sometimes judge other policy interventions, we would have to make a case for getting rid of GT.

For example, there is a common argument in education circles that teachers with master’s degrees do not produce larger test score gains than those without them, and so we should therefore stop providing those who have them with a salary “bump” (the same basic argument is often applied to giving teachers raises for additional years of experience). New York City recently shut down its schoolwide bonus program in the wake of a RAND evaluation finding that the program was not associated with higher test scores. In fact, many people argue that we should close entire schools if their students consistently fail to make progress on assessments, and open new schools if their operators demonstrate an ability to make such progress.

Are these programs and policies somehow different from gifted and talented programs? Not really. GT programs (and similar programs such as magnet/exam schools) cost money, which of course also means that they leave less funding for other interventions. If a solid body of research found that they offered no test-based benefits, how could those who argue for eliminating master’s bumps and closing low-performing schools based on the same evidence remain silent when it comes to GT programs?

The answer, it seems to me, is simple: People acknowledge that GT programs, offering special services such as advanced curricula, “hands-on” experience and specially-trained teachers, provide benefits that cannot be measured in terms of test score gains. Supporters think that they’re worth paying for, even if they don’t boost scores, because they believe that these programs improve children’s educational experience in ways that are less “tangible” but perhaps just as important.

This is an enlightened, nuanced view of “what works," as distinct from the clumsy and simplistic standard by which we sometimes (but not nearly always) judge other policies. Now, it’s certainly true that plenty of people do oppose GT and similar programs, but their arguments are typically based on other concerns, such as equity and fairness, rather than test-based results.

To be clear, I’m not trying to accuse anyone of hypocrisy. I understand that policy judgments often must be made based on imperfect evidence; and it’s difficult to find the sweet spot between this uncertainty and the need to act to improve performance. Moreover, while there are a few people who feel that standardized testing results should play absolutely no role in making decisions about teachers, students and schools, I am not among them (even if I frequently disagree with the manner in which they are used).

What I am saying is that our education discourse sometimes takes too far the viewpoint that boosting test scores is a sufficient measure of “what works," and it’s not always clear why the standard applies in some instances and not others. When pressed, nobody actually agrees with the “pure” test-based standard - even staunch advocates for using assessment data in high-stakes decisions usually acknowledge that these data are not, by themselves, sufficient to judge success and failure. Nevertheless, in some circumstances, we continue to behave as though it were sufficient – e.g., opening/closing schools because of operators’ record of test score gains (or lack thereof), or ending the practice of rewarding master’s degrees. Yet, if we were to apply that standard consistently, we would be calling for shutting down programs that the majority of citizens, parents, educators, and policymakers agree have intrinsic value, whether or not they improve the test-based bottom line.

In my opinion, test-based evidence can and should play a role in education policy decisions. But in the struggle to figure out how to use these data, let’s also pay attention to when we use them.

- Matt Di Carlo