Submitted on March 17, 2011

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

Over the weekend, New York Times columnist Nick Kristof made a persuasive argument that teachers should be paid more. In making his case, he also put forth a point that you’ve probably heard before: “One Los Angeles study found that having a teacher from the 25 percent most effective group of teachers for four years in a row would be enough to eliminate the black-white achievement gap."

This is an instance of what we might call the "X consecutive teachers” argument (sometimes it’s three, sometimes four or five). It is often invoked to support, directly or indirectly, specific policy prescriptions, such as merit pay, ending tenure, or, in this case, higher salaries (see also here and here). To his credit, Kristof’s use of the argument is on the cautious side, but there are plenty of examples in which it used as evidence supporting particular policies.

Actually, the day after the column ran, in a 60 Minutes segment featuring “The Equity Project," a charter school that pays its teachers $125,000 a year, the school’s principal was asked how he planned to narrow the achievement gap with his school. His reply was: “The difference between a great teacher and a mediocre or poor teacher is several grade levels of achievement in a given year. A school that focuses all of its energy and its resources on fantastic teaching can bridge the achievement gap."

Indeed, it is among the most common arguments in our education policy debate today. In reality, however, it is little more than a stylistic riff on empirical research findings, and a rough one at that. It is not at all useful when it comes to choosing between different policy options.

The "X consecutive teachers” idea has its roots in a manuscript by William Sanders and June Rivers, which was originally released in 1996. Few education papers have been more influential. Sanders/Rivers used a dataset that linked students to teachers over time (rare at that time), and presented a brief set of growth model estimates (though not the kind that eventually made Sanders famous – he and his colleagues now provide the value-added estimates in thousands of districts, including all of states such as Ohio).

They estimated teachers’ value-added scores, and sorted them into quintiles (20 percent intervals). They then calculated that those students who happened to have three top quintile teachers in a row (in third through fifth grade) saw very large cumulative gains relative to students who had three bottom quintile (lowest 20 percent) teachers. Sanders and Rivers made no direct mention of closing the achievement gap in this paper (instead expressing the three-year effect in terms of percentile gains), but the die was cast.

Subsequent papers sought to confirm and elaborate on the Sanders/Rivers analysis: For instance, Bob Mendro and his colleagues found similarly large effects in Dallas, and June Rivers did the same in her doctoral dissertation reanalysis of the Tennessee data, this time employing the famous “Sanders model” (I couldn’t find a link to the dissertation).

A few years later, in a 2004 paper, economists Eric Hanushek and Steven Rivkin presented a review of prior research related to improving teacher quality (see here for Diane Ravitch's account of this paper’s presentation). Citing their previous work, they noted that, in any given year, students of teachers at the 85th “effectiveness” percentile made one-year testing gains that were about one-fifth the average score differences between high- and low-income students. So, they suggested, five years in a row with these great teachers could overcome the achievement gap. (Full disclosure: Ravitch is a Shanker Institute board member.)

The illustration was also, of course, used in the analysis of Los Angeles data cited by Kristof, even though his characterization of the findings is not quite correct (the actual result was that the difference between top and bottom quartile teachers was roughly one-fourth the size of the achievement gap). So, while the definition of “top teachers," the type of gap (race/income), and the number of years needed to “close” it vary by study, the "X consecutive teachers” argument is now a fixture in our education debate. But what does it really mean?

The first thing to keep in mind is that most of these are just extrapolations. The researchers didn’t follow a group of low-performing students over five years, assigning some to five consecutive great teachers and then measuring the outcome. Instead, they took the average one-year gains among students of “top teachers” (however defined), and then determined how many of these one-year gains are equivalent to the average aggregate achievement gap.

In reality, of course, the students who made these big gains came from many different incomes and races. Some started out with higher scores, some low. Some students made slow, steady progress, while the gains by others were erratic. In short, one must be very careful in applying the estimated one-year testing gains among a large, diverse group of students to a hypothetical scenario in which a specific “type” of student (e.g., low-income) moved from one specific score to another (e.g., moving from the average for free lunch-eligible students to that of non-eligible students) over a period of years.

It is, at most, a hypothetical illustration of how big the gains are (at least according to these estimates), one that might be useful to those who are not accustomed to thinking in terms of standard deviations or percentiles. It is not an actual policy outcome.

On a similar note, the "X consecutive teachers” argument depends on the assumption that a teacher’s effect is “persistent” – that it does not diminish over time (see here for a great technical discussion of how this applies to Sanders and Rivers). So, for example, those students who gained 10 percentile points in one year with an “effective” teacher – going from, say, the 30th to the 40th percentile – would, if assigned to another one the next year, get to the 50th percentile, to the 60th the year after that, and so on. The assumption is that, each year, students start where they left off, and the effect of their previous teacher remains intact.

In contrast, there ample evidence that a teacher’s effect on test scores “decays” rather quickly over time, and there is still little idea of how to best account for this phenomenon (though recent advances are very interesting). Since only part of a teacher’s effect persists –students don’t retain a great deal of what they learn – it’s a bit implausible to take one-year gains and project them out across several years. The "X consecutive teachers” argument kind of treats test score gains like weight gains – you can just add them up – and this belies the complex, transitory nature of teaching and learning.

But these somewhwat technical issues are less important than what the "X consecutive teachers” argument actually means for policy. Outside of a general suggestion that teacher quality is important (with which nobody disagrees), it doesn’t mean much at all.

Perhaps most notably, because of the imprecision of these growth models, various sources of bias, and year-to-year variation in students and conditions, very few teachers manage to be “top” teachers for three, four or five consecutive years. A huge chunk of the “top” teachers in year one are average – or even below average – in year two. Even more of them fall out of the “top” bracket in the third, fourth, and fifth years.

The papers discussed above do not report how many teachers are actually top-rated for three, four, or five years. However, based on my colleague’s discussions with a couple of the authors, it seems that only about 5-7 percent of teachers are rated in the “top” category for three consecutive years. It would be even fewer over four or five years (to ballpark it, using a 50 percent year-to-stability rate [as in this paper], only 1-2 percent of teachers will be in the top quartile for five consecutive years). Making things worse, many who are consistently top-ranked will be misclassified as such, due to random error.

This is important, because the "X consecutive teachers” argument only carries concrete policy implications if we can accurately identify the “top” teachers. In reality, though, the ability to do so is still extremely limited.

So, in the context of policy debates, the argument proves almost nothing. All it really does - in a rather overblown, misleading fashion - is illustrate that teacher quality is important and should be improved, not that policies like merit pay, higher salaries, or charter schools will improve it.

This represents a fundamental problem that I have discussed before: The conflation of the important finding that teachers matter – that they vary in their effectiveness - with the assumption that teacher effects can be measured accurately at the level of the individual teacher (see here for a quick analogy explaining this dichotomy).

That said, let’s return briefly to Kristof’s column. Although the evidence on whether more money attracts better teachers is mixed, it is perfectly reasonable to believe that drastically higher starting salaries (over $100,000) could compel many talented folks, young and old, to pursue a teaching career when they otherwise wouldn’t (personally, I find the argument that merit pay would do so to be far less compelling). So, if Mr. Kristof (or the Equity Project principal) is arguing that we should boost teachers’ salaries, he’ll get no objection from me (or, I suspect, most teachers).

But the "X consecutive teachers” argument doesn’t help us evaluate whether this or anything else is a good idea. Using it in this fashion is both misleading and counterproductive. It makes huge promises that cannot be fulfilled, while also serving as justification for policies that it cannot justify. Teacher quality is a target, not an arrow.