In recent years, school districts from New York City to Los Angeles have revamped the way teachers are evaluated and rewarded. In the District of Columbia, teachers voted last year for the option to trade job security for merit pay, meaning that high-performing instructors could expect five-figure bonuses (and the laggards could fear pink slips). One particularly controversial component of D.C.’s move to greater teacher accountability is the use of “value-added” evaluations based on the test scores of students in a teacher’s classroom. New York Governor Andrew Cuomo also took a tentative step into this teacher evaluation minefield when he suggested in his State of the State on Wednesday that New York teachers statewide should be evaluated using standardize test scores. Critics of value-added measures worry that putting so much emphasis on standardized tests will create a culture of “teaching to the test,” and they speculate about whether standardized tests really tell us anything useful about students and whether they’re acquiring the skills they’ll need to lead successful lives.

The findings of a new National Bureau of Economic Research study released by economists at Harvard and Columbia suggest that at least some of these critics’ concerns may be misplaced. By following students in grades three through eight into adulthood, the research team was able to link, for the first time, value added performance evaluations to life outcomes we actually care about. The economists found that teachers who boosted standardized test scores also better prepared their students for later in life: Students who had high value-added teachers in grade school attended college at higher rates (and attended better colleges), were less likely to be teenage mothers, and earned more in early adulthood. High performing teachers may more than justify much higher pay.

Why has the so-called accountability movement been so late in coming to education when we’ve been hearing for decades that our schools are in crisis? Because evaluating teacher performance is difficult. To generate comparable performance metrics for teachers across entire school districts requires going beyond the subjective assessments of principals, administrators, and others. Most schools have come to rely on standardized tests to compare student performance, and these same tests are now employed to evaluate the instructors who teach them.

Of course, evaluating teachers based on the performance of their students is going to reward teachers for securing jobs in places with smart and diligent students, not for effective teaching. For example, Scarsdale students do far better on standardized tests than those in Harlem. This isn’t necessarily the result of higher-quality instruction in Scarsdale schools, but rather because Scarsdale—a high-income zip code known for its schools—attracts families that value education. Even within a single district, some schools naturally attract stronger students than others. If teachers and principals were evaluated based on the level of student test scores as compared to the performance of students in other schools, be they across town or across the country, they’d be tempted to put a lot of energy into luring high-performing students from other schools rather than improving the educations of the students they’ve got.

To get around this problem, teacher performance is gauged instead by measuring how students perform compared to how they performed at the end of the previous school year—this is the so-called value-added approach. If test scores are raised consistently for most students in a teacher’s classroom year after year, we can be pretty sure that it’s mostly because of the teacher. Of course, test score improvements can still be affected by all sorts of considerations beyond the teacher’s control, like class size, a community’s economic ups and downs, and random chance. The value added technique does its best to control for these myriad factors, and to provide some indication of how certain researchers are about whether student improvements resulted from great teaching or just dumb luck.

Value added as a measurement tool has been mired in unresolvable controversy, though, because there has been strong disagreement over whether researchers have succeeded in extracting the teacher-specific component to student performance gains, and whether high value-added teachers do anything beyond giving a short-term boost to test scores. Do the gains persist with time? And do teachers that boost test scores actually improve the longer-term life prospects of their students?

The authors of the new study—Raj Chetty and John Friedman of Harvard along with Columbia professor Jonah Rockoff—are the first to link teacher value-added measures to outcomes beyond the classroom, using data on 2.5 million students from a large urban school district who attended grades three through eight during 1989-2009. The two decades of test score history allowed the researchers to generate value-added estimates for district teachers. And since they were able to follow both teachers and students over so many years, the economists were also able to see what happened to test scores when a teacher got transferred to a new school. They found that the arrival of a high value-added teacher resulted in a noticeable bump up in student performance at the new school, and a significant drop in student test scores at the school she had just left. The pattern was reversed for low value-added teachers. These striking patterns (look at Figure 3 in the paper—no advanced math required) put to rest at least some of the concerns about whether value added scores measure teacher quality, or something else.

Even more critical, the researchers were able to link a teacher’s value-added rating to their students’ outcomes later in life. It turns out that the effects of high value-added teachers in grade school continue to reverberate into adulthood. A student who lucks into the classroom of a teacher who is in the top quarter of the district’s instructors is about 0.5 percentage points more likely to attend college than a student of a middle-of-the-road teacher—all the result of a single year of grade school education. If this doesn’t sound like a big number, multiply by 30, which is the average number of students in each classroom, and note that in this school district, only about a third of students attend college in the first place. There are similarly sizeable effects on teenage motherhood, college quality, and retirement savings.

Most strikingly, the study connects teacher value added to earnings in young adulthood. Having a high value-added teacher boosts incomes by about 1 percent relative to a mediocre instructor. Multiplying over a lifetime of earnings, and a classroom full of students benefiting from good instruction, the authors calculate that great teachers create more than a quarter of a million dollars in extra income for students in each of their classrooms.

Before concluding that great teachers should get six figure bonuses, rather than the relatively measly windfalls handed out to a select few D.C. teachers, a number of caveats are in order. First, one of the main critiques of evaluation based on student test scores is that it motivates instructors to “teach to the test.” Student test scores may be a useful measure of teacher quality; but when they’re actually used to evaluate and reward teachers, tests might start having a negative effect: Teachers who are too focused on the objective of maximizing end-of-year test outcomes may erase any long-term benefits. If the incentives were strong enough, teachers might even doctor their students’ scores for a bigger bonus or to save their own jobs. And it’s an open question as to whether such bonuses would keep the very best teachers in the classroom—the burnout rate of inner city teachers is notoriously high, and those who do stick it out are motivated at least in part by a higher calling than a $10,000 bonus check. On the other hand, the extra cash and recognition surely help.

The new study clearly isn’t going to end the debate on the use of value added measures in evaluating teachers, nor should it. The authors are careful to caution their audience about how the link between teacher value added and students’ life outcomes might change once testing becomes high stakes for teachers themselves. Yet keeping such caveats in mind, this study will hopefully provide some clear-eyed analysis that can move forward the often contentious discussion on how to improve American education.