Teachers have complained that the test results fluctuate wildly from year to year. 'Small typo' mars teacher evaluations

A single missing suffix among thousands of lines of programming code led a public school teacher in Washington, D.C., to be erroneously fired for incompetence, three teachers to miss out on $15,000 bonuses and 40 others to receive inaccurate job evaluations.

The miscalculation has raised alarms about the increasing reliance nationwide on complex “value-added” formulas that use student test scores to attempt to quantify precisely how much value teachers have added to their students’ academic performance. Those value-added metrics often carry high stakes: Teachers’ employment, pay and even their professional licenses can depend on them.


The Obama administration has used financial and policy levers, including Race to the Top grants and No Child Left Behind waivers, to nudge more states to rate teachers in part based on value-added formulas or other measures of student achievement. Education Secretary Arne Duncan has credited D.C.’s strong recent gains on national standardized tests in part to the district’s tough teacher evaluation policy, which was launched by former Chancellor Michelle Rhee.

But teachers have complained that the results fluctuate wildly from year to year — and can be affected by human error, like the missing suffix in the programming code for D.C. schools.

“You can’t simply take a bunch of data, apply an algorithm and use whatever pops out of a black box to judge teachers, students and our schools,” Randi Weingarten, president of the American Federation of Teachers, said this week. The AFT and its affiliates have signed off on contracts that use value-added measures as a significant portion of teacher evaluations — including in D.C. — but Weingarten called the trend “very troubling” nonetheless.

( Sign up for POLITICO’s Morning Education tip sheet)

The problem in D.C. stemmed from “a very small typo” inserted into complex programming code during an upgrade earlier this year, said Barbara Devaney, chief operating officer of Mathematica Policy Research, the private firm that holds the contract to calculate value-added scores for the district.

Devaney said the firm employs stringent quality control, which in this case included 40 hours of meetings to review the updated model and an analysis by independent programmers paid to comb through the code line by line. Yet no one noticed the missing suffix until yet another routine quality review took place this November — after the district had already distributed bonuses, layoff notices and evaluation scores based on the value-added data for the 2012-13 school year, Devaney said.

Jason Kamras, chief of human capital for the district, said Mathematica had certified that its results were accurate and had passed its quality control inspection before the district acted on the scores. When the error was belatedly discovered, the firm immediately recalculated those scores.

The recalculations produced “very small differences” in individual teachers’ scores, Devaney said. “But small differences can sometimes have big implications,” she added.

Mathematica’s other clients use different programming codes, Devaney said, and thus the error should not affect other districts’ teacher ratings.

In all, the error affected 44 teachers in D.C. — about 10 percent of those who receive value-added scores based on their students’ standardized tests. Half were rated higher than they should have been and half were rated lower, Kamras said.

( Also on POLITICO: Full education policy coverage)

The teacher who was fired for an “ineffective” rating in fact should have been ranked “minimally effective,” Kamras said. Three other teachers who scored effective were, in fact, “highly effective” by the district’s scale and deserved bonuses of $15,000 apiece.

Kamras said the district has already reached out to the teacher who was mistakenly fired with a job offer and the promise of salary payments retroactive to the start of the school year. He said the bonuses for the three highly effective teachers will be distributed immediately.

Kamras said he didn’t know if any of the teachers whose ratings were inflated by the Mathematica error received bonuses they didn’t deserve. Even if they did, he said the district will not ask them to repay the money. No one’s evaluation will be lowered as a result of the new calculations, he said.

But some critics noted that it may be impossible for the district to “hold harmless” all teachers affected by the error, as Kamras intends. A study released earlier this year found that getting a poor rating prompted many teachers to leave the district or quit the profession, even though they were not fired. It’s unclear whether any of the affected teachers may have altered their career plans after receiving scores that were lower than they actually deserved.

Both Kamras and Devaney said they stood by the concept of value-added despite the glitch. “This does not diminish my faith in value-added [measures] whatsoever,” Kamras said.

But Devaney acknowledged that typos aren’t the only source of potential errors in value-added measures.

When New York City calculated value-added ratings last year, city officials acknowledged that a teacher rated at the 50th percentile of her peers might actually have been as low as the 23rd — or as high as the 77th, a huge margin of error that persisted even when the city used three years of student test data to smooth out bumps.

A study that Mathematica conducted for the Department of Education in 2010 found that value-added estimates “are likely to be quite noisy.” Indeed, the study concluded that even when three years of student test data are used, as many as 50 percent of teachers will be misidentified — deemed average when they’re actually better or worse than their peers, or singled out for praise or condemnation when they’re actually average.

“I do think that value-added is a legitimate model to assess students and teachers,” Devaney said, “taking into account that there is measurement error. Any quality assessment will have measurement error.”

Another potential problem: About 70 percent of educators don’t teach a subject or a grade that’s covered by a state’s standardized tests, so there’s no data on them to feed into the value-added formula. Some districts have addressed this issue by giving those teachers scores based on the test performance of the school as a whole — including students they’ve never taught. Others are developing standardized tests to judge the performance of students — and, by extension, their teachers — in subjects such as physical education, music, art and kindergarten.

Most districts that use value-added measures weigh a score as no more than 50 percent of a teachers’ overall performance rating. (In D.C., it’s 35 percent.) Other measures also carry weight, including the observations of principals who watch the teacher at work.

In a concession to some states’ concerns about the complexity of value-added teacher evaluations, the Education Department has offered to give them extra time before basing personnel decisions on those evaluations.