New York educators are pushing back forcefully against the state’s controversial teacher evaluation system. This spring, the Teachers Association of the cities of Rochester and Syracuse filed a lawsuit against the state, arguing that the ratings metrics unfairly penalize teachers of disadvantaged students. Now Sheri G. Lederman (PDF), a lifelong teacher from Long Island, is challenging her “ineffective” rating as arbitrary and capricious, based on an ill-conceived and misapplied statistical model of teaching quality.

These suits converge on the issue of whether teachers should be judged on the basis of student test scores, and New York state is poised to set a nationwide precedent on the use of value-added testing data in teacher evaluations. While most parents and administrators would agree that educational accountability is essential, thorny questions persist about how the art of teaching should be appraised in a data-driven culture.

Value-added evaluation systems have been celebrated by U.S. Secretary of Education Arne Duncan and his 2010 Race to the Top initiative for their potential to distinguish between highly effective and ineffective teachers. Value-added models (VAMs) draw from students’ prior test scores and their backgrounds, such as race and socioeconomic status, to forecast how well they ought to score on a current year’s standardized exam. If math or English students fail to reach these benchmarks, then their teachers are deemed ineffective. This expectation game, however, springs from Byzantine formulas that have been denounced by the American Statistical Association as wrongly measuring “correlation, not causation.”

For example, because tests are given only in certain subjects to certain age groups, 70 percent of educators in Florida last year received VAM rankings based on students or subjects they didn’t even teach. New York’s system determines whether a teacher is highly effective, effective, developing or ineffective, using a triad of measures: 20 percent based on value-added modeling of students’ state test scores, 20 percent on district level assessments and 60 percent on an array of other measures, such as classroom observations. Lederman’s value-added classification dropped two rungs in just one year despite having student test scores that were consistently more than double the state average for meeting standards.

One explanation for this change is that New York statisticians rejigger the VAM formula each year, effectively moving the goalposts without informing teachers. Furthermore, researchers at the University Of Colorado at Boulder found that tweaks to the formula for reading outcomes would alter the effectiveness ratings for more than 50 percent of Los Angeles public school teachers. In New York an ineffective rating cannot be appealed, which explains why the impetus behind Lederman’s suit is not monetary or political; rather, she seeks to have her score clarified and recalculated.

Erroneous evaluations can have real-world ramifications. The public release of teachers’ ratings can damage their professional reputations and set up future employment challenges, including denial of tenure or dismissal. Lederman has job security after a 17-year career, but green teachers, who are increasingly the norm in nationwide classrooms, face serious risks. They have no existing file of other evaluations and, without seniority, are often slotted into classrooms with underperforming students of different learning needs.

The danger of VAMs can be seen in the verdict of a May 2014 study published by the American Educational Research Association, which found no consistent correlation between teachers with high-scoring students and teachers who excelled in other metrics of effective schooling. Across six sample states, the report concluded, “The tests used for calculating VAM are not particularly able to detect differences in the content or quality of classroom instruction.”