PROVIDENCE, R.I. [Brown University] —Despite major reforms to evaluation systems designed to help school administrators distinguish between effective and ineffective teachers, a new study finds that less than one third of the teachers perceived as ineffective by principals are formally rated that way.

“Districts and states have invested considerable time and resources to reform teacher evaluation systems,” said co-author Matthew Kraft, an assistant professor of education at Brown University. “These reforms have placed conversations about core instructional practice at the center of the evaluation process — a welcome change. However, top-down system designs combined with rushed and under-resourced implementation have undercut their potential to differentiate and support teacher effectiveness.”

The study examined the impact of reforms made to teacher evaluation systems in the wake of The Widget Effect, a seminal 2009 report by The New Teacher Project (TNTP) , a nonprofit founded by teachers that aims to address educational inequality through research, training and policy advancement. The report highlighted the failure of U.S. public schools to recognize and act on teacher effectiveness, finding that teachers were viewed not as “individual professionals, but rather as interchangeable parts,” because individual instructional performance was not being measured in an accurate or credible way.

“Today, almost every state has designed and adopted new teacher evaluation systems,” Kraft and co-author Emily Gilmour of Temple University wrote.

To assess the impact of those new systems, the authors compiled teacher performance ratings across 24 states that adopted major reforms systems since 2009 and found that the percentage of teachers rated as unsatisfactory had not changed in the majority of those states. As was true in the past, less than 1 percent of teachers are rated unsatisfactory.

The new ratings did create differentiation in one area — at the top. Evaluation systems with multiple rating categories now differentiate between good and great teachers, Kraft said, but the reforms were less successful at differentiating performance at the bottom of the scale.

A district case study

To dig into the data, Kraft and Gilmour studied teacher evaluation ratings in a large urban district in the Northeast that incorporated qualitative surveys administered to evaluators and interviews with principals. The surveys and interviews uncovered a large gap between the number of educators who were considered ineffective by evaluators and the number who were actually deemed ineffective in formal evaluations.

“On average, the evaluators who participated in our survey in 2012-13 estimated that 27.1 percent of all teachers in their schools were performing at a level below proficient,” the authors wrote. Those same evaluators predicted that only 23.6 percent of those low-performing teachers would receive an accordingly low rating, but in fact, only 6.6 percent received such an evaluation.

Kraft and Gilmour repeated the surveys two years later, after evaluators had more experience with the teacher evaluation system, and found a similar pattern. More than 19 percent of teachers were perceived as below proficient, but only 6.3 percent were formally rated below proficient.

Mind the gap

Interviews the authors conducted with principals shed light on why so few teachers receive below proficient ratings.

Principals said time constraints prevented them from giving a teacher a low rating, noting that the time required to observe and document a teacher’s performance, collect enough evidence to support a low rating and create and implement a plan for helping the teacher improve was prohibitive, according to the study. Some felt it was unfair to rate teachers as below proficient if they did not have the capacity to provide teachers with support. And if multiple teachers were performing poorly, a principal might take on a “triage” approach, reserving low ratings for those who needed the most help.

Other principals gave higher ratings to low-performing new or motivated teachers whose instructional performance could improve over time, while some withheld poor ratings for fear that the teachers would focus on the implications of the negative evaluation and not be receptive to constructive feedback moving forward. Still others wished to shield teachers from potential job loss or avoid unpleasant confrontations.

And some principals wished to focus on improving a low-performing teacher’s instruction rather than risk dismissing the teacher only to be saddled with a less effective educator from the district’s pool of excess tenured teachers.

Policy implications

“History shows that the success of policy initiatives depends on the will and capacity of local actors to implement reforms,” the authors wrote. Kraft and Gilmour suggested a variety of approaches to help principals align formal ratings with their actual assessments of instruction.

To address time constraints, the authors advocated for the development of a new evaluator role for expert teachers, like the Peer Assistance and Review system, for districts in which reducing principals’ roles in classroom observations is workable and desirable.

They also noted that adopting mutual-consent hiring, in which both the teacher and principal agree to a transfer, rather than requiring principals to fill vacancies through mandatory placement of tenured teachers from a district’s excess pool, could help principals feel confident that they could replace an ineffective teacher with an effective one.

Centralized professional development programs for teachers would enable principals to honestly rate low-performing teachers without fear that those teachers would be left without the resources to improve, the authors wrote. They also suggested that principals themselves receive training in how to navigate difficult conversations so they are able to effectively confront low-performing teachers.

Kraft and Gilmour also suggested approaching evaluation systems differently: “Systems that ask, ‘How is a teacher effective?’ rather than ‘How effective is a teacher?’ would recognize the full range of teachers’ strengths and weaknesses and, in doing so, provide a more precise picture of teacher effectiveness,” they wrote.