The changes, underway in some cities and states, are intended to provide meaningful feedback and, critically, to weed out weak performers.

More than half the states now require new teacher evaluation systems and New York City will soon have one, too, because of a deal announced last week in Albany.

NEW YORK — Across the country, education reformers and their allies in both parties have revamped the way teachers are graded, abandoning methods under which nearly everyone was deemed satisfactory, even when students were falling behind.

And here are some of the early results: In Florida, 97 percent of teachers were deemed effective or highly effective in the most recent evaluations. In Tennessee, 98 percent of teachers were judged to be “at expectations.” In Michigan, 98 percent of teachers were rated effective or better.


Advocates of education reform concede that such rosy numbers, after many millions of dollars developing the new systems and thousands of hours of training, are worrisome.

“It is too soon to say that we’re where we started and it’s all been for nothing,” said Sandi Jacobs, vice president of the National Council on Teacher Quality, a research and policy organization. “But there are some alarm bells going off.”

The new systems, a central achievement of the reform movement, generally rate teachers on a combination of student progress, including their test scores, and observations by principals or others.

The Obama administration has encouraged states to adopt the new methods through grant programs like Race to the Top.

The teachers might be rated all above average, like students in Lake Wobegon, for the same reason that the older evaluation methods were considered lacking.

Principals, who are often responsible for the personal observation part of the grade, generally are not detached managerial types and can be loath to give teachers low marks.


“There’s a real culture shift that has to occur and there’s a lot of evidence that that hasn’t occurred yet,” Jacobs said.

But even the part of the grade that was intended to be objective, how students perform on standardized tests, has proved squishy.

In part, this is because tests have changed so much in recent years — and are changing still, because of the new Common Core curriculum standards that most states have adopted — that administrators have been unwilling to set the test-score bar too high for teachers. In many states, consecutive “ineffective” ratings are grounds for firing.

“We have changed proficiency standards 21 times in the last six years,” said Jackie Pons, the schools superintendent for Leon County, Fla.. In the county, 100 percent of the teachers were rated “highly effective” or “effective.”

“How can you evaluate someone in a system when you change your levels all the time?” Pons asked.

Until recently, Florida teachers were typically observed once a year for about 20 minutes and deemed satisfactory or unsatisfactory.

Roughly 100 percent of them were rated satisfactory in 2010-11. Florida districts are spending $43 million in federal Race to the Top grant money on devising and beginning new methods.

Generally, 50 percent of the evaluation is now based on administrators’ observations of teachers and 50 percent on student growth as measured by test scores (districts can alter that ratio to some extent).

For the observation part, teachers are no longer rated simply on “classroom management” and “planning,” but rather on 60 specific elements, including “engaging students in cognitively complex tasks involving hypothesis generation” and “testing and demonstrating value and respect for low expectancy students.”


One Leon County principal, Melissa Fullmore of Ruediger Elementary school in Tallahassee, said that had it been solely up to her, one or two of her teachers would have been graded “highly effective,” the top category. Three would have been marked “needs improvement,” one rung up from the bottom, and the rest would have fallen under “effective.”

But because Leon County set the test score bar so low, when their marks came out, all but one were highly effective, and the other was categorized as effective.

“I wouldn’t put stock in the numbers,” Fullmore said.

Grover J. Whitehurst, director of the Brown Center on Education Policy at the Brookings Institution, said variations in teacher quality had been proven to affect student academic growth.

If an evaluation system is not finding a wider distribution of effectiveness, “it is flawed,” he said.