In the first large-scale analysis of new systems that evaluate teachers based partly on student test scores, two researchers found little or no correlation between quality teaching and the appraisals teachers received.

The study, published Tuesday in Educational Evaluation and Policy Analysis, a peer-reviewed journal of the American Educational Research Association, is the latest in a growing body of research that has cast doubt on whether it is possible for states to use empirical data in identifying good and bad teachers.

“The concern is that these state tests and these measures of evaluating teachers don’t really seem to be associated with the things we think of as defining good teaching,” said Morgan S. Polikoff, an assistant professor of education at the Rossier School of Education at the University of Southern California. He worked on the analysis with Andrew C. Porter, dean and professor of education at the Graduate School of Education at the University of Pennsylvania.

The number of states using teacher-evaluation systems based in part on student test scores has surged during the past five years. Many states and school districts are using the evaluation systems to make personnel decisions about hirings, firings and compensation.

The rapid adoption has been propelled by the Obama administration, which made the teacher-evaluation systems a requirement for any state that wanted to compete for Race to the Top grant money or receive a waiver from the most onerous demands of No Child Left Behind, the 2002 federal education law.

Thirty-five states and the District of Columbia require student achievement to be a “significant” or the “most significant” factor in teacher evaluations. Just 10 states do not require student test scores to be used in teacher evaluations.

Most states are using “value-added models” — or VAMs — which are statistical algorithms designed to figure out how much teachers contribute to their students’ learning, holding constant factors such as demographics.

Polikoff and Porter analyzed a subsample of 327 fourth- and eighth-grade mathematics and English-language-arts teachers across six school districts in New York, Dallas, Denver, Charlotte-Mecklenburg, Memphis and Florida’s Hillsborough County.

The data came from a larger project funded by the Bill and Melinda Gates Foundation known as the Measures of Effective Teaching. Polikoff and Porter’s work also was funded by a $125,000 grant from the Gates Foundation.

The researchers found that some teachers who were well-regarded based on student surveys, classroom observances by principals and other indicators of quality had students who scored poorly on tests. The opposite also was true.

Teacher-evaluation systems have stirred up controversy and some recent legal challenges.

The Houston Federation of Teachers filed a federal lawsuit this month charging that Houston’s “value-added” teacher-evaluation system violates educators’ rights.

Similar legal challenges have popped up in Tennessee and also in Florida, where teachers are in an uproar over a state system that assesses some educators using scores of students they never taught.

Last month, the American Statistical Association urged states and school districts against using VAM systems to make personnel decisions, noting that recent studies have found that teachers account for a maximum of about 14 percent of a student’s test score, with other factors responsible for the rest.

Polikoff said policymakers should rethink how they use VAM models.

“We need to slow down or ease off completely for the stakes for teachers, at least in the first few years, so we can get a sense of what do these things measure, what does it mean,” Polikoff said. “We’re moving these systems forward way ahead of the science in terms of the quality of the measures.”