(Photo: Thomas Lohnes/Getty Images)

The teaching evaluation is something of a staple for end-of-term college life. In theory, student feedback provides key information on how well a professor is doing in the classroom. In reality, a new analysis posted on the post-publication-review site ScienceOpen shows, students give their female instructors worse grades than their male counterparts—and there's no simple way to fix or compensate for that bias.

Of course, this is not the first time that researchers have criticized student evaluations of teaching (SETs); it's an open secret that they have more to do with students' grades than actual teaching effectiveness. Nor is it the first time they've been criticized for gender bias. Just last year, economist Anne Boring found strong bias against female instructors in first-year undergraduate courses in France, and, in 2014, researchers at the University of North Carolina showed that simply changing the name of an online course instructor from a man's name to a woman's led to worse SETs.



"I really think colleges and universities need to move away from using student evaluations for hiring, firing, and promotion decisions."

But those studies may themselves have been biased—statistically so, that is, because they used off-the-shelf approaches to data analysis that didn't line up with how the two studies had gathered their data, says co-author on this latest research, Philip Stark, an associate dean for mathematical and physical sciences at the University of California–Berkeley. So, Stark got in touch with the original researchers, ultimately teaming up with Boring and Stark's student Kellie Ottoboni to conduct the analysis correctly and to see what else the data had to tell them.

The results? "On average, male instructors get higher scores, [but] they get it for different reasons," Stark says. For example, the French data revealed more gender bias in some courses than others, and comparing the two studies suggests that different students are responsible for the bias in different contexts. In the French data, male students rated male instructors higher, and female students rated men and women about the same. In the North Carolina experimental study, female students were the ones who gave male instructors better scores. (By the way, judging by their students' final exam scores, men are no better, and maybe worse, instructors than women, Stark says.)

Complications like those mean "there's no way to fix the bias," Stark says. To do so, SETs—in particular, the degree of gender bias in SETs—would need to be comparable across academic subjects and educational contexts, but the data clearly shows they're not. "You can't just add half a point" to compensate for gender bias, Stark says.

Perhaps none of that would matter, except that academic institutions are increasingly using SETs to decide whom to hire and whether to grant junior faculty tenure—so if students' ratings are biased, so are those decisions. "I really think colleges and universities need to move away from using student evaluations for hiring, firing, and promotion decisions," Stark says.

Quick Studies is an award-winning series that sheds light on new research and discoveries that change the way we look at the world.