Imagine that you’re up for a promotion at your job, but before your superior decides whether you deserve it, you have to submit the comments section of an internet article that was written about you for assessment.

Sound a little absurd?

That’s in essence what we ask professors in higher education to do when they submit their teaching evaluations in their tenure and promotion portfolios. At the end of each semester, students are asked to fill out an evaluation of their professor. Typically, they are asked both to rate their professors on an ordinal scale (think 1­–5, 5 being highest) and provide written comments about their experience in the course.

In many cases, these written evaluations end up sounding more like something out of an internet comments section than a formal assessment of a professor’s teaching. Everything from personal attacks to text-speak (“GR8T CLASS!”) to sexual objectification has been observed by faculty members who dare to read their evaluation comments at the end of the semester.

But the fact that the evaluations can be cruel and informal to the point of uselessness isn’t even the problem. The problem is that there’s a significant and observable difference in the way teaching evaluations treat men versus women.

A new study I published with my co-author examines gender bias in student evaluations. We looked at the content of the comments in both the formal in-class student evaluations for his courses as compared to mine as well as the informal comments we received on the popular website Rate My Professors. We found that a male professor was more likely to receive comments about his qualification and competence, and that refer to him as “professor.” We also found that a female professor was more likely to receive comments that mention her personality and her appearance, and that refer to her as a “teacher.”

The comments weren’t the only part of the evaluation process we examined. We also looked at the ordinal scale ratings of a man and a woman teaching identical online courses. Even though the male professor’s identical online course had a lower average final grade than the woman’s course, the man received higher evaluation scores on almost every question and in almost every category.

Think back to that promotion that you only get if you turn in the comments section on that article someone wrote about you. If you’re a woman, your comments are going to talk about whether you’re nice or rude and whether you’re hot or ugly, while for men, the comments will talk about how qualified you are. And on a scale of 1–5, a man is going to receive ratings that are, on average, 0.4 points higher than a woman.

This is frustrating, perhaps more so given that we certainly are not the first study to look at the ways that student evaluations are biased against female professors. But we might be among the first to make the case explicitly that the use of student evaluations in hiring, promotion, and tenure decisions represents a discrimination issue. The Equal Employment Opportunity Commission exists to enforce the laws that make it illegal to discriminate against a job applicant or employee based on sex. If the criteria for hiring and promoting faculty members is based on a metric that is inherently biased against women, is it not a form of discrimination?

It’s not just women who are suffering, either. My newest work looks at the relationship between race, gender, and evaluation scores (initial findings show that the only predictor of evaluations is whether a faculty member is a minority and/or a woman), and other work has looked at the relationship between those who have accented English and interpersonal evaluation scores. Repeated studies are demonstrating that evaluation scores are biased in favor of white, cisgender, American-born men.

This is not to say we should never evaluate teachers. Certainly, we can explore alternate methods of evaluating teaching effectiveness. We could use peer evaluations (though they might be subject to the same bias against women), self-evaluation, portfolios, or even simply weigh the evaluation scores given to women by 0.4 points, if that is found to be the average difference between men and women across disciplines and institutions. But until we’ve found a way to measure teaching effectiveness that isn’t biased against women, we simply cannot use teaching evaluations in any employment decisions in higher education.