We know that men and women are often described differently in performance evaluations, and now we have more information on exactly what some of those differences are. Researchers analyzed a large-scale military dataset (over 4,000 participants and 81,000 evaluations) to examine objective and subjective performance measures. They found no gender differences in objective measures (e.g., grades, fitness scores, class standing), but the subjective evaluations were very different. Negative words (like selfish, passive, and scattered) were much more frequently applied to women. The specific words used to describe men and women also differed. The most commonly used positive term to describe men was analytical, while for women it was compassionate. The most commonly used negative term to describe men was arrogant, while for women, it was inept — even though men’s and women’s performances were objectively the same.

Phil Ashley/Getty Images

We like to think of ourselves as unbiased and objective in our employment decisions, but with two equal candidates, who are you going to promote? Someone who is described in their performance evaluations as analytical or someone who is described as compassionate? On the other end of the employment spectrum, if you’re downsizing and have to fire someone and the two people in jeopardy are very similar, who are you going to fire? Someone perceived as arrogant or someone perceived as inept? Leadership attributions in performance evaluations are powerful.

A unique and fascinating data set allowed us to explore the language used to describe individuals in subjective performance evaluations and provides evidence that, as we suspected, language in performance evaluations is applied differently to describe men and women. We analyzed a large-scale military dataset (over 4,000 participants and 81,000 evaluations) to examine objective and subjective performance measures that included a list of 89 positive and negative leadership attributes that were used to assess leader performance in a military leadership setting.

The military provides an interesting and significant setting to evaluate gender bias as it is a long-standing and traditionally male profession that has, over several decades, worked to eliminate formal gender segregation and discrimination. For performance evaluations specifically, the military has long been predicated on meritocratic ideals of fairness and justice providing equal opportunity regardless of demographics. The top-down enforcement of equal employment opportunity policies, hierarchical organization by military rank and not social status characteristics, and recent total gender integration in all occupations are hallmarks of meritocratic organizations where we might expect less gender bias in performance evaluations.

In our analysis we found no gender differences in objective measures (e.g., grades, fitness scores, class standing), which is consistent with prior research. However, the subjective evaluations provided a wealth of interesting findings.

For starters, in terms of sheer numbers of attributes, we found no gender difference in the number of positive attributes assigned, but women were assigned significantly more negative attributes.

We also looked at which specific attributes were more often assigned to men and to women. This gives us a better idea of how gendered language is employed in leader evaluations. The most commonly used positive term to describe men was analytical, while for women it was compassionate. At the other extreme, the most commonly used negative term to describe men was arrogant. For women, it was inept. We found statistically significant gender differences in how often these terms (and others) were used (relative to the other positive or negative terms available for selection) when describing men and women — even though men’s and women’s performances were the same by more objective measures.

Find this and other HBR graphics in our Visual Library

So what? Both “analytical” and “compassionate” reflect positively on the individual being evaluated. However, could one characterization be more valuable from an organizational standpoint? The term analytical is task-oriented, speaking to an individual’s ability to reason, to interpret, to strategize, and lending support to the objectives or mission of the business. Compassion is relationship-oriented, contributing to a positive work environment and culture, but perhaps of less value to accomplishing the work at hand. When considering who to hire, who to promote, or who to compensate, which person— with which attribute—takes the prize?

Likewise, who is retained and who is fired? An arrogant employee may have a character flaw–and a negative impact on his work environment—but may still be able to accomplish the task or job. An inept person, in contrast, is clearly not qualified and presumably on her way out.

Our research on leadership attributes found significant differences in the assignment of 28 leadership attributes when applied to men and women. While men were more often assigned attributes such as analytical, competent, athletic and dependable, women were more often assigned compassionate, enthusiastic, energetic and organized. Consistent with our results, societal attitudes suggest that women leaders are described as more compassionate (the most assigned attribute overall) and organized than men leaders. In contrast, women were more often evaluated as inept, frivolous, gossip, excitable, scattered, temperamental, panicky, and indecisive, while men were more often evaluated as arrogant and irresponsible.

These are not just words — they can have real-life implications for employees and organizations. Language in performance evaluations can tell us what is valued and what is not in an organization. Employees also know what is valued and make choices and decisions about how well they fit in an organization and their potential to advance.

Our research is in line with other studies that have found differences in formal feedback for men and women. Some studies have shown that women are more likely to receive vague feedback that is not connected to objectives or business outcomes, which is a disadvantage when women are competing for job opportunities, promotions, and rewards, and in terms of women’s professional growth and identity. And women leaders often get conflicting feedback — told on the one hand that they’re too bossy or aggressive, but on the other that they should be more confident and assertive. A huge body of work has found that when women are collaborative and communal, they are not perceived as competent—but when they emphasize their competence, they’re seen as cold and unlikable, in a classic “double bind.”

One of the things that’s ironic about our findings is that many of the leadership traits that people say they most appreciate, want in a leader, or make a successful leader are the positive traits — such as compassion — that women leaders receive in their performance evaluations. So why isn’t this translating into more women in these roles? It’s one thing to describe an ideal leader, it’s another to describe a real person’s performance without being influenced by stereotypes about their gender, or stereotypes about what a leader should be.

Because of widely held societal beliefs about gender roles and leadership, when most people are asked to picture a leader, what they picture is a male leader. Even when women and men behave in leaderly ways among peers — speaking up with new ideas, for example — it’s men who are seen as leaders by the group, not women. And as our study shows, even in this era of talent management and diversity and inclusion initiatives, our formal feedback mechanisms are still suffering from the same biases, sending subtle messages to women that they aren’t “real leaders”— men are.