Competence impressions from faces affect important decisions, such as hiring and voting. Here, using data-driven computational models, we identified the components of the competence stereotype. Faces manipulated by a competence model varied in attractiveness (Experiment 1a). However, faces could be manipulated on perceived competence controlling for attractiveness (Experiment 1b); moreover, faces perceived as more competent but not attractive were also perceived as more confident and masculine, suggesting a bias to perceive male faces as more competent than female faces (Experiment 2). Correspondingly, faces manipulated to appear competent but not attractive were more likely to be classified as male (Experiment 3). When masculinity cues that induced competence impressions were applied to real-life images, these cues were more effective on male faces (Experiment 4). These findings suggest that the main components of competence impressions are attractiveness, confidence, and masculinity, and they reveal gender biases in how we form important impressions of other people.

First impressions from facial appearance are formed effortlessy and shape significant social outcomes (Todorov, 2017; Todorov, Olivola, Dotsch, & Mende-Siedlecki, 2015). Impressions of competence are especially important, because they influence decisions about leadership selection (Antonakis & Eubanks, 2017). Intuitive judgments of competence from faces, for instance, can predict the results of political elections (Antonakis & Dalgas, 2009; Ballew & Todorov, 2007; Lenz & Lawson, 2011; Olivola & Todorov, 2010; Todorov, Mandisodza, Goren, & Hall, 2005) and company executives’ compensation (Graham, Harvey, & Puri, 2017; Stoker, Garretsen, & Spreeuwers, 2016). It is important to understand the perceptual basis of these impressions, because people act on these impressions (e.g., choose their leaders on the basis of competence impressions) despite the dubious relationship between leaders’ actual competence and competence impressions from their faces (Stoker et al., 2016; Wyatt & Silvester, 2018).

Here, we investigated the visual ingredients of the competence stereotype. Facial attractiveness is one of these ingredients. Both empirical studies and computational models of facial impressions support the “halo effect” of attractiveness (Dion, Berscheid, & Walster, 1972; Landy & Sigall, 1974; Thorndike, 1920) on competence impressions. First, a meta-analysis showed a modest to strong association between attractiveness and perceived social and intellectual competence (Eagly, Ashmore, Makhijani, & Longo, 1991). Individuals with attractive faces are perceived as socially and occupationally competent (Dion et al., 1972; Landy & Sigall, 1974) and as having a higher social status (Webster & Driskell, 2015), which is strongly associated with perceived competence (Fiske, Cuddy, Glick, & Xu, 2002). In real-world data, judgments of competence and attractiveness from politicians’ faces (N = 244) are highly correlated (Olivola & Todorov, 2010). Second, data-driven models of facial impressions (Oosterhof & Todorov, 2008; Todorov & Oosterhof, 2011) show a strong similarity between models of competence and attractiveness (Todorov, Dotsch, Porter, Oosterhof, & Falvello, 2013). Because the two models exist in a common space, one can directly assess the similarities between them: The models of competence and attractiveness are indeed highly similar (ρ = .71), suggesting that people rely on attractiveness when forming impressions of competence.

We tested whether there are meaningful visual components other than attractiveness that contribute to competence impressions. Data-driven computational models of impressions (Todorov et al., 2013; Todorov & Oosterhof, 2011) are particularly suitable for addressing this question. Because the competence and attractiveness models are in the same statistical space, we can create a new competence model that is not confounded with attractiveness by either (a) making the new competence model statistically orthogonal (uncorrelated) to the attractiveness model or (b) forcing the new model to be negatively correlated with the attractiveness model by subtracting the attractiveness model from the competence model. To the extent that the new competence model (e.g., the resulting competence-minus-attractivness model) is meaningful, faces that are perceived as more competent should not be perceived as more attractive. More importantly, if the model still predicts competence impressions, then by inspecting this model, we can find out meaningful components of competence impressions that are not readily apparent.

One potential component of these impressions is facial masculinity. When asked to evaluate one’s self and others on multiple attributes, people evaluate men as more competent (Bem, 1974; Broverman, Vogel, Broverman, Clarkson, & Rosenkrantz, 1972; Spence, Helmreich, & Stapp, 1975) and more confident (Broverman et al., 1972; Spence et al., 1975) than women, on average. Further, the beliefs in the association between men and competence, confidence, and semantically similar traits (e.g., independence, inventiveness) are held across diverse cultures (Williams & Best, 1990). However, the influence of masculinity on competence impressions may not be immediately apparent in the model of competence, because attractiveness is highly positively correlated with feminine facial appearance in both genders (Perrett et al., 1998; Rhodes, Hickford, & Jeffery, 2000; Said & Todorov, 2011; but see Rhodes, 2006). By controlling for the attractiveness of faces, we can directly test whether masculinity contributes to competence impressions.

Following this logic, we tested for gender biases in competence impressions and uncovered multiple components underlying these impressions. In Experiment 1a, we showed that both judgments of attractiveness and competence change as faces are manipulated to look more competent by the standard competence model (Todorov et al., 2013). In Experiment 1b, we created a new model of competence by subtracting the model of attractiveness and showed that faces manipulated by this model to look competent are indeed perceived to be more competent but not more attractive. More importantly, we showed in Experiment 2 that these faces are also perceived as more masculine and confident. In Experiment 3, we showed that the competent-looking faces are more likely to be categorized as men than as women and that the incompetent-looking faces are more likely to be categorized as women. In Experiment 4, we extended our findings to real-life face images. We showed that whereas masculinity cues increase competence impressions of male faces, they increase competence impressions of female faces only up to a point, after which they decrease their perceived competence.

Experiment 1b The results of Experiment 1a show that facial attractiveness is a major ingredient of competence impressions. However, it is unclear whether there are other meaningful ingredients when attractiveness is not positively correlated with competence impressions. In Experiment 1b, we created a new model—that is, the difference between the competence and attractiveness models (referred to hereafter as the difference model)—and applied this model to new faces. Theoretically, this model should force judgments of competence and attractiveness to be negatively correlated. However, the mapping between the model space and the psychological judgment space may not be linear (Oosterhof & Todorov, 2008; we obtained results using a competence model orthogonal to the attractiveness model, too; see Experiment S1 in the Supplemental Material). To test how judgments of competence and attractiveness change as a function of the difference model, we asked participants to evaluate faces manipulated by this model on either competence or attractiveness. Method Participants One hundred twenty-five MTurk workers (72 men, 53 women; age: M = 37.05 years, range = 18–70) participated for payment. A power analysis using G*Power 3.1.9 indicated that a sample size of 45 participants per condition would afford 90% power to detect a medium effect (R2 = .20) of impression-manipulation levels in participant-level regressions. We expected a medium effect size because removing attractiveness from the competence model (the resulting difference model) should attenuate the effects of the model manipulation on judgments. Materials To create the difference model, we subtracted each of the 100 parameters defining the attractiveness model from each of the 100 parameters defining the competence model. To the extent that the difference model works, faces manipulated to be perceived as more competent should also be perceived as less attractive than faces manipulated to be perceived as less competent, or at the very least as attractive as them. The same 25 identities created for Experiment 1a were employed (see Fig. S1). Each identity was projected at −3, −2, −1, 0, 1, 2, and 3 standard deviations on the dimension of the difference model, resulting in 7 faces per identity. As a result, as in Experiment 1a, the complete set of stimuli consisted of 175 face images (25 identities × 7 manipulation levels). Procedure The procedure was identical to that used in Experiment 1a except that faces were created using the difference model. As we did in Experiment 1a, we excluded from further analyses the responses of any participants with test-retest reliability less than or equal to 0: 18 participants in the competence-rating condition and 17 participants in the attractiveness-rating condition. We recruited additional participants so that we had 45 participants with test-retest reliability greater than 0 per impression condition. The interrater reliabilities were high (competence: α = .82, attractiveness: α = .75). Results Linear and quadratic regression models were fitted for the impression ratings to test whether competence and attractiveness impressions tracked the difference-model manipulation. For the regressions, the impression ratings were averaged across participants (face-level analysis, n = 25 each impression) and across face identities (participant-level analysis, n = 45 each impression). The model fit was good for the competence judgments but not the attractiveness judgments, showing that only the competence judgments were well explained as a function of the manipulation level (see Fig. 2). As the model-manipulation level increased, ratings of competence increased, too, but ratings of attractiveness did not. The effect of the model manipulation on the competence and attractiveness ratings was consistent across face identities (see Figs. 2 and S2). However, the impression manipulation was far more impactful on the competence ratings than on the attractiveness ratings. The linear models explained 49% of the variance in the competence ratings, F(1, 173) = 163.04, p < .001, but only 6% of the variance in the attractiveness ratings, F(1, 173) = 11.61, p < .001. When we compared the coefficients from the regression models for competence and attractiveness, the manipulation induced a far bigger change in the competence ratings (b 1 = 0.21) than in the attractiveness ratings (b 1 = −0.06; z = 17.39, p < .001). This finding shows that the difference model was indeed capable of varying the faces’ perceived competence without varying their attractiveness ratings too much. If anything, more competent-looking faces were perceived to be less attractive. The quadratic model explained 66% of the variance in the competence ratings, F(2, 172) = 165.52, p < .001, but only 13% of the variance in the attractiveness ratings, F(2, 172) = 13.26, p < .001. The quadratic fits were better than the linear fits—competence: F(1, 172) = 86.98, p < .001; attractiveness: F(1, 172) = 14.03, p < .001. When we compared the coefficients from the regression models for competence and attractiveness ratings, the face-impressions manipulation again induced a bigger change in the competence ratings than in the attractiveness ratings for both the quadratic terms (competence: b 2 = −0.07, attractiveness: b 2 = −0.04; z = 4.40, p < .001) and linear terms (competence: b 1 = 0.79, attractiveness: b 1 = 0.23; z = 8.17, p < .001). The results were consistent with the analysis conducted at the level of participants (see Figs. S3 and S4). The linear models explained 8% of the variance in the competence ratings, F(1, 313) = 26.27, p < .001, but an insignificant amount (< 1%) of the variance in the attractiveness ratings, F(1, 313) = 1.36, p = .245. The quadratic models explained 11% of the variance in the competence ratings, F(2, 312) = 18.32, p < .001, but an insignificant amount (< 1%) of the variance in the attractiveness ratings, F(2, 312) = 1.45, p = .237. The results show that when facial cues of perceived competence are enhanced by the difference model, competence impressions increased but attractiveness impressions decreased (face-level analysis) or did not vary at all (participant-level analysis). This negative or null correlation between competence and attractiveness impressions contrasts with the high positive correlation between these impressions when facial cues of perceived competence were manipulated by the standard model (Experiment 1a; we obtained results using a competence model orthogonal to the attractiveness model, too. The orthogonal model could not control for the halo effect of attractiveness; see Experiment S1). These results show that perceived competence can be meaningfully manipulated, controlling for the halo effect of attractiveness.

Experiment 2 Visual inspection of the difference model (see Fig. 1) shows that as the faces increase in perceived competence, but not attractiveness, they express more confidence and look more masculine. This is consistent with prior research showing strong associations between competence impressions, confidence impressions, and gender (e.g., Spence et al., 1975), as well as research showing high correlations between femininity and attractiveness (e.g., Said & Todorov, 2011). To formally test whether facial confidence and masculinity underlie competence impressions, we asked participants to evaluate faces varying on perceived competence but not attractiveness on either masculinity or confidence. Method Participants Ninety-eight MTurk workers (48 men, 49 women, 1 nonbinary/other gender; age: M = 38.31 years, range = 21–70) participated for payment. A power analysis using G*Power 3.1.9 indicated that a sample size of 45 participants per condition would afford 90% power to detect a medium effect (R2 = .20) of impression-manipulation levels in participant-level regressions. Materials The same 25 identities created for Experiments 1a and 1b were employed (see Fig. S1). Each identity was projected at −3, −2, −1, 0, 1, 2, and 3 standard deviations on the dimension of the difference model, resulting in 7 faces per identity. As a result, as in Experiment 1b, the complete set of stimuli consisted of 175 face images (25 identities × 7 manipulation levels). Procedure The procedure was identical to that used in the previous experiments except that each participant was randomly assigned to make judgments of either confidence or masculinity. As in previous experiments, we excluded from further analyses the responses of any participants with test-retest reliability less than or equal to 0: 6 participants in the confidence-rating condition and 2 participants in the masculinity-rating condition. We recruited additional participants so that we had 45 participants with test-retest reliability greater than 0 per impression condition. The interrater reliabilities were high (confidence: α = .96, masculinity: α = .98). Results Linear and quadratic regression models were fitted for the impression ratings to test whether the confidence- and masculinity-impression ratings tracked the difference-model manipulation. For the regressions, the impression ratings were averaged across participants (face-level analysis, n = 25 each impression) and across face identities (participant-level analysis, n = 45 each impression). The fit was good across all models, showing that the judgments were well explained as a function of the manipulation level (see Fig. 3). Download Open in new tab Download in PowerPoint The effect of the model manipulation on the confidence and masculinity ratings was consistent across face identities (see Figs. 3 and S5 in the Supplemental Material). The linear model explained more than 85% of the variance in the ratings—confidence: R2 = .88, F(1, 173) = 1,283.66, p < .001; masculinity: R2 = .92, F(1, 173) = 2,045.29, p < .001. The quadratic model explained more than 85% of the variance—confidence: R2 = .90, F(2, 172) = 747.8, p < .001; masculinity: R2 = .94, F(2, 172) = 1,336.79, p < .001. The quadratic fits were better than the linear fits—confidence: F(1, 172) = 26.06, p < .001; masculinity: F(1, 172) = 49.92, p < .001. The results were similar when the analysis was conducted at the level of participants (see Figs. S6 and S7 in the Supplemental Material). The linear model explained more than 45% of the variance in the ratings—confidence: R2 = .45, F(1, 313) = 259.02, p < .001; masculinity: R2 = .73, F(1, 313) = 829.15, p < .001. The quadratic model also explained more than 45% of the variance—confidence: R2 = .46, F(2, 312) = 133.33, p < .001; masculinity: R2 = .74, F(2, 312) = 443.46, p < .001. The results show that when facial cues of perceived competence are enhanced by the difference model, both confidence and masculinity impressions increase. These relationships were expected from the visual inspection of the model (see Fig. 1) and the previous literature, as discussed earlier (we obtained similar results using a competence model orthogonal to the attractiveness model; see Experiment S2 in the Supplemental Material). Competence and attractiveness impressions are not positively correlated in the faces used here, as shown in Experiment 1b. It follows that the variance in the competence impressions cannot be attributed to the halo effect of attractiveness, which is a significant natural confound of competence impressions. Thus, the results show that confidence and masculinity cues are important ingredients of competence impressions—ingredients that cannot be explained as a by-product of attractiveness.

Experiment 3 Masculinity cues are strongly related to perceptions of gender, which suggests the presence of gender biases in competence impressions. To directly test whether people use gender-related facial cues to judge competence, we asked participants to categorize faces varying on perceived competence as male or female. We used faces manipulated by both the standard competence model and the difference model. Given the high positive correlation between competence impressions and masculinity, especially in the absence of the halo effect, we expected that (a) participants would be more likely to categorize the faces as male as the level of model manipulation increases, irrespective of the model (i.e., the standard competence or the difference model), and (b) this effect would be accentuated for faces generated by the difference model. Specifically, controlling for attractiveness, we expected that faces manipulated to be perceived as less competent would be more likely to be categorized as female. Method Participants Thirty-one MTurk workers (22 men, 9 women; age: M = 36.32 years, range = 20–58) participated for payment. A power analysis using G*Power 3.1.9 indicated that a sample size of at least 26 participants would afford 95% power to detect a small to medium effect (f = .25) of the main effects of manipulation level and model type, as well as the interaction between the two. Materials We used both the face images manipulated by the competence model and the face images manipulated by the difference model. This created a combined pool of 350 face-image stimuli (2 models × 25 identities × 7 manipulation levels). Procedure Participants were asked to make a forced choice of perceived gender for each face. All participants were exposed to faces from both the competence and difference models. Two versions of the study with the same length were created: Half of the participants were presented with 88 competence-model faces and 87 difference-model faces, whereas the other half were presented with 87 competence-model faces and 88 difference-model faces. There was no overlap in the face images between the two versions of the study. The 175 chosen stimuli were presented in random order to each participant. For each stimulus, the question asked was “What is the gender of this person?” presented with two options: male or female. Left and right arrow keys were used to indicate one or the other gender, and the gender–key mapping was counterbalanced. Before the experiment began, each participant was told to rely on gut instinct, not to spend too much time on each face, and that there were no right or wrong answers. Participants were given unlimited time. To assess intrarater reliability, we added 25 repeated trials randomly chosen from the first 175 trials in each study, bringing the total number of trials to 200. As we did in the previous experiments, we excluded the responses of participants with test-retest reliability less than or equal to 0: 1 participant. We recruited an additional participant so that we had 30 participants with test-retest reliability greater than 0. Results Overall, faces were more likely to be categorized as male than as female: The proportion of “male” responses to all faces (n = 350) averaged across participants was significantly higher than .5 (M = .79, SD = .31), t(349) = 17.55, p < .001. This may be mainly attributed to the fact that the faces were bald, which creates a strong bias to perceive faces as male. Nevertheless, as shown in Figure 4, as the competence-manipulation level increased in both models, the categorization of faces as male increased, too. Download Open in new tab Download in PowerPoint To test whether the perceived gender of faces tracked the model manipulations, we conducted a 7 (manipulation level) × 2 (model type) repeated measures analysis of variance on the proportion of “male” responses for each face. This analysis found that perceived gender varied as a function of both manipulation level and the type of model, as well as their interaction. First, faces were more likely to be categorized as male when they were manipulated to look more competent, irrespective of the model type, as indicated by a main effect of impression-manipulation level, F(6, 144) = 464.32, p < .001, η2 = .88. Second, faces were more likely to be categorized as male when they were manipulated by the competence model than when they were manipulated by the difference model, as indicated by a main effect of impression model, F(1, 24) = 796.55, p < .001, η2 = .69. Third, the difference model led to a much larger difference in the proportion of “male” categorization responses as a function of the manipulation level than the competence model did, as indicated by the interaction effect between manipulation level and impression model, F(6, 144) = 291.93, p < .001, η2 = .78. This interaction effect reveals that the two models had differential effects on gender perception. When faces were varied by the standard competence model, most of the faces were likely to be categorized as male, despite the main effect of the manipulation level. This bias to perceive the faces as male could be attributed to the fact that they were all bald. However, once the attractiveness of the faces was subtracted from the faces manipulated by the competence model, as in the faces manipulated by the difference model, gender-categorization responses changed dramatically. Whereas faces manipulated to be perceived as competent (but not more attractive) were categorized as male, faces manipulated to be perceived as less competent (but not less attractive) were categorized as female (we obtained similar results using a competence model orthogonal to the attractiveness model; see Experiment S3 in the Supplemental Material). This effect shows that after the positive covariance between attractiveness and competence impressions is visually removed, the variance in the masculinity cues becomes much more prominent in the faces.

Action Editor

Alice J. O’Toole served as action editor for this article. Author Contributions

A. Todorov developed the study concept. All the authors contributed to the study design. D. Oh and E. A. Buck collected and analyzed the data. All the authors wrote the manuscript and approved the final manuscript for submission. ORCID iD

DongWon Oh https://orcid.org/0000-0002-2105-3756 Declaration of Conflicting Interests

The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article. Supplemental Material

Additional supporting information can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797618813092 Open Practices

All data and materials have been made publicly available via the Open Science Framework and can be accessed at osf.io/ygzx3 and osf.io/86kfq, respectively. The complete Open Practices Disclosure for this article can be found at http://journals.sagepub.com/doi/suppl/10.1177/0956797618813092. This article has received the badges for Open Data and Open Materials. More information about the Open Practices badges can be found at http://www.psychologicalscience.org/publications/badges.