By Jesse Singal

Diversity trainings are big business. In the United States, companies spend about £6.1 billion per year, by one estimate, on programmes geared at making companies more inclusive and welcoming to members of often-underrepresented groups (British numbers aren’t easy to come by, but according to one recent survey, over a third of recruiters are planning to increase their investment in diversity initiatives).

Unfortunately, there’s little evidence-backed consensus about which sorts of diversity programmes work, and why, and there have been long-standing concerns in some quarters that these programmes don’t do much at all, or that they could actually be harmful. In part because of this dearth of evidence, the market for pro-diversity interventions is a bit of a Wild West with regard to quality.

For a new paper in PNAS, a prominent team of researchers, including Katherine Milkman, Angela Duckworth, and Adam Grant of the University of Pennsylvania’s Wharton School, partnered with a large global organisation to measure the real-world impact of the researchers’ own anti-bias intervention, designed principally to “promote inclusive attitudes and behaviors toward women, whereas a secondary focus was to promote the inclusion of other underrepresented groups (e.g., racial minorities).” The results were mixed at best – and unfortunately there are good reasons to be sceptical that even the more positive results are as positive as they seem.

The company in question emailed 10,983 employees, inviting them to “complete a new inclusive leadership workplace training,” as the researchers sum it up. Out of those, 3,016 employees, 38.5 per cent of them American, answered the call and were assigned, in thirds, either to one of two online treatment groups or to a control group.

“In the gender-bias and general-bias trainings,” write the researchers, “participants learned about the psychological processes that underlie stereotyping and research that shows how stereotyping can result in bias and inequity in the workplace, completed and received feedback on an Implicit Association Test assessing their associations between gender and career-oriented words, and learned about strategies to overcome stereotyping in the workplace.” There was little difference between the so-called gender-bias and general-bias trainings, except for minor points of emphasis, and the researchers ended up collapsing them into one training condition, comparing the results to those in a control group who received a “stylistically similar training, but [with a focus on] psychological safety and active listening rather than stereotyping.” On average, the bias training took 68 minutes to complete, and was evaluated on the basis of responses to an attitudes questionnaire and real-world behavioural data taken up to 20 weeks post-intervention.

The key chart summarising all the researchers’ findings is shown at the top of this post. One thing that immediately jumps out is that the top row, which deals with attitudes, shows more impressive results than the bottom row, which deals with actual behavior (though even the top-row results are modest in terms of the size of their effect). But in looking at the attitudinal measures the researchers used, critical readers might fairly wonder whether they’re really measuring what they are supposed to be measuring – i.e. support for women in the workplace.

For example, the “attitudinal support for women” score was drawn from levels of (dis)agreement with items like “Discrimination against women is no longer a problem” and “Society has reached the point where women and men have equal opportunities for achievement.” But arguably, whether an individual believes equality has already been accomplished is different from whether they believe that women should be fully supported in the workplace. Someone can believe that discrimination against women is no longer a problem but still believe they should be treated in exactly the same way as men. These questionnaire statements seem to be more geared at measuring political liberalism or conservatism with regard to gender issues than misogynist attitudes directly. So the risk, then, is that the training successfully increases political liberalism (to a modest degree), but it’s not possible to tell from the questionnaire items whether it alters the kind of attitudes that are likely to more directly underlie discriminatory behaviour.

Arguably, there is also a problem with the “gender-bias acknowledgment” result, which was based on those participants in the training group admitting (more than the control participants) to having gender biases more similar to the average person. This finding is complicated by the fact that the bias training involved taking the gender-bias Implicit Association Test (IAT). Setting aside the other well-documented problems with the IAT, the gender IAT, in particular, has produced some strange results. As this chart from FiveThirtyEight showed in 2017, the normal “pattern” of IATs relating to minority or marginalised groups – in which conservatives (and members of majority groups) tend to score higher for implicit bias toward such groups than liberals (and members of minority groups) – is more or less reversed when it comes to the gender-bias IAT:

Specifically, results from the gender IAT suggest that women are more implicitly biased against women than men are, that this is the case at every point on the political spectrum, and that within many subgroups, political liberals are more biased against women than conservatives.

Setting aside these questionable patterns, a lot of people in the bias-training condition were likely told they were biased by their IAT result. So when they subsequently filled out a questionnaire asking them to gauge their level of anti-woman bias, were they simply echoing what a dubious test told them? It certainly seems like a possibility, and if so, should that count as an ethical, successful educational intervention?

Meanwhile, the behavioral results (based on participants’ choices of who to mentor, which new hires to speak to, and colleagues’ excellence to recognise) were far less impressive than the attitudinal ones. A lot of the effects were clustered right around zero and/or differed greatly on the basis of sex or nationality (which isn’t a good sign for programmes that are designed to be delivered to large, diverse groups of employees). The biggest effect involved an email invitation, 14 weeks after follow-up, to talk on the phone with either a male or a female new hire. There was no overall effect, but among women only, those in the bias-training group were more likely, at a statistically significant level, than those in the control condition, to agree to talk to newly hired women than men.

Where does all this leave us? It’s hard to say. The results definitely weren’t impressive, and the strongest came from questionable attitude measures and, potentially, participants’ responses to a controversial test of implicit bias. This outcome shouldn’t be too surprising: if you surveyed psychologists and asked them whether a single, 68-minute training could generate sizable behavioural effects weeks and months later, most would probably tell you no. Behaviour change is hard. Plus, even the modest results seen here were for employees who volunteered to take part, rather than a random sample. Yet diversity trainings, at their best, will have a positive effect on the general population of a company or institution, not just those who express a desire to take part.

That said, setting aside quibbles with this particular evaluation and programme, this study is laudable for showing exactly how diversity programmes should be evaluated: with an eye toward behavioural outcomes measured a significant period of time after the programme itself is delivered. Until there are more studies like this one, those who want to use diversity training to make companies and other institutions more welcoming have no option but to feel around in the dark for what works.

—The mixed effects of online diversity training

Post written by Jesse Singal (@JesseSingal) for the BPS Research Digest. Jesse is a contributing writer at BPS Research Digest and New York Magazine, and he publishes his own newsletter featuring behavioral-science-talk. He is also working on a book about why shoddy behavioral-science claims sometimes go viral for Farrar, Straus and Giroux.