Another day, another study purporting to find that Tech Is Sexist. Since it’s showing up here, you probably already guessed how this is going to end. Most of this analysis is not original to me – Hacker News had figured a lot of it out before I even woke up this morning – but I think it’ll at least be helpful to collect all the information in one easily linkable place.

The study is Gender Bias In Open Source: Pull Request Acceptance Of Women Vs. Men. It’s a pretty neat idea: “pull requests” are discrete units of contribution to an open source project which are either accepted or rejected by the community, so just check which ones are submitted by men vs. women and whether one gender gets a higher acceptance rate than the other. This is a little harder than it sounds – people on GitHub use nicks that don’t always give gender cues – but the researchers wrote a program to automatically link contributor emails to Google Plus pages so they could figure out users’ genders.

This alone can’t rule out that one gender is genuinely doing something differently than another, so they had another neat trick: they wrote another program that automatically scored accounts on obvious gender cues: for example, somebody whose nickname was JaneSmith01, or somebody who had a photo of themselves on their profile. By comparing obviously gendered participants with non-obviously gendered participants whom the researchers had nevertheless been able to find the gender of, they should be able to tell whether there’s gender bias in request acceptances.

Because GitHub is big and their study is automated, they manage to get a really nice sample size – about 2.5 million pull requests by men and 150,000 by women.

They find that women get more (!) requests accepted than men for all of the top ten programming languages. They check some possible confounders – whether women make smaller changes (easier to get accepted) or whether their changes are more likely to serve an immediate project need (again, easier to get accepted) and in fact find the opposite – women’s changes are larger and less likely to serve project needs. That makes their better performance extra impressive.

So the big question is whether this changes based on obviousness of gender. The paper doesn’t give a lot of the analyses I want to see, and doesn’t make its data public, so we’ll have to go with the limited information they provide. They do not provide an analysis of the population as a whole (!) but they do give us a subgroup analysis by “insider status”, ie whether the person has contributed to that project before.

Among insiders, women do the same as men when gender is hidden, but better than men when gender is revealed. In other words, if you know somebody’s a woman, you’re more likely to approve her request than you would be on the merits alone. We can’t quantify exactly how much this is, because the paper doesn’t provide numbers, just graphs. Eyeballing the graph, it looks like being a woman gives you about a 1% advantage. I don’t see any discussion of this result, even though it’s half the study, and as far as I can tell the more statistically significant half.

Among outsiders, women do the same as/better than men when gender is hidden, and the same as/worse than men when gender is revealed. I can’t be more specific than this because the study doesn’t give numbers and I’m trying to eyeball confidence intervals on graphs. The study itself say that women do worse than men when gender is revealed, so since the researchers presumably have access to their real numbers data, that might mean the confidence intervals don’t overlap. From eyeballing the graph, it looks like the difference is 1% – ie, men get their requests approved 64% of the time, and women 63% of the time. Once again, it’s hard to tell by graph-eyeballing whether these two numbers are within each other’s confidence intervals.

The paper concludes that “for insiders…we see little evidence of bias…for outsiders, we see evidence of gender bias: women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable. There is a similar drop for men, but the effect is not as strong.”

In other words, they conclude there is gender bias among outsiders because obvious-women do worse than gender-anonymized-women. They admit that obvious-men also do worse than gender-anonymized men, but they ignore this effect because it’s smaller. They do not report doing a test of statistical significance on whether it is really smaller or not.

So:

1. Among insiders, women get more requests accepted than men.

2. Among insiders, people are biased towards women, that is, revealing genders gives women an advantage over men above and beyond the case where genders are hidden.

3. Among outsiders, women still get more requests accepted than men.

4. Among outsiders, revealing genders appears to show a bias against women. It’s not clear if this is statistically significant.

5. When all genders are revealed among outsiders, men appear to have their requests accepted at a rate of 64%, and women of 63%. The study does not provide enough information to determine whether this is statistically significant. Eyeballing it it looks like it might be, just barely.

6. The study describes its main finding as being that women have fewer requests approved when their gender is known. It hides on page 16 that men also have fewer requests approved when their gender is known. It describes the effect for women as larger, but does not report the size of the male effects, nor whether the difference is statistically significant. Eyeballing it, it looks about 2/3 the size of the female effect, and maybe?

7. The study has no hypothesis for why both sexes have fewer requests approved when their gender is known, without which it seems kind of hard to speculate about the significance of the phenomenon for one gender in particular. For example, suppose that the reason revealing gender decreases acceptance rates is because corporate contributors tend to use their (gendered) real names and non-corporate contributors tend to use handles like 133T_HAXX0R. And suppose that the best people of all genders go to work at corporations, but a bigger percent of men go there than women. Then being non-gendered would be a higher sign of quality in a man than in a woman. This is obviously a silly just-so story, but my point is that without knowing why all genders show a decline after unblinding, it’s premature to speculate about why their declines are of different magnitudes – and it doesn’t take much to get so small a difference.

8. There’s no study-wide analysis, and no description of how many different subgroup analyses the study tried before settling on Insiders vs. Outsiders (nor how many different definitions of Insider vs. Outsider they tried). Remember, for every subgroup you try, you need to do a Bonferroni correction. This study does not do any Bonferroni corrections; given its already ambiguous confidence intervals, a proper correction would almost certainly destroy the finding.

9. We still have that result from before that women’s changes are larger and less likely to serve immediate needs, both of which make them less likely to be accepted. No attempt was made to control for this.

“Science” “journalism”, care to give a completely proportionate and reasonable response to this study?

Here’s Business Insider: Sexism Is Rampant Among Programmers On GitHub, Research Finds. “A new research report shows just how ridiculously tough it can be to be a woman programmer, especially in the very male-dominated world of open-source software….it also shows that women face a giant hurdle of “gender bias” when others assess their work. This research also helps explain the bigger problem: why so many women who do enter tech don’t stick around in it, and often move on to other industries within 10 years. Why bang your head against the wall for longer than a decade?” [EDIT: the title has since been changed]

Here’s Tech Times: Women Code Better Than Men But Only If They Hide Their Gender: “Interestingly enough, among users who were not well known in the coding community, coding suggestions from those whose profiles clearly stated that the users were women had a far lower acceptance rate than suggestions from those who did not make their gender known. What this means is that there is a bias against women in the coding world.” (Note the proportionate and reasonable use of the term “far lower acceptance rate” to refer to a female vs. male acceptance rate of, in the worst case, 63% vs. 64%.)

Here’s Vice.com: Women Are Better At Coding Than Men: “If feminism has taught us anything, it’s that almost all men are sexist. As this GitHub data shows, whether or not bros think that they view women as equals, women’s work is not being judged impartially. On the web, a vile male hive mind is running an assault mission against women in tech.”

This is normally the part at which I would question how a study got through peer review, but luckily this time there is a very simple answer: it didn’t. If you read the study, you may notice the giant red “NOT PEER-REVIEWED” sign on the top of every page. The paper was uploaded to a pre-peer-review site asking for comments. The authors appear to be undergraduate students.

I don’t blame the authors for doing a neat study and uploading it to a website. I do blame the entire world media up to and including the BBC for swallowing it uncritically. Note that two of the three news sources above failed to report that it is not peer-reviewed.

Oh, one more thing. A commenter on the paper’s pre-print asked for a breakdown by approver gender, and the authors mentioned that “Our analysis (not in this paper — we’ve cut a lot out to keep it crisp) shows that women are harder on other women than they are on men. Men are harder on other men than they are on women.”

Depending on what this means – since it was cut out of the paper to “keep it crisp”, we can’t be sure – it sounds like the effect is mainly from women rejecting other women’s contributions, and men being pretty accepting of them. Given the way the media predictably spun this paper, it is hard for me to conceive of a level of crispness which justifies not providing this information.

So, let’s review. A non-peer-reviewed paper shows that women get more requests accepted than men. In one subgroup, unblinding gender gives women a bigger advantage; in another subgroup, unblinding gender gives men a bigger advantage. When gender is unblinded, both men and women do worse; it’s unclear if there are statistically significant differences in this regard. Only one of the study’s subgroups showed lower acceptance for women than men, and the size of the difference was 63% vs. 64%, which may or may not be statistically significant. This may or may not be related to the fact, demonstrated in the study, that women propose bigger and less-immediately-useful changes on average; no attempt was made to control for this. This tiny amount of discrimination against women seems to be mostly from other women, not from men.

The media uses this to conclude that “a vile male hive mind is running an assault mission against women in tech.”

Every time I say I’m nervous about the institutionalized social justice movement, people tell me that I’m crazy, that I’m just sexist and privileged, and that feminism is merely the belief that women are people so any discomfort with it is totally beyond the pale. I would nevertheless like to re-emphasize my concerns at this point.

[EDIT: I don’t have much of a quarrel with the authors, who seem to have done an interesting study and are doing the correct thing by submitting it for peer review. I have a big quarrel with “science” “journalists” for the way they reported it. If any of the authors read this and want my peer review suggestions, I would recommend:

1. Report gender-unblinding results for the entire population before you get into the insiders-vs.-outsiders dichotomy.

2. Give all numbers represented on graphs as actual numbers too.

3. Declare how many different subgroup groupings you tried, and do appropriate Bonferroni corrections.

4. Report the magnitude of the male drop vs. the female drop after gender-unblinding, test if they’re different, and report the test results.

5. Add the part about men being harder on men and vice versa, give numbers, and do significance tests.

6. Try to find an explanation for why both groups’ rates dropped with gender-unblinding. If you can’t, at least say so in the Discussion and propose some possibilities.

7. Fix the way you present “Women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable”, at the very least by adding the comparable numbers about the similar drop for men in the same sentence. Otherwise this will be the heading for every single news article about the study and nobody will acknowledge that the drop for men exists at all. This will happen anyway no matter what you do, but at least it won’t be your fault.

8. If possible, control for your finding that women’s changes are larger and less-needed and see how that affects results. If this sounds complicated, I bet you could find people here who are willing to help you.

9. Please release an anonymized version of the data; it should be okay if you delete all identifiable information.]