If you are an academic and on social media, then over the last weekend your feed was probably full of mentions of an article by economist Justin Wolfers in the New York Times titled “A Family-Friendly Policy That’s Friendliest to Male Professors.”

It describes a study by three economists of the effects of parental tenure extension policies, which give an extra year on the tenure clock when people become new parents. The conclusion is that tenure extension policies do make it easier for men to get tenure, but they unexpectedly make it harder for women. The finding has a counterintuitive flavor – a policy couched in gender-neutral terms and designed to help families actually widens a gender gap.

Except there are a bunch of odd things that start to stick out when you look more closely at the details, and especially at the original study.

Let’s start with the numbers in the NYT writeup:

The policies led to a 19 percentage-point rise in the probability that a male economist would earn tenure at his first job. In contrast, women’s chances of gaining tenure fell by 22 percentage points. Before the arrival of tenure extension, a little less than 30 percent of both women and men at these institutions gained tenure at their first jobs.

Two things caught my attention when I read this. First, that a 30% tenure rate sounded awfully low to me (this is at the top-50 PhD-granting economics departments). Second, that tenure extension policies took the field from parity (“30 percent of both men and women”) to a 6-to-1 lopsided rate favoring men (the effects are percentage points, so it goes to a 49% tenure rate for men vs. 8% for women). That would be a humongous effect size.

Regarding the 30% tenure rate, it turns out the key words are “at their first jobs.” This analysis compared people who got tenure at their first job to everybody else — which means that leaving for a better outside offer is treated the same in this analysis as being denied tenure. So the tenure-at-first-job variable is not a clear indicator of whether the policy is helping or hurting a career. What if you look at the effect of the policy on getting tenure anywhere? The authors did that, and they summarize the analysis succinctly: “We find no evidence that gender-neutral tenure clock stopping policies reduce the fraction of women who ultimately get tenure somewhere” (p. 4). That seems pretty important.

What about that swing from gender-neutral to a 6-to-1 disparity in the at-first-job analysis? Consider this: “There are relatively few women hired at each university during the sample period. On average, only four female assistant professors were hired at each university between 1985 and 2004, compared to 17 male assistant professors” (p. 17). That was a stop-right-there moment for me: if you are an economics department worried about gender equality, maybe instead of rethinking tenure extensions you should be looking at your damn hiring practices. But as far as the present study goes, there are n = 62 women at institutions that never adopted gender-neutral tenure extension policies, and n = 129 at institutions that did. (It’s even worse than that because only a fraction of them are relevant for estimating the policy effect; more on that below). With a small sample there is going to be a lot of uncertainty in the estimates under the best of conditions. And it’s not the best of conditions: Within the comparison group (the departments that never adopted a tenure extension policy), there are big, differential changes in men’s and women’s tenure rates over the study period (1985 to 2004): Over time, men’s tenure rate drops by about 25%, and women’s tenure rate doubles from 12% to 25%. Any observed effect of a department adopting a tenure-extension policy is going to have to be estimated in comparison to that noisy, moving target.

Critically, the statistical comparison of tenure-extension policy is averaged over every assistant professor in the sample, regardless of whether the individual professor used the policy. (The authors don’t have data on who took a tenure extension, or even on who had kids.) But causation is only defined for those individuals in whom we could observe a potential outcome at either level of the treatment. In plain English: “How does this policy affect people” only makes sense for people who could have been affected by the policy — meaning people who had kids as assistant professors, and therefore could have taken an extension if one were available. So if the policy did have an effect in this dataset, we should expect it to be a very small one because we are averaging it with a bunch of cases that by definition could not possibly show the effect. In light of that, a larger effect should make us more skeptical, not more persuaded.

There is also the odd finding that in departments that offered tenure extension policies, men took less time to get to tenure (about 1 year less on average). This is the opposite of what you’d expect if “men who took parental leave used the extra year to publish their research” as the NYT writeup claims. The original study authors offer a complicated, speculative story about why time-to-tenure would not be expected to change in the obvious way. If you accept the story, it requires invoking a bunch of mechanisms that are not measured in the paper and likely would add more noise and interpretive ambiguity to the estimates of interest.

There were still other analytic decisions that I had trouble understanding. For example, the authors excluded people who had 0 or 1 publication in their first 2 years. Isn’t this variance to go into the didn’t-get-tenure side of the analyses? And the analyses includes a whole bunch of covariates without a lot of discussion (and no pre-registration to limit researcher degrees of freedom). One of the covariates had a strange effect: holding a degree from a top-10 PhD-granting institution makes it less likely that you will get tenure in your first job. This does make sense if you think that top-10 graduates are likely to get killer outside offers – but then that just reinforces the lack of clarity about what the tenure-in-first-job variable is really an indicator of.

But when all is said and done, probably the most important part of the paper is two sentences right on the title page:

IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character.

The NYT writeup does no such thing; in fact it goes the opposite direction, trying to draw broad generalizations and make policy recommendations. This is no slight against the study’s original authors – it is typical in economics to circulate working papers for discussion and critique. Maybe they’d have compelling responses to everything I said, who knows? But at this stage, I have a hard time seeing how this working paper is ready for a popular media writeup for general consumption.

The biggest worry I have is that university administrators might take these results and run with them. I do agree with the fundamental motivation for doing this study, which is that policies need to be evaluated by their effects. Sometimes superficially gender-neutral policies have disparate impacts when they run into the realities of biological and social roles (“primary caregiver” leave policies being a case in point). It’s fairly obvious that in many ways the academic workplace is not structured to support involved parenthood, especially motherhood. But are tenure extension policies making the problem worse, better, or are they indifferent? For all the reasons outlined above, I don’t think this research gives us an actionable answer. Policy should come from cumulative knowledge, not noisy and ambiguous preliminary findings.

In the meantime, what administrator would not love to be able to put on the appearance of We Are Doing Something by rolling back a benefit? It would be a lot cheaper and easier than fixing disparities in the hiring process, providing subsidized child care, or offering true paid leave. I hope this piece does not license them to do that.

Thanks to Ryan Light and Rich Lucas, who dug into the paper and first raised some of the issues I discuss here in response to my initial perplexed tweet.