As a postdoc at Cornell University in the late 2000s, with the aid of a few collaborators and an army of undergraduates, I carried out a big experiment that I’ll call the Cornell Experiment, testing whether female Drosophila melanogaster face a tradeoff between making babies and defending themselves against infectious disease. We decided to look at the relationship between these traits in two environments: one in which females were housed in vials supplemented with a copious amount of dietary yeast, and vials without the extra yeast.

Our reasoning for doing this environmental manipulation was that I had previously found, as part of my dissertation research, that yeast availability had a very pronounced effect on female immune function and fecundity: in the presence of excess yeast, females made more babies and were better at defending themselves against a bacterial infection. I’ll call this result the Riverside Experiment. I’m pretty proud of both of these papers, but that’s not why I’m writing.

It turns out, when I did what I thought was the exact same environmental manipulation of adding dietary yeast to vials to experimentally manipulate both immune function and female fecundity, I got entirely different results. In other words, in the Cornell Experiment I was unable to replicate one of the major results from my dissertation! Needless to say, I was a bit surprised. What could be going on?

Half of the result in the Cornell Experiment was consistent: females in vials with excess yeast had more babies. However, there was absolutely no difference in how well they defended themselves against a bacterial infection. We did use a different bacterium in the Cornell Experiment as well as a different population of D. melanogaster. Could these differences lead to such a dramatically different outcome? We thought this was unlikely.

Instead, we argued that the difference was likely due to the different fly food we used for the two experiments. In the Riverside Experiment, flies were kept on agar-cornmeal-molasses food, while in the Cornell Experiment they were kept on an agar-dextrose-yeast media. The reason we suspected that this difference in the food was the culprit was that females on the Cornell food were already having lots of babies when we didn’t add excess dietary yeast, while on the Riverside food, female fecundity was much lower in the absence of excess yeast. In the paper, we wrote: “In support of this hypothesis [that the difference between the two studies was the type of fly food used], the fecundity of females in yeast-unlimited condition in McKean and Nunney (2005) [the Riverside Experiment] was 6.7 times that of the yeast-limiting condition, compared to only a 2.5-fold increase seen in the present study.“

In a subsequent collaboration, the Canadian Experiment (which was never published), we again looked at the immune function of females kept in yeast-supplemented or regular vials. The Canadian Experiment was done with agar-cornmeal-molasses food, and as we predicted, we again saw that females in vials supplemented with yeast had significantly greater defense against a bacterial infection than less-well-fed females in regular vials.

If you’ve read this far, congratulations, because it’s pretty dry, a bit technical, and certainly some inside baseball. It’s also very important. It’s important because science is facing what many are calling a “reproducibility crisis.” A recent survey of 1,576 scientists found that 70 percent reported being unable to replicate a result from another research group, and 50 percent said they had also experienced an inability to replicate one of their own results.

For me, the lesson about reproducibility from the experiments at Riverside, Cornell, and Canada is that subtle environmental effects can profoundly affect experimental results. Many fly biologists are well aware of this. In my own lab, we kept track of when food was made and tested whether our phenotypes varied with different batches of food. When experimental flies were sampled from the same vial, we kept track of “rearing vial” in our statistical analyses and got experimental flies from as many different rearing vials as possible in order to reduce spurious results. We also kept track of, or tried to control for, the density of flies in vials, temperature, light and dark cycles, and the time of day when experiments were done.

Subtle environmental effects potentially affecting the reproducibility of results are, of course, not unique to studies with flies. Some of my favorite examples come from studies of mice and other rodents. For example, sex differences in immune function in voles depend on whether they are housed individually or in mixed- or same-sex pairs. In perhaps the most insidious example, a study found that male researchers induced a stress response in mice—a response not seen in mice handled by female researchers.

Given that the food used, or the vial a fly came from, or whether the mouse had a cage mate, or even whether a man or a woman is doing the study can affect experimental results, it is perhaps not surprising that reproducibility is difficult. So what should we do? As a researcher, it is important to be conscious of subtle environmental effects and to control for them as much as possible. It’s also important that we start including fuller descriptions of how plants, animals, and cells are handled in the lab, so that groups attempting to replicate a result can do as much as possible to recreate precise experimental conditions. Lastly, we need more programs like the Reproducibility Initiative, which received grant money to set out and verify results from 50 landmark cancer studies.

Of course there are potentially many other causes of the reproducibility crisis, but understanding the contribution of subtle environmental differences will improve our science and allow us to make more robust and generalizable conclusions.