Pre-clinical animal research is typically based on single laboratory studies conducted under highly standardized conditions, a practice that is universally encouraged in animal science courses and textbooks. But does this insistence on uniformity produce the most reliable results? In a new study publishing February 22 in the open access journal PLOS Biology, researchers from the Universities of Bern and Edinburgh ask whether the use of such highly standardized laboratory conditions runs the risk of getting results that only apply under very specific conditions, and show that this near-universal practice may actually help to explain the poor reproducibility of pre-clinical animal research. Instead, diversity may be better.

The authors used computer simulations based on 440 pre-clinical studies across 13 different treatments in animal models of stroke, heart attack, and breast cancer, and compared the reproducibility of results between single-laboratory and multi-laboratory studies. Their findings indicate that multi-laboratory studies -- or other ways of creating more diverse study samples -- can significantly improve the reproducibility of experimental results.

To simulate such multi-laboratory studies, the researchers combined data from multiple studies, as if several laboratories had conducted them in parallel. They found that single-laboratory studies produced greater variation between study results. In contrast, multi-laboratory studies, comprising as few as two to four laboratories, produced much more consistent results, thereby increasing reproducibility without a need for larger sample sizes. "Our findings demonstrate that standardization is a cause of poor reproducibility, as it ignores biologically relevant variation," says lead author Prof. Hanno Würbel, director of the Division of Animal Welfare at the University of Bern.

The scientists first selected 50 independent studies on the effect of body temperature management (hypothermia) on the infarct volume, an indicator of stroke severity, in rodent models of stroke. A meta-analysis of these 50 studies showed that hypothermia reduces stroke severity by about 50%, which they used as a yardstick against which they compared the accuracy and reproducibility of results from single- and multi-laboratory studies including two, three, or four laboratories. The proportion of studies that accurately predicted the 50% reduction of infarct volume increased from under 50% in single-lab studies to 73% in two-lab studies, to 83% in three-lab studies, and to 87% in four-lab studies. "This increase in the proportion of accurate study results with increasing numbers of laboratories reflects the improved reproducibility of results from multi-laboratory studies," says co-author Dr. Bernhard Voelkl.

The researchers then replicated the same analysis with 12 further treatments in animal models of stroke, heart attack, and breast cancer to assess whether their findings were generalizable; in all cases they found an increase in the accuracy and reproducibility with an increasing number of participating laboratories. They also simulated different sample sizes and found that rather than solving the problem, simply increasing sample size in single-laboratory studies made things worse, with larger sample sizes rendering the results even less accurate.

These findings demonstrate that standardization is a major cause of poor reproducibility in pre-clinical animal research. Poor reproducibility questions the benefit of animal experiments and creates the need for additional replicate experiments to be conducted in order to answer a given research question conclusively. "Our findings show that more representative study samples will improve the reproducibility of animal research and prevent wasting animals and other resources for inconclusive research," Hanno Würbel says. He further concludes, "Multi-laboratory studies should replace standardized single-laboratory studies as the gold standard, at least for late-phase preclinical trials." These improvements require neither many participating laboratories, nor larger sample sizes. Indeed, the greatest improvement in reproducibility was observed between single-laboratory studies and studies involving two labs.