The lack of reproducibility of preclinical experimentation has implications for sustaining trust in and ensuring the viability and funding of the academic research enterprise. Here I identify problematic behaviors and practices and suggest solutions to enhance reproducibility in translational research.

Main Text

As I contemplated the content of my lecture at a recent Keystone symposium, the potential topics to be addressed were tantalizing. The theme of this Keystone meeting, “New Therapeutics for Diabetes and Obesity,” was nicely aligned with the interests of my lab and our studies examining the therapeutic potential of gut hormones. On the program was a litany of leading scientists discussing the most exciting advances in metabolic disease research, encompassing brown fat, central nervous system control of glucose and body weight, advances in islet biology, stem cells, peptide therapeutics, mouse and human genetics, immunotherapy, and neuromodulation. There were even lectures devoted to discussing existing and novel preclinical animal models widely used to test promising pathways and compounds for efficacy in treating experimental diabetes and obesity. There has rarely been a more exciting time to unravel the molecular mechanisms and pathways controlling energy intake and assimilation, and the abnormalities in these pathways that predispose us to the development of diabetes and obesity.

In the back of my mind however, was the ever-present nagging voice of sobering reality. The voice reminds me, a clinician scientist, that the vast majority of genes and proteins and pathways and targets that provide impressive and exciting results in preclinical studies on a daily basis (and a justifiable number of exciting high-profile widely publicized publications) usually do not survive the difficult path toward rigorous target validation and ultimate clinical development. Although the joy of an exciting result in preclinical studies need not be extinguished by the stark reality that many findings will simply not be reproducible in animals, let alone translatable in human studies, we have developed a culture of hype and exaggerated expectations that often fall far short of the promises made. These reproducibility challenges are not confined to the study of metabolism and are independent of the much less common, but equally worrisome, issue of scientific misconduct, which continues, like the Lernean Hydra, to raise its ugly head on a daily basis no matter how hard the community tries to extinguish egregious scientific behavior.

Prinz et al., 2011 Prinz F.

Schlange T.

Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets?. Collins and Tabak, 2014 Collins F.S.

Tabak L.A. Policy: NIH plans to enhance reproducibility. Landis et al., 2012 Landis S.C.

Amara S.G.

Asadullah K.

Austin C.P.

Blumenstein R.

Bradley E.W.

Crystal R.G.

Darnell R.B.

Ferrante R.J.

Fillit H.

et al. A call for transparent reporting to optimize the predictive value of preclinical research. Landis et al., 2012 Landis S.C.

Amara S.G.

Asadullah K.

Austin C.P.

Blumenstein R.

Bradley E.W.

Crystal R.G.

Darnell R.B.

Ferrante R.J.

Fillit H.

et al. A call for transparent reporting to optimize the predictive value of preclinical research. Reproducibility issues in basic science are now being debated regularly. Published surveys from industry colleagues routinely highlight difficulties in validation or reproduction of major research findings from academic laboratories (). The National Institutes of Health leadership has identified a number of remediable problems worthy of correction that will address experimental design, pitfalls in animal experimentation, use of appropriate statistics, transparency of methodology, and over-interpretation of data (). Indeed, agencies within the NIH have convened conferences to discuss reproducibility challenges plaguing preclinical research providing constructive guidance and recommended reporting standards designed to enhance transparency and reproducibility (). These recommendations include calls for random assignment of animals, blinding of preclinical treatment groups, more rigorous sample and effect size calculations, and formal rules for handling of data involving outliers, pre-specified primary and secondary endpoints, and replication of key experimental findings (). While laudable, these recommendations have not been formally adopted by journals, and reading the metabolism literature suggests that the majority of laboratories do not strictly adhere to these “best practices.”

Goodman et al., 2016 Goodman S.N.

Fanelli D.

Ioannidis J.P. What does research reproducibility mean?. Herein, I discuss challenges and issues that might account for some of the chasm between our astounding collective success in explaining, treating, and often vanquishing metabolic disease in preclinical studies, and our meager success rate in moving most of these discoveries from bench to bedside. The recurring spectacular preclinical discovery paradigm is often highlighted at Keystone or related meetings, prompting me to consider our collective expensive, often disappointing, inability to reproduce and translate the majority of early-stage exciting research into therapeutic interventions with clinical utility. Even attempting to precisely define reproducibility can be contentious, as the term embodies different concepts, accepted ranges of variability, and varying criteria that differ across fields and scientific communities (). Using examples from our own area of gut hormone research, I illustrate anecdotally some commonly encountered challenges and pitfalls in the design and interpretation of experimental data. Finally, in an attempt to stimulate discussion, I provide examples of and suggestions for fueling ongoing efforts to improve and standardize preclinical research, so our own community may find greater success in the pursuit of preclinical reproducibility and, ultimately, enhanced bench-to-bedside translatability.

Execution and Reporting of Clinical versus Preclinical Studies Human clinical trials are often carried out using a randomized double-blinded design, and outliers, or suboptimal responders, are not discarded from the analysis. It is expected that clinical researchers will ideally account for and report on every single study subject screened, and ultimately enrolled in a clinical trial, even if subjects drop out or move away. Non-responders are not simply discarded from the trial results, and there are statistical methods employed to account for study subjects who may not complete the entire study. Both efficacy and safety are critically scrutinized, and the primary and secondary outcomes must be carefully stated in advance, thus minimizing extensive post hoc number crunching to find statistically significant unexpected outcomes. Moreover, many large clinical trials study large numbers of genetically diverse subjects, from different regions of the world, both male and female, often including a wide range of ages. While it is certainly true that clinical trials reporting negative results are more slowly reported, and sometimes not at all, efforts to improve reporting, such as the initiative embodied within alltrials.net , are likely to produce further improvements in comprehensive reporting. It would be viewed as completely unacceptable for an academic clinical trials unit to carry out dozens of clinical studies in human subjects and only report the most promising results from studies that produced a favorable outcome. Figure 1 Proposed Template for Reporting Animal Use and Analysis in Preclinical Studies Show full caption Designated the Consolidated Standards of Animal Experiment ReporTing (CONSAERT) flow diagram. Contrast this situation with current norms and expectations for preclinical studies and research in animals. In many discovery-focused basic science laboratories, we seek to understand disease pathophysiology, often honing in on a small number of genes, proteins, and pathways that receive intensive scrutiny. The research is frequently exploratory and open-ended, with large numbers of variables and endpoints simultaneously quantified. It is common practice to perform dozens, if not hundreds, of experiments in cells, mice, and rats, yet often only a small subset of the data is reported and made available for scrutiny. Not surprisingly, the majority of published manuscripts contain the most promising and exciting results that “worked the best” or generated data and yielded mechanisms consistent with the prevalent hypothesis. What happens to the dozens of experiments that “did not work,” a euphemism meaning that the results of the studies did not turn out the way the scientists wanted and/or did not support the story being assembled in the laboratory? Rather than accepting that our theories may be incorrect or not important, or that our favorite molecule may not produce robust and reproducible actions in multiple models, we frequently soldier on, trying different time points, concentrations, conditions, and animal models, until we get just the “right result,” which ends up in a figure in our paper. A majority of negative, contradictory, or divergent results may never see the light of the day and are not reported. How would scientists (and journal reviewers and editors) react to a proposal mandating accounting for and reporting of all results in all animal studies, such as a formal standardized structured report ( Figure 1 ) perhaps included as a mandatory online appendix, accessible to the reviewer and, ultimately, the reader? This transparency would likely prove sobering and for some would be an unwelcome obligation, yet it might frame the more promising results presented in a more realistic light and broader context. There is a great deal to be said for exploratory discovery research, uncovering findings and mechanisms that account for some, but not all, of the pathophysiology in a particular disease model. However, our temptation to generalize and elevate the significance of our positive results likely contributes to future challenges that arise when other scientists, using slightly different methods, doses, conditions, and animals, cannot reproduce the results we publish.

How Reproducible Is Basic Science Research? Woods and Begg, 2015 Woods S.C.

Begg D.P. Food for Thought: Revisiting the Complexity of Food Intake. Many scientists might scoff with indignation if someone questions reproducibility of their own research; after all, this “reproducibility issue” is usually someone else’s problem. Nevertheless, some honest individuals have thoughtfully discussed challenges inherent in reproducing observations within their own lab, let alone across laboratories (). The available evidence from multiple fields clearly indicates that we have a systemic reproducibility problem in basic science research. Below, I review some of the contemporary issues that contribute to reproducibility challenges within preclinical metabolic research. It is likely that many of the problems to be confronted are relevant to many others outside the field of metabolism, and the issues raised may resonate with a broader community of basic scientists. Cell Lines Considerable debate has focused on the identification and reliability of cell lines and, while progress has been made in this area, the problem continues to fester. As a postdoctoral fellow in the mid-1980s, I was excited to have isolated, with colleagues, a new human glucagon-producing cell line. We were convinced this would be an invaluable reagent for study of human glucagon biosynthesis and secretion, and we had assembled a good many figures for our envisioned paper. Like many things in life, what seemed too good to be true actually was; analysis of genomic DNA from my “human cells” revealed the presence of repeated DNA sequences from both human and rat DNA. It turned out that our “human glucagonoma” cell line was likely a mixture of HeLa cells and our new RIN1056A glucagon-producing cell line, and the party was over. Several years later, we also discovered mycoplasma contamination of our hamster glucagon-producing cell line and wasted several valuable months redoing key experiments after re-deriving “mycoplasma-free” InR1G9 hamster glucagonoma cells. In hindsight, it was a valuable learning experience to identify, early on, the pitfalls of using incompletely characterized or infected cell lines for basic science studies. Freedman et al., 2015 Freedman L.P.

Gibson M.C.

Ethier S.P.

Soule H.R.

Neve R.M.

Reid Y.A. Reproducibility: changing the policies and culture of cell line authentication. Freedman et al., 2015 Freedman L.P.

Gibson M.C.

Ethier S.P.

Soule H.R.

Neve R.M.

Reid Y.A. Reproducibility: changing the policies and culture of cell line authentication. Surprisingly, however, although considerable effort has been devoted to recognition of the problems associated with cell line identification and verification (), there is scant evidence that scientists have routinely adopted these guidelines to ensure the fidelity and rigor of their own cell line research. The origin and sourcing of cell lines is often inadequately described in publications, further challenging efforts to reproduce published data. While differences in source of cell line, passage number, cell density, cell culture conditions, and experimental technique may logically account for some degree of variability, many publications provide inadequate details that preclude careful reproduction of cell line experiments. Indeed, the trend in many journals is toward minimizing the length of methods sections. The regular use of simple quality control techniques to verify cell line identity and potential contamination would greatly enhance the validity of cell line data, yet many institutions, granting agencies, and journals do not regularly insist on obligatory detailed cell line reporting (). With the emerging use of stem cell-derived cells for the study of islet biology, it seems likely that very precise detailed disclosure of the exact composition of cell culture media and all essential exogenous additives and growth factors, coupled with specification of the gender and age of origin as well as extensive molecular characterization and footprinting criteria, will be needed to ensure replication of stem cell-derived cells within and across laboratories. Antibodies-Quicksand for the Non-curious Drucker, 2013 Drucker D.J. Incretin action in the pancreas: potential promise, possible perils, and pathological pitfalls. Equally vexatious is the ongoing crisis promulgated by use of antibodies that have not been properly validated and, as a result, generate irreproducible or incorrect data due to lack of sensitivity and/or problems with specificity. This challenge extends to all fields of research that use antibodies, and every researcher has their own story with “problematic antibodies.” In the incretin field, there are dozens of published papers using commercial antibodies employed to detect the GLP-1 receptor; our own laboratory experience, regrettably, is that most of these antibodies do not detect the GLP-1 receptor. The use of non-specific antibodies, together with studies employing small numbers of animals and inadequate controls for animal and histology experiments, has led to significant misconceptions in the incretin field and supported, in part, the imbroglio surrounding GLP-1R expression in normal and neoplastic pancreatic cells (). Gore, 2013 Gore A.C. Editorial: antibody validation requirements for articles published in endocrinology. Panjwani et al., 2013 Panjwani N.

Mulvihill E.E.

Longuet C.

Yusta B.

Campbell J.E.

Brown T.J.

Streutker C.

Holland D.

Cao X.

Baggio L.L.

Drucker D.J. GLP-1 receptor activation indirectly reduces hepatic lipid accumulation but does not attenuate development of atherosclerosis in diabetic male ApoE(-/-) mice. Pyke et al., 2014 Pyke C.

Heller R.S.

Kirk R.K.

Ørskov C.

Reedtz-Runge S.

Kaastrup P.

Hvelplund A.

Bardram L.

Calatayud D.

Knudsen L.B. GLP-1 receptor localization in monkey and human tissue: novel distribution revealed with extensively validated monoclonal antibody. Pyke and Knudsen, 2013 Pyke C.

Knudsen L.B. The glucagon-like peptide-1 receptor--or not?. Panjwani et al., 2013 Panjwani N.

Mulvihill E.E.

Longuet C.

Yusta B.

Campbell J.E.

Brown T.J.

Streutker C.

Holland D.

Cao X.

Baggio L.L.

Drucker D.J. GLP-1 receptor activation indirectly reduces hepatic lipid accumulation but does not attenuate development of atherosclerosis in diabetic male ApoE(-/-) mice. Drucker and Yusta, 2014 Drucker D.J.

Yusta B. Physiology and pharmacology of the enteroendocrine hormone glucagon-like peptide-2. Following several publications and editorials () highlighting the major problems and inadequacies associated with the use of flawed GLP-1R antisera, one might have hoped that dissemination of this information through publications and discussions at meetings might lead to elimination of the problem. Indeed, many editorials continue to highlight the importance of extensive antibody validation. Sadly, although our paper describing problems with the sensitivity and specificity of GLP-1R antisera appeared online in November 2012, I estimate that about every other week I still read another new publication reporting data using suspect or incompletely characterized GLP-1R antisera (). What does this say about the thoroughness and credibility of our community of reviewers, editors, and scientific colleagues? Not surprisingly, we also encounter substantial problems with sensitivity and specificity of antisera widely used to detect the GLP-2 receptor (), and our unpublished studies raise similar issues in regard to the specificity of antibodies for the glucagon and GIP receptors. Freedman et al., 2016 Freedman L.P.

Gibson M.C.

Bradbury A.R.M.

Buchberg A.M.

Davis D.

Dolled-Filhart M.P.

Lund-Johansen F.

Rimm D.L. [Letter to the Editor] The need for improved education and training in research antibody usage and validation practices. Baker, 2015 Baker M. Reproducibility crisis: Blame it on the antibodies. Bradbury and Plückthun, 2015 Bradbury A.

Plückthun A. Reproducibility: Standardize antibodies used in research. The pervasive problem associated with inadequate antibodies has received widespread attention; however, both experienced and junior investigators still fail to routinely characterize antibodies used in their laboratories. A survey of over 500 scientists revealed that less than half of junior investigators (<5 years out) reported validating antibodies. This data speaks to our ongoing need as a research community to properly educate our trainees in the appropriate characterization, validation, and use of antibodies for research purposes (). How can the antibody problem be solved? The increasingly frank public discussion of problematic antibodies, coupled with publicly available databases documenting inadequate or appropriate antibody validation (), represents important steps in energizing the community to pay more attention to the quality of antibodies. The use of CRISPR technology enables rapid generation of “knockout” cells, facilitating assessment of antibody specificity. While some scientists and a few antibody companies have enjoined discussions to more rigorously characterize antibody sensitivity and specificity (), including calls for use of DNA sequence verification and use of strictly recombinant antibodies, progress has been slow. Nevertheless, there seems to be a clear commercial opportunity for the next-generation antibody company that works to develop a reputation for excellence in antibody characterization, a reputation not currently enjoyed by any of the companies presently operating in the space.

Animal Models for Metabolism Research Fuchsberger et al., 2016 Fuchsberger C.

Flannick J.

Teslovich T.M.

Mahajan A.

Agarwala V.

Gaulton K.J.

Ma C.

Fontanillas P.

Moutsianas L.

McCarthy D.J.

et al. The genetic architecture of type 2 diabetes. Wang et al., 2014 Wang B.

Chandrasekera P.C.

Pippin J.J. Leptin- and leptin receptor-deficient rodent models: relevance for human type 2 diabetes. Vasandani et al., 2016 Vasandani, C., Clark, G., Adams-Huet, B., Quittner, C., and Garg, A. (2016). Efficacy of Metreleptin in Patients with Type 1 Diabetes. In 2016 Annual Meeting of the American Diabetes Association (New Orleans). Akita mouse for studies of β cell failure and diabetic nephropathy independent of obesity and insulin resistance. While the Ins2Akita mouse is an excellent animal model for analysis of endoplasmic reticulum stress and β cell failure, mutations within the human insulin gene do not make significant contributions to the genetic risk for development of T2D. Although much more expensive and time consuming, several months of high-fat feeding in diabetes-prone mice or rats is more likely to recapitulate many of the features and natural history evident in human subjects with slowly progressive weight gain who ultimately develop T2D. Many scientists study the pathophysiology of diabetes using animal models, with a view to development of novel therapeutic agents. In some instances, we use these animal models to interrogate mechanisms and identify key genes and proteins underlying islet dysfunction, disordered hepatic glucose production, or insulin resistance arising through disturbances in signaling within muscle, adipose tissue, brain, and other organs. Alternatively, we may already have developed promising new therapeutic agents for the treatment of diabetes or obesity and seek to test their efficacy in representative animal models. Surprisingly, despite broad awareness that the majority of risk for development of human diabetes arises through small contributions from dozens of genes with modest effect sizes (), the diabetes research community continues to heavily utilize predominantly monogenic models of disease, including the Akita, ob/ob, and db/db mouse, and corresponding rat models such as the Zucker fatty and Zucker diabetic fatty rat. While the seminal metabolic importance of the leptin signaling pathway is beyond dispute, the genes encoding leptin or the leptin receptor are not associated with the risk of developing type 2 diabetes (T2D) in human population studies. While these rodent models recapitulate some, if not many, of the features encountered in human subjects with diabetes, obesity, and insulin resistance, they are likely suboptimal models with which to study potential therapies for T2D. Most of these animal models develop diabetes and obesity at accelerated rates, some associated with rapid development of β cell failure, quite unlike the indolent slowly progressive natural history of human T2D (). Although administration of leptin is strikingly effective in ameliorating diabetes in leptin-sensitive mice and rats, the same cannot be said for the efficacy of leptin in human subjects with T2D or in insulin-treated subjects with type 1 diabetes (T1D) (). Similar challenges surround the extensive use of the Ins2mouse for studies of β cell failure and diabetic nephropathy independent of obesity and insulin resistance. While the Ins2mouse is an excellent animal model for analysis of endoplasmic reticulum stress and β cell failure, mutations within the human insulin gene do not make significant contributions to the genetic risk for development of T2D. Although much more expensive and time consuming, several months of high-fat feeding in diabetes-prone mice or rats is more likely to recapitulate many of the features and natural history evident in human subjects with slowly progressive weight gain who ultimately develop T2D. Clayton and Collins, 2014 Clayton J.A.

Collins F.S. Policy: NIH to balance sex in cell and animal studies. Harwood et al., 2012 Harwood Jr., H.J.

Listrani P.

Wagner J.D. Nonhuman primates and other animal models in diabetes research. Equally problematic may be the extensive reliance on male mice and rats, as development of hyperglycemia and diabetes emerges less frequently in female mice in widely used strains. Notwithstanding directives from NIH and other granting agencies to ensure balanced use of both male and female animals and cell lines in preclinical studies (), the impact and ultimate success of these initiatives remains unclear. Given the widespread epidemic of diabetes and obesity in women, how well do findings made predominantly in studies of male mice and rats with diabetes predict efficacy in future translational studies of women? Although the use of non-human primates for diabetes research might yield data more easily reproduced in human clinical studies (), the limitations and expense inherent in these studies suggest that they are likely to be valuable for confirmatory, rather than exploratory or discovery, experiments.

How Reproducible Are Mouse Experiments? Kilkenny et al., 2009 Kilkenny C.

Parsons N.

Kadyszewski E.

Festing M.F.

Cuthill I.C.

Fry D.

Hutton J.

Altman D.G. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. Hrabě de Angelis et al., 2015 Hrabě de Angelis M.

Nicholson G.

Selloum M.

White J.K.

Morgan H.

Ramirez-Solis R.

Sorg T.

Wells S.

Fuchs H.

Fray M.

et al. EUMODIC Consortium

Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics. The reproducibility of animal data may reflect challenges with experimental design, small numbers, inadequate statistical analyses, or failure to communicate sufficient information to allow careful repetition of the experiments (). Differences among animal facilities including noise, bedding, diet, water supply, circadian rhythms, light/dark cycles, and gut microbial populations can greatly influence murine metabolic phenotypes. Indeed, even within experienced consortia dedicated to the detailed and careful standardization of mouse phenotyping across sites, considerable variation in phenotyping may still exist within and between centers (). As someone who spends a great deal of time studying the function of the gastrointestinal tract, variability in phenotypic responses in different mouse experiments is not a trivial issue. Our laboratory has repeatedly encountered scenarios, often involving studies of gut inflammation, where we cannot precisely reproduce our own phenotypes or those reported by others often in papers published in very good journals. Whether it is in the extent of local and systemic inflammation induced by varying doses of lipopolysaccharide, the timing and magnitude of the development of inflammatory bowel disease in the Il10−/− mouse, the analysis of intestinal permeability, barrier function, and systemic inflammation following gut injury or high fat feeding, or the evanescent nature of gut inflammation that may be inconsistent from one study to another within our own laboratory, quantitative reproducibility is a recurring challenge for some of our experiments. We experienced these reproducibility challenges when we moved our laboratory across the street from the Toronto General Hospital to the Mount Sinai Hospital about 10 years ago. After re-deriving mouse lines and reanalyzing several of our most exciting gut phenotypes, we were stunned and disappointed to note that a few of our most exciting observations made in one mouse facility had simply failed to transfer and were no longer evident when we moved to a new animal facility across the street. Claus et al., 2008 Claus S.P.

Tsang T.M.

Wang Y.

Cloarec O.

Skordi E.

Martin F.P.

Rezzi S.

Ross A.

Kochhar S.

Holmes E.

Nicholson J.K. Systemic multicompartmental effects of the gut microbiome on mouse metabolic phenotypes. Ussar et al., 2015 Ussar S.

Griffin N.W.

Bezy O.

Fujisaka S.

Vienberg S.

Softic S.

Deng L.

Bry L.

Gordon J.I.

Kahn C.R. Interactions between Gut Microbiota, Host Genetics and Diet Modulate the Predisposition to Obesity and Metabolic Syndrome. While it is widely recognized that germ-free mice exhibit profound metabolic differences compared to genetically identical, yet conventionally raised, animals (), the impact of subtle changes in gut microbial populations may not always be accounted for when interpreting phenotypes. Not all laboratories have the intellectual curiosity and resources required to untangle the complex relationships between genetic strains, animal facility environments, varying microbial populations and specific diets, interactions that may produce important differences in key metabolic phenotypes within the same mouse line (). The importance of intestinal microbial dysbiosis is a research area potentially relevant to the pathophysiology and treatment of murine diabetes, obesity, and insulin resistance. Nevertheless, whether many exciting observations implying causality for dysbiosis can be reproducibly translated in humans with metabolic disorders remains uncertain.

Mice Are Not Always Good Models for Studying Disease Pathophysiology Relevant to Humans Begley and Ellis, 2012 Begley C.G.

Ellis L.M. Drug development: Raise standards for preclinical cancer research. Libby, 2015 Libby P. Murine “model” monotheism: an iconoclast at the altar of mouse. Teufel et al., 2016 Teufel A.

Itzel T.

Erhart W.

Brosch M.

Wang X.Y.

Kim Y.O.

von Schönfels W.

Herrmann A.

Brückner S.

Stickel F.

et al. Comparison of Gene Expression Patterns Between Mouse Models of Nonalcoholic Fatty Liver Disease and Liver Tissues From Patients. Donath, 2014 Donath M.Y. Targeting inflammation in the treatment of type 2 diabetes: time to start. Donath, 2016 Donath M.Y. Multiple benefits of targeting inflammation in the treatment of type 2 diabetes. If substantial challenges limit the universal reproducibility of many preclinical studies, even greater obstacles arise in reproducing and translating key findings from preclinical studies to humans. Much has been written about the poor reproducibility of preclinical studies in oncology () and cardiovascular biology (); however, the field of translational metabolism may not be so different. Inflammation is a powerful driver of murine metabolic phenotypes such as islet dysfunction, non-alcoholic fatty liver disease, atherosclerosis, and insulin resistance. The extent to which inflammation similarly drives the pathophysiology of these disorders in humans is more challenging to ascertain. Evaluation of gene expression profiles in multiple distinct mouse models of liver inflammation versus profiling of RNA from human liver biopsies obtained from patients with NASH revealed substantial inter-species differences and surprisingly little overlap in gene expression profiles (). Similar species-specific differences have become evident in studies of anti-inflammatory interventions for the treatment of diabetes, which often produce robust resolution of experimental insulin resistance and diabetes in many animal models (). On the other hand, targeting of interleukin-1 or tumor necrosis factor-α or the NF-κB pathway in humans produced relatively modest, often clinically insignificant, improvements in insulin secretion and glucose control (). Hence, it remains challenging to produce robust and clinically meaningful translation of immune interventions for metabolic disorders from animals to humans. Speakman and Keijer, 2012 Speakman J.R.

Keijer J. Not so hot: Optimal housing temperatures for mice to mimic the thermal environment of humans. Tian et al., 2016 Tian X.Y.

Ganeshan K.

Hong C.

Nguyen K.D.

Qiu Y.

Kim J.

Tangirala R.K.

Tontonoz P.

Chawla A. Thermoneutral Housing Accelerates Metabolic Inflammation to Potentiate Atherosclerosis but Not Insulin Resistance. Beura et al., 2016 Beura L.K.

Hamilton S.E.

Bi K.

Schenkel J.M.

Odumade O.A.

Casey K.A.

Thompson E.A.

Fraser K.A.

Rosato P.C.

Filali-Mouhim A.

et al. Normalizing the environment recapitulates adult human immune traits in laboratory mice. Recent studies also highlight the limitations inherent in broadly extrapolating data from inbred mice, housed at certain temperatures, to other animals or species. The temperature of the animal facility has profound effects on metabolically sensitive tissues, not just on the extent of beige or brown adipose tissue activation (); temperature may also modify the development of tissue and systemic inflammation and experimental atherosclerosis (). Furthermore, the immune system of mice raised in pathogen-free barrier facilities is notably different compared to the innate and adaptive immune systems characterized in pet store mice (), findings with clear implications for development of immune interventions that can be translated from the laboratory into the clinic. Tremendous differences in metabolic rate, basal cardiovascular function, feeding behavior, hepatic lipid metabolism, and other species-specific physiological differences may also contribute to difficulties in translation of preclinical research findings across species.

Translational Challenges in Enteroendocrine Biology Drucker, 2016 Drucker D.J. Evolving Concepts and Translational Relevance of Enteroendocrine Cell Biology. Lim and Brubaker, 2006 Lim G.E.

Brubaker P.L. Glucagon-Like Peptide 1 Secretion by the L-Cell: The View From Within. Pais et al., 2016 Pais R.

Gribble F.M.

Reimann F. Stimulation of incretin secreting cells. The success in development of gut hormone-based therapies for the treatment of diabetes and obesity has re-energized the field of enteroendocrine cell (EEC) biology. However, the complexity of multiple EEC populations within the small and large bowel, coupled with important species-specific differences in the molecular characterization and function of EECs, raises important caveats for translation and drug development (). EEC scientists have long been aware that some regulators of GLP-1 secretion in rodents, such as GIP, gastrin-releasing peptide, leptin, insulin, artificial sweeteners, and other neurotransmitters, do not appear to robustly stimulate GLP-1 secretion in humans (). Ritter et al., 2016 Ritter K.

Buning C.

Halland N.

Pöverlein C.

Schwink L. G Protein-Coupled Receptor 119 (GPR119) Agonists for the Treatment of Diabetes: Recent Progress and Prevailing Challenges. Mancini and Poitout, 2015 Mancini A.D.

Poitout V. GPR40 agonists for the treatment of type 2 diabetes: life after ‘TAKing’ a hit. Mancini and Poitout, 2015 Mancini A.D.

Poitout V. GPR40 agonists for the treatment of type 2 diabetes: life after ‘TAKing’ a hit. A number of molecules targeting gut hormone secretion have worked beautifully in rodents, but not so well in humans. Multiple companies (at least five) tested the clinical efficacy of chemically distinct GPR119 agonists in subjects with T2D, following promising activity (stimulation of incretin and insulin secretion) of these same GPR119 agonists in studies of diabetic mice and rats. The clinical results in normal and diabetic human subjects were uniformly disappointing, and no GPR119 agonist progressed beyond phase 2 evaluation in clinical trials of subjects with T2D (). Although the lead GPR40 agonist, TAK875, was discontinued primarily for reasons related to hepatic toxicity, preclinical findings with GPR40 demonstrated robust stimulation of incretin and insulin secretion; however, not all of these findings were reproduced in human studies (). The importance of species-specific differences in receptor signaling, differential effects of full versus biased versus allosteric agonism, and differential actions of GPR40 agonists on rodent versus human enteroendocrine cells requires more extensive evaluation (). Hodge et al., 2016 Hodge R.J.

Paulik M.A.

Walker A.

Boucheron J.A.

McMullen S.L.

Gillmor D.S.

Nunez D.J. Weight and Glucose Reduction Observed with a Combination of Nutritional Agents in Rodent Models Does Not Translate to Humans in a Randomized Clinical Trial with Healthy Volunteers and Subjects with Type 2 Diabetes. Hodge et al., 2016 Hodge R.J.

Paulik M.A.

Walker A.

Boucheron J.A.

McMullen S.L.

Gillmor D.S.

Nunez D.J. Weight and Glucose Reduction Observed with a Combination of Nutritional Agents in Rodent Models Does Not Translate to Humans in a Randomized Clinical Trial with Healthy Volunteers and Subjects with Type 2 Diabetes. Most academic investigators do not have the resources required to simultaneously assess the efficacy of promising therapeutics in both preclinical and clinical studies. Hodge and colleagues developed a therapeutic mixture of four different natural compounds targeting gut-related mechanisms linked to weight loss and glucoregulation. Each individual compound exhibited favorable pharmacological activity, including the stimulation of EEC and GLP-1 secretion, and activation of fatty acid receptors, in preclinical studies (). GSK457 was developed as a mixture of these four individual compounds, oligofructosaccharide, apple pectin, blackcurrant extract, and oleic acid, combined in a ratio of 5:5:2:3. GSK457, when administered alone or in combination with a GLP-1R agonist, produced robust weight loss, reductions in glycemia, and amelioration of hepatic steatosis in high-fat diet-fed or db/db mice. In contrast, the same investigators assessed the efficacy of GSK457 in three different groups of human subjects with or without T2D, on top of baseline metformin or liraglutide therapy, for 6 weeks. Disappointingly, no meaningful improvement in body weight or glycemic parameters was detected in subjects treated with GSK457 (), highlighting further challenges in translating robust results from preclinical studies into the clinic.

Young versus Older Animals and Likelihood of Translation Even after avoiding many of the pitfalls enumerated above, the most carefully done science in rodents may fail to translate in other species or humans, due to considerable inherent physiological differences in the biology of small versus larger animals and rodents versus humans. For obvious reasons of availability, cost, and efficiency (time to complete an experiment), young mice, often 2–6 months of age, are widely used in studies of preclinical research. Although T2D has sadly become more prevalent in our children, the majority of human subjects with T2D are usually much older, often in the fifth through ninth decade of life. As a result, many older human subjects have experienced years of low-grade tissue inflammation and fibrosis, dyslipidemia, weight gain, and hypertension, associated with a gradual progression from impaired glucose tolerance to frank dysglycemia and T2D. The suitability of using young mice, often predominantly only one strain (C57BL/6J), for assessing the translational potential of new therapeutic mechanisms is questionable. Younger animals are far more likely to exhibit a greater potential for organ repair, cellular plasticity, and cell proliferation, compared to older animals. Indeed the field of neuroscience research is replete with extensive reports of age-associated reductions in cognition and synaptic plasticity, and it is generally easier to prevent disease in young mice (T1D in NOD mice, atherosclerosis in apoE−/− mice) than reverse established disease in much older animals. Nevertheless, there is no consistent expectation from funding agencies or journals that key novel and compelling results, frequently touted as having exciting translational potential, be examined critically not only in younger mice, but also in older animals with established disease.

Journals, Editors, Public Relations Staff, and the Media Egan et al., 2014 Egan A.G.

Blind E.

Dunder K.

de Graeff P.A.

Hummer B.T.

Bourcier T.

Rosebraugh C. Pancreatic safety of incretin-based drugs--FDA and EMA assessment. Bonner-Weir et al., 2014 Bonner-Weir S.

In’t Veld P.A.

Weir G.C. Reanalysis of study of pancreatic effects of incretin therapy: methodological deficiencies. Drucker, 2013 Drucker D.J. Incretin action in the pancreas: potential promise, possible perils, and pathological pitfalls. Egan et al., 2014 Egan A.G.

Blind E.

Dunder K.

de Graeff P.A.

Hummer B.T.

Bourcier T.

Rosebraugh C. Pancreatic safety of incretin-based drugs--FDA and EMA assessment. Drucker, 2016 Drucker D.J. Evolving Concepts and Translational Relevance of Enteroendocrine Cell Biology. A few years ago in 2009, several years after the clinical introduction of incretin-based therapies, I was startled to read a press release from a prestigious medical center containing recommendations to restrict the clinical use of incretin-based therapies, based on findings in a very small number of rats in a preclinical study. The controversy surrounding the expression and biological activity of the GLP-1R in normal and neoplastic rodent and human pancreatic tissue continued for several years and was also associated with inappropriate statistical analysis, questionable interpretation of data from adverse event reporting system databases, inadequately controlled animal studies, and technically flawed analysis of a small number of human histology samples. Nevertheless, many of these stories with inappropriate clinical conclusions were published in respected journals and widely disseminated in the media, with feature stories in the New York Times and other leading media outlets. Collectively, a series of peer-reviewed papers contained a substantial amount of alarmist misinformation, coupled with recommendations often trumpeted by accompanying editorials that incretin-based therapy should be curtailed or withdrawn. These publications and their conclusions supported the filing of lawsuits and led to much patient anxiety, ultimately requiring the attention of regulatory authorities (). After a huge amount of time and financial resources expended in independent attempts to reproduce many of the key findings, regulatory agencies concluded that many of the original scientific reports alleging serious safety issues were suboptimal and key conclusions could not be independently reproduced (). Long-term outcome studies in human subjects have subsequently validated the safety and therapeutic benefit of several incretin-based therapies (). Macleod et al., 2015 Macleod M.R.

Lawson McLean A.

Kyriakopoulou A.

Serghiou S.

de Wilde A.

Sherratt N.

Hirst T.

Hemblade R.

Bahor Z.

Nunes-Fonseca C.

et al. Risk of Bias in Reports of In Vivo Research: A Focus for Improvement. The rising tide of angst and discussions related to the fidelity and reproducibility of preclinical research is likely to foster greater awareness of methodological pitfalls and challenges in experimental design and execution. A growing chorus of scientific societies, independent funding agencies and journals, and independent investigators have taken up the rallying cry, and thoughtful recommendations and consensus statements continue to accumulate. These actions alone are unlikely to make a major dent in the status quo, which continues to be problematic (). Behavioral economics teaches us that a mixture of incentives and penalties stand the best chance of producing meaningful change in the way we do research. Although scientists must take ultimate responsibility for their standards, ethics, training environments, validation of reagents, experimental results, and reproducibility of their published data, the complex web of individuals contributing to the current reproducibility crisis is worth mentioning. Journal editors attend many meetings, socialize and network with hundreds of scientists, and compete for the most exciting papers from the best labs. Editors in turn are aware that the most tantalizing papers will garner the greatest media visibility and, ultimately, will indirectly attract more page views, advertising revenue, and boost the journal’s reputation, impact factor, and profitability. Corresponding authors will not infrequently receive a small editorial nudge if they have not sufficiently framed the biomedical translational importance of their basic science with sufficient positivity, panache, and verve. The same scientist will, at many institutions, receive regular email requests, periodic visits, and exhortations from public relations or media communications officers, anxious for new exciting stories to tell that highlight the wonderful work being done within the institution. There are monthly newsletters to fill, websites to update, fundraising pitches and portfolios to embellish, and local media contacts always need a new story. The media itself has an extraordinary appetite for scientific and medical information, especially stories with a hint of therapeutic relevance. The media beast is insatiable, although even my mother has now learned that most “medical breakthrough stories” featured on the television, radio, in print, or disseminated via the internet and social media are almost always exaggerated and often frankly incorrect. Drucker, 2015 Drucker D.J. Deciphering metabolic messages from the gut drives therapeutic innovation: the 2014 Banting Lecture. How did we arrive at this state of affairs? Although it is admittedly more difficult to become an elite fighter pilot, astronaut, or head of state, competition for faculty positions and resources in the best academic institutions is fierce, and the most valuable currency continues to be a mixture of publications in “the best journals,” ideally coupled with already secured independent funding. To obtain these valuable prestigious publications, one must meet the standards and expectations of journal editors, who similarly prize research that is spectacular, highly novel, and ideally accompanied by well-defined reductionist mechanisms and immediate obvious translational relevance. Given these challenges, it is perhaps not surprising that most research in my own area of gut hormone action has not been published in the top journals. Perhaps this reflects a degree of mediocrity and lack of scientific talent and imagination (mea culpa) in most individuals who have pursued the biology of regulatory peptides. Alternatively, it may reflect a historical tradition in the field of careful incremental physiology, studying how peptides that circulate at very low levels engage intricate communication mechanisms, in part through neuronal pathways, which may be challenging to tease out and simplify. Nevertheless, my plodding colleagues and I have witnessed the development, approval, and clinical utilization of several new drug classes for diabetes, obesity, and gastrointestinal disease, working in a field where 99% of the papers are published in “mid-tier,” yet respectable, physiology, biochemistry, endocrinology, and gastroenterology journals. Hence, despite a paucity of high-impact papers in the best journals, it seems clear that careful incremental, solid science, although rarely flashy, may, brick by brick, help build a field of science that is reproducible within and across many species, ultimately enabling successful drug development programs ().

Moving beyond the Status Quo toward Highly Reproducible Research Figure 2 Issues Contributing to Suboptimal Reproducibility of Preclinical Research Are Highlighted Show full caption Strategies to enhance research reproducibility are outlined. While accepting the notion that sunlight is the best disinfectant for many problems, simply highlighting existing challenges ( Figure 2 ) in an anecdotal way, or publishing guidelines, commentaries, or position papers, while perhaps helpful, seems unlikely to move the needle in a meaningful way. Below, I provide some simple suggestions, including recommendations made by others that may be useful for enhancing research reproducibility ( Figure 2 ). Transparency in Communication Vasilevsky et al., 2013 Vasilevsky N.A.

Brush M.H.

Paddock H.

Ponting L.

Tripathy S.J.

Larocca G.M.

Haendel M.A. On the reproducibility of science: unique identification of research resources in the biomedical literature. A very simple strategy for enabling research to be more easily reproduced is the provision of sufficient experimental detail, including careful description of and source of reagents, cell lines, and animals used in each experiment. While this sounds obvious, assessment of the extent to which publications routinely provide sufficient information to facilitate reproducibility reveals major gaps in our collective communication styles. Vasilevsky and colleagues surveyed 238 journal articles across five biomedical research disciplines and found that 54% of the research resources used to carry out experiments were not adequately described or identified (). Simply enhancing journal requirements for more detailed research reagent identification and reporting would increase the likelihood that future scientists were using the same reagents, conditions, and animals, in attempts to reproduce key findings. Although some journals cite space limitations, precluding provision of more extensive information, the widespread use of online supplemental information should facilitate, not hinder, precise research communication. Feature Discussions of Research Reproducibility at Scientific Meetings We all attend scientific meetings to hear and present the most updated science, network, and learn new techniques and best practices from colleagues. Although sessions are often devoted to challenges and pitfalls inherent to specific research areas, it is rare to see sessions dedicated to reproducibility issues in research. Imagine if most scientific meetings held a regular panel discussion with representation from scientists, funding agencies, and journal editors devoted to their recent experiences with fostering research reproducibility. Common problems could be highlighted and solutions proposed. Since new methodological issues and techniques surface constantly, the annual “reproducibility” symposia need not become stale or repetitive. Simply ingraining the importance of regularly discussing reproducibility problems and identifying methods to improve our research practices would make us more accountable to each other within our own scientific ecosystems. Design and Reporting of Animal Experiments Kilkenny et al., 2010 Kilkenny C.

Browne W.J.

Cuthill I.C.

Emerson M.

Altman D.G. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. Baker et al., 2014 Baker D.

Lidster K.

Sottomayor A.

Amor S. Two years later: journals are not yet enforcing the ARRIVE guidelines on reporting standards for pre-clinical animal studies. Extensive recommendations have been developed for design and reporting of animal experiments in the form of the “Animal Research: Reporting of in Vivo Experiments” (ARRIVE) Guidelines. These recommendations are comprehensive and span study design, housing, timing of experiments, more extensive description and reporting of animals used, animal husbandry, sample size calculations, randomization and analysis and reporting of outcomes, including all positive and negative results, adverse events, and use of appropriate statistical methods (). Despite widespread endorsement of the guidelines by funding agencies and multiple journals, the impact to date of the guidelines on improvement of reporting of animal experiments has been modest (). Although some journals have developed checklists of required information that must accompany each article, most of these checklists fall short in regard to reporting requirements recommended for preclinical studies. It is noteworthy that in the clinical trial domain, most journals, review boards, and funding agencies now enforce mandatory registration of clinical trials, with obligatory reporting requirements, on Clinicaltrials.gov . This allows for assessment of the design, pre-specified outcomes, and, ideally, results of a clinical trial, with information on results sometimes provided in advance of a peer-reviewed publication. In contrast, the details surrounding ongoing animal experimentation within laboratories and institutions are opaque and not accessible. It is not uncommon for scientists testing the efficacy of a new therapeutic agent to try numerous doses and concentrations and time points, employing different modes of administration in multiple animal models, using mice of different ages and health status, to finally land on the experiment that “works the best,” and this one experiment makes it into the paper as a key figure. None of the dozens of experiments that did not work are ever divulged. It is expected and understandable that each envisioned putative experimental result (often a therapeutic response) will require exploration and refinement of pharmacokinetic and pharmacodynamic relationships unique to each reagent. Nevertheless, it also seems likely that in some instances, the experiment simply does not work most of the time, the hypothesis is generally not sustained, and the scientist simply keeps on searching for just the right conditions that will provide the desired results, however narrowly applicable they might be. Reporting of negative results (conditions and models tried where an expected result was not obtained) is likely to be extremely valuable to the research community, yet at present there is no uniformly accepted or promoted mechanism enabling or requiring reporting of negative results. These unpublished experiments are not restricted to the academic sector, as the pharmaceutical industry also undertakes a great deal of preclinical research that may never be publicly disclosed. Scientists would likely be disinclined to volunteer such information, unless mandated to do so by funding agencies, institutional guidelines, animal care committees, or journals. After all, who wants to disclose that one’s exciting new therapeutic agent or mechanism only worked in 10% of the experiments it was tested in? Nevertheless, for scientists, journals, and funding agencies serious about reproducibility, a mechanism or channel for reporting all results, including negative results, would go a long way toward enhancing reproducibility and likely save colleagues a great deal of time and financial resources. The increasing recognition that questioning existing results or reporting negative results has great value has fostered innovation in discussions of reproducibility through websites such as PubPeer ( https://pubpeer.com/ ) or the Preclinical Reproducibility and Robustness publication channel developed by F1000Research ( http://f1000research.com/channels/PRR ). While these online forums are in their infancy, have growing pains, and may foster problematic discussions and accusations, the quality of the comments and motives of the participants should become more refined and improve over time. In the future, it seems likely that journals will develop their own portals, linked to individual papers, to facilitate the updating of published results and the reporting and tracking of efforts directed at reproducing key published findings.

Tracking Reproducibility; the R Index The McNamara or quantitative fallacy states that decisions should only be made based on quantitative data, because if you can’t measure something it may not exist. Similarly, Michael Bloomberg ran his business and New York City in part by reminding colleagues “In God we trust. Everyone else bring data.” McNutt, 2014 McNutt M. Journals unite for reproducibility. A combination of transparency and accountability for one’s track record is an accepted part of how we judge scientists; however, measurement of reproducibility is currently missing from most metrics and equations used to evaluate scientists. Although all scientists are data driven, we currently have no accepted way to track and quantify our own track record of scientific reproducibility. This reproducibility knowledge gap extends to our funding agencies and journals, which collectively have little understanding of whether the science they regularly fund and publish, respectively, turns out to be reproducible. At present, reproducibility is the subject of informal water cooler or bar room discussions, editorial musings, and consensus conferences (), but there is little attempt to construct and validate an index for research reproducibility. Simple steps could be taken by granting agencies, who might ask for a one-half page description of the scientists’ major reproducible research findings. This requirement would not be applicable to junior scientists, and might kick in after 10 years of independent productivity. s (Reproducibility Scientist ) index, reflecting the number of times the key scientific findings in a paper had been reproduced by at least one other independent research group. Of course, one would have to carefully debate and refine the meaning of “reproducible” ( Goodman et al., 2016 Goodman S.N.

Fanelli D.

Ioannidis J.P. What does research reproducibility mean?. s index of 40 would have published 40 research papers with findings found to be independently reproduced by others. Adjudication of the reproducibility of each paper would have to be carefully tracked over time, ideally by an independent body, which would require funding to sustain its activities. This type of undertaking presents major feasibility challenges, but also opportunities for independent organizations dedicated to the reproducibility of published research. Rather than only quantifying citations, papers in top journals, and research funding, why not quantify reproducibility? Imagine if each scientist was associated with a R(Reproducibility) index, reflecting the number of times the key scientific findings in a paper had been reproduced by at least one other independent research group. Of course, one would have to carefully debate and refine the meaning of “reproducible” (), but perhaps one could start simply by requiring that (a) the key findings and (b) at least 50% of the experimental data, from a single paper, were independently reproduced by at least one other research group. So a senior scientist with a Rindex of 40 would have published 40 research papers with findings found to be independently reproduced by others. Adjudication of the reproducibility of each paper would have to be carefully tracked over time, ideally by an independent body, which would require funding to sustain its activities. This type of undertaking presents major feasibility challenges, but also opportunities for independent organizations dedicated to the reproducibility of published research. J (Reproducibility Journal ) index, similarly reflecting the number of papers it publishes that are ultimately found to be reproducible. The R indexes could also be divided by the total number of publications (per scientist or journal) to yield R %s and R %J indices, reflecting the proportion of total papers and output ultimately found to be reproducible. Although it would take some time to generate metrics and establish the validity and utility of these measures, what scientist or journal would aspire to have a low reproducibility index? Hiring, promotion, funding, and award deliberations would ideally incorporate assessment of reproducibility as one additional factor to be considered in ranking of candidates. Who would want to fund, hire, or reward a scientist with a low reproducibility index? A potential advantage of this index is that it does not matter whether one regularly publishes only in high impact journals or simply does careful meaningful science published in subspecialty or mid-tier general science journals. Although no metric is likely to be without flaws or critics, pursuing high-quality reproducible science, without formally measuring reproducibility, is not likely to be successful. As Begley and Ioannidis have noted, “We get what we incentivize” ( Begley and Ioannidis, 2015 Begley C.G.

Ioannidis J.P. Reproducibility in science: improving the standard for basic and preclinical research. Emambokus and Granger, 2015 Emambokus N.

Granger A. The Elephant in the Room. The reproducibility index should not be restricted to scientists. Each journal should also have an associated R(Reproducibility) index, similarly reflecting the number of papers it publishes that are ultimately found to be reproducible. The R indexes could also be divided by the total number of publications (per scientist or journal) to yield Rand Rindices, reflecting the proportion of total papers and output ultimately found to be reproducible. Although it would take some time to generate metrics and establish the validity and utility of these measures, what scientist or journal would aspire to have a low reproducibility index? Hiring, promotion, funding, and award deliberations would ideally incorporate assessment of reproducibility as one additional factor to be considered in ranking of candidates. Who would want to fund, hire, or reward a scientist with a low reproducibility index? A potential advantage of this index is that it does not matter whether one regularly publishes only in high impact journals or simply does careful meaningful science published in subspecialty or mid-tier general science journals. Although no metric is likely to be without flaws or critics, pursuing high-quality reproducible science, without formally measuring reproducibility, is not likely to be successful. As Begley and Ioannidis have noted, “We get what we incentivize” (), and if we fail to measure and incentivize careful reproducible science, it is unlikely we will change the landscape of our current problematic scientific enterprise. The importance and attractiveness of reproducibility research could be enhanced by having funding agencies allocate a dedicated proportion of new funding for grants directed at research reproducibility, focused on the most potentially transformative findings within a field. Alternatively, companies in the private sector might pool resources toward establishment of small independent reproducibility research laboratories, tasked with reproducing major key findings within a select number of scientific disciplines, with results profiled annually at conferences and in journals. As the editorial leadership of this journal has noted, ensuring reproducibility and experimental replication is the responsibility of the entire scientific community (), not just someone else’s problem.