Applying our survey tool to 360 random-sampled hydrology articles published in 2017 shows that a decreasing number of articles are able to satisfy the progressively stricter reproducibility requirements of artifact availability and ultimately reproduction of the published results (Fig. 2). For example, 70.3% of the 360 sampled articles stated some materials were available, but we could only access 48.6% of those materials online (Fig. 3). Only 5.6% of sampled articles made data, model/code, and directions publicly available while just 1.1% of sampled articles made artifacts available and were fully reproduced. We partially reproduced an additional 0.6% of articles.

Figure 2 Number of papers progressing through the survey questions to determine availability and reproducibility requirements. Full size image

Figure 3 Data, model, code availability by journal (summary of Q4 and Q5). Full size image

Artifact Availability

Across all sampled publications, the most common primary artifact provided was input data, followed by code/model/software, and then directions to run (Fig. 4). These three primary artifacts were each needed to reproduce modeled results. Secondary artifacts, such as hardware/software requirements, common file formats, unique and persistent digital identifiers, and metadata, were made available at much lower rates than the primary artifacts. Articles published in Environmental Modeling & Software (EM&S) provided data/model/code, directions, hardware/software requirements, common file formats, and metadata at rates two times or higher than other journals.

Figure 4: Availability artifacts organized by journal. All percentages are based on the total number of sampled papers for each journal. Refer to Fig. 3 or the text for full journal titles. Full size image

Sampled articles use different methods to make artifacts available and these methods differ markedly across journals (Fig. 4). A majority of sampled EM&S articles made at least some artifacts available online (61.9%). By contrast, Hydrology and Earth Systems Sciences (HESS) and Water Resources Research (WRR) had high percentages of articles where materials were only available by first author request (38.5–40.2%). Otherwise, the Journal of Hydrology (JoH), Journal of the American Water Resources Association (JAWRA), and Journal of Water Resources Planning and Management (JWRPM) had large proportions of articles where data were available within the article or as supplemental material. These three journals also had high proportions of sampled papers in which research artifacts were not available.

Reproducibility of Results

Twenty sampled articles (5.6% of total sampled articles) made the required input data, software/model/code, and directions available, allowing an attempt at reproducing published results. We were able to fully reproduce results for four articles35–38 and partially reproduced results for two additional articles39,40. We were unable to reproduce results for four articles41–44, which nonetheless appeared to provide the necessary materials. During the process to reproduce results, we found 10 of the 20 articles did not have all the required artifacts, despite being initially considered for reproducibility testing. Reasons we only partially reproduced results for two articles and did not reproduce results for four articles included unclear directions (4 articles), did not generate results (3 articles), hardware/software error (2 articles), or results differed from the publication (1 article; some articles had multiple reasons). The survey permitted multiple selections for this question. A common issue across cases where we did not generate results was that folder and file locations were hard-coded to work on the author’s computer. If these issues were obvious, we tried, with limited success, to fix them. Other cases pointed to general data gateways, like the USGS streamgauge network, with no further details, or required expensive or proprietary software. Of the 10 articles that had all artifacts available, five were published in EM&S, two articles were published in HESS and in WRR, and the remaining article was published in JWRP&M.

Estimated Reproducibility for Population

Because the stratified sampling method oversampled articles with certain reproducibility keywords, we used bootstrap resampling (see Methods) to estimate that 0.6 to 6.8% of all 1,989 articles published in 2017 in the six journals tested here would be reproducible (95% confidence interval). We estimated 28.0% (23.1–32.6% confidence interval) of all articles published in these journals during 2017 provided at least some of the artifacts necessary for reproducibility (Fig. 5, black). EM&S differed from other journals by having a large proportion of articles with some or all data available (31.8–64.0%) and relatively high estimates of reproducibility (Fig. 5).

Figure 5: Population estimate of reproducibility for all papers published in 2017. Results are sorted by journal, with “Total” representing all 6 journals. Median estimate is represented by a point, vertical bars show the 95% confidence interval. Refer to Fig. 3 or the text for full journal titles. Full size image

Using Keywords to Identify Reproducible Articles

We found that five of the six articles with some or complete reproducibility had certain related keywords of interest in their abstracts (full list in Methods). This positive hit rate (4.2%) for articles with reproducibility keywords is significantly greater than articles without (0.4%; 2-sample Chi-Squared test with Yate’s continuity correction (p = 0.014)). These findings confirm the value of reproducibility keywords to identify reproducible articles and reaffirm the difficulty to know at the outset whether the results presented in an article are reproducible.

Time Required to Determine Availability and Reproducibility

We surveyed and analyzed the time required to complete the survey to show the incremental effort required to determine the availability of article artifacts and reproducibility of results (Fig. 6). For example, for a single publication it took us as little as 5 to 14 min (25–75% range) to determine the availability of input data, model/software/code, and directions. Reproducing results for a single paper required upwards of 25 to 86 min (25–75% range), with an upper outlier of 200 min. There were no statistically detectable differences in the time between journals to determine availability of digital artifacts or to reproduce results.

Figure 6: Self-reported time to complete survey organized by the survey’s ending question. Each reviewed paper is shown by a dot, while the mean is represented by a red diamond. Distribution density is shown by width. Full size image

Reproducibility and Journal Policies

Among the six hydrology and water resources journals we studied, the HESS and WRR policies effective during the 2017 review period require articles to state how data, models, and code can be accessed. In contrast, the 2017 policies by EM&S, WRPM, JoH, and JAWRA simply encouraged this practice. EM&S further recommends articles include an explicit “Software and/or data availability” section within the article and requires authors to make software essential to the paper available to reviewers during the review process (Supplemental Material). HESS includes an assets tab in each publication, based on the Code and Data Availability sections. EM&S, WRR, and JOH are all signatories of the Transparency and Openness Promotion (TOP) policy framwork45, while HESS participates in the FAIR (Findable, Accessible, Interoperable, and Reusable) data project46.

Stronger journal data availability policies and making open data commitments tend to produce higher rates of artifact availability and result reproducibility. However, there is significant variation among these journals, likely due to minor differences in implementation or other factors. For example, EM&S, which only encourages authors to make artifacts available, had the highest rate of articles that made artifacts available (Fig. 3) and this high rate persisted across nearly every artifact category (Fig. 4). Although EM&S used “should” instead of “must” statements, their policy was by far the most specific for papers with a software component (Supplemental Material). This may explain their high participation rate. EM&S is also a software-focused journal, which may attract papers and authors that are more conscious of reproducible software. In contrast, HESS and WRR, which require data availability statements, had lower percentages of articles that made artifacts available and more papers that direct readers to the authors or third parties for data, models, or code (Fig. 3). These directional statements tend to appear in the Data Availability section of HESS articles and the Acknowledgements of WRR articles. The final group, JoH, JAWRA, and JWRP&M, that also encouraged authors to make artifacts available, had high proportions of articles without available digital artifacts (Fig. 3). The HESS and WRR policies that require data availability statements appear to encourage authors to select options like contact the author rather than work to provide a research article and supporting materials that are available, reproducible, and replicable. In July 2018, JWRP&M switched to start requiring authors to state the availability of data, models, and code, similar to HESS and WRR47.