UW anthropologist: Why researchers should share computer code

Kim Eckart UW News

For years, scientists have discussed whether and how to share data from painstaking research and costly experiments. Some are further along in their efforts toward “open science” than others: Fields such as astronomy and oceanography, for example, involve such expensive and large-scale equipment and logistical challenges to data collection that collaboration among institutions has become the norm.

Meanwhile, a variety of academic journals, including several in the Nature Research family, are turning their attention to another aspect of the research process: computer programming code. Code is becoming increasingly important in research because scientists are often writing their own computer programs to interpret their data, rather than using commercial software packages. Some journals now include scientific data and code as part of the peer-review process.

And now, with the May 25 online publication of a commentary by Ben Marwick, University of Washington associate professor of anthropology, and 13 other colleagues at universities across the United States and Europe, there are conventions and tools that researchers can use to make code sharing easier and more efficient. The team’s paper advocating the sharing of code appears in Nature Neuroscience, while the journal in an editorial announces a pilot project to ask future authors to make their code available for review.

Making the programs behind the research accessible allows other scientists to test the code and reproduce the computations in an experiment — in other words, to reproduce results and solidify findings. It’s the “how the sausage is made” part of research, Marwick said. It also allows the code to be used by other researchers in new studies, making it easier for scientists to build on the work of their colleagues.

“What we’re missing is the convention of sharing code or the tools for turning data into useful discoveries or information,” Marwick said. “Researchers say it’s great to have the data available in a paper — increasingly raw data are available in supplementary files or specialized online repositories — but the code for performing the clever analyses in between the raw data and the published figures and tables are still inaccessible.”

Other Nature Research journals, such as Nature Methods and Nature Biotechnology, provide for code review as part of the article evaluation process. Since 2014, the company has encouraged writers to make their code available upon request.

The Nature Neuroscience pilot focuses on three elements: whether the code supporting an author’s main claims is publicly accessible; whether the code functions without mistakes; and whether it produces the results cited.

“This is a commitment from a high-impact journal to raise software to the status of a regular research product, that it’s not just a tool that gets discarded along the way, or hidden on a researcher’s computer where no-one else can benefit from it,” Marwick said. “In the future, scientific disciplines will be shifting to a position where you need to share your code as well as your data. It will be easier to reproduce someone’s new discovery, and incorporate their discoveries into your own work.”

Imagine this scenario, Marwick said: A neuroscientist is trying to find new ways to identify early-stage tumors using 3-D brain imagery. She comes up with an algorithm that can pick out specific pixel values in an image, which helps lead to early tumor detection. By sharing the computer code and its mathematical algorithm, the scientist could facilitate a breakthrough.

The Nature Neuroscience paper resulted from a two-day workshop held in 2014 in the United Kingdom, to Marwick, an archaeologist, was invited because of his efforts in using code and promoting open science in archaeology. A Senior Data Science Fellow at the UW eScience Institute, Marwick is active in the institute’s Reproducibility and Open Science Group, which works on issues and practices around tools and practices to enhance data sharing, preservation and reproducibility.

Bill Howe, associate director of the eScience Institute, said code sharing is part of the future. “Reproducibility is literally the definition of science, and as science moves from the lab to the computer, code sharing must be at the core of how we conduct research and train students.”

An open science approach to sharing code is not without its critics, as well as scientists who raise legal and ethical questions about the repercussions. How do researchers get proper credit for the code they share? How should code be cited in the scholarly literature? How will it count toward tenure and promotion applications? How is sharing code compatible with patents and commercialization of software technology?

Marwick, who specializes in prehistoric human evolutionary ecology in Southeast Asia and Australia, has been advocating for code-sharing and related open science initiatives in archaeology through the Society of American Archaeology.

“I’m just trying to shift the needle in my discipline to a practice that benefits everyone — researchers and the public,” he said.

###