Photo by Jeremy Olson on Unsplash

Data silos

Researchers use a combination of software tools and instruments to take and analyze data. To make the input efficient, they use different media to store the data: Thoughts in a paper notebook, raw data from an instrument on disk, analysis code in a Jupyter notebook on GitHub, tracked parameters in a spreadsheet. This quickly creates data silos, because none of the storage media are set-up to connect to each other easily, and the connections are in the researcher’s head. If she wants to update a colleague, she has to describe the workflow and recreate the links in an email or a presentation, which again becomes a data silo.

Photo by Ricardo Viana on Unsplash

Consistency

Since research deals with the unknown, there are few conventions on how to name steps and parameters in the context of experiments and what to store. New things to name come up all the time, so the naming happens on the go. We have seen every imaginable way to name temperature, and laboratory notebook entries range from extremely detailed narratives to not human-readable lists of acronyms and numbers.

Photo by Simson Petrol on Unsplash

Digitalization

Most researchers still use a paper notebook as a central source of truth in their experiments. The paper notebook has stood the test of time because it is incredibly flexible, easy to use, relatively secure, and hard to falsify.

Photo by Cesar Carlevarino Aragon on Unsplash

We need tools!

This by no means exhaustive list shows that there are many practical hurdles to overcome before researchers can collaborate seamlessly. There is more standing in the way of open research than the fear of getting scooped on a publication or a patent. Researchers currently need powerful incentives to share their data, because it is not just about putting all of their data on a publicly accessible server every night, it takes significant time and effort to bring it into a form that makes it useful to other researchers, which can be similar to time it takes to write a publication. With the current pressure to continually publish, it is no wonder that journals for negative results are not taking off. If researchers already spend half of their time publishing, they will choose the success stories, over mountains of inconclusive data.

For open science to become a reality, we need the tools that allow researchers to efficiently share every single datapoint instantly in a way that expresses how and why it was created.