The past year has been a momentous period for preprint-driven open access. Elsevier has made two major acquisitions, of SSRN with its edited research networks and of bepress with its Digital Commons institutional repository service. Springer Nature sibling Digital Science has worked to develop its presence too, expanding figshare as not just a data repository but as a full institutional solution and more recently improving its support for preprints. As commercial providers buy and build their way into the institutional repository and preprint marketplace, the not-for-profit Center for Open Science (COS) is offering an alternative by expanding what it calls the preprint services it powers through its platform. Today, COS announced the availability of six new services, including a national repository for Indonesia and a variety of new disciplinary services. While for now relatively small in scale, COS is building a platform for the research community that is controlled by a not-for-profit and therefore presents an intriguing and potentially powerful alternative.

COS

The Center for Open Science (COS) characterizes itself as “a non-profit technology startup founded in 2013 with a mission to increase openness, integrity, and reproducibility of scientific research.” Its board is comprised of scientists and social scientists with several interesting institutional affiliations, most notably perhaps the National Academy of Sciences. Its strategic plan is available for all to review.

It is admirably transparent about finances and sponsors. Its largest supporter every year has easily been the Laura and John Arnold Foundation, but it has also received funding from Sloan, Templeton, IMLS, NSF, NIH, Hewlett, DARPA, and others.

I recently spoke with Brian Nosek, the COS Executive Director, about its newest preprint communities and its broader direction. COS is building a platform on which any scientist can perform research more openly than might otherwise be the case. Its platform allows researchers to use the services they have already selected (such as figshare, Mendeley, and Dropbox) but in a way that drives greater interoperability across the research lifecycle. It is also selectively building services of its own. Because preprints are well established in some fields but barely utilized in others, Nosek says, it makes sense for COS to drive the development of infrastructure that will accelerate greater sharing.

Preprint Communities

COS takes the approach of making its platform available for other communities to build on. For the most part, rather than soliciting specific communities to adopt its platform, COS is making it available for a variety of communities that might welcome having a home, such as learned societies, research funders, and grassroots initiatives. Nosek points to this evidence of “pent-up demand” in communities as diverse as library and information science, mind and contemplative practices, nutritional sciences, and sport and exercise-related research. COS provides the infrastructure and each community is responsible for soliciting contributions and drawing in scholars to use the service.

In many ways, this approach to adding communities is quite similar to the approach that Elsevier’s SSRN takes. Its research networks, directed by one or more senior researchers in the field, provide a community space for a given discipline or field. Chemistry was a small headline this summer: as the American Chemical Society’s ChemRxiv preprint service was being launched through the Digital Science figshare platform, SSRN announced a new research network for chemistry. While these preprint communities find platform homes, there seems also to be no lack of platform competition for research papers.

Preprint Providers

Preprints are found in a number of different repository services, organized both through disciplinary and institutional communities. Beyond COS, there are several notable services providers: Cornell University, Digital Science, the not-for-profit DuraSpace, and Elsevier. Comparisons are tricky but it is worth examining scale and velocity.

There are a number of different specific definitions to the term “preprint,” as well as some interesting questions about whether the term is breaking down. But several providers are able to estimate the number of preprints or research articles they make available through repository-like services:

Provider Number of Research Articles COS 3,794 Elsevier (includes bepress and SSRN) 3,050,000 DuraSpace Not available Cornell University (includes arXiv) 1,297,000 Digital Science (through figshare, includes ChemRxiv, institutional repositories, etc) Not available

This set of measures shows that some providers are more well established than others. The size of the corpus in absolute terms matters for content availability and discoverability, and as I will discuss below for new approaches that both COS and Elsevier are pursuing.

But absolute measures can also be misleading, since they mask rates of growth (or contraction). Today and over the past year, we have seen both COS and SSRN adding new communities, which drives growth, along with several changes in platform ownership. As a result, we should expect to continue to see scholars, funders, and libraries change their preferred platforms for deposit or switch platform partners.

Therefore, velocity may be an equally relevant metric, not just absolute numbers. Prior to today’s announcement, COS’s velocity, measured in research articles deposited per month, is approximately 500. Over the past 22 months, Cornell’s arXiv had grown by an average of roughly 6,300 deposits per month. Since it was acquired by Elsevier, SSRN has added an average of 12,000 papers per month.

It is hard to know even from these metrics which platform is gaining and which is losing momentum, but it is clear that COS would have its work cut out if it were interested in competing on numbers (which Nosek emphasizes is not how COS defines its success). Platforms reach through to scholars to deposit research articles through institutional affiliations or disciplinary structures — or, as ResearchGate and Academia have shown, through peer networks. Marketing to scholars, directly or indirectly, is key.

COS’s interest in preprints is part of a larger vision about “disrupting scholarly publishing.” New features for commenting, moderation, and peer review will soon enable various kinds of journals to be run on this infrastructure. These are ambitious directions, and exciting ones if they can be realized, since they will open up the possibility of new kinds of journals connected more directly through the life-cycle with its underlying preprints. Without the burden of an established journals business, COS may find an opportunity to be more disruptive than in preprints alone. On the other hand, there is a risk that entering the journal market could pull COS away from building out full support for the research workflow.

The Turn to Workflow

It is impossible to understand the COS vision without addressing what I call the turn to workflow. In an emerging viewpoint that I view as extremely important, the repository is not a service or a platform of its own but rather a waystation along a researcher workflow. All scientists and many other types of scholars have a researcher workflow, the steps they take from project definition and funding through research and analysis to publication and showcasing. Elsevier and COS are two organizations placing big bets not just on individual research services but on providing an integrated workflow: Elsevier through a major spurt of acquisitions that really began with its acquisition of Mendeley and COS through the development of its Open Science Framework (OSF).

Their visions are different. Elsevier is pursuing a strategy that involves tying each of its researcher workflow services increasingly closely together, offering up the prospect of an increasingly seamless researcher experience. It plans to monetize services, in part by gathering and aggregating data that make its analytics and decision-support products as powerful as possible. While quick to say that individual products do not involve user lock-in, Elsevier is exploring a variety of opportunities to increase the “stickiness” of users to its services and tie its products together.

COS has a very different vision. As Nosek puts its, “community controlled services is our long-term aim…We perceive our role in the phase to create the public goods infrastructure.” The first step is to provide “services that researchers know they need now as a hook” to get them using the platform. So, this includes growing interoperability with other workflow providers, such as Elsevier’s Mendeley Data as a dataset repository and making SSRN and Digital Commons preprints discoverable via SHARE. It can then add important interstitial services and add needed components that improve scientific integrity. Thus COS offers an alternative to a commercially provided research workflow in one sense while at the same time looking for ways to integrate some of these key offerings. It is for this reason that COS talks about providing a framework for services moreso than insisting on becoming a service provider itself. Its preprint communities fit into this broader approach.

Looking Ahead

One key priority for COS is establishing its sustainability. It lacks access to the level of capital that Elsevier and other alternative providers can bring to bear (reportedly paying £150 million for the Digital Commons parent alone). Over time, COS will build a model for recurring funding, possibly on the lines of Cornell’s arXiv. While some funding streams can inhibit usage, others can be positive for customer and provider alike, if they build closer links with “customers.” Selling sustainability, even under an ultimately open model, can add discipline and focus.

Already, COS sees growing interest in the use of its platform from diverse communities. As scholars bring their preprints to these communities, they will have an opportunity to experience COS’s broader researcher workflow offerings. Some will stay for the full research lifecycle. It is entirely possible that, especially if COS can secure partnerships with a greater number of learned societies, its platform could grow substantially. And, we should not rule out attrition from some of the alternatives, as the market re-sorts itself.

One abiding interest seems to be in connecting institutional priorities, such as compliance and showcasing, with the disciplinary communities through which scholars tend to organize their research. If COS, like figshare, continues to evolve to offer homes to both institutional and disciplinary windows into its preprint collections and broader data, it may have a strong advantage. Elsevier will need to integrate Digital Commons and SSRN in order to achieve a similar offering, albeit starting with a much greater base of content than competitors.

Ultimately, a key question is emerging for higher education institutions: To what extent, and under what conditions, does it make sense to outsource core scholarly infrastructure? The Center for Open Science is providing what Nosek calls a “pragmatic strategy” to offer a vision of something other than complete outsourcing to commercial providers.