Using open data and open services, a large-scale study of the state of open and free access has found both the number and proportion of articles freely available to the public is growing, having reached 45% of the literature published in 2015. Juan Pablo Alperin reveals more about this study and suggests we are at the beginning of a new era of large-scale, big data research on the state of scholarly publishing and public access to research. Services such as Unpaywall allow us to see what research people are trying to access, as well as that which is already freely available, and this again shows almost half of overall demand being met by some type of open access.

Earlier this year, the ImpactStory team of Heather Piwowar and Jason Priem launched Unpaywall, a new browser extension that helps users find free, easy-to-access research. Since officially launching in April, Unpaywall has been installed by over 85,000 users who have collectively made over 75 million requests. Like its cousin, the OA Button, and their illegal counterpart, Sci-Hub, Unpaywall demonstrates the pent-up demand that exists for access to research.

All three of these tools share something in common: users have found research they are interested in reading but need a tool to help them gain access. If academics and non-academics alike are able to find research they are interested in but need tools such as these for access, we must ask ourselves: how many more people are being locked out? Or, put another way, how much are we limiting our capacity to use and benefit from all the research and scholarship we produce? Fortunately, data from Unpaywall (or, more precisely, from its underlying database, oaDOI) can help shed some light on this question.

In a recently published preprint, the creators of Unpaywall, along with a group of co-authors (myself included) from three universities, have conducted a large-scale analysis of the state of open and free access. We found both the number and proportion of articles that are freely available to the public is growing, having reached 45% of the literature published in 2015 (Figure 1). While this is not the first study to attempt to measure the prevalence of OA (see p5 of our paper for more details of other such studies), this work stands out because of the size of the sample and the method of automatically assessing whether free versions of the articles exist (especially archived copies), and because the underlying data and service are openly available. In this sense, it is most similar, both in methods and scale, to the work of Archambault et al. (2014) (whose work is used to power the 1Science database). Like our own, their study found approximately half of the scientific papers in their sample were free to read.

Figure 1: Number of articles (left panel) and proportion of articles (right panel) with OA copies, estimated based on a random sample of 100,000 articles with Crossref DOIs. Gold: published in an open-access journal (as defined by the DOAJ); Green: toll-access on the publisher page, but there is a free copy in an OA repository; Hybrid: free under an open license in a toll-access journal; Bronze: free to read on the publisher page, but without a license; Closed: all other articles, including those shared only on an academic social network or in Sci-Hub. Source: Piwowar H, Priem J, Larivière V, Alperin JP, Matthias L, Norlander B, Farley A, West J, Haustein S. (2017) “The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles”, PeerJ Preprints. This work is licensed under a CC BY 4.0 license.

These studies, and other recent work such as the study of Sci-Hub usage data, are just the beginning of a new era of large-scale (dare we say, big data?!) research on the state of scholarly publishing and the public’s need for research. As we see new open access mandates – from universities, funders, and governments alike – these studies should make us acutely aware of the value of having open data, along with high-quality metadata, in machine readable forms. Such data is only now becoming available. Both the ImpactStory and 1Science teams have had to resort to developing tools that pool data from many different sources and work around many of the limitations of the currently available information.

The result of these efforts are databases – of which oaDOI has been made freely and openly available – that can be used to inform and shape decisions about what and how researchers and the public are given access to research. The value is multiplied when the databases are combined with usage data from places like Unpaywall, the OA Button, and Sci-Hub, which allow us to see not only what research is freely available but what research people are actually trying to access. In the case of Unpaywall, a sample drawn from a single week of use shows almost 50% of articles Unpaywall users are interested in reading are freely available (about a third of these are in OA journals) (Table 1).

Table 1: Prevalence of OA by type in sample of 100,000 accesses by Unpaywall users.

What now seems clear is that publishers are recognising the importance of providing access. Data from these studies shows an obvious demand for access, and publishers are responding by launching OA journals, as seen by the growth of Gold OA content (content published in an open-access journal, as defined by the DOAJ). It can also be seen in the large proportion of Bronze OA (a new term coined by the authors, referring to articles that are free to read on the publisher page, but that have not been explicitly licensed as being open). If we put on an optimist’s lens and view the glass as half-full, it is certainly encouraging to see our system of scholarly communications gives users half of the access they need. We can, of course, also view it from the glass-half-empty perspective, and consider that we still have more than halfway to go.

What is certain, regardless of whether you are a glass-half-empty or half-full kind of person, is that we are finally in a position to make these assessments with some reliability. This study gives us a reproducible benchmark measure of the extent to which we are enabling researchers and the public to access scholarly work. Ever the optimist, I want to believe we will soon be able to quench the world’s thirst for knowledge.

This blog post is based on the author’s co-written article, “The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles”, a preprint currently available at PeerJ Preprints (DOI: 10.7287/peerj.preprints.3119v1).

Note: This article gives the views of the author, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below.

About the author

Juan Pablo Alperin is an Assistant Professor in the School of Publishing at Simon Fraser University, an Associate Director of Research for the Public Knowledge Project, and the co-Director of the ScholCommLab. He is a multi-disciplinary scholar, with training in computer science, geography, and education, whose research focuses on the public’s use of research. He can sometimes be found at juan@alperin.ca, and always on Twitter at @juancommander.