Posted on behalf of Andrea Chiarelli (Consultant) and Rob Johnson (Director) of Research Consulting.

Open access discovery tools enable users to find scholarly articles that are available in open form, whether on a publisher’s website or elsewhere. This is a technically-challenging endeavour and also requires a deep understanding of the scholarly communications landscape, the underpinning infrastructure and the needs of widely different stakeholder groups such as researchers, publishers, service providers and the general public.

It is currently estimated that somewhere between a third and a half of journal articles have open versions available at or soon after publication (source: STM Report 2018). However, open versions are often shared via channels (e.g. repositories, preprints servers) that are disconnected from the journal landing page where a paywalled article would typically reside. This is a considerable obstacle for readers, as open versions of academic literature risk languishing in undiscoverable places rather than being in plain sight for everyone to find.

The objective of OA, and the rationale for the significant investment in it, is to enable open versions of academic literature to be found, re-used and have an impact beyond what would be the case if they were behind a paywall. Hence, OA discovery is a priority for the UK and international research communities. As the OA discovery landscape is crowded, OA discovery products compete for space and efficacy against established public infrastructure, library discovery services and commercial services. Finally, illegal access to academic literature via Sci-Hub appears to be drawing attention away from OA discovery tools, as a wide variety of end-users of research are reported to be using this while they could try and pursue access via lawful pathways.

A number of OA discovery tools are available today and libraries and researchers themselves are making important decisions as to how to engage with and use such products. In this context, Jisc asked Research Consulting to prepare the present review to help guide their future understanding and support for their members.

Why do OA discovery tools matter?

Discovering free-to-read academic articles is frequently difficult, as there is no single place where these can be found. From the perspective of a reader browsing the web for articles to read, OA discovery tools are most relevant in a specific scenario – imagine you are visiting a journal web page and find an article you wish to read:

If it is OA, you don’t really need OA discovery tools to carry out any searches as the article will be immediately available to browse and, likely, to download.

If it is not OA, however, OA discovery tools become relevant: finding if and where an open version exists is a real challenge! For instance, possible answers may lie in institutional repositories (Green OA) or preprint servers.

OA discovery tools tend to operate in similar ways. First, a user installs the tool’s browser extension. Then, when the extension detects a DOI as the user browses the internet, this is used to run a search against:

The tool’s own database, if one is available; and

A range of external databases, including e.g. PMC, CORE (a Jisc-funded service), BASE or another tool’s database (e.g. the Unpaywall dataset).

If an OA copy is found, the browser extension would typically display an icon in the browser window to take the user to the open copy of the article.

Today, two key tools are available to achieve the above: Unpaywall and OA Button. These focus solely on OA discovery and have been around for a few years. Another tool offering a browser extension and OA discovery features is Kopernio; however, this was designed to help readers with institutional subscriptions rather than focusing primarily on the use case discussed above. The OA discovery landscape as a whole also includes a range of other players that do not offer browser extensions but index OA material (exclusively or in addition to paywalled literature) that is then searchable on their websites. Examples include 1findr, Dimensions, Google Scholar and the Jisc/Open University managed CORE. Additional services for libraries and publishers are also available, such as Anywhere Access, which simplifies workflows for end-users to reach OA content via institutional libraries and publisher websites.

In the context of our work, we spoke with a range of stakeholders across the OA discovery landscape – one of the key observations that were raised was that “for now, these tools are workarounds trying to fix a broken system”. This referred to the (slow) transition to OA: the more articles become OA on publishers’ platforms in the future, the less OA discovery browser extensions will need to be used to find free-to-read copies. However, at this stage, it is difficult to predict how long the expected transition might take and, therefore, OA discovery tools are likely meeting a medium-term need at least. They are, indeed, expected to evolve with the market and diversify their operations to some extent: as an example, we note that Unpaywall received a grant from the Arcadia Fund, which is being used to support the development of an AI-powered tool aiming to make research understandable to potential users beyond academia.

Interest from academic libraries is growing

Library system vendors have been investing substantially in tools to discover literature, including so-called web-scale discovery, of which the leading examples are EBSCO Discovery, Proquest Summon, Ex Libris Primo, and OCLC WorldCat Discovery. These services are collectively installed in almost 10,000 customer sites. Increasingly, OA discovery tools are being integrated into library systems and link resolvers, as well as traditional abstracting and indexing databases (e.g. Scopus, Web of Science).

Some examples of efforts to integrate OA discovery in library systems include:

Kopernio announced a collaboration with the California Institute of Technology (Caltech), with a view to offering one-click access to both subscription and open-access content.

OA Button received a grant from the Arcadia Fund to work on improving interlibrary loans via their GetPDF and InstantILL products.

CORE recently entered a partnership with ProQuest to deliver more content within their library discovery services (Ex Libris Primo and Ex Libris Summon).

Digital Science’s Dimensions and 1Science’s 1findr service (now owned by Elsevier) aim to combine web-scale discovery and analytics functions in a single product.

The issue of legitimacy in the OA discovery landscape

The elephant in the room when it comes to OA discovery is the difficulty of identifying legitimate results while using the OA discovery browser extensions: when an OA discovery tool presents a user with a free-to-read version, it is very complex for them to verify its provenance and know whether the article had been shared lawfully in the first place.

Most tools appear to be making efforts in this direction, particularly by not including ResearchGate among their sources. This scholarly collaboration network is known for hosting a large number of academic articles in versions infringing copyright and violating publishers’ policies: therefore, using ResearchGate as a data source in OA discovery tools is likely to lead to illegal OA versions.

At this stage, there seems to be agreement around the fact that no tool is able to guarantee that 100% of the content is shared legitimately. This remains an important area of focus for all OA discovery tools, but our interviewees stressed that this is felt as an issue more by librarians, who need to recommend legitimate sources for academic literature, than by end users just trying to access a publication of interest. However, the topic is relevant for a much broader audience, including publishers, research funders, university administrators and more.

Selectively bridging the metadata gap to enable better OA discovery

Better metadata in the scholarly communications landscape is desirable per se, but it’s not the solution to all technical OA discovery issues.

It would be desirable to have more interoperable metadata (including on licensing) in institutional repositories (Green OA) and preprint servers, as these are some of the locations where OA content is most likely to be found when the original article is paywalled. Improved and more interoperable metadata in these locations could reduce the efforts OA discovery tools need to make to maintain their own databases (Unpaywall, in particular) and crawl the internet for free copies of paywalled literature. Furthermore, it would contribute to addressing the issue of legitimacy, providing firm evidence that research deposited in institutional repositories and preprint servers is appropriately licensed for sharing.

An open question in the area of OA discovery is what proportion of the total academic literature is available in an open version. In this case, Crossref would be a solid starting point: improved licensing metadata on Crossref would be useful to paint a clearer picture of the OA landscape, but would not be particularly useful to an end-user who is already on a journal web page trying to read a research output.

The future of OA discovery tools

OA discovery tools perform a relatively simple task, which is to match a DOI or title/other query to an OA URL. However, the effort to accurately match DOIs/queries to OA URLs is not negligible: OA discovery tools need to have a high recall rate (i.e. how many DOIs/queries have a corresponding OA URL) but also need precision (i.e. the OA URL returned corresponds to the DOI/query). If erroneous matches are present in the database, user trust will decrease: this leads some of the tools to performing manual checks of the links between DOIs and OA URLs to ensure precision is as high as possible.

Under the assumption that a reliable and accurate database is available matching DOIs/titles/other queries to OA URLs, the computational effort to connect the former to the latter is relatively low. Similarly, the algorithms required to do so are also simple and a more demanding task is to develop a user-friendly browser extension and user interface.

In order to improve OA discovery, the scholarly communications community will have to focus on metadata. We expect that improvements in metadata on the side of institutional repositories and preprint servers would be the most effective to support OA discovery tools, but publishers can also support OA discovery by providing more complete and consistent licensing information when submitting records to Crossref. The role of persistent identifiers (DOIs in particular) should not be underestimated, too: most OA discovery tools match DOIs to OA URLs in their databases, so it is essential that these are assigned to all research outputs that are deemed to be within the scope of OA discovery.

Our review shows that the OA discovery landscape is crowded with exciting and fast-growing solutions. Due to the level of experimentation in the area and the continuous changes in the open access arena, it is hard to tell whether one or more of the tools will prevail. Perhaps, users may coalesce around one or a small number of leading tools in time, but this will depend on a number of factors including performance but also trust and financial sustainability.

The role of Jisc and other players in the area, including universities, libraries and publishers, is clear: end users will need guidance and advice to navigate this complex environment and understand what approach to OA discovery best matches their needs and everyday practices. However, one question remains – how can OA discovery (a largely technical matter) be further integrated in existing workflows to enable a seamless user experience?

About the authors

Andrea Chiarelli (Consultant) and Rob Johnson (Director) work at Research Consulting. Research Consulting is a UK-based consultancy firm active in the higher education landscape and that supports Jisc on a range of projects in scholarly communications, open science, research data and more.