The volume of academic research articles is increasing exponentially. However, the ease with which we are able to find these articles depends on the capabilities of the search systems that we use. These systems (bibliographic databases like Scopus and academic search engines like Google Scholar) act as important gatekeepers between authors and readers. A recent study found that many of these systems are difficult to use, non-transparent and do not adhere to scientific standards. As a result, researchers find fewer relevant records, searching takes longer, or does not have the necessary scientific rigour. In this post, Neal Haddaway and Michael Gusenbauer argue that to address these issues academic searching needs to adopt the principles of FAIR (Findable, Accessible, Interoperable, Reusable), and be radically overhauled.

Finding the right information for your research has always been difficult. Before computers, academic search consisted of scanning abstract card catalogues in libraries. This was time-consuming and impossible to sustain as publication rates increased. The advent of the internet allowed researchers to perform targeted searches in specialised digital bibliographic databases that use ‘exact-match models’ for finding relevant terms, but subscriptions to these services are hugely expensive and users require knowledge of search syntax. In other words, you would need to know in advance exactly what you are searching for. In recent years, researchers have largely switched to ‘intelligent’ search systems, like Google Scholar, that are free, highly intuitive, and use semantics, algorithms, or artificial intelligence to suggest the most relevant research based on what the software ‘thinks’ is most likely to be relevant.

However, the two most popular forms of academic search, web-based search engines and commercial bibliographic databases, both present flaws, for this reason we believe the whole system of academic search is in need of drastic overhaul.

Firstly, search engines like Google Scholar are increasingly the first choice for research discovery, because they offer easy access to a large stock of literature for day-to-day searches. However, what is often forgotten, is that searches on Google Scholar are neither reproducible, nor transparent. Repeated searches often retrieve different results and users cannot specify detailed search queries, leaving it to the system to interpret what the user wants. As the matching and ranking algorithms of semantic or AI-based search engines are often unknown – even to the providers themselves – these systems do not allow comprehensive searching.

By developing academic search systems in this way, we can futureproof research discovery against increasingly appreciated limitations, like bias and lack of comprehensiveness, and make it an equitable and FAIR practice.

These issues are perhaps less significant in day-to-day searches, where we want to locate a particular research paper efficiently. However, systematic reviews in particular need to use rigorous, scientific methods in their quest for research evidence. Searches for articles must be as objective, reproducible and transparent as possible. With systems like Google Scholar, searches are not reproducible – a central tenet of the scientific method. Furthermore, it is virtually impossible to export results for documentation in systematic reviews/maps and if you try to manually download search results in bulk (as would be needed in a systematic review), your IP address is likely to be locked after a short time: an effort to stop ‘bots’ from reverse engineering the Google Scholar algorithm and the information in their databases.

Secondly, commercial bibliographic databases and platforms, like Web of Science and Scopus, might seem powerful and efficient, but they also have limitations that make accessing articles for research, such as rigorous evidence synthesis, highly challenging, not to mention frustrating. For example, most of these databases restrict users in downloading records. Some only allow a maximum of 500 citations at a time. Information retrieval specialists typically have to export tens of thousands of search results within a systematic review or systematic map for the purposes of transparency and rigour: doing so takes days because of these restrictions and introduces bias and error to the retrieval process. In addition, the costs of these paywalled resources are restrictively high, prohibitively so for researchers working in resource-constrained contexts like low- and middle- income countries and small organisations. As a result, even though research articles are increasingly being published Open Access, researchers cannot easily identify them because of the lack of access to these search systems.

In a recent study we investigated the specific capabilities and limitations of 28 popular search systems, showing that many bibliographic resources are not fit-for-purpose in systematic reviews. In one way or another, all of the systems have limitations in how users can combine keywords into search strings, or interact with the search results. Because of these restrictions they are less suitable for academic searches and load the burden on users to be aware of these limitations to search most effectively. This is especially problematic, as academics typically have their go-to search system – in many cases Google Scholar – and use it for all kinds of searches without knowing that their search is highly biased, non-transparent and non-repeatable.

The problem of inadequate search capabilities is getting more relevant: on the one hand more and more so-called ‘semantic’ or ‘intelligent’ search systems, like Microsoft Academic or Semantic Scholar, are being developed. On the other hand, researchers increasingly need to search systematically (i.e. in a repeatable and transparent manner) – the stock of systematic reviews doubled only within the last four years. This is not surprising, as researchers need these reviews to stay up to date in their field and to get insights on a specific topic based on a systematic synthesis of evidence across contexts.

Despite the limitations of current search systems, we see promise in the increasingly dynamic and diverse search system landscape. New solutions are regularly appearing, like Lens.org, that aim to improve how we discover research. Now, we must direct these technical efforts to respect scientific standards that improve accessibility of research findings. Specifically, we believe there is a very real need to drastically overhaul how we discover research, driven by the same ethos as in the Open Science movement. The FAIR data principles offer an excellent set of criteria that search system providers can adapt to make their search systems more adequate for scientific search, not just for systematic searching, but also in day-to-day research discovery:

Findable: Databases should be transparent in how search queries are interpreted and in the way they select and rank relevant records. With this transparency researchers should be able choose fit-for-purpose databases clearly based on their merits.

Databases should be transparent in how search queries are interpreted and in the way they select and rank relevant records. With this transparency researchers should be able choose fit-for-purpose databases clearly based on their merits. Accessible: Databases should be free-to-use for research discovery (detailed analysis or visualisation could require payment). This way researchers can access all knowledge available via search.

Databases should be free-to-use for research discovery (detailed analysis or visualisation could require payment). This way researchers can access all knowledge available via search. Interoperable: Search results should be readily exportable in bulk for integration into evidence synthesis and citation network analysis (similar to the concept of ‘research weaving’ proposed by Shinichi Nakagawa and colleagues). Standardised export formats help analysis across databases.

Search results should be readily exportable in bulk for integration into evidence synthesis and citation network analysis (similar to the concept of ‘research weaving’ proposed by Shinichi Nakagawa and colleagues). Standardised export formats help analysis across databases. Reusable: Citation information (including abstracts) should not be restricted by copyright to permit reuse/publication of summaries/text analysis etc.

By developing academic search systems in this way, we can futureproof research discovery against increasingly appreciated limitations, like bias and lack of comprehensiveness, and make it an equitable and FAIR practice. In addition, we need to educate users to be able to decide which systems fit their search needs, so they use the best systems, in the best way. In this regard, we want to use our research to make the search system landscape more transparent. We hope to raise awareness among academics to be more attentive, and search system providers to elevate their quality to the necessary standard in science – for better search and better results.

Note: This article gives the views of the author, and not the position of the LSE Impact Blog, nor of the London School of Economics. Please review our comments policy if you have any concerns on posting a comment below.

Image credit: People in hedge maze, via Good Free Photos, (Public Domain)