Abstract Open access publication rates have been steadily increasing over time. In spite of this growth, academics in low income settings struggle to gain access to the full canon of research literature. While the vast majority of open access repositories and funding organizations with open access policies are based in high income countries, the geographic patterns of open access publication itself are not well characterized. In this study, we developed a computational approach to better understand the topical and geographical landscape of open access publications in the biomedical research literature. Surprisingly, we found a strong negative correlation between country per capita income and the percentage of open access publication. Open access publication rates were particularly high in sub-Saharan Africa, but vastly lower in the Middle East and North Africa, South Asia, and East Asia and the Pacific. These effects persisted when considering papers only bearing authors from within each region and income group. However, papers resulting from international collaborations did have a higher percentage of OA than single-country papers, and inter-regional collaboration increased OA publication for all world regions. There was no clear relationship between the number of open access policies in a region and the percentage of open access publications in that region. To understand the distribution of open access across topics of biomedical research, we examined keywords that were most enriched and depleted in open access papers. Keywords related to genomics, computational biology, animal models, and infectious disease were enriched in open access publications, while keywords related to the environment, nursing, and surgery were depleted in open access publications. This work identifies geographic regions and fields of research that could be priority areas for open access advocacy. The finding that open access publication rates are highest in sub-Saharan Africa and low income countries suggests that factors other than open access policy strongly influence authors’ decisions to make their work openly accessible. The high proportion of OA resulting from international collaborations indicates yet another benefit of collaborative research. Certain applied fields of medical research, notably nursing, surgery, and environmental fields, appear to have a greater proportion of fee-for-access publications, which presumably creates barriers that prevent researchers and practitioners in low income settings from accessing the literature in those fields.

Citation: Iyandemye J, Thomas MP (2019) Low income countries have the highest percentages of open access publication: A systematic computational analysis of the biomedical literature. PLoS ONE 14(7): e0220229. https://doi.org/10.1371/journal.pone.0220229 Editor: Wolfgang Glanzel, KU Leuven, BELGIUM Received: March 9, 2019; Accepted: July 1, 2019; Published: July 29, 2019 Copyright: © 2019 Iyandemye, Thomas. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The underlying data are available on PubMed and in the supplemental tables, and the analyses are available in a public and persistent Github repository at: https://github.com/iyandemye/oa_project. The methods section of the paper and the Github repository available at this URL contain complete instructions to access the data through PubMed and replicate the findings of the study. Funding: This work received an internal research grant from the University of Global Health Equity (ughe.org) to MPT. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Open access (OA) describes materials that are free to access and read online, either through publisher websites or through publication repositories. It seems self-evident that OA publication maximizes the benefits of scientific findings for researchers, funders, and the public [1]. Some OA advocates now argue that all research publications should be openly accessible by default [2] and that access to knowledge stemming from research should be considered a fundamental human right [3]. In keeping with this, many government agencies, private foundations, and universities have concluded that the results of research they support should be openly accessible and have adopted mandates and policies to support OA publication. This has been accompanied by steady growth in OA repositories [4]. The most common routes to OA publication are either “gold” open access, which refers to papers that are made immediately available from the publisher under a creative commons license, or “green” OA papers, which are deposited by authors or publishers in a public repository. In addition, a large fraction of the literature is also made available on publisher’s websites without an explicit OA license. Most funders that have OA policies mandate that authors deposit papers in repositories, thus promoting the green OA publication route, but some have more recently established policies intended to promote gold OA [5]. Although there is evidence that OA policies and compliance efforts have increased OA publication [6], OA policies that promote green OA can place the burden of compliance upon authors, who may misunderstand OA or the policies. In spite of growth in OA publication over time, more than 50% of newly-published research still can only legally be accessed with an institutional license or by paying publishers’ fees [7]. These fees are often too high for institutions and individuals based in low income countries [8,9], which has spurred initiatives to provide greater access to research literature in developing countries. The Research4Life initiative, a public-private partnership involving five United Nations agencies, provides free access to a large number of paywalled journals and books for organizations based in low income countries, notably through the HINARI program that is focused on biomedical research literature [9]. At the same time, services such as Sci-Hub have sprung up to provide access to pirated articles, thus circumventing publishers entirely. A recent study indicated that Sci-Hub provides access to a greater proportion of the published literature than a major US-based research university library [10]. This highlights the reality that the costs and complexities involved with licensing copyrighted research articles make adequate access a challenge, even for well-resourced universities in high income countries. The “green” route of OA has been encouraged by funder policies, but also by an enormous growth in the number of OA repositories, particularly in Europe and North America. The number of OA repositories based in Africa lags far behind other parts of the world. According to the registry of open access repositories (ROAR), less than 4% of the total number of such repositories worldwide in 2018 were based in Africa [11]. Similarly, the vast majority of funding organizations with OA policies as of 2018 were based in Europe and North America, with less than 3% of total OA policies originating from organizations based in Africa [12]. It is generally believed that open access tracks with development [13] and that the Western world leads the OA movement due to technology and a more supportive publishing environment [14]. It has been speculated that publication fees, which are more common for OA papers, could have an inhibitory effect on OA publication by authors from low income countries [8,15], though such an inhibitory effect might be offset by fee waivers frequently granted to authors from these countries. Based on this confluence of factors, the prediction that OA publication rates are lower in low income countries than other countries seems very reasonable. There is an increasing amount of evidence for a variety of benefits of OA publication, including economic, social, and academic benefits [15]. In academic circles, article citations are generally conflated with article importance, so one proxy for the impact of OA itself is the number of citations of OA articles compared to citations of similar non-OA articles. A majority of studies have identified a citation advantage for OA publications [14], and the OA citation advantage is true of international collaborations that focus on poverty-related diseases [16], though this advantage does vary by field of research and remains controversial [7]. OA articles are more likely to be accessed and downloaded than non-OA articles [17]. Some have speculated that OA publication will improve the consumption of scientific literature, but not the production of scientific research in developing countries [18], however, there is some evidence that providing free access to journal articles increases publication output of researchers based in low income countries [19]. The interaction between OA publication and international collaboration has not been studied extensively, but it is well established that the percentage of publications stemming from international collaborations is steadily increasing over time [20]. The research output of low income countries, particularly countries in sub-Saharan Africa, is also increasing [21]. In some countries in sub-Saharan Africa, papers resulting from international collaborations account for a large proportion of total research output [22], so it is important to account for the effects of international collaboration on the research output of these countries. It is well established that international collaborations tend to have a positive impact on the number of citations of papers [21,23,24]. The biomedical sciences have a higher percentage of OA publication than other fields of inquiry [7,25], presumably due to funders’ mandates and the utilitarian value of providing free access to biomedical research findings [18], but aside from research focused on specific diseases [16], we are not aware of studies that have evaluated OA across all different fields of study within the biomedical sciences. It seems likely that OA publication rates are not uniformly distributed across fields within biology and medicine, but the nature of this distribution is not known. Similarly, although the geographic distributions of OA repositories and policies are well documented, it is less clear how OA publication itself is globally distributed. In this study, we set out to determine the geographic distribution and topical distribution of OA publication in biomedical research indexed in PubMed.

Methods We used PubMed to identify a set of papers that matched specified search criteria. MEDLINE indexes journals that cover a broad array of topics, so we limited our search to papers in the biological sciences and medicine using MeSH terms and MeSH headings. The exact search term was: (“2015/01/01”[Date—Publication]: “2015/12/31"[Date—Publication]) AND (((Health Care category [mh]) OR (Psychiatry and Psychology category [mh])) OR ("Education"[MeSH Term]) OR ("Biological Science Disciplines"[MeSH Term])). These search criteria were designed to return a large body of literature, but restrict results to work in the biomedical sciences or medical education and exclude work in related fields, such as physics, mathematics, and the humanities (all of which also have MeSH terms). MeSH headings are hierarchical, and PubMed returns all papers that are below a given term in the hierarchy by default [26], so this search returned a very large volume of papers, comprising approximately 63% of all MEDLINE-indexed papers for 2015. We downloaded all of the PubMed IDs returned, then used the Entrez e-utilities to extract MeSH terms, digital object identifier (DOI), and affiliation metadata for each paper. We used Unpaywall [7] to identify the OA status of each paper, using the DOI to identify the paper. Unpaywall comprehensively tracks OA publications by compiling open access status from a wide variety of resources, institutional repositories, and databases. The affiliation strings were split into substrings using regular expressions, and we used a two-tiered approach to identify the country named in the substrings. We first identified countries of affiliation by their names, abbreviations, or major cities named in the affiliations. If this failed to yield a result, we submitted the affiliation substring to the google maps geocoding API [27]. For analysis of world economies and world regions, we used World Bank data from 2015 [28]. For analysis of enriched and depleted MeSH terms, we split terms into individual words, and tabulated all instances of each word. Word frequencies were normalized to total word counts, and words that were more than 33.33% enriched or depleted in OA papers relative to non-OA papers were considered. Words that appeared more than 4,000 times in the full set of words extracted from MeSH terms were considered for this portion of the analysis. All of the data extraction and processing was done with Python and the code is openly accessible on Github at https://github.com/iyandemye/oa_project.

Discussion This work highlights unappreciated complexities in the geography of OA publication. The percentage of OA publication was highest in low income countries and particularly in sub-Saharan Africa, which has few OA policies and repositories, suggesting that factors in addition to OA policies play a major role in authors’ decisions to publish OA papers. We also observed a consistent effect of international and inter-regional collaboration: papers with authors based in multiple countries or regions had a substantially higher percentage of OA publication than their single-country and single-region counterparts. We hypothesize that authors based in low income countries who routinely struggle to gain access to pay-to-view academic literature are motivated to make their work freely available to other researchers. However, there are other factors at play. First, research in low income countries may disproportionately come from fields that are over-represented in the OA literature and supported by funders with OA policies (such as research on infectious diseases like HIV and malaria). In support of this, other investigators have documented that research into poverty-related diseases has a very high percentage of open access publication [16], and in this study, MeSH terms related to these conditions are enriched in OA papers. Second, OA publication fee waivers offered to authors in low income countries are likely to encourage greater rates of OA publication by these authors. Finally, international collaboration certainly influences OA publication rates, though our own results indicate that collaboration alone cannot explain the high percentage of OA in sub-Saharan Africa and low income countries. It is likely that a combination of these factors influences authors’ decisions to publish OA papers. Subsequent studies could investigate the factors that impact OA publication decisions in low income countries. The very low proportion of OA publication in certain areas—particularly the Middle East and North Africa and South Asia, points to geographic regions where additional work could be done to increase OA publication. Both of these regions have comparatively few institutions and funders with OA policies, but other factors may be involved in the low rates of OA publication. Notably, many OA journals only offer full fee waivers for low income countries; it is possible that partial fee waivers are insufficient to incentivize authors from middle income countries to submit their work to OA journals. The moderate percentage of OA publication in Europe and North America could be an indication of the success of OA policies that have been put in place by funders from these regions in recent years. Finally, the strong association between international collaboration and OA publication observed in this study indicates yet another benefit to collaborative research, in addition to other benefits that have been documented in previous work [21,23,24]. Our topical analysis also points to important trends in OA publication. It is concerning that keywords related to nursing, environmental health, and surgery are under-represented in the OA literature. This suggests that publications in these fields are more likely to require a fee for access than publications in other biomedical fields. Low income countries may have lax environmental regulations and high burdens of certain environmental contaminants [30], there is a large burden of untreated surgical disease in low income countries [31], and task-shifting increases the importance of nursing care in low income settings [32]. In other words, some of the fields of study that are most applicable and actionable in low income countries have a body of literature that is the least accessible to practitioners and researchers in these countries. There are limitations to the current study. Not all journals report affiliations in PubMed, and affiliations are not formatted consistently between different journals. Moreover, by focusing on biomedical literature indexed in PubMed and tagged with MeSH terms, this work does not represent all published literature in 2015, rather it represents a subset of the literature that meets MEDLINE editorial standards [33,34]. To be returned by the search, papers had to be tagged with MeSH terms (therefore they had to be indexed by MEDLINE). This affords certain advantages: it allows for topical analysis of OA publication, and MEDLINE editorial standards substantially reduce the chance that so-called “predatory” journals will be indexed [35]. In addition to the relatively strict MEDLINE editorial criteria which undoubtedly bias PubMed results, it is well documented that different publication databases have variable representation of literature in different fields or from different countries [36]. Previous work has indicated that a large percentage of authors who publish in “predatory” journals are from Asia and Africa, so publications from these regions are likely to be under-represented in the current study [37]. In spite of these limitations, PubMed is a preferred resource for many biomedical researchers, and the results of this study are indicative of the accessibility of material returned by a PubMed search. Moreover, the results of this study are corroborated by another analysis, conducted using the Web of Science, in which the percentage OA was examined for selected countries in different world regions [38]. This suggests that the trends we observe in this paper may extend beyond the biomedical sciences and the PubMed database, but further research is needed to clarify this point. The Unpaywall database, which was used to determine OA status, is comprehensive, but there is no single definitive index of all OA publications, so it is also possible that some country or topic-level bias was introduced at the step of identifying OA status of each article. By working with publicly accessible data and making the computer code for these analyses freely available on Github, we hope to help others to replicate and build upon the work we have done here.

Supporting information S1 Table. Total number of papers identified, and percentages OA, for all countries identified in the analysis. https://doi.org/10.1371/journal.pone.0220229.s001 (CSV) S2 Table. Full MeSH term under- and over-representation in OA papers. Only terms with 500 or more counts in the full dataset were considered, and those that were 33.33% more or less enriched are presented here. https://doi.org/10.1371/journal.pone.0220229.s002 (CSV)

Acknowledgments This work was supported by a research grant from the University of Global Health Equity. We would like to thank Dr. Abebe Bekele and other members of the staff and faculty at the University for their support of this work.