Emergence and growth of open access

Over the last 20 years the publishing of scientific peer-reviewed journal articles has gone through a revolution triggered by the technical possibilities offered by the internet. Firstly, electronic publishing has become the dominant distribution channel for scholarly journals. Secondly, the low cost of setting up new electronic journals has enabled both scholars and publishers to experiment with new business models, where anybody with internet access can read the articles ('open access' or OA) and the required resources to operate journals are collected by other means than charging readers. Similarly, increased availability can be achieved by scientists uploading the prepublication versions of their articles published in subscription journals to OA web repositories such as PubMed Central. The majority of publishers now allow some form of archiving in their copyright agreements with authors, sometimes requiring an embargo period. Major research funders such as the National Institutes of Health (NIH) and the Wellcome Trust have started requiring OA publishing from their grantees either in open access journals (gold OA) or repositories (green OA). A recent study showed that 20.4% of articles published in 2008 were freely available on the web, in 8.5% of the cases directly in journals and in 11.9% in the form of archived copies in some type of repository [1].

In the latter half of the 1990s when journals created by individual scientists were dominating OA publishing, these journals were not considered by most academics a serious alternative to subscription publishing. There were doubts about both the sustainability of the journals and the quality of the peer review. These journals were usually not indexed in the Web of Science, and initially they lacked the prestige that academics need from publishing. Quite often their topics were related to the internet and its possibilities, as exemplified by the Journal of Medical Internet Research, which in 15 years has managed to become a leading journal in its field.

A second wave of OA journals consisted of established subscription journals, mainly owned by societies. These publishers decided to make the electronic version of their journal(s) freely accessible. Such journals are particularly important in certain regions of the world for example, Latin America and Japan, where portals such as Scielo and J-stage host hundreds of journals at no cost to the publishers. One of the earliest journals to make its electronic version OA was BMJ, which since 1998 has made its research articles freely available.

The third wave of OA journals was started by two new publishers, BioMedCentral and Public Library of Science (PLoS). They pioneered the use of article processing charges (APCs) as the central means of financing professional publishing of OA journals. Since 2000 the importance of the APC business model for funding OA publishing has grown rapidly. BioMedCentral was purchased in 2008 by Springer and over the last couple of years almost all leading subscription publishers have started full open access journals funded by APCs. The leading scientific OA journals using the APC model tend to charge between US$2,000 and US$3,000 for publishing but overall the average APC was US$900 in 2010 across all journals charging APCs listed in the Directory of Open Access Journals [2]. In many fields the payment of such charges is a substantial barrier to submissions. In a broad survey of authors who had published in scholarly journals, 39% of respondents who hadn't published in OA journals mentioned problems in funding article-processing fees as a reason [3].

Subscription publishers have also tried an OA option called hybrid journals where authors can pay fees (typically in the range of US$3,000) to have the electronic versions of their articles OA as part of what is otherwise a subscription journal. The uptake for hybrid journals in general has been very limited at about 1% to 2% for the major publishers [4].

Does OA threaten to undermine scientific peer review?

The starting point for this study are the claims made, often by publishers and publishers' organizations, that the proliferation of OA would set in motion changes in the publishing system which would seriously undermine the current peer review system and hence the quality of scientific publishing. Suber has written an excellent overview of this discussion [5]. Lobbying using this argument has in particular been directed against government mandates for OA such as implemented by the NIH for their grantees. It is claimed that the resulting increase in posting of manuscript copies to OA repositories would lead to wide-scale cancellation of subscriptions putting traditional publishers, both commercial and society in jeopardy and in the long run result in an erosion of scientific quality control. This scenario is based on the assumption that the OA publishers would take over an increasing part of the publishing industry and would not provide the same level of rigorous peer review as traditional subscription publishers, which would result in a decline in the quality of scholarly publishing. The NIH have documented that their mandate has not in fact caused any harm to publishers [6].

The critique has in particular been focused on OA publishers that charge authors APCs. Superficially such publishers would seem to be inclined to accept substandard articles since their income is linearly dependent on the number of papers they publish. There have in fact been reports of some APC-funded OA publishers with extremely low quality standards [7]. Reports of such cases in the professional press such as the recent article 'Open access attracts swindlers and idealists' [8] in the Finnish Medical Journal, a journal read by the majority of practicing physicians in Finland, can by the choice of title alone contribute to a negative image of OA publishing. The founding of the Open Access Scholarly Publishers Association, which in particular strives to establish quality standards for OA journals, was in part a reaction by reputable OA publishers to the appearance of such publishers on the market.

One of the questions in the above-mentioned survey of scholarly authors [3], dealt with the 'myths' about open access, including the quality issue. On a Likert scale researchers in general tended to disagree with the statements 'Open access undermines the system of peer review' and 'Open access publishing leads to an increase in the publication of poor quality research' (results reported in Figure 4; [3]). It thus seems that a majority of scholars or at least those who completed this very widely disseminated survey did not share this negative perception of the quality of OA publishing.

Aim of this study

Scientific quality is a difficult concept to quantify. In general terms very rigorous peer review procedures should raise the quality of journals by screening out low quality articles and improving manuscripts via the reviewers' comments. In this respect one could assume that the novel peer review procedures used by certain OA journals such as PLoS ONE should lower the quality. However, such journals essentially leave it to the readers to affirm the quality through metrics such as the number of citations per article. In practice the only proxy for the quality that is generally accepted and widely available across journals are citation statistics. In the choice of title for this article we have hence consciously avoided the term scientific 'quality' and chose to use 'impact' instead, which is closely related to citations such as in the impact factor used in Journal Citation Reports.

It has now been 20 years since the emergence of the first OA journals and 10 years since the launch of the first major OA journals funded by APCs. The number of peer-reviewed articles published in OA journals was already around 190,000 in 2009 and growing at the rate of 30% per annum [9]. Roughly half of the articles are published in journals charging APCs [2]. Enough time has also passed so that the qualitatively better OA journals and in particular journals that have been OA from their inception are now being indexed by major citation indexes such as the Web of Science and Scopus. In the last few years academic search engines such as Google Scholar have also emerged, but the data generated by these automated searches is too unstructured to be used for a study of the citation counts of large numbers of articles or full journals. In contrast both the Journal Citation Reports (JCR), and SCOPUS via the data available on the SCImago portal provide aggregated data in the form of impact factors, which can be used for comparing OA and subscription journals.

This provides empiric data enabling us to ask meaningful questions such as: 'How frequently are articles published in OA journals cited compared to articles in non-OA journals?'. Although the citation level cannot directly be equated to scientific quality, it is widely accepted as a proxy for quality in the academic world, and is the only practical way of getting comprehensive quantitative data concerning the impact of journals and the articles they contain. The aim of this study was thus to compare OA and subscription journals in terms of the average number of citations received both at the journal and article level.

Earlier studies

Over the past 10 years there have been numerous studies reporting that scientific articles that are freely available on the internet are cited more frequently than articles only available to subscribers (for overviews see Swan [10] and Wagner [11]). Most of these studies have been conducted by comparing articles in subscription journals where some authors have made their articles freely available in archives. Gargouri et al. [12] found a clear citation advantage of the same size both for articles where the author's institution mandated OA, and for articles archived voluntary. They also found that the citation advantage was proportionally larger for highly cited articles. Some authors claim that when eliminating factors such as author's selecting their better work for OA dissemination, the advantage, at least concerning citations in Web of Science journals is low or even non-existent. Evans and Reimar using extensive Web of Science data report an overall global effect of 8% more citations, but with a clearly higher level of around 20% for developing countries [13]. Davis, in a randomized trial experiment involving 36 mainly US-based journals, found no citation effect but a positive effect on downloads [14]. His study was however limited to high-impact journals with wide subscription bases.

Assuming that there is some level of citation advantage, this would mean that the articles published in full OA journals would receive an additional citation advantage beyond their intrinsic quality from their availability. In practice it would, however, be very difficult to separate out the effects of these two underlying factors. A share of the articles in subscription journals (approximately 15%) also benefit from the increased citations due to the existence of freely available archival copies as noted for instance by Gargouri et al. [12]. If there was a consensus of the citation advantage for being freely available, it would be possible to correct for this effect. Since the estimates of this factor vary so much across studies, we are hesitant to attempt such a correction.

However, we don't necessarily need to explicitly take this factor into account when assessing the quality level of the global OA journal corpus. If articles in them on average get as many citations as articles in subscription journals, then their overall scientific impact (as measured by getting cited) is also equal. OA is just one of several factors influencing the citation levels of particular journals, others being the prestige of the journals, the interest of the topics of the articles, the quality of the layout for easy reading, timeliness of publication and so on.

Journals that were launched as OA from relatively new publishers such as PLoS or BMC have disadvantages in other respects. They lack the established reputation of publishers that have been in business for decades. The reputation of these journals is also hindered by a large, though shrinking, number of researchers who believe that electronic-only OA journals are somehow inferior to their more established subscription counterparts. In this study we will therefore make no attempt to look separately at the citation effect of OA, due to the complexity of the issue and the lack of a reliable estimate of the effect.

There are a few previous studies that have tried to determine the overall quality of OA journal publishing as compared to traditional subscription publishing. McVeigh studied the characteristics of the 239 OA journals included in the 2003 Journal Citation Reports [15]. Her report contains very illustrative figures showing the positions of these journals in the ranking distribution within their respective scientific disciplines. Overall, OA journals were represented more heavily among the lower-ranking journals, but there were also 14 OA journals in the top 10% in their disciplines. She also mentions that 22,095 articles were published in these OA journals in 2003. In considering the results from this early study it is important to bear in mind the highly skewed regional and age distributions of the journals in question. Only 43% of the OA journals were published in North America or Western Europe, and the vast majority of the journals were old established journals that had recently decided to make their electronic content openly available.

Giglia [16] set out to duplicate the McVeigh study, to the extent possible. Giglia was now able to rely solely on the DOAJ index for info about which journals were OA and identified 385 titles to study, using JCR from 2008 as the starting point. Giglia studied the distribution of titles in different percentiles of rank in their discipline using the same breakdown as McVeigh. All in all the results were not much different from the earlier study. Giglia found that 38% of the 355 OA journals in Science Citation Index and 54% of the 30 OA journals in Social Science Citation Index were in the top half ranks in JCR.

Miguel et al. [17] focused on studying how well represented gold and green OA journals were in citation indexes. They were able to combine DOAJ data with data from the SCOPUS citation database, which covers more journals than JCR, and could also use the average citation counts from the SCImago database. The results highlighted how OA journals have achieved a share of around 15% of all SCOPUS indexed journals for Asia and Africa and a remarkable 73% for Latin America. Of particular interest for this study was that some of the figures in the article showed the average number of citations per document in a 2-year window (calculated over journals) for particular journal categories. Thus the overall average number of citations was around 0.8 for OA journals, 1.6 for subscription journals allowing green posting and 0.8 for subscription journals not allowing green posting. They found highly differentiated average citation levels for nine different broad disciplines. They also found very clear differences in the citation levels between regions, with North American and European OA journals performing at a much higher level than journals from other parts of the world. Both in the disciplinary and regional breakdowns the non-OA journals followed the same patters, so that the relative performance of OA journals to non-OA journals was relatively stable.