In this article, we summarize the ebb and flow of the debate around the various and complex aspects of conventional (editorially-controlled) peer review. In particular, we highlight how innovative systems are attempting to resolve the major issues associated with traditional models, explore how new platforms could improve the process in the future, and consider what this means for the identity, role, and purpose of peer review within diverse research communities. The aim of this discussion is not to undermine any specific model of peer review in a quest for systemic upheaval, or to advocate any particular alternative model. Rather, we acknowledge that the idea of peer review is critical for research and advancing our knowledge, and as such we provide a foundation here for future exploration and creativity in diversifying and improving an essential component of scholarly communication.

In spite of all of these criticisms, it remains clear that the ideal of peer review still plays a fundamental role in scholarly communication ( Goodman et al. , 1994 ; Pierie et al. , 1996 ; Ware, 2008 ) and retains a high level of respect from the research community ( Bedeian, 2003 ; Greaves et al. , 2006 ; Gibson et al. , 2008 ). One primary reason why peer review has persisted is that it remains a unique way of assigning credit and differentiating research publications from other types of literature, including blogs, media articles, and books. This perception, combined with a general lack of awareness or appreciation of the historic context of peer review, research examining its potential flaws, and the conflation of the process with the ideology, has sustained its ubiquitous usage and continued proliferation in academia. This has led to the widely-held perception that peer review is a singular and static process, and to its acceptance as a social norm. It is difficult to move away from a process that has now become so deeply embedded within oligarchic research institutes. The consequence of this is that, irrespective of any systemic flaws, peer review remains one of the essential pillars of trust when it comes to scientific communication ( Haider & Åström, 2017 ).

On top of all of these potential issues, some critics go even further in stating that, at its worst, peer review can be seen as detrimental to research. By operating as a closed system, it protects the status quo and suppresses research viewed as radical, innovative, or contrary to the theoretical perspectives of referees ( Alvesson & Sandberg, 2014 ; Benda & Engels, 2011 ; Horrobin, 1990 ; Mahoney, 1977 ; Merton, 1968 ), even though it is precisely these factors that underpin and advance research. As a consequence, questions to the competency and integrity of traditional peer review arise, such as: who are the gatekeepers and how are their gates constructed; what is the balance between author-reviewer-editor tensions; what are the inherent biases associated with this; does this enable a fair or structurally inclined system of peer review to exist; and what are the repercussions for this on our knowledge generation and communication systems?

Such a discrepancy between a dynamic history and remembered consistency could be a consequence of peer review processes being central to both scholarly identity as a whole and to the identity and boundaries of specific communities ( Moore et al. , 2017 ). Indeed, this story linking identity to peer review is taught to junior researchers as a community norm, often without the much-needed historical context. More work on how peer review, alongside other community practices, contributes to community building and sustainability would be valuable. Examining criticisms of conventional peer review and proposals for change through the lens of community formation and identity may be a productive avenue for future research.

As mentioned above, there is an increasing quantity and quality of research that examines how publication processes, selection, and peer review evolved from the 17th to the early 20th century, and how this relates to broader social patterns ( Baldwin, 2017a ; Baldwin, 2017b ; Moxham & Fyfe, 2016 ). However, there is much less research critically exploring the diversity of selection and peer review processes in the mid- to late-20th century. Indeed, there seems to be a remarkable discrepancy between the historical work we do have ( Baldwin, 2017a ; Gupta, 2016 ; Shuttleworth & Charnley, 2016 ) and apparent community views that “we have always done it this way,” alongside what sometimes feels like a wilful effort to ignore the current diversity of practice.

In many cases, there is an attempt to link the goals of peer review processes with Mertonian norms ( Lee et al. , 2013 ; Merton, 1973 ) (i.e., universalism, communalism, disinterestedness, and organized skepticism) as a way of showing their relation to shared community values. The Mertonian norm of organized scepticism is the most obvious link, while the norm of disinterestedness can be linked to efforts to reduce systemic bias, and the norm of communalism to the expectation of contribution to peer review as part of community membership (i.e., duty). In contrast to the emphasis on supposedly shared social values, relatively little attention has been paid to the diversity of processes of peer review across journals, disciplines, and time. This is especially the case as the (scientific) scholarly community appears overall to have a strong investment in a “creation myth” that links the beginning of scholarly publishing—the founding of The Philosophical Transactions of the Royal Society —to the invention of peer review. The two are often regarded to be coupled by necessity, largely ignoring the complex and interwoven history of peer review and publishing. This has consequences, as the individual identity as a scholar is strongly tied to specific forms of publication that are evaluated in particular ways ( Moore et al. , 2017 ). A scholar’s first research article, PhD thesis, or first book are significant life events. Membership of a community, therefore, is validated by the peers who review this newly contributed work. Community investment in the idea that these processes have “always been followed” appears very strong, but ultimately remains a fallacy.

Due to the increasingly systematic use of external peer review, its processes have become entwined with the core activities of scholarly communication. Without approval through peer review to assess importance, validity, and journal suitability, research articles will not be sent to print. The historical motivation for selecting amongst submitted articles or distribution was primarily economic. With scholarly publishing turning into an essentially loss-making business, the costs of printing and paper needed to be limited ( Fyfe, 2015 ). The rising number of submissions, particularly in the 20th century, required distributing the management of this selection process. While in the digital world the costs of dissemination have dropped, the marginal cost of publishing articles is far from zero (e.g., due to time and management, hosting, marketing, technical and ethical checks, among other services). The economic motivations for still imposing selectivity in a digital environment, and applying peer review as a mechanism for this, have received limited attention or questioning, and is often regarded as just how things are done. Selectivity is now often attributed to quality control, but is based on the false assumption that peer review requires careful selection of specific reviewers to assure a definitive level of adequate quality, termed the “Fallacy of Misplaced Focus” by Kelty et al. (2008) .

More recently, there has been a new wave of innovation in peer review, which we term “the revolution” phase ( Figure 2 ; note that this is a non-exhaustive overview of the peer review landscape). The pace of this is accelerating rapidly, with the majority of changes occurring in the last five to ten years. This could be related to initiatives such as the San Francisco Declaration on Research Assessment ( ascb.org/dora/ ; DORA), that called for systemic changes in the way that scientific research outputs are evaluated. Digital-born journals, such as PLOS ONE , introduced commenting on published papers. This spurred developments in cross-publisher annotation platforms like PubPeer and PaperHive . Some journals, such as F1000 Research and The Winnower , rely exclusively on a model where peer review is conducted after the manuscripts are made publicly available. Other services, such as Publons , enable reviewers to claim recognition for their activities as referees. Platforms such as ScienceOpen provide a search engine combined with peer review across publishers on all documents, regardless of whether manuscripts have been previously reviewed. Each of these innovations has partial parallels to other social Web applications or platforms in terms of transparency, reputation, performance assessment, and community engagement. It remains to be seen whether these innovations and new models of evaluation will become more popular than traditional peer review.

1.1.3 The peer review revolution. In the last several decades, there have been substantial efforts to decouple peer review from the publishing process ( Figure 2 ; Schmidt & Görögh (2017) ). This has typically been done either by adopting peer review as an overlay process on top of formally published research articles, or by pursuing a “publish first, filter later” protocol, with peer review taking place after the initial publication of research results ( McKiernan et al. , 2016 ; Moed, 2007 ). Here, the meaning of “publication” becomes “making public”, as opposed to the traditional sense where it also implies peer reviewed. In fields such as Physics and Mathematics, it has traditionally been commonplace for authors to send their colleagues either paper or electronic copies of their manuscripts for pre-submission evaluation. Launched in 1991, arXiv ( arxiv.org ) formalized this process by creating a central network for whole communities to access such e-prints. Today, arXiv has more than one million e-prints from various research fields and receives more than 8,000 monthly submissions ( arXiv, 2017 ). Here, e-prints or pre-prints are not formally peer reviewed prior to publication, but still undergo a certain degree of moderation in order to filter out non-scientific content. This practice represents a significant shift, as public dissemination was decoupled from a traditional peer review process, resulting in increased visibility and citation rates ( Davis & Fromerth, 2007 ; Moed, 2007 ). The launch of Open Journal Systems ( openjournalsystems.com ; OJS) in 2001 offered a step towards bringing journals and peer review back to their community-led roots. As of 2015, the OJS platform provided the technical infrastructure and editorial and peer review workflow management support to more than 10,000 journals ( Public Knowledge Project, 2016 ). Its exceptionally low cost was perhaps responsible for around half of these journals appearing in the developing world ( Edgar & Willinsky, 2010 ).

This editor-led process of peer review became increasingly important in the post-World War II decades, due to the development of a modern academic prestige economy based on the perception of quality or excellence and symbolism surrounding journal-based publications ( Baldwin, 2017a ; Fyfe et al. , 2017 ). The increasing professionalism of academies enabled commercial publishers to use peer review as a way of legitimizing their journals ( Baldwin, 2015 ; Fyfe et al. , 2017 ), and capitalized on the traditional perception of peer review as voluntary duty by academics to provide these services. A consequence of this was that peer review became a more homogenized process that enabled private publishing companies to establish a dominant, oligarchic marketplace position ( Larivière et al. , 2015 ). This represented a shift from peer review as a more synergistic activity between academics, to commercial entities selling it as an added value service back to the same academic community who was performing it freely for them. The estimated cost of peer review is a minimum of $1.9bn USD per year (in 2008; ( Research Information Network, 2008 )), representing a substantial vested financial interest in maintaining the current process of peer review ( Smith, 2010 ). This figure does not even include the time spent by typically unpaid reviewers, or account for overhead costs in publisher management or the wasteful redundancy of the reject-resubmit cycle authors enter when chasing journal prestige ( Jubb, 2016 ). The result of this is that peer review has now become enormously complicated. By allowing the process of peer review to become managed by a hyper-competitive industry, developments in scholarly publishing have become strongly coupled to the transforming nature of academic research institutes. These have evolved into internationally competitive businesses that strive for quality through publisher-mediated journals by attempting to align these products with the academic ideal of research excellence ( Moore et al. , 2017 ). Such a consequence is plausibly related to, or even a consequence of, broader shifts towards a more competitive neoliberal academia and society at large. Here, emphasis is largely placed on production and standing, value, or utility ( Gupta, 2016 ), as opposed to the original primary focus of research on discovery and novel results.

1.1.2 Adaptation through commercialisation. Through time, the diversity, quantity, and specialization of the material presented to journal editors increased. This made it necessary to seek assistance outside the immediate group of knowledgeable reviewers from the journals’ sponsoring societies ( Burnham, 1990 ). Peer review evolved to become a largely outsourced process, which still persists in modern scholarly publishing today, where publishers call upon external specialists to validate journal submissions. The current system of peer review only became more widespread in the mid 20th century (and in some disciplines, the late 20th century or early 21st; see Graf, 2014 , for an example of a major philological journal which began systematic peer review in 2011). Nature , now considered a top journal, did not implement such a formal peer review process until 1967 ( nature.com/nature/history/timeline_1960s.html ).

From these early developments, the process of independent review of scientific reports by acknowledged experts gradually emerged. However, the review process was more similar to non-scholarly publishing, as the editors were the only ones to appraise manuscripts before printing ( Burnham, 1990 ). As early as 1731, the Royal Society of Edinburgh adopted a formal peer review process in which materials submitted for publication in Medical Essays and Observations were vetted and evaluated by additional knowledgeable members ( Kronick, 1990 ; Spier, 2002 ). In 1752, the United Kingdom’s Royal Society created a “Committee on Papers” to review and select texts for publication in Philosophical Transactions ( Fitzpatrick, 2011b , Chapter One). The primary purpose of this process was to select information for publication to account for the limited distribution capacity, and remained the authoritative purpose of peer review for more than two centuries.

1.1.1 The early history of peer review. The origins of scholarly peer review of research articles are commonly associated with the formation of national academies in 17th-century Europe, although some have found foreshadowing of the practice ( Al-Rahawi, c900 ; Spier, 2002 ). We call this period the primordial time of peer review ( Figure 1 ). Biagioli (2002) described in detail the gradual differentiation of peer review from book censorship, and the role that state licensing and censorship systems played in 16th-century Europe; a period when monographs were the primary mode of communication. Several years after the Royal Society of London (1660) was established, it created its own in-house journal, Philosophical Transactions ; around the same time, Denis de Sallo published the first issue of Journal des Sçavans . Both of these journals were first published in 1665. In London, Henry Oldenburg was appointed Secretary to the Royal Society and became the founding editor of Philosophical Transactions . Here, he took on the role of gathering, reporting, critiquing, and editing the work of others, as well as initiating the process of peer review as it is now commonly performed ( Manten, 1980 ; Oldenburg, 1665 ). Due to this origin, peer review emerged as part of the social practices of gentlemanly learned societies. These social practices also included organizing meetings and arranging the publications of society members, while being responsible for editorial curation, financial protection, and the assignment of individual prestige ( Moxham & Fyfe, 2016 ). The development of these prototypical scientific journals gradually replaced the exchange of experimental reports and findings through correspondence, formalizing a process that had been essentially personal and informal until then. “Peer review”, during this time, was more of a civil, collegial discussion in the form of letters between authors and the publication editors ( Baldwin, 2017b ). Social pressures of generating new audiences for research, as well as new technological developments such as the steam-powered press, were also crucial. The purpose of developing peer reviewed journals became part of a process to deliver research to both generalist and specialist audiences, and improve the status of societies and fulfil their scholarly missions ( Shuttleworth & Charnley, 2016 ).

Any discussion on innovations in peer review must take into account its historical context. By understanding the history of scholarly publishing and the interwoven evolution of peer review, we recognize that neither are static entities, but in fact covary with each other, and therefore should be treated as such. By learning from historical experiences, we can also become more aware of how to shape future directions of peer review evolution and gain insight to what the process should look like in an optimal world. The actual term “peer review” only appears in the scientific press in the 1960s. Even in the 1970s, it was associated with grant review and not with evaluation and selection for publishing ( Baldwin, 2017a ). However, the history of evaluation and selection processes for publication clearly predates the 1970s.

The goal of this article is to investigate the historical evolution in the theory and application of peer review in a socio-technological context. We use this as the basis to consider how specific traits of consumer social Web platforms can be combined to create an optimized hybrid peer review model that is more efficient, democratic, and accountable than the traditional process.

Traditionally, the function of peer review has been as a vetting procedure or gatekeeper to assist the distribution of limited resources—for instance, space in peer reviewed print publication venues, research time at specialized research facilities, or competitive research funds. Nowadays, it is also used to assess whether and how a given piece of research fits into the overall body of existing scholarly knowledge, and which journal it is suitable for and should appear in. This has consequences for whether the body of published research produced by an individual merits consideration for a more advanced position within academic or industrial research. With the advent of the Internet, the physical constraints on distribution are no longer present, and, at least in theory, we are now able to disseminate research content rapidly and at relatively negligible cost ( Moore et al. , 2017 ). This has led to the increasing popularity of digital-only publication venues that vet submissions based on the soundness of the research (e.g., PLOS , PeerJ ). Such a flexibility in the filter function of peer review reduces, but does not eliminate, the role of peer review as a selective gatekeeper. Due to such innovations, ongoing discussions about peer review are intimately linked with contemporaneous developments in Open Access (OA) publishing and to broader changes in open research ( Tennant et al. , 2016 ).

Peer review is the process in which experts are invited to assess the quality, novelty, validity, and potential impact of research by others, typically while it is in the form of a manuscript for an article, conference, or book ( Spier, 2002 ). For the purposes of this article, we are exclusively addressing peer review in the context of manuscripts for research articles, unless specifically indicated; different forms of peer review are used in other contexts such as hiring, promotion, tenure, or awarding research grants (see, e.g., Fitzpatrick, 2011b , p. 16). Peer review comes in various flavors that result from different approaches to the relative timing of the review (with respect to article drafting, submission, or publication) and the transparency of the process (what is known to whom about submissions, authors, reviewers and reviews) ( Ross-Hellauer, 2017 ). The criteria used for evaluation, including methodological soundness or expected impact are also important variables to consider. In spite of the diversity of the process, it is generally perceived as the gold standard that defines scholarly publishing by researchers and the wider public alike, and often deemed the primary determinant of scientific, theoretical, and empirical validity ( Kronick, 1990 ). Consequently, peer review is a vital component at the core of research communication processes, with repercussions for the very structure of academia, which largely operates through a peer reviewed publication-based reward and incentive system ( Moore et al. , 2017 ). However, peer review is applied inconsistently both in theory and practice ( Pontille & Torny, 2015 ), and generally lacks any form of transparency or formal standardization. As such, it remains difficult to know what we actually mean when we identify something as a “peer reviewed publication.”

Coupled with the demise of services such as Axios Review , the generally low uptake of decoupled peer review processes suggests the overall reluctance of many research communities to adapt outside of the traditional coupled model. In this section, we have discussed a range of different arguments, variably successful platforms, and surveys and reports about peer review. Taken together, these reveal an incredible amount of friction to experimenting with peer review beyond that which is typically and incorrectly viewed as the only way of doing it. This reluctance is emphasized in recent surveys, for instance the one by Ross-Hellauer (2017) suggests that while attitudes towards the principles of OPR are rapidly becoming more positive, faith in its execution is not. We can perhaps expect this divergence due to the rapid pace of innovation, which has not led to rigorous or longitudinal evidence that these models are superior to the traditional process at either a population or system-wide level. Cultural or social inertia, then, is defined by this cycle between low uptake and limited incentives and evidence. Perhaps more important is the general under-appreciation of this intimate relationship between social and technological barriers, that is undoubtedly required to overcome this cycle. The proliferation of social media over the last decade provides excellent examples of how digital communities can leverage new technologies for great effect.

While several new overlay journals are currently thriving, the history of their success is invariably limited, and most journals that experimented with the model returned to their traditional coupled roots ( Priem & Hemminger, 2012 ). Axios Review was closed down in early 2017 due to a lack of uptake from researchers, with the founder stating: “I blame the lack of uptake on a deep inertia in the researcher community in adopting new workflows” ( Davis, 2017 ). Finally, it is probably worth mentioning that not a single overlay journal appears to have emerged outside of physics and math ( Priem & Hemminger, 2012 ). This is despite the fast growth of arXiv spin-offs like biorXiv , and potential layered peer review through services such as ScienceOpen or the recently launched Peer Community In ( peercommunityin.org ).

2.5.4 Limitations of decoupled peer review. Despite a general appeal for post-publication peer review and considerable innovation in this field, the appetite among researchers is limited, reflecting an overall lack of engagement with the process (e.g., Nature (2010) ). As recently as 2012, it was reported that relatively few platforms allowed users to evaluate manuscripts post-publication ( Yarkoni, 2012 ). Even platforms such as PLOS have a restricted scope and limited user base: analysis of publicly available usage statistics indicate that at the time of writing, PLOS articles have each received an average of 0.06 ratings and 0.15 comments (see also Ware (2011) ). Part of this may be due to how post-publication peer review is perceived culturally, with the name itself being anathema and considered an oxymoron, as most researchers usually consider a published article to be one that has already undergone formal peer review. At the present, it is clear that while there are numerous platforms providing decoupled peer review services, these are largely non-interoperable. The result of this, especially for post-publication services, is that most evaluations are difficult to discover, lost, or rarely available in an appropriate context or platform for re-use. To date, it seems that little effort has been focused on aggregating the content of these services, which hinders its recognition as a valuable community process and for additional evaluation or assessment decisions.

Endorsements and recommendations are a form of peer review that can facilitate re-use of published works. This has been most evident in the Open Educational Resources (OER) movement, in which peer review and testimonials on Open Education repositories, such as Merlot , form a way to filter the many resources available. Peer review, including recommendations, has been effectively utilized in the creation and sharing of Open Textbooks. Petrides et al. (2011) and Harley et al. (2010) found that proof of peer review by trusted experts was a significant factor leading to adoption of textbooks by instructors who expressed concern about the quality of a free textbook. Some OER reviewers are even paid for their reviews ( Open Access Textbook Task Force, 2010 ), while other reviews are done by volunteer editors and the users of the resources. ( info.merlot.org/merlothelp/merlot_peer_review_information.htm ).

2.5.3 Peer Review by Endorsement. A relatively new mode of named pre-publication review is that of pre-arranged and invited review, originally proposed as author-guided peer review ( Perakakis et al. , 2010 ), which ScienceOpen terms Peer Review by Endorsement (PRE) ( about.scienceopen.com/peerreview-by-endorsement-pre/ ). This has also been implemented at RIO , and is functionally similar to the Contributed Submissions of PNAS ( pnas.org/site/authors/editorialpolicies.xhtml#contributed ). This model requires an author to solicit reviews from their peers prior to submission in order to assess the suitability of a manuscript for publication. While some might see this as a potential bias, it is worth bearing in mind that many journals already ask authors who they want to review their papers, or who they should exclude. To avoid potential pre-submission bias, reviewer identities and their endorsements are made publicly available alongside manuscripts, which also removes any possible deleterious editorial criteria from inhibiting the publication of research. Also, PRE is much cheaper, legitimate, unbiased, faster, and more efficient alternative to the traditional publisher-mediated method. In theory, depending on the state of the manuscript, this means that submissions can be published much more rapidly, as less processing is required. PRE also has the potential advantage of being more useful to non-native English speaking authors by allowing them to work with editors and reviewers in their first languages.

2.5.2 Two-stage peer review and Registered Reports. Registered Reports represent a significant departure from conventional peer review in terms of relative timing and increased rigour ( Chambers et al. , 2014 ; Chambers et al. , 2017 ; Nosek & Lakens, 2014 ). Here, peer review is split into two stages. Research questions and methodology (i.e., the study design itself) are subject to a first round of evaluation prior to any data collection or analysis taking place ( Figure 4 ). If a protocol is found to be of sufficient quality to pass this stage, the study is then provisionally accepted for publication. Once the research has been completed and written-up, completed manuscripts are then subject to a second-stage of peer review which, in addition to affirming the soundness of the results, also confirms that data collection and analysis occurred in accordance with the originally described methodology. The format, originally introduced by the psychology journals Cortex and Perspectives in Psychological Science in 2013, is now used in some form by more than 40 journals ( Nature Human Behaviour, 2017 ). Registered Reports are designed to boost research integrity by ensuring the publication of all research results, which helps reduce publication bias. As opposed to the traditional model of publication, where “positive” results are more likely to be published, results remain unknown at the time of review and therefore even “negative” results are equally as likely to be published. Such a process is designed to incentivize data-sharing, guard against dubious practices such as selective reporting of results (via so-called “p-hacking” and “HARKing”— Hypothesizing After the Results are Known) and low statistical power, and also prioritizes accurate reporting over that which is perceived to be of higher impact or publisher worthiness.

A similar approach to that of overlay journals is being developed by PubPub ( pubpub.org ), which allows authors to self-publish their work. PubPub then provides a mechanism for creating overlay journals that can draw from and curate the content hosted on the platform itself. This model incorporates the pre-print server and final article publishing into one contained system. EPISCIENCES is another platform that facilitates the creation of peer reviewed journals, with their content hosted on digital repositories ( Berthaud et al. , 2014 ). ScienceOpen provides editorially-managed collections of articles drawn from pre-prints and a combination of open access and non-open venues (e.g., scienceopen.com/collection/Science20 ). Editors compile articles to form a collection, write an editorial, and can invite referees to peer review the articles. This process is mediated by ORCID for quality control, and CrossRef and Creative Commons licensing for appropriate recognition. They are essentially equivalent to community-mediated overlay journals, but with the difference that they also draw on additional sources beyond pre-prints.

2.5.1 Pre-prints and overlay journals. In fields such as mathematics, astrophysics, or cosmology, research communities already commonly publish their work on arXiv ( Larivière et al. , 2014 ). To date, this platform has accumulated more than one million research documents – pre-prints or e-prints – and currently receives 8000 submissions a month with no costs to authors. arXiv also sparked innovation for a number of communication and validation tools within restricted communities, although these seem to be largely local, non-interoperable, and do not appear to have disrupted the traditional scholarly publishing process to any great extent ( Marra, 2017 ). In other fields, the uptake of pre-prints has been relatively slower, although it is gaining momentum with the development of platforms such as bioRxiv and several newly established ones through the Center for Open Science , including engrXiv ( engrXiv.org ) and psyarXiv ( psyarxiv.com ), and social movements such as ASAPBio ( asapbio.org ). Manuscripts submitted to these pre-print servers are typically a draft version prior to formal submission to a journal for peer review. Primary motivation for this is the lengthy time taken for peer review and formal publication, and causes the timing of peer review to occur subsequent to making manuscripts public. However, sometimes these articles are not submitted anywhere else and form what some regard as grey literature ( Luzi, 2000 ). Papers on digital repositories are cited on a daily basis and much research builds upon them, although they may suffer from a stigma of not having the scientific stamp of approval of peer review ( Adam, 2010 ). Some journal policies explicitly attempt to limit their citation in peer-reviewed publications (e.g., Nature nature.com/nature/authors/gta/#a5.4 and Cell cell.com/cell/authors ), and recently the scholarly publishing sector even attempted to discredit their recognition as valuable publications ( asapbio.org/faseb ). In spite of this, the popularity and success of pre-prints is testified by their citation records, with four of the top five venues in physics and maths being arXiv sub-sections ( scholar.google.com/citations?view_op=top_venues&hl=en&vq=phy ). Similarly, the single most highly cited venue in economics is the NBER Working Papers server ( scholar.google.com/citations?view_op=top_venues&hl=en&vq=bus_economics ), according to the Google Scholar h5-index.

LIBRE ( openscholar.org.uk/libre ) is a free, multidisciplinary, digital article repository for formal publication and community-based evaluation. Reviewers’ assessments, citation indices, community ratings, and usage statistics, are used by LIBRE to calculate multiparametric performance metrics. At any time, authors can upload an improved version of their article or decide to send it to an academic journal. Launched in 2013, LIBRE was subsequently combined with the Self-Journal of Science ( sjscience.org ) under the combined heading of Open Scholar ( openscholar.org.uk ). One of the tools that Open Scholar offers is a peer review module for integration with institutional repositories, which is designed to bring research evaluation back into the hands of research communities themselves ( openscholar.org.uk/open-peer-review-module-for-repositories ). Academic Karma is another new service that facilitates peer review of pre-prints from a range of sources ( academickarma.org/ ).

Initiatives such as the Peerage of Science ( peerageofscience.org ), RUBRIQ ( rubriq.com ), and Axios Review ( axiosreview.org ; closed in 2017) have implemented a decoupled model of peer review. These tools work based on the same core principles as traditional peer review, but authors submit their manuscripts to the platforms first instead of journals. The platforms provide the referees, either via subject-specific editors or via self-managed agreements. After the referees have provided their comments and the manuscript has been improved, the platform forwards the manuscript and the referee reports to a journal. Some journal policies accept the platform reviews as if the reviews were coming from the journal’s pool of reviewers, while others still require the journal’s handling editor to look for additional reviewers. While these systems usually cost money for authors, these costs can sometimes be deducted from any publication fees once the article has been published. Journals accept deduction of these costs because they benefit by receiving manuscripts that have already been assessed for journal fit and have been through a round of revisions, thereby reducing their workload. A consortium of publishers and commercial vendors recently established the Manuscript Exchange Common Approach (MECA; manuscriptexchange.org ) as a form of portable review in order to cut down inefficiency and redundancy. Yet, it still is in too early a stage to comment on its viability.

One proposal to transform scholarly publishing is to decouple the concept of the journal and its functions (e.g., archiving, registration and dissemination) from peer review and the certification that this provides. Some even hail this decoupling process as the “paradigm shift” that scholarly publishing needs ( Priem & Hemminger, 2012 ). Some publishers, journals, and platforms are now taking a more adventurous exploration of peer review that occurs subsequently to publication ( Figure 3 ). Here, the principle is that all research deserves the opportunity to be published (usually pending some form of initial editorial selectivity), and that filtering through peer review occurs subsequently to the actual communication of research articles (i.e., a publish then filter process). This is often termed “post-publication peer review”, a confusing terminology based on what constitutes “publication” in the digital age, depending on whether it occurs on manuscripts that have been previously peer reviewed or not ( blogs.openaire.eu/?p=1205 ). Numerous venues now provide inbuilt systems for post-publication peer review, including RIO , PubPub , ScienceOpen , The Winnower , and F1000 Research . In addition to the systems adopted by journals, other post-publication annotation and commenting services exist independent of any specific journal or publisher and operating across platforms, such as hypothes.is , PaperHive , and PubPeer .

Applying a single, blanket policy regarding anonymity would greatly degrade the ability of science to move forward, especially without the flexibility to manage exceptions. The reasons to avoid one definite policy are the inherent complexity of peer review systems, the interplay with different cultural aspects within the various sub-sectors of research, and the difficulty in identifying whether anonymous or identified works are objectively better. As a general overview of the current peer review ecosystem, Nobarany & Booth (2017) recently recommended that, due to this inherent diversity, peer review policies and support systems should remain flexible and customizable to suit the needs of different research communities. We expect that, by emphasizing the different shared values across research communities, as well as their commonalities, we will see a new diversity of OPR processes developed across disciplines in the future. Remaining ignorant of this diversity of practices and inherent biases in peer review, as both social and physical processes, would be an unwise approach for future innovations.

While there are relatively few large-scale investigations of the extent and mode of bias within peer review (although see Lee et al. (2013) for an excellent overview of the different levels in which bias can be potentially injected into the process), these studies together indicate that inherent biases are systemically embedded within the process, and must be accounted for prior to any further developments in peer review. This range of population-level investigations into attitudes and applications of anonymity, and the extent of any biases resulting from this, exposes a highly complex picture, and there is little consensus on its impact at a system-wide scale. However, based on these often polarised studies, it is inescapable to conclude that peer review is highly subjective, rarely impartial, and definitely not as homogeneous as it is often regarded.

2.4.3 The impact of identification and anonymity on bias. One of the biggest criticisms levied at peer review is that, like many human endeavours, it is intrinsically biased and not the objective and impartial process many regard it to be. The question is no longer about whether or not it is biased, but to what extent it is in different social dimensions. One of the major issues is that peer review suffers from systemic confirmatory bias, with only results that are deemed as significant, statistically or otherwise, being selected for publication ( Mahoney, 1977 ). This causes a distinct bias within the published research record ( van Assen et al. , 2014 ), as a consequence of perverting the research process itself by creating an incentive system that is almost entirely publication-oriented. Others have described the issues with such an asymmetric evaluation criteria as lacking the core values of a scientific process ( Bon et al. , 2017 ).

In an ideal world, we would expect that strong, honest, and constructive feedback is well received by authors, no matter their career stage. Yet, it seems that this is not the case, or at least there seems to be the very real perception that it is not, and this is just as important from a social perspective. Retaliations to referees in such a negative manner represent serious cases of academic misconduct ( Fox, 1994 ; Rennie, 2003 ). It is important to note, however, that this is not a direct consequence of OPR, but instead a failure of the general academic system to mitigate and act against inappropriate behavior. Increased transparency can only aid in preventing and tackling the potential issues of abuse and publication misconduct, something which is almost entirely absent within a closed system. COPE provides advice to editors and publishers on publication ethics, and on how to handle cases of research and publication misconduct, including during peer review. COPE could be used as the basis for developing formal mechanisms adapted to innovative models of peer review, including those outlined in this paper. Any new OPR ecosystem could also draw on the experience accumulated by Online Dispute Resolution (ODR) researchers and practitioners over the past 20 years. ODR can be defined as “the application of information and communications technology to the prevention, management, and resolution of disputes” ( Katsh & Rule, 2015 ), and could be implemented to prevent, mitigate, and deal with any potential misconduct during peer review alongside COPE. Therefore, the perceived danger of author backlash is highly unlikely to be acceptable in the current academic system, and if it does occur, it can be dealt with through increased transparency. Furthermore, bias and retaliation exist even in a double blind review process ( Baggs et al. , 2008 ; Snodgrass, 2007 ; Tomkins et al. , 2017 ), which is generally considered to be more conservative or protective. Such widespread identification of bias highlights this as a more general issue within peer review and academia more broadly, and we should be careful not to attribute it to any particular mode or trait of peer review. This is particularly relevant for more specialized fields, where the pool of potential authors and reviewers is relatively small ( Riggs, 1995 ). Nonetheless, careful engagement with researchers, especially high-risk or marginalized communities, should be a necessary and vital step prior to implementation of any system of reviewer transparency.

2.4.2 The dark side of identification. The debate of signed versus unsigned reviews is not to be taken lightly. Early career researchers in particular are some of the most conservative in this area as they may be afraid that by signing overly critical reviews (i.e., those which investigate the research more thoroughly), they will become targets for retaliatory backlashes from more senior researchers. In this case, the justification for reviewer anonymity is to protect junior researchers, as well as other marginalized demographics, from bad behaviour. Furthermore, author anonymity could potentially save junior authors from public humiliation from more established members of the research community, should they make errors in their evaluations. These potential issues are at least a part of the cause towards a general attitude of conservatism from the research community towards OPR. Indeed, they come up as the most prominent resistance factor in almost every formal discussion on the top of open peer review (e.g., Darling (2015) ; Godlee et al. (1998) ; McCormack (2009) ; Pontille & Torny (2014) ; Snodgrass (2007) van Rooyen et al. (1998) ). However, it is not immediately clear how this widely-exclaimed but poorly documented potential abuse of signed-reviews is any different from what would occur in a closed system anyway, as anonymity provides a potential mechanism for referee abuse. The fear that most backlashes would be external to the peer review itself, and indeed occur in private, is probably the main reason why such abuse has not been widely documented. However, it can also be argued that by reviewing with the prior knowledge of open identification, such backlashes are prevented since researchers do not want to tarnish their reputations in a public forum. Under these circumstances, openness becomes a means to hold both referees and authors accountable for their public discourse, as well as making the editors’ decisions on referee and publishing choice public. Either way, there is little documented evidence that such retaliations actually occur either commonly or systematically. If they did, then publishers that employ this model such as Frontiers or BioMed Central would be under serious question, instead of thriving as they are.

2.4.1 Reviewing the evidence. Baggs et al. (2008) investigated the beliefs and preferences of reviewers about blinding. Their results showed double blinding was preferred by 94% of reviewers, although some identified advantages to an un-blinded process. When author names were blinded, 62% of reviewers could not identify the authors, while 17% could identify authors ≤ 10% of the time. Walsh et al. (2000) conducted a survey in which 76% of reviewers agreed to sign their reviews. In this case, signed reviews were of higher quality, were more courteous, and took longer to complete than unsigned reviews. Reviewers who signed were also more likely to recommend publication. In their study to explore the review process from the reviewers’ perspectives, Snell & Spencer (2005) found that reviewers would be willing to sign their reviews and feel that the process should be transparent. Yet, a similar study by Melero & Lopez-Santovena (2001) found that 75% of surveyed respondents were in favor of reviewer anonymity, while only 17% were against it.

Strong, but often conflicting arguments and attitudes exist for both sides of the anonymity debate (see e.g., Prechelt et al. (2017) ). In theory, anonymous reviewers are protected from potential backlashes for expressing themselves fully and therefore are more likely to be more honest in their assessments. Further, there is some evidence to suggest that double blind review can increase the acceptance rate of women-authored articles in the published literature ( Darling, 2015 ). However, this kind of anonymity can be difficult to protect, as there are ways in which identities can be revealed, albeit non-maliciously, such as through language and phrasing, prior knowledge of the research and a specific angle being taken, previous presentation at a conference, or even simple Web-based searches.

There are different levels of bi-directional anonymity throughout the peer review process, including whether or not the referees know who the authors are but not vice versa (single blind, the most common; ( Ware, 2008 )), or whether both parties remain anonymous to each other (double blind) ( Table 1 ). Traditional double blind review is based on the idea that peer evaluations should be impartial and based on the research, not ad hominem, but there has been considerable discussion over whether reviewer identities should remain anonymous (e.g., Baggs et al. (2008) ; Pontille & Torny (2014) ; Snodgrass (2007) ) ( Figure 3 ). Models such as triple-blind peer review even go a step further, where authors and their affiliations are reciprocally anonymous to the handling editor and the reviewers. This attempts to nullify the effects of one’s scientific reputation, institution, or location on the peer review process, and is employed at the Open Access journal Science Matters ( sciencematters.io ), launched in early 2016.

Unresolved issues with posting review reports include whether or not it should be conducted for ultimately unpublished manuscripts, the impact of author identification or anonymity, and if the announcement of author’s career stage has potential consequences on their reputations. Furthermore, the actual readership and usage of published reports remains ambiguous in a world where researchers are typically already inundated with published articles to read. The benefits of publicizing reports might not be seen until further down the line from the initial publication and, therefore, their immediate value might be difficult to convey and measure in current research environments. Finally, different populations of reviewers with different cultural norms and identities will undoubtedly have varying perspectives on this issue, and it is unlikely that any single policy or solution to posting referee reports will ever be widely adopted.

When BioMed Central launched in 2000, it quickly recognized the value in including both the reviewers’ names and the peer review history (pre-publication) alongside published manuscripts in their medical journals. Since then, further reflections on open peer review ( Godlee, 2002 ) led to the adoption of a variety of OPR models. For example, the Frontiers series now publishes all referee names alongside articles, EMBO journals publish a review process file with the articles, with referees remaining anonymous but editors being named, and PLOS added public commenting features to articles they published in 2009. More recently, launched journals such as PeerJ have a system where both the reviews and the names of the referees can optionally be made public, and journals such as Nature Communications and the European Journal of Neuroscience have started to adopt this method of OPR as well.

Publishing peer review reports appears to have little or no impact on the overall process but may encourage more civility from referees. In a small survey, Nicholson & Alperin (2016) found that approximately 75% of survey respondents (n=79) perceived that public peer review would change the tone or content of the reviews, and 80% of responses indicated that performing peer reviews that would be eventually be publicized would not require a significantly higher amount of work. However, the responses also indicated that an incentive is needed for referees to engage in open peer review. This would include recognition by performance review or tenure committees (27%), peers publishing their reviews (26%), being paid in some way such as with an honorarium or waived APC (24%), and getting positive feedback on reviews from journal editors (16%). Only 3% (one response) indicated that nothing could motivate them to participate in an open peer review of this kind. Leek et al. (2011) showed that when referees’ comments were made public, significantly more cooperative interactions were formed, while the risk of incorrect comments decreased. Moreover, referees and authors who participated in cooperative interactions had a reviewing accuracy rate that was 11% higher. On the other hand, the possibility of publishing the reviews online has also been associated with a high decline rate among potential peer reviewers, and an increase in the amount of time taken to write a review, but with no effect on review quality ( van Rooyen et al. , 2010 ). This suggests that the barriers to publishing review reports are inherently social, rather than technical.

It is ironic that, while assessments of articles can never be evidence-based without the publication of referee reports, they are still almost ubiquitously regarded as having an authoritative stamp of quality. The issue here is that the attainment of peer reviewed status will always be based on an undefined, and only ever relative, quality threshold due to the opacity of the process. This is quite an unscientific practice, and instead, researchers rely almost entirely on heuristics and trust for a concealed process and the intrinsic reputation of the journal, rather than anything legitimate. This can ultimately result in what is termed the “Fallacy of Misplaced Finality”, described by Kelty et al. (2008) , as the assumption that research has a single, final form, to which everyone applies different criteria of quality.

In a study of two journals, one where reports were not published and another where they were, Bornmann et al. (2012) found that publicized comments were much longer. Furthermore, there was an increased chance that they may result in a constructive dialogue between the author, reviewers, and wider community, and might therefore be better for improving the content of a manuscript. On the other hand, unpublished reviews tend to have had more of a selective function to determine whether a manuscript is appropriate for a particular journal (i.e., focusing on the editorial process). Therefore, depending on the journal, different types of peer review could be better suited to perform different functions, and therefore optimized in that direction. Transparency of the peer review process can also be used as an indicator for peer review quality, thereby potentially enabling the tool to predict quality in new journals in which the peer review model is known ( Godlee, 2002 ; Morrison, 2006 ; Wicherts, 2016 ), if desired. Journals with higher transparency ratings were less likely to accept flawed papers and showed a higher impact as measured by Google Scholar’s h5-index.

The rationale behind publishing referee reports lies in providing increased context and transparency to the peer review process—the making of the sausage, so to speak. Often, valuable insights are shared in reviews that would otherwise remain hidden if not published. By publishing reports, peer review then has the potential to become a supportive and collaborative process that is viewed more as an ongoing dialogue between groups of scientists to progressively assess the quality of research. Furthermore, the reviews themselves are opened up for analysis and inspection, including how authors respond to reviews themselves, which adds an additional layer of quality control and a means for accountability and verification. There are additional educational benefits to publishing peer reviews, such as training purposes or for journal clubs. At the present, some publisher policies are extremely vague about the re-use rights and ownership of peer review reports ( Schiermeier, 2017 ).

The Publons platform provides a semi-automated mechanism to formally recognize the role of editors and referees who can receive due credit for their work as referees, both pre- and post-publication. Researchers can also choose if they want to publish their full reports depending on publisher and journal policies. Publons also provides a ranking for the quality of the reviewed research article, and users can endorse, follow, and recommend reviews. Other platforms, such as F1000 Research and ScienceOpen , link post-publication peer review activities with CrossRef DOIs to make them more citable, essentially treating them equivalent to a normal Open Access research paper. ORCID (Open Researcher and Contributor ID) provides a stable means of integrating with platforms such as Publons and ImpactStory in order to receive due credit for reviews. ORCID is rapidly becoming part of the critical infrastructure for OPR, and greater shifts towards open scholarship ( Dappert et al. , 2017 ). Exposing peer reviews through these platforms links accountability to receiving credit. Therefore, they offer possible solutions to the dual issues of rigor and reward, while potentially ameliorating the growing threat of reviewer fatigue. Whether such initiatives will be successful remains to be seen, although Publons was recently acquired by Clarivate Analytics , suggesting that the process could become commercialized as this domain rapidly evolves ( Van Noorden, 2017 ). In spite of this, the outcome is most likely to be dependent on whether funding agencies and those in charge of tenure, hiring, and promotion will use peer review activities to help evaluate candidates. This is likely dependent on whether research communities themselves choose to embrace any such crediting or accounting systems for peer review.

2.2.3 Progress in crediting peer review. Any acknowledgement model to credit reviewers also raises the obvious question of how to facilitate this model within an anonymous peer review system. By incentivizing peer review, much of its potential burden can be alleviated by widening the potential referee pool. This can also help to diversify the process and inject transparency into peer review, a solution that is especially appealing when considering that it is often a small minority of researchers who perform the vast majority of peer reviews ( Fox et al. , 2017 ; Gropp et al. , 2017 ); for example, in biomedical research, only 20 percent of researchers perform 70–95 percent of the reviews ( Kovanis et al. , 2016 ). In 2014, a working group on peer review services (CASRAI) was established to “develop recommendations for data fields, descriptors, persistence, resolution, and citation, and describe options for linking peer-review activities with a person identifier such as ORCID ” ( Paglione & Lawrence, 2015 ). The idea here is that by being able to standardize peer review activities, it becomes easier to describe, attribute, and therefore recognize and reward them.

2.2.2 Increasing demand for recognition. Traditional approaches of credit fall short of any sort of systematic feedback or recognition, such as that granted through publications. A change here is clearly required for the wealth of currently unrewarded time and effort given to peer review by academics. A recent survey of nearly 3,000 peer reviewers by the large commercial publisher Wiley showed that feedback and acknowledgement for work as referees are valued far above either cash reimbursements or payment in kind ( Warne, 2016 ). As of today, peer review is poorly acknowledged by practically all research assessment bodies, institutions, granting agencies, as well as publishers. Wiley’s survey reports that 80% of researchers agree that there is insufficient recognition for peer review as a valuable research activity and that researchers would actually commit more time to peer review if it became a formally recognized activity for assessments, funding opportunities, and promotion ( Warne, 2016 ). While this may be true, it is important to note that commercial publishers, including Wiley , have a vested interest in retaining the current, freely provided service of peer review since this is what provides their journals the main stamp of legitimacy and quality (“added value”) as society-led journals. Therefore, one of the root causes for the lack of appropriate recognition and incentivization is, ironically, publishers themselves, who have strong motivations to find non-monetary forms of reviewer recognition. Indeed, the business model of almost every large scholarly publisher is predicated on free work by peer reviewers, and it is unlikely that the present system would function financially with market-rate reimbursement of peer reviewers. Hence, this survey could represent a biased view of the actual situation. Other research shows a similar picture, with approximately 70% of respondents to a small survey done by Nicholson & Alperin (2016) indicating that they would list peer review as a professional service on their curriculum vitae. 27% of respondents mentioned formal recognition in assessment as a factor that would motivate them to participate in public peer review. These numbers indicate that the lack of credit referees receive for peer review is a contributing factor to the perceived stagnation of the traditional models. Furthermore, acceptance rates are lower in humanities and social sciences, and higher in physical sciences and engineering journals ( Ware, 2008 ). This means there are distinct disciplinary variations in the number of reviews performed by a researcher relative to their publications, and suggests that there is scope for using this to either provide different incentive structures or to increase acceptance rates and therefore decrease referee fatigue ( Lyman, 2013 ).

2.2.1 Traditional methods of recognition. One current way to recognize peer review is to thank anonymous referees in the Acknowledgement sections of published papers. In these cases, the referees will not receive any public recognition for their work, unless they explicitly agree to sign their reviews. Another common form of acknowledgement is a private thank you note from the journal or editor, which usually takes the form of an automated email upon completion of the review. In addition, journals often list and thank all reviewers in a special issue or on their website once a year, thus providing another way to credit reviewers. Another idea that journals and publishers have tried implementing is to list the best reviewers for their journal (e.g., by Vines (2015a) for Molecular Ecology ), or, on the basis of a suggestion by Pullum (1984) , naming referees who recommend acceptance in the article colophon (a single blind version of this recommendation was adopted by Digital Medievalist from 2005–2016; see Wikipedia contributors, 2017 , and bit.ly/DigitalMedievalistArchive for examples preserved in the Internet Archive). Digital Medievalist stopped using this model and removed the colophon as part of its move to the Open Library of Humanities; cf. journal.digitalmedievalist.org ). As such, authors can then integrate this into their scholarly profiles in order to differentiate themselves from other researchers or referees. Currently, most tenure and review committees do not consider peer review activities as required or sufficient in the process of professional advancement or tenure evaluation. Instead, it is viewed as expected or normal behaviour for all researchers to contribute in some form to peer review.

Of these, the latter two can both potentially reduce the quality of peer review, open or otherwise, and therefore affect the overall quality of published research. Paradoxically, as the Internet empowers us to communicate information virtually instantaneously, the turn around time for peer reviewed publications is as far from this as it ever has been. One potential solution to this is to encourage referees by providing additional recognition and credit for their work. The present lack of bona fide incentives for referees is perhaps the main factor responsible for indifference to editorial outcomes, which ultimately leads to the increased proliferation of low quality research ( D’Andrea & O’Dwyer, 2017 ).

A vast majority of researchers see peer review as an integral and fundamental part of their work. They often even consider peer review to be part of an altruistic cultural duty or a quid pro quo service, closely associated with the identity of being part of their research community. Generally, journals do not provide any remuneration or compensation for these services. Notable exceptions are the UK-based publisher Veruscript ( veruscript.com/about/who-we-are ) and Collabra ( collabra.org/about/our-model ), published by University of California Press. To be invited to review a research article is perceived as a great honor, especially for junior researchers, due to the recognition of expertise—i.e., the attainment of the level of a peer. However, the current system is facing new challenges as the number of published papers continues to increase rapidly ( Albert et al. , 2016 ), with more than one million articles published in peer reviewed, English-language journals every year ( Larsen & Von Ins, 2010 ). Some estimates are even as high as 2–2.5 million per year ( Plume & van Weijen, 2014 ), and this number is expected to double approximately every nine years at current rates ( Bornmann & Mutz, 2015 ). There are several possible solutions to this issue:

With all of these complex evolutionary trajectories, it is clear that peer review is undergoing a phase of experimentation in line with the evolving scholarly ecosystem. However, despite the range of new innovations, the engagement with these experimental open models is still far from common. The entrenchment of the ubiquitously practiced and much more favored traditional model (which, as noted above, is also diverse) is ironically non-traditional, but nonetheless currently revered. Practices such as self-publishing and predatory or deceptive publishing cast a shadow of doubt on the validity of research posted openly online that follow these models, including those with traditional scholarly imprints ( Fitzpatrick, 2011a ; Tennant et al. , 2016 ). The inertia hindering widespread adoption of new models of peer review can be ascribed to what is often termed “cultural inertia” within scholarly research. Cultural inertia, the tendency of communities to cling to a traditional trajectory, is shaped by a complex ecosystem of individuals and groups. These often have highly polarized motivations (i.e., capitalistic commercialism versus knowledge generation versus careerism versus output measurement), and an academic hierarchy that imposes a power dynamic that can suppress innovative practices ( Burris, 2004 ; Magee & Galinsky, 2008 ).

However, the context of this transparency and the implications of different levels of transparency at different stages of the review process are both very rarely explored, and achieving transparency is difficult at a variety of levels. How and where we inject transparency into the system has implications for the magnitude of transformation and, therefore, the general concept of OPR is highly heterogeneous in meaning, scope, and consequences. New suggestions to modify peer review vary, between fairly incremental small-scale changes, to those that encompass an almost total and radical transformation of the present system. The various parts of the “revolutionary” phase of peer review undoubtedly have different combinations of these OPR traits, and within this remains a very heterogeneous landscape. Table 3 provides an overview of the advantages and disadvantages of the different approaches to anonymity and openness in peer review.

A core question is how to transform traditional peer review into a process aligned with the latest advances in what is now widely termed “open science”. This is tied to broader developments in how we as a society communicate, thanks to the inherent capacity that the Web provides for open, collaborative, and social communication. Many of the suggestions and new models for improving peer review are geared towards increasing the transparency and ultimately the reliability, efficiency, and accountability of the publishing process, and aligning peer review norms to support these aims. These traits are desired by all actors in the system, and increasing transparency moves peer review towards a more open model.

Novel ideas about “Open Peer Review” (OPR) systems are rapidly emerging, and innovation has been accelerating over the last several years ( Figure 2 ; Table 3 ). The advent of OPR is complex, and often multiple aspects of peer review are used inter-changeably or are conflated without appropriate prior definition. Currently, there is no formally established definition of OPR that is accepted by the scholarly research and publishing community ( Ford, 2013 ). The most simple definitions by McCormack (2009) and Mulligan et al. (2008) presented OPR as a process that does not attempt “to mask the identity of authors or reviewers” ( McCormack, 2009 , p.63), thereby explicitly referring to open in terms of personal identification or anonymity. Ware (2011, p.25) expanded on reviewer disclosure practices: “Open peer review can mean the opposite of double blind, in which authors’ and reviewers’ identities are both known to each other (and sometimes publicly disclosed), but discussion is complicated by the fact that it is also used to describe other approaches such as where the reviewers remain anonymous but their reports are published.” Other authors define OPR distinctly, for example by including the publication of all dialogue during the process ( Shotton, 2012 ), or running it as a publicly participative commentary ( Greaves et al. , 2006 ). A recent survey by OpenAIRE found 122 different definitions of OPR in use, exemplifying the extent of this issue. This diversity was distilled into a single proposed definition comprising seven different open traits: participation, identity, reports, interaction, platforms, pre-review manuscripts, and final-version commenting ( Ross-Hellauer, 2017 ).

The diversification of peer review is intrinsically coupled with wider developments in scholarly publishing. When it comes to the gate-keeping function of peer review, innovation is noticeable in some digital-only, or “born open,” journals, such as PLOS ONE and PeerJ . These explicitly request referees to ignore any notion of novelty, significance, or impact, before it becomes accessible to the research community. Instead, reviewers are asked to focus on whether the research was conducted properly and that the conclusions are based on the presented results. This arguably more objective method has met some resistance, even receiving the somewhat derogatory term “peer review lite” from some corners of the scholarly publishing industry ( Pinfield, 2016 ). Such a perception is largely a hangover from the commercial age of publishing, and now seems superfluous and discordant with any modern Web-based model of scholarly communication. The relative timing of peer review to publication is a further major innovation, with journals such as F1000 Research publishing prior to any formal peer review process. Some of the advantages and disadvantages of these different variations of open peer review are explored in Table 2 .

Over time, three principal forms of journal peer review have evolved: single blind, double blind, and open ( Table 1 ). Of these, single blind, where reviewers are anonymous but authors are not, is the most widely-used in most disciplines because the process is comparably less onerous and less expensive to operate than the alternatives. double blind peer review, where both authors and reviewers are reciprocally anonymous, requires considerable effort to remove all traces of the author’s identity from the manuscript under review ( Blank, 1991 ). For a detailed comparison of double versus single blind review, Snodgrass (2007) provides an excellent summary. These are generally considered to be the traditional forms of peer review, with the advent of open peer review introducing substantial additional complexity into the discussion ( Ross-Hellauer, 2017 ).

3 Potential future models

As we have discussed in detail above, there has been considerable technological innovation in peer review in the last decade, which is leading to critical examination of it as a social process. Much of this has been driven by the advent of Web 2.0 technologies and new social media platforms, and an overall shift towards a more open system of scholarly communication. Previous work in this arena has described features of a Reddit-like model, combined with additional personalized features of other social platforms, like Stack Exchange, Netflix, and Amazon (Yarkoni, 2012). Here, we develop upon this by considering additional traits of models such as Wikipedia, GitHub, and Blockchain, and discuss these in the rapidly evolving socio-technological environment for the present system of peer review. In any vision of the future of scholarly publishing (Kriegeskorte et al., 2012), the evolution of peer review and evaluation systems must be considered. Any future peer review platform or system would greatly benefit from considering the following key features:

1. Quality control and moderation, possibly through openness and transparency;

2. Certification via personalized reputation or performance metrics;

3. Incentive structures to motivate and encourage engagement.

While discussing a number of principles that should guide the implementation of novel platforms for evaluating scientific work, Yarkoni (2012) argued that many of the problems researchers face have already been successfully addressed by a range of non-research focused social Web applications. Therefore, developing next-generation platforms for scientific evaluations should focus on adapting the best currently used approaches for these rather than on innovating entirely new ones (Neylon & Wu, 2009; Priem & Hemminger, 2010; Yarkoni, 2012). One important element that will determine the success or failure of any such peer-to-peer reputation or evaluation system is a critical mass of researcher uptake. This has to be carefully balanced with the demands and uptakes of restricted scholarly communities, which have inherently different motivations and practices in peer review. A remaining issue is the aforementioned cultural inertia, which can lead to low adoption of anything innovative or disruptive to traditional workflows in research. This is a perfectly natural trait for communities, where ideas out-pace technological innovation, which in turn out-paces the development of social norms. Hence, rather than proposing an entirely new platform or model of peer review, our approach here is to consider the advantages and disadvantages of existing models and innovations in social services and technologies (Table 4). We then explore ways in which such traits can be adapted, combined, and applied to build a more effective and efficient peer review system, while potentially reducing friction to its uptake.

Feature Description Pros Cons/Risks Existing models Voting or rating Quantified review evaluation

(5 stars, points), including

up- and down-votes Community-driven, quality

filter, simple and efficient Randomized procedure,

auto-promotion, gaming,

popularity bias, non-static Reddit, Stack

Exchange, Amazon Openness Public visibility of review content Responsibility,

accountability, context,

higher quality Peer pressure, potential

lower quality, invites

retaliation All Reputation Reviewer evaluation and

ranking (points, review

statistics) Quality filter, reward,

motivation Imbalance based on user

status, encourages gaming,

platform-specific Stack Exchange,

GitHub, Amazon Public commenting Visible comments on paper/

review Living/organic paper,

community involvement,

progressive, inclusive Prone to harassment,

time consuming, non-

interoperable, low re-use Reddit, Stack

Exchange,

Hypothesis Version control Managed releases and

configurations Living/organic objects,

verifiable, progressive,

well-organized Citation tracking, time

consuming, low trust of

content GitHub, Wikipedia Incentivization Encouragement to engage

with platform and process via

badges/money or recognition Motivation, return on

investment Research monetization,

can be perverted by greed,

expensive Stack Exchange,

Blockchain Authentication and

certification Filtering of contributors via

verification process Fraud control, author

protection, stability Hacking, difficult to manage Blockchain Moderation Filtering of inappropriate

behavior in comments, rating Community-driven, quality

filter Censorship, mainstream

speech Reddit, Stack

Exchange

3.1 A Reddit-based model Reddit (reddit.com) is an open-source, community-based platform where users submit comments and original or linked content, organized into thematic lists of subreddits. As Yarkoni (2012) noted, a thematic list of subreddits can be automatically generated for any peer review platform using keyword metadata generated from sources like the National Library of Medicine’s Medical Subject Heading (MeSH) ontology. Members, or redditors, can upvote or downvote any submissions based on quality and relevance, and publicly comment on all shared content. Individuals can subscribe to contribution lists, and articles can be organized by time (newest to oldest) or level of engagement. Quality control is invoked by moderation through subreddit mods, who can filter and remove inappropriate comments and links. A score is given for each link and comment as the sum of upvotes minus downvotes, thus providing an overall ranking system. At Reddit, highly scoring submissions are relatively ephemeral, with an automatic down-voting algorithm implemented that shifts them further down lists as new content is added, typically within 24 hours of initial posting. 3.1.1 Reddit as an existing “journal” of science. The subreddit for Science (reddit.com/r/science) is a highly-moderated discussion channel, curated by at least 600 professional researchers and with more than 15 million subscribers at the time of writing. The forum has even been described as “The world’s largest 2-way dialogue between scientists and the public” (Owens, 2014). Contributors here can add flair to their posts as a way of thematically organizing them based on research discipline, analogous to the container function of a typical journal. Individuals can also have flair as a form of subject-specific credibility (i.e., a peer status) upon provision of proof of education in their topic. Public contributions from peers are subsequently stamped with a status and area of expertise, such as “Grad student|Earth Sciences.” Scientists already further engage with Reddit through science AMAs (Ask Me Anythings), which tend to be quite popular. However, the level of discourse provided in this is generally not equivalent in depth compared to that perceived for peer review, and is more akin to a form of science communication or public engagement with research. In this way, Reddit has the potential to drive enormous amounts of traffic to primary research and there even is a phenomenon known as the “Reddit hug of death”, whereby servers become overloaded and crash due to Reddit-based traffic. The /r/science subreddit is viewed as a venue for “scientists and lay audiences to openly discuss scientific ideas in a civilized and educational manner”, according to the organizer, Dr. Nathan Allen (Lee, 2015). As such, an additional appeal of this model is that it could increase the public level of scientific literacy and understanding. 3.1.2 Reddit-style peer evaluation. The essential part of any Reddit-style model with potential parallels to peer review is that links to scientific research can be shared and ranked (upvoted or downvoted) by the community. All links or texts can be publicly discussed in terms of methods, context, and implications, similar to any post-publication commenting system. Such a process for peer review could essentially operate as an additional layer on top of a pre-print archive or repository, much like a social version of an overlay journal. Ultimately, a public commenting system like this could achieve the same depth of peer evaluation as the formal process, but as a crowd-sourced process. However, it is important to note here that this is a mode of instantaneous publication prior to peer review, with filtering through interaction occurring post-publication. Furthermore, comments can receive similar treatment to submitted content, in that they can be upvoted, downvoted, and further commented upon in a cascading process. An advantage of this is that multiple comment threads can form on single posts and viewers can track individual discussions. Here, the highest-ranked comments could simply be presented at the top of the thread, while those of lowest ranking remain at the bottom. In theory, a subreddit could be created for any sub-topic within research, and a simple nested hierarchical taxonomy could make this as precise or broad as warranted by individual communities. Reddit allows any user to create their own subreddit, pending certain status achievements through platform engagement. In addition, this could be moderated externally through ORCID, similar to the approach taken by ScienceOpen, in which five items in a peer’s ORCID profile are required to perform a peer review; or in this case, create a new subreddit. Connection to a social network within academia, such as ORCID, further allows community validation, verification, and judgement of importance. For example, being able to see whether senior figures in a given field have read or upvoted certain threads can be highly influential in decisions to engage with that thread, and vice versa. A very similar process already occurs at the Self Journal of Science, where contributors have a choice of voting either “This article has reached scientific standards” or “This article still needs revisions”, with public disclosure of who has voted in either direction. Threaded commenting could also be implemented, as it is vital to the success of any collaborative filtering platform, and also provides a highly efficient corrective mechanism. Peer evaluation in this form emphasizes progress and research as a discourse over piecemeal publications or objects as part of a lengthier process. Such a system could be applied to other forms of scientific work, which includes code, data and images, thereby allowing contributors to claim credit for their full range of research outputs. Comments could be signed by default, pseudonymous, or anonymized until a contributor chooses to reveal their identity. If required, anonymized comments could be filtered out automatically by users. A key to this could be peer identity verification, which can be done at the back-end via email or integrated via ORCID. 3.1.3 Translating engagement into prestige. Reddit karma points are awarded for sharing links and comments, and having these upvoted or downvoted by other registered members. The simplest implementation of such a voting system for peer review would be through interaction with any article in the database with a single click. This form of field-specific social recommendation for content simultaneously creates both a filter and a structured feed, similar to Facebook and Google+, and can easily be automated. With this, contributions get a rating, which accumulate to form a peer-based rating as a form of reputation and could be translates into a quantified level of community-granted prestige. Ratings are transparent and contributions and their ratings can be viewed on a public profile page. More sophisticated approaches could include graded ratings—e.g., five-point responses, like those used by Amazon—or separate rating dimensions providing peers with an immediate snapshot of the strengths and weaknesses of each article. Such a system is already in place at ScienceOpen, where referees evaluate an article for importance, validity, completeness, and comprehensibility using a five-star system. For any given set of articles retrieved from the database, a ranking algorithm could be used to dynamically order articles on the basis of a combination of quality (an article’s aggregate rating within the system, like at Stack Exchange), relevance (using a recommendation system akin to Amazon or ScienceOpen), and recency (newly added articles could receive a boost). By default, the same algorithm would be implemented for all peers, as on Reddit. The issue here is making any such karma points equivalent to the amount of effort required to obtain them, and also ensuring that they are valued by the broader research community and assessment bodies. This could be facilitated through a simple badge incentive system, such as that designed by the Center for Open Science for core open practices (cos.io/our-services/open-science-badges/). 3.1.4 Can the wisdom of crowds work with peer review? One might consider a Reddit-style model as pitching quantity versus quality. Typically, comments provided on Reddit are not at the same level in terms of depth and rigor as those that we would expect from traditional peer review—as in, there is more to research evaluation than simply upvoting or downvoting. Furthermore, the range of expertise is highly variable due to the inclusion of specialists and non-specialists as equals (“peers”) within a single thread. However, there is no reason why a user prestige system akin to Reddit flair cannot be utilised to differentiate varying levels of expertise. The primary advantage here is that the number of participants is uncapped, therefore emphasizing the potential that Reddit has in scaling up participation in peer review. With a Reddit model, we must hold faith that sheer numbers will be sufficient in providing an optimal assessment of any given contribution and that any such assessment will ultimately provide a consensus of high quality and reusable results. Social review of this sort must therefore consider at what point is the process of review constrained in order to produce such a consensus, and one that is not self-selective as a factor of engagement rather than accuracy. This is termed the “Principle of Multiple Magnifications” by Kelty et al. (2008), which surmises that in spite of self-selectivity, more reviewers and more data about them will always be better than fewer reviewers and less data. The additional challenge, then, will be to capture and archive consensus points for external re-use. Journals such as F1000 Research already have such a tagging system, where reviewers can mark a submission as approved after peer review iterations. “The rich get richer” is one potential phenomenon for this style of system. Content from more prominent researchers may receive relatively more comments and ratings, and ultimately hype, as with any hierarchical system, including that for traditional scholarly publishing. Research from unknown authors may go relatively under-noticed and under-used, but will at least have been publicized. One solution to this is having a core community of editors, drawing on the r/science subreddit’s community of moderators. The editors could be empowered to invite peers to contribute to discussion threads, essentially wielding the same executive power as a journal editor, but combined with that of a forum moderator. Recent evidence suggests that such intelligent crowd reviewing has the potential to be an efficient and high quality process (List, 2017).

3.2 An Amazon-style rate and review model Amazon was one of the first websites allowing the posting of public customer book reviews. The process is completely open and informal, so that anyone can write a review and vote, providing usually that they have purchased the product. Customer reviews of this sort are peer-generated product evaluations hosted on a third-party website, such as Amazon (Mudambi & Schuff, 2010). Here, usernames can be either real identities or pseudonyms. Reviews can also include images, and have a header summary. In addition, a fully searchable question and answer section on individual product pages allows users to ask specific questions, answered by the page creator, and voted on by the community. Top-voted answers are then displayed at the top. Chevalier & Mayzlin (2006) investigated the Amazon review system finding that, while reviews on the site tended to be more positive, negative reviews had a greater impact in determining sales. Reviews of this sort can therefore be thought of in terms of value addition or subtraction to a product or content, and ultimately can be used to guide a third-party evaluation of a product and purchase decision (i.e., a selectivity process). 3.2.1 Amazon’s star-rating system. Star-rating systems are used frequently at a high-level in academia, and are commonly used to define research excellence, albeit perhaps in a flawed and an arguably detrimental way; e.g., the Research Excellence Framework in the UK (ref.ac.uk) (Mhurchú et al., 2017; Moore et al., 2017; Murphy & Sage, 2014). A study about Web 2.0 services and their use in alternative forms of scholarly communication by UK researchers found that nearly half (47%) of those surveyed expected that peer review would be complemented by citation and usage metrics and user ratings in the future (Procter et al., 2010a; Procter et al., 2010b). Amazon provides a sophisticated collaborative filtering system based on five-star ratings, usually combined with several lines of comments and timestamps. This system is summarized with the proportion of total customer reviews that have rated a product at each star level. An average star rating is also given for each piece of content. A low rating (one star) indicates an extremely negative view, whereas a high rating (five stars) reflects a positive view of the product. An intermediate scoring (three stars) can either represent a mid-view of a balance between negative and positive points, or merely reflect a nonchalant attitude towards a product. These ratings reveal fundamental details of accountability and are a sign of popularity and quality for items and sellers. The utility of such a star-rating system for research is not immediately clear, or whether positive, moderate, or negative ratings would be more useful. A rating by itself would be a fairly useless design for researchers without being able to see the context and justification behind it. It is also unclear how a combined rate and review system would work for non-traditional research outputs, as the extremity and depth of reviews have been shown to vary depending on the type of content (Mudambi & Schuff, 2010). Furthermore, the ubiquitous five-star rating tool used across the Web is flawed in practice and produces highly skewed results. For one, when people rank products or write reviews online, they are more likely to leave positive feedback. The vast majority of ratings on YouTube, for instance, is five stars and it turns out that this is repeated across the Web with an overall average estimated at about 4.3 stars, no matter the object being rated (Crotty, 2009). Ware (2011) confirmed this average for articles rated in PLOS, suggesting that academic ranking systems operate in a similar manner to other social platforms. Ratings systems also select for popularity rather than quality, which is the opposite of what scholarly evaluation seeks (Ware, 2011). Another problem with commenting and rating systems is that they are open to gaming and manipulation. The Amazon system has been widely abused and it has been demonstrated how easy it is for an individual or small groups of friends to influence the popularity metrics even on hugely-visited websites like Time 100 (Emilsson, 2015; Harmon & Metaxas, 2010). Amazon has historically prohibited compensation for reviews, prosecuting businesses who pay for fake reviews as well as the individuals who write them. Yet, with the exception that reviewers could post an honest review in exchange for a free or discounted product as long as they disclosed that fact. A recent study of over seven million reviews indicated that the average rating for products with these incentivized reviews was higher than non-incentivized ones (Review Meta, 2016). Aiming to contain this phenomenon, Amazon has recently decided to adapt its Community Guidelines to eliminate incentivized reviews. As mentioned above, ScienceOpen offers a five-star rating system for papers, combined with post-publication peer review, but here the incentive is simply that the review content can be re-used, credited, and cited. How this translates to user and community perception in an academic environment remains an interesting question for further research. 3.2.2 Reviewing the reviewers. At Amazon, users can vote whether or not a review was helpful with simple binary yes or no options. Potential abuse can also be reported and avoided here by creating a system of community-governed moderation. After a sufficient number of yes votes, a user is upgraded to a spotlight reviewer through what essentially is a popularity contest. As a result, their reviews are given more prominence. Top reviews are those which receive the most helpful upvotes, usually because they provide more detailed information about a product. One potential way of improving rating and commenting systems is to weight such ratings according to the reputation of the rater (as done on Amazon, eBay, and Wikipedia). Reputation systems intend to achieve three things: foster good behavior, penalize bad behavior, and reduce the risk of harm to others as a result of bad behavior (Ubois, 2003). Key features are that reputation can rise and fall and that reputation is based on behavior rather than social connections, thus prioritizing engagement over popularity. In addition, reputation systems do not have to use the true names of the participants but, to be effective and robust, they must be tied to an enduring identity infrastructure. Frishauf (2009) proposed a reputation system for peer review in which the review would be undertaken by people of known reputation, thereby setting a quality threshold that could be integrated into any social review platform and automated (e.g., via ORCID). One further problem with reputation systems is that having a single formula to derive reputation leaves the system open to gaming, as with almost any process that can be measured and quantified. Gashler (2008) proposed a decentralized and secured system where each reviewer would digitally sign each paper, hence the digital signature would link the review with the paper. Such a web of reviewers and papers could be data mined to reveal information on the influence and connectedness of individual researchers within the research community. Depending on how the data were mined, this could be used as a reputation system or web-of-trust system that would be resistant to gaming because it would specify no particular metric.

3.3 A Stack Exchange/Overflow-style model Stack Exchange (stackexchange.com) is a collective intelligence system comprising multiple individual question and answer sites, many of which are already geared towards particular research communities, including maths and physics. The most popular site within Stack Exchange is Stack Overflow, a community of software developers and a place where professionals exchange problems, ideas, and solutions. Stack Exchange works by having users publish a specific problem, and then others contribute to a discussion on that issue. This format is considered to be a form of dynamic publishing by some (Heller et al., 2014). The appeal of Stack Exchange is that threaded discussions are often brief, concise, and geared towards solutions, all in a typical Web forum format. Highly regarded answers are positioned towards the top of threads, with others concatenated beneath. Like the Amazon model of weighted ratings, voting in Stack Exchange is more of a process that controls relative visibility. The result is a library of topical questions with high quality discussion threads and answers, developed by capturing the long tail of knowledge from communities of experts. The main distinction between this and scholarly publishing is that new material rarely is the focus of discussion threads. However, the ultimate goal remains the same: to improve knowledge and understanding of a particular issue. As such, Stack Exchange is about creating self-governing communities and a public, collaborative knowledge exchange forum based on software (Begel et al., 2013). 3.3.1 Existing Overflow-style platforms. Some subject-specific platforms for research communities already exist that are similar to or based on Stack Exchange technology. These include BioStars (biostars.org), a rapidly growing Bioinformatics resource, the use of which has contributed to the completion of traditional peer reviewed publications (Parnell et al., 2011). Another is PhysicsOverflow, a platform for real-time discussions between physics professionals combined with an open peer review system (Pallavi Sudhir & Knöpfel, 2015). PhysicsOverflow forms the counterpart forum to MathOverflow (Tausczik et al., 2014), with both containing a graduate-level question and answer forum, and an Open Problems section for collaboration on research issues. Both have a Reviews section to complement formal journal-led peer review, where peers can submit preprints (e.g., from arXiv) for public peer evaluation, considered by most to be an “arXiv-2.0”. Responses are divided into reviews and comments, and given a score based on votes for originality and accuracy. Similar to Reddit, there are moderators but these are democratically elected by the community itself. Motivation for engaging with these platforms comes from a personal desire to assist colleagues, progress research, and receive recognition for it (Kubátová, 2012) – the same as that for peer review. Together, both have created open community-led collaboration and discussion platforms for their research disciplines. 3.3.2 Community-granted reputation and prestige. One of the key features of Stack Exchange is that it has an inbuilt community-based reputation system, karma, similar to that for Reddit. Identified peers rate or endorse the contributions of others and can indicate whether those contributions are positive (useful or informative) or negative. This provides a point-based reputation system for individuals, based not just on the quantity of engagement with the platform and its peers alone, but also on the quality and relevance of those engagements, as assessed by the wider engaging community (stackoverflow.com/help/whats-reputation). Peers have their status and moderation privileges within the platform upgraded as they gain reputation. Such automated privilege administration provides a strong social incentive for engaging within the community. Furthermore, peers who asked the original questions mark answers considered to be the most correct, thereby acknowledging the most significant contributions while providing a stamp of trustworthiness. This has the additional consequence of reducing the strain of evaluation and information overload for other peers by facilitating more rapid decision making, a behavior based on simple cognitive heuristics (e.g., social influences such as the “bandwagon effect” and position bias) (Burghardt et al., 2017). Threads can also be closed once questions have been answered sufficiently, based on a community decision, which enables maximum gain of potential karma points. This terminates further contribution but ensures that the knowledge is captured for future needs. Karma and reputation can thus be achieved and incentivized by building and contributing to a growing community and providing knowledgeable and comprehensible answers on a specific topic. Within this system, reputation points are distributed based on social activities that are akin to peer review, such as answering questions, giving advice, providing feedback, providing data, and generally improving the quality of work in the open. The points directly reflect an individual’s contribution to that specific research community. Such processes ultimately have a very low barrier to entry, but also expose peer review to potential gamification through integration with a reputation engine, a social bias which proliferates through any technoculture (Belojevic et al., 2014). 3.3.3 Badge acquisition on Stack Overflow. An additional important feature of Stack Overflow is the acquisition of merit badges, which p