Scholarly Communication in Sociology

by Philip N. Cohen

University of Maryland

MIT Libraries Visiting Scholar

April 2019





Introduction

Overview

Scholarly publishing takes place in an institutional arena that is opaque to its practitioners. As readers, writers, reviewers, and editors, we have no clear view of the system within which we’re working. Researchers starting their careers receive (if they’re lucky) folk wisdom and mythology handed down from advisor to advisee, geared more toward individual success (or survival) than toward attaining a systemic perspective. They may learn how to get their work into the right journals or books, but often don’t learn why that is the outcome that matters for their careers, how the field arrived at that decision, and what the alternatives are – or should be. Gaining a wider perspective is important both for shaping individual careers and for confronting the systematic problems we face as a community of knowledge creators and purveyors.

This primer starts from the premise that sociologists, especially those early in their careers, need to learn about the system of scholarly communication. And that sociology can help us toward that goal. Understanding the political economy of the system within which publication takes place is necessary for us to fulfill our roles as citizens of the research community, as people who play an active role in shaping the future of that system, consciously or not. Responsible citizenship requires learning about the institutional actors in the system and how they are governed, as well as who pays and who profits within the field, and who wins or loses.

Scholarly communication

Before we can understand the political economy of scholarly communication, we need to know something about the structure of the information itself. To get to that point it’s helpful to step outside the discipline and see it from the perspective of libraries. Libraries are responsible for collecting, describing, disseminating, and preserving our research. In keeping with that perspective, I use the general term, scholarly communication rather than simply, “publishing.” Publishing is that thing you do to get your research out to readers, while scholarly communication is the system that encompasses that activity – “the system through which research and other scholarly writings are created, evaluated for quality, disseminated to the scholarly community, and preserved for future use.”

Researchers share their work with various audiences through working papers, preprints, conference presentations, journal articles, and books. In addition to these research products, sociologists also blog, tweet, podcast, speak, and write for nonacademic publications about research. There once was a discrete, formal scholarly record, consisting of finished books and articles published by academic publishers and sitting on library shelves. But now we recognize a much more complex web of interactions – some formal, some informal – that underlies the development of social science. Although “final” publications may seem to be the most important outputs of the research lifecycle, as shown in Figure 1, the entire cycle produces a scholarly record that scientists need to do their work.

Figure 1. The research lifecycle, as depicted by the Center for Open Science.

The movement of scholarly research from a print world to a networked digital world has profound implications. And that transition is still happening, even if it appears self-evident to people entering the field today. What once seemed like fixed and final products written in private by individual scholars at elite institutions – mostly books and articles – are increasingly displaced by fluid, evolving work by networks of collaborators operating across institutions, and in the public eye. In addition to many linked creators, there are also many linked audiences simultaneously reading and interacting with our work (and each other). And the final products are no longer so final: we usually don’t own or hold physical copies of published research, but rather read digital copies of work that is licensed (or rented) rather than purchased, and which may be revised at any time. These changes may be technological in origin, but their effects are social.

How to use this primer

I hope this primer will offer useful guidance for your career – to help you succeed in a competitive, opaque, inefficient system with little accountability. Knowing how the scholarly communication system works will help you navigate it successfully for your career ends. However, I also aspire to help you see the bigger picture in your career, and become an engaged citizen within this system so that we may work together to improve it.

Social science is a collective endeavor to discover and communicate knowledge. If we understand the scholarly communication system we can make decisions to enhance the scientific enterprise while producing and delivering the knowledge we seek. But if we don’t learn to challenge this system where it is flawed – individually and collectively – and develop alternative practices and institutions where they are needed, social science will continue to fall short of its transformative potential.

How it works

Formats

In this section I describe how the different dissemination formats work from the point of view of the scholars who use them, and their role in the system of scholarly communication. These means of communication are technological devices with social origins and implications, with historical inertia as well as capacity for change and development.

Working papers

A working paper is a completed draft of original scholarship that has not undergone peer review and may be subject to revision. Working papers emerged in the pre-digital era, as an important way for researchers to circulate preliminary work for feedback, and to stake their claim to a particular piece of research, more quickly than could be done through the journal system. They were distributed selectively through the mail or handed out at conferences. Some were formal, such as the one displayed in Figure 2 (on the left), which I wrote when I worked at the U.S. Census Bureau in 1999. It is part of a series that has a title, “Population Division Work Papers” (which now has more than 100 papers), and the paper is numbered within the series. Others are informal, and may be no more than a paper draft with the words “working paper” on the title page.

Figure 2: A working paper title page from 1999 (left), and the version published in Demography in 2000 (right).

Working papers may be singular documents that represent the last word on a piece of research. Or they may be drafts that continue to evolve, perhaps presented at conferences, or submitted to one or more journals for peer review and eventual publication in a journal (as happened with our paper; that version is on the right). Over the course of that journey over multiple versions, the link to the original work may or may not be preserved, and copies of the working paper version may continue to circulate.

Working papers have an uncertain status in the scholarly literature. In some cases being part of a formal series connotes legitimacy, but as publications outside the peer review system they generally carry less prestige. If they are produced by a non-academic entity, working papers are considered part of the “gray literature,” a somewhat archaic term that refers to work that may be less authoritative than “published” research.

Whether they are called “working papers” or not, the status of “unpublished” papers has changed as they are now increasingly identified as objects in the scholarly record. They are often cited in formal academic writing, and assigned Digital Object Identifiers (DOIs), giving them more permanence. In this they are becoming merged with the category of papers known as preprints.

Preprints

There is no settled definition of preprints. Some people only use the term to refer to papers that have not been peer reviewed. Others use it for pre-published versions of papers that have been published by a journal (confusingly, these are also known as “postprints” in some circles). In some disciplines, including mathematics, physics, and the life sciences, preprints have achieved a formal status. Authors list preprints on their CVs, and government funding agencies permit citations to preprints in formal reports and applications.

Preprints were popularized in the 1990s in mathematics and physics, with the establishment of the arXiv server (pronounced archive), which now hosts 1.5 million papers in many disciplines. SocArXiv (sosh-archive), of which I am the director, was established for the social sciences in 2016. To help reduce the confusion associated with different terminology, we refer to everything on SocArXiv as “papers,” and let authors call them whatever they like. Systems such as arXiv and SocArXiv provide a crucial service, by allowing authors to update versions of their papers under a consistent bibliographic entry, so that one link always takes readers to the latest version.

Working papers and preprints are all part of the scholarly record. When they are properly distributed, archived, and cited, they establish precedence, which means authors use them to take credit for their discoveries. Because it often takes a long time for peer-reviewed journal articles to appear, these systems have become an important way of moving science forward more rapidly, while allowing researchers to document their efforts.

Conferences

Conferences facilitate presenting research to the public, and establishing its precedence. When conferences are managed properly, they play an important role in the research lifecycle. Submissions to conferences, whether they are complete papers or abstracts, go through some vetting process (which, in sociology, is not considered peer review), before being accepted and added to the program. Between the submission and the presentation, some conferences (though no major sociology conferences) make the papers publicly accessible. And after the presentation they may be indexed and archived. A conference paper thus properly handled becomes a preprint, which can be versioned and attributed to the author in citations (SocArXiv hosts such papers, for example).

Unfortunately, in practice conference presentations are among the least well-documented components of the scholarly record. For some conferences there is no written paper corresponding to presentations, but instead only a title or (sometimes) an abstract. The publicness of conferences is also not well understood. Some researchers give presentations at public conferences but ask the audience not to further disseminate the research presented (a request that cannot be considered binding). Many others simply do not share the underlying work (which is why you see people tweeting blurry phone pictures of the slides).

Several problems result from failure to document conference presentations and make them publicly available. First, the author cannot establish precedence for the work. In other words, a researcher cannot reasonably complain if someone uses the ideas without attribution. Second, authors limit the potential feedback they get. Finally, papers that are only verbally delivered exacerbate problems of accessibility at conferences. There may still be benefits to conference presentations in such cases. Authors can get useful experience, alert interested audience members to their efforts, and hear valuable feedback – and of course they gain a line to add to their CVs. But as a piece of the scholarly record, their value is lost. Distributing and preserving conference papers is an important professional responsibility that individual scholars should take upon themselves if the conferences in which they participate do not.

Journals and books

Journals and books are the traditional finished products of scholarly research. Journal articles, which are shorter and can be published more quickly, are the medium through which most scientific knowledge is recorded and disseminated. Scholarly journals began as the products of scholarly societies, and remain the main identifying feature of most societies today, from Science (published by the American Association for the Advancement of Science) to American Sociological Review (published by the American Sociological Association). Many journals are published by private universities or private companies, for whom they generate billions of dollars in profits annually. We will return to the issue of journal business models below.

In scientific disciplines most original research is published in journals. Sociology is unusual in that our research may be published in journals or books, reflecting the traditions of qualitative research and theory in the discipline. Sociology and social science books come in three major genres: scholarly monographs, general interest or trade books, and textbooks. These are industry publishing conventions, not strict categories. Scholarly monographs are specialized, aimed at expert audiences, written in a scholarly style (e.g., with footnotes or in-text citations), and peer reviewed before publication. They are published by university presses (named for the universities that host them), or else by publishers that have academic reputations (such as Routledge), which lend them academic legitimacy or authority. Much of the most important sociology is published in this format.

Trade books are written for a wider audience, and if they have scholarly citations they are presented in the back, out of the general reader’s direct view. Trade books gain their authority less from the publisher than from the reputations of their authors. So, for example, Evicted: Poverty and Profit in the American City, by Matthew Desmond, was published as a trade book (and won the Pulitzer Prize in general nonfiction), but it had academic authority because he was a well-known sociology professor with a position at Princeton University. Textbooks are designed for students and classrooms, and generally are not considered to be research contributions, or sources of new knowledge. That is one reason research libraries don’t devote their limited budgets to comprehensive textbook collections.

In terms of scholarly communication, books of original research are considered part of the scholarly record. Like papers of various kinds, books are also increasingly available electronically, and licensed or rented as digital copies. And like journal articles, they may have been published in advance as working papers or preprints, or presented at conferences.

Peer review

Journals

Publishing in sociology journals is more complicated than it looks, because the process requires the coordinated efforts of many different actors – authors, editors, reviewers, publishers, and technology companies – in an idiosyncratic system that only reveals a small fraction of itself to the authors and readers. In theory, the process in sociology works through these steps:

1. An author submits a paper to a single journal.

2. The journal editor assesses the paper’s appropriateness for the journal.

3. If appropriate, the editor solicits reviews from experts.

4. The reviewers send the editor their reviews, to be shared with the author and a recommendation, which is not shared with the author

5. The editor considers the reviews and recommendations and decides between:

A. Accept B. Accept conditionally, subject to limited conditions C. Invitation to revise and resubmit (R&R) D. Reject

If the decision is R&R, the process essentially repeats, except with an added discursive layer in which the author resubmits the paper with a memo responding to the reviewers, which the editor and reviewers take into account along with the revised paper. At the revision stage, the process may or may not include the same reviewers. If the decision is “reject,” the author may submit the paper to a different journal and repeat the process, with or without making revisions first. As you can see in Figure 3, at least in ASA journals, the initial decision of journals is usually a rejection, and very few papers are accepted on the first submission.

Figure 3: Outcome of the first round of peer review at ASA journals. (Rejections rates are over 70 percent for five journals, with R&R rates below 25 percent and a tiny fraction of acceptances. The exceptions are Sociological Methodology, which gives more R&R’s than rejects; and Socius, which accepts 60 percent of papers.)

In almost all sociology journals, the peer review process is double-blind. That is, the authors and the reviewers are not told each other’s identities, and only the editors know the identity of both. The authors submit a “blinded” copy of their paper without their names on it, and in some cases authors are forbidden from citing their own work in the paper, or cite it in a masked way, such as, “Author (2018).” In turn, the editor returns the reviews to the author stripped of identifying information, so the reviewers remain anonymous as well.

Despite the common perception that blind peer review has always been a feature of scientific research, this practice only became widespread in scientific journals in the mid-twentieth century, and only known as “peer review” in the 1970s. Before that editors decided what to publish, sometimes in consultation with colleagues, assistants or outside consultants, which may or may not have included soliciting anonymous opinions. The idea that review by “peers” (researchers in the same field) is the gold standard for evaluating science emerged in the post-WWII period, propagated by academics trying to keep control of their research budgets from being wrested away by politicians.

Peer review often serves both the researcher and the scientific community well. The process may lead to improved research, prevent erroneous or unreliable findings from reaching a wide audience, help prioritize the work in a given field, and provide an authoritative assessment of research quality, which is used to evaluate researchers for rewards and career advancement. In addition, the reputation of the peer review system itself serves as an important source of legitimacy to people without expertise in the field, including journalists, policymakers, researchers in other fields, as well as the general public.

On the other hand, there are a number of systemic problems with peer review as it is practiced in sociology (and other disciplines), which are made difficult to identify, or rectify, because of the lack of transparency in the process. These include:

Time and efficiency. At American Sociological Association (ASA) journals it takes 9 weeks on average to get a decision on the first round of reviews. The decision on a revision takes an average of 6 weeks. And then the time from acceptance to print is another 5 months, for a total average time to print of 9 months (not including the time authors spend on the revisions). During that time the papers are generally not shared with the wider audience of interested parties. If a paper is rejected, authors frequently submit it to a new journal, where the process repeats, and because the reviews don’t travel with the paper, a new group of reviewers may end up repeating the efforts of the first.

Figure 4: Total review and production time at ASA journals. (The figure shows an average total time of 8.9 months, with three journals over 15 months and four below 8 months.)

Incentives and rewards. Sociologists receive no formal credit for conducting journal reviews, and their review efforts are not publicly disclosed. (Journals often publish reviewer names in a periodic list, but the extent and quality of their contributions are not shown.) As the number of journals and articles increases, editors are having increasing difficulty recruiting reviewers, so the reviewing burden is increasing on those who choose to do reviews.

Quality. Peer review as a gatekeeping function can make errors of commission (accepting bad work) and omission (rejecting good work), and an irregular system of volunteers is bound to fail at times. The prevalence of these errors is impossible to ascertain, but there are some clear indicators of problems, as fraudulent work and erroneous analyses slip through. Moreover, in general there is a low level of agreement between reviewers, which means that there is a substantial random element in determining the fate of a given paper.

Bias. Blind review is intended to reduce bias in the evaluation of research, for example by preventing reviewers from being impressed by an author’s personal or institutional prestige, or by encouraging junior scholars to share their critical views of senior scholars’ work. On the other hand, blind review also protects the identities of reviewers who do a bad job or discriminate against authors, whose identities or affiliations are often in fact discernible to knowledgeable reviewers. And blind review does not prevent bias by editors. Finally, the need to impress reviewers (and editors) exacerbates publication bias, which is the tendency to publish significant findings while ignoring important null results. As a result, researchers often bury their boring results in the “file drawer,” where they don’t contribute to the scholarly record.

Accountability. Whatever its benefits, the advantages of peer review as currently practiced are balanced against the loss of accountability for powerful actors whose actions are not transparent to the people affected by the outcome of the process. Even in the case of non-profit professional associations such as ASA, the only outcome open to public scrutiny is the corpus of published papers. (Although ASA published statistics on acceptance and rejection rates, and review time, the content and flow of decisions remains hidden.)

Books

Academic books are peer reviewed, too, but the process is quite different from that used for journal articles. Books may be peer reviewed at the proposal stage – when the author submits an outline of the book rather than a complete draft – or when the draft is complete, or both. The variations in the process are determined by the practices of the particular publisher, the status of the author, and the publisher’s determination to publish the book based on potential sales or the author’s prestige.

Reviews of books are generally single-blind, with the reviewers knowing the identity of the author. The peer review process with books is more explicitly geared to market questions, such as sales and classroom potential. Reviewers are sometimes asked to comment on market appeal among target audiences as well as theoretical, technical, or analytic issues.

Unlike journal articles, there may be competition between multiple presses hoping to win the contract with a single book, and simultaneous submission is acceptable with books. Authors with an established reputation may be able to sign a contract for a book without review, while junior authors need their proposal to pass review first. Nevertheless, academic books are almost always peer reviewed at least nominally before publication.

The nature of scrutiny before publication differs between books and articles, but the difference extends to the period after publication as well. Journal articles that are peer reviewed are usually seen as having passed muster and bear the stamp of legitimacy. Books, however, are subject to public scrutiny through formal book reviews after the book is published.

Innovations in peer review

As awareness increases of the problems in the peer review system, a number of new models and experiments have emerged. None of these is dominant, but they are a growing presence in scholarly communication. These three are the most alternative prominent models for journal reviews:

Non-selective review. Most journals, and all the leading journals in sociology, limit the number of papers they publish to fit them in printed volumes. This is to manage their production costs (such as editing and proofreading), and to arbitrarily constrain the number of papers that can have “top” status in the discipline. As a result, good papers are routinely rejected by the top journals, after which they are usually published in lower status journals, causing publication delays and making inefficient use of reviewers’ time. An alternative model is to ask peer reviewers whether a paper is good enough to publish – sound, reliable, competently executed – and then publish papers that meet that threshold criteria with less regard for how important or popular they will be, and with no limit on the number that can be accepted. In sociology, this model is practiced Socius, which seeks to publish “all scientifically sound sociological research.” Such journals can have quick review and production times (see Figure X) because they provide less feedback, concentrating on the publication decision rather than working with extensive revisions to improve papers.

Open review. Some journals, publishers, and research funders outside of sociology advocate publishing the peer reviews of papers along with the papers themselves. Their goal is to increase transparency in the review process, share the ideas and experiences of reviewers, provide models for junior scholars to follow, and make it possible to properly credit review effort. Some journals publish just the reviews, while others also include the reviewer names. To make reviewer reports functional parts of the scholarly record, they can now be archived and assigned DOIs, making them discoverable in searches.

Post-publication review. Now that journals are not necessary for printing and distributing research, why do we still use journal-run peer review to determine what research will be published, and whether it should be declared important – before all but a small handful of people have read it? This question leads to the idea of post-publication peer review: publishing research first, and then using peer review to determine how accurate or ground-breaking it is. This is a logical extension of preprints. An “overlay journal” conducts peer review of papers already available on preprint servers, and then simply publishes a list of the papers they have “accepted” – papers that have been available to read all along. Thus, in post-publication peer review, the review and publication functions are carried out independently. In theory, also, this could allow for the same article to be “published” by different journals, thereby reaching different audiences; and for articles to be further revised after they are published.

Registered reports. Another radical way to restructure peer review is with registered reports. Under this plan, authors “register” research designs and hypotheses, and submit them to peer review before they collect their data and perform their analyses. This “review before results known” is meant to prevent the common practice of “hypothesis after results known,” in which authors write up their research as if they had correctly anticipated the results. Journals use peer review and decide whether to publish the results of the research regardless of the outcome (assuming the research is executed properly). This process is intended to prevent publication bias.

Rights and licenses

Once research is reviewed and accepted, the next step is a legal agreement between the author and the publisher. Copyright law is complicated, and most academics are rightfully uninterested in mastering it. The trick is to learn the most important principles and how they intersect with common practices and problems in our system of scholarly communication. The rules at play in this arena involve professional norms and obligations, copyrights, and licenses.

Professional norms and obligations

Much of the creation and dissemination of research in sociology is governed by norms and obligations rather than legal rules. For example, the practices of citing the work of others, sharing credit with co-authors, keeping peer reviews confidential, and providing other researchers with copies of relevant materials upon request, are all commonly expected but not legally required. Therefore, although in an extreme case plagiarism may be fraud in the eyes of the law, or a violation of copyright, most of the time it is handled as a violation of professional ethics. In fact, such ethical violations can have devastating career consequences, but they usually don’t lead to criminal or civil penalties. On the other hand, copyright violations are legal violations, but small-scale violations rarely become legal cases, and if they do the consequences are relatively minor. (The ASA Code of Ethics is a good guide to the relevant principles. )

Copyright and licenses

In copyright law, an author owns the copyright to their creative work at the moment of creation, regardless of whether they put a “(c)” on it or register it with the copyright office. The owner of a copyright has the legal right to control how the work will be used, and reap the rewards from its sale (with exceptions for fair use, which includes some educational purposes). The major exception to this is “work-for-hire,” or work done as part of one’s employment, in which case the employer owns the copyright. In practice, instead of claiming work-for-hire ownership, most universities have employment policies that allow their employees to keep copyright over their own research (but everyone should know their university policy). For sociologists the issue of intellectual property is not as contentious as it is in the disciplines such as chemistry or engineering, where research outputs developed by faculty may be worth millions of dollars.

In copyright terms, scholarly communication is a system for managing the rights that researchers create when they generate work so that people can read it. Most of the time research needs to be copied and distributed in order for people to read it, and this happens through different kinds of licensing.

Assuming authors do own the copyright to their work, the copyright and license issues are handled in the agreement that authors sign before their books or articles are published. In my experience, most authors do not read these long, dense, legal agreements carefully, and don’t fully understand their implications. Further, authors are presented with these agreements at a moment when they are excited to see the article published, and when they fear any objection on their part might jeopardize that accomplishment, which may have a profound influence on their career. Finally, the agreement is usually made between an author who has no legal expertise and only rarely enters into such agreements on one side, and a powerful corporation with lawyers who devote their careers to crafting and enforcing these agreements on the other. In short, authors are at a profound disadvantage while making these consequential legal agreements.

Transferring copyright

In one model of agreement, the author transfers the copyright to the journal, and the publisher can then do whatever they want with the article. However, if that was the end of the agreement, the author would have no more rights to distribute the work than anyone else, which would offend authors, so in the agreement the journal gives back to the author a license to do some things with the work they used to own.

As an example of this kind of arrangement, here is how the American Sociological Review (published by Sage for ASA) transfer of copyright agreement works, as of fall 2018. First, the author gives the copyright to ASA, with virtually no limitations:

Contributor transfers and assigns to the Society … all right, title, and interest in copyright, and all of the rights comprised therein … including without limitation … the exclusive right to reproduce, publish, republish, prepare all foreign language translations and other derivative works, distribute, sell, license, transfer, transmit, and publicly display copies of, and otherwise use the Contribution... and the exclusive right to license or otherwise authorize others to do all of the foregoing, and the right to assign and transfer the rights granted hereunder.

Once ASA owns the article, it then (several pages later), gives to the author permission to use it in limited ways. Authors cannot publicly share the final PDF of the article, or use it for commercial purposes, but they are permitted to:

Distribute photocopies for teaching purposes or to research colleagues on an individual basis.

Share the version that was originally submitted to the journal – that is, before improvements made in the peer review and editorial process.

Share the final, peer-reviewed version (not the journal PDF), but only after a 12-month embargo period, and only on non-commercial platforms.

Note that once the author gives ASA the copyright in the paragraph above, the association is now free to publish and distribute the work on whatever terms it prefers. ASA then contracts with Sage to publish and distribute the article, and the author has no right to influence (or even see) that contract. For example, ASA is free to collect royalties from other publishers for reuse of articles from their journals. Under the contract, Sage sells the journals and returns a portion of the proceeds to ASA, which we will return to below.

Granting a license

When someone owns a copyright, they can grant licenses to other people to use it for specified purposes (like ASA lets authors share some versions of the papers they wrote). In recent years this process has become much simpler, thanks to Creative Commons (CC). With a CC license, rather than give licenses to individuals one at a time, authors can give permission to the public, and clearly communicate any restrictions in a standardized format. (See figure 5.) For example, I put a CC license on this primer, as indicated by the logo and license link on the title page.

Figure 5: Creative Commons licenses (creativecommons.org).

With open access journals (which we will discuss below), such as Socius and Sociological Science, authors usually don’t transfer their copyright to the publisher. Rather, they either give the journal a license to publish the article, or just apply a CC license that lets anyone publish it. For example, under the author agreement with Socius, the author gives the journal permission to publish the article under a CC license that lets anyone read and share it. In the case of Sociological Science, on the other hand, the author merely applies a CC license to the paper when they give it to the journal. The author keeps the copyright and releases the paper to the public on the condition it will be cited as published in the journal.

With these license agreements, the journal arranges to publish an article but does not own it, and the author (or their institution) retains the right to sell or distribute it. This may be the most consequential decision authors and their institutions make in the business of publishing, as it shifts not only the economics but also the politics of scholarly communication.

Data, code, and other research materials

The research lifecycle depicted in Figure X is not just about the production of written works. The lifecycle also involves the creation, use, discovery, and reuse of data, code for processing and analyzing data, and other research materials. As network technology makes it easier to transport and share these products, libraries have developed standards for describing, preserving, and discovering them, which is part of the work of scholarly communication. Publishers increasingly provide archive services so that supplementary materials can be included with published work – such as making data available for a particular article. And archives such as SocArXiv also store and link to materials for readers. As with published papers, data and other materials are now citable research objects, with DOIs and other identifiers.

Sociology lags behind some adjacent disciplines in the adoption of open practices for data, code, and research materials. In 2015, graduate students in Cristobal Young’s graduate statistics course asked 53 sociologists for replication packages – data and code – for their published papers, and only 15 authors provided the needed information. No major sociology journals require data sharing with publication of an article, although some encourage it. Here are several policies that show the range of practices in other disciplines:

At the open-access publisher Public Library of Science (PLOS), authors must make the data they used publicly available at the time the article is published, with some exceptions (e.g., ethical restrictions regarding confidentiality).

The Journal of Social Psychology requires that research materials be shared when a paper is published. These include survey instruments and questionnaires, and scripts used in experiments.

The American Journal of Political Science requires that any published article be accompanied by data and materials that enable other researchers to replicate the findings reported in the article (this applies to qualitative as well as quantitative research). And the journal verifies that the replication works before they publish the article – by running the program and checking the output – at which point they make those materials publicly available in their own archive.

Of course, ethical protection of research subjects requires careful regulation of what data can be shared, and with whom. Across disciplines, medical scientists and social scientists are more reluctant to share the data they collect than are physicists and biologists, for example. But standards and practices regarding data secrecy in sociology have come under scrutiny, with some researchers advocating a more open stance within ethical limits.

Some researchers understandably worry that sharing their research data and materials requires labor on their part that will provide benefits primarily to others, thus slowing down their own work while helping competing researchers publish results first. Solving this problem requires technology that reduces the burden of sharing materials, demanding ethical standards on collaboration and attribution, and institutional incentives to practice open research practices. Sharing has become much easier, with platforms such as Github, Figshare, and the Open Science Framework offering convenient services, along with other applications for operating dynamic research notebooks that share as you go (within limits set by the researcher). As shared research materials have become more accessible, authors increasingly cite them in their publications, which allows people to measure the impact of this work, and provides some incentive to make materials available.

Who pays, who profits

In its economics, the scholarly communication system is a hybrid that combines elements of state sponsorship (such as government grants for research, and salaries for professors at state institutions), non-profit organizations (including private universities), private industry researchers, and individual researchers and consumers (including students). Some key resources are free to use (such as the Google Scholar database). Others appear free for individuals to use only because their purchase is not visible – mainly databases and publications to which universities subscribe through their libraries. Others are expensive, such as many books. Pricing and paying for information is a complex problem, creating markets in which the conflict between consumers’ need for transparency and sellers’ desire for secrecy is heightened. As a result, even if you are determined to figure out how much things cost, or should cost, and who pays for what (which most academics are not!) it’s a confusing maze. Recent developments have not clarified the picture.

Many products in the system are sold and distributed in different ways at once. For example, consider the different ways someone may gain access to an article in American Sociological Review. Members of ASA can access ASR articles, at an annual membership cost of $51 to $377, depending on their academic rank and income. Or one might gain access to ASR through a library subscription costing their university $730 per year. However, chances are the library is buying the journal as part of a large bundle of journals sold by Sage so the exact cost of ASR is unknown. Some people also may access ASR articles through institutional subscriptions to the JSTOR database, or as an individual, unaffiliated scholar, who can access six free articles per month. Finally, as an unaffiliated individual who finds an ASR article on the journal homepage at Sage, one may stumble onto the confusing page shown in Figure 6, offering one-day access to the single article for $36.

Figure 6: American Sociological Review’s paywall.

Arriving at this page is what people mean by the term “hit the paywall” – the moment one’s research is stopped by a demand for payment. Someone who could join ASA and access every article the association has ever published for as little as $51 might now pay $36 for a single article, a price almost no one chooses to pay. It makes little sense to sell individual articles, because they are generally only valuable as part of a network of information, including citations, author information, and other publications. In effect, therefore, the paywall page is not so much an active sales spot as it is a backstop to prevent people from getting around all of the other ways there are to pay for journal content. If you aren’t associated with an organization that pays for it, the research is essentially unavailable to you (at least legally). Although hardly anyone pays at the paywall page, the system is still defined by the existence of paywalls, which divide those who belong to subscribing institutions from those who don’t.

In this section we will explore how this works, with an emphasis on journal publishing, beginning with journals published by academic societies such as ASA.

Academic associations

Scholarly learned societies, now more often known as academic associations, emerged during the scientific revolution, and their publications defined science in the modern era, beginning with the Royal Society and its journal, Philosophical Transactions, which began publishing in 1665. Since that time, the journal has defined the academic association, and usually served as its economic foundation as well.

When the American Sociological Society (which later became ASA) was founded in 1905, its first publication, Papers and Proceedings from the 1906 annual meeting, offered membership for $3 per year, for which members would receive one benefit: the association’s publications. The launch of American Sociological Review in 1936 reflected the association’s coming of age and led to its national dominance. In the first issue the editorial board wrote to the membership, “The Review belongs to you collectively and should be made to express your interests.” Membership was $6 per year, “of which four dollars is for subscription to the review.” Belonging to the association meant the opportunity to read as well as contribute to the journal. (Membership is no longer required for publishing in the journal.)

Today, ASA is the dominant sociology society, and publications are its largest source of income. Of revenue of $7.3 million reported by ASA in 2016, 45 percent came from publications, almost all from its scholarly journals and the publishing agreement between the association and Sage. The other six sociology societies shown in Figure 7 are much smaller, with annual revenue under $1 million. (The Population Association of America, which includes many sociologists, is somewhat bigger at $1.4 million.) For these seven societies, 44 percent of all revenue comes from publications, and for all but the PAA, publications are the biggest source of income. These societies pay for their activities largely with money raised from outside their organizations.

Figure 7. Percentage of total revenue from publications, meetings, and member dues, for select U.S. sociology societies. Calculated from 2016 non-profit tax filings.

How do societies get revenue from their journals? A few societies serve as their own publishers, but most partner with a publishing company. Details of the publishing agreements are not publicly available, but in broad terms they involve the societies delivering content (facilitating the peer review, selecting articles, and editing them), which a publisher distributes (marketing, subscriptions, printing and shipping, online distribution, and managing the paywall). The publisher returns a portion of the sales revenue to the society, and keeps the rest for its operations and profit. In the case of ASA journals, the association keeps the copyright, and licenses the content to Sage for distribution.

I say societies “deliver” content, rather than “producing” content, because most of the work of producing journal articles is done by people academic employees as part of their job duties, and is not paid for by the association, the journals, or the publisher. This includes, of course, conducting the research and writing the articles, but also the massive job of peer-reviewing them (which is divided among thousands of people). Academic editors of society journals are usually faculty who do the work as part of their regular jobs, and they may or may not receive time off from their institutions to devote to their journals. In the case of ASA journals, editors receive a small honorarium from the association, which is not intended to pay for their time. Associate editors, editorial board members, and peer reviewers are not paid by the association or the publisher, so their efforts amount to a subsidy of the publication process by their employers, which are mostly universities. And of course universities, through their libraries, are also the primary customers for the finished products.

Academic associations like ASA do a lot more than publish journals, but much of what they do is subsidized by their journal revenue. For example, ASA issues public statements about matters of concern to sociologists, runs a minority fellowship program, conducts research about the discipline, publishes ethics standards, and represents the interests of sociologists in other forums. At present, publication revenue – especially institutional subscriptions to its most popular journals – makes this other work possible. In effect, then, subscribing universities are persuaded to pay for the activities of ASA and other academic associations because they need access to society research journals.

Figure 8. The economic flow of association journal publishing with commercial partner.

The economic flow of association journal publishing is depicted in Figure 8. In summary, the research for journals is paid for by grants and university salaries that support faculty and other researchers, as is most of the work of publishing, namely the peer reviewing and editing. That content is delivered to the association, which takes on the job of communicating the research to the academic community and wider audiences, organizing the labor that universities supply. The costs of editing – managing reviews, copyediting, and formatting – are paid for by a share of the subscription revenue returned by the publishers. Additional revenue is diverted to other association priorities. Finally, the remainder of the subscription revenue goes to publisher operations and profits.

The main value contribution to the final product, in terms of labor time and infrastructure expense, is the research and reviewing, which is paid for by the academics’ employers and grants. And the main source of income is academic library subscriptions. You could look at this as universities paying for different stages of scholarly communication in different ways – paying academic employees to write articles on one hand, and paying the publisher for the journals to bring the research output back to campus on the other. To critics of this system, however, it looks like universities are paying for the same research twice.

Several of the most prominent sociology journals are published by university presses, which are non-profit organizations but not the same as academic associations. These include two journals that predate ASR: American Journal of Sociology, which is edited by the sociology department at the University of Chicago and published with University of Chicago Press; and Social Forces, by the sociology department at the University of North Carolina with Oxford University Press. Although non-profit, they are not run by membership organizations such as ASA, and thus don’t have the same kind of accountability. And they are also paywalled journals that generate revenue windfalls for the departments and publishers that produce them, even if not for the shareholders of for-profit companies.

For-profit publishers

Association journals insert a non-profit actor between the researcher and the commercial journal. However, scholarly publishing, at least with regard to journals, is increasingly the domain of for-profit publishers. Some of the most prestigious (and profitable) journals in the natural sciences, such as Cell and Nature, are published by private companies without the imprint of academic associations. This is less the case in sociology, although some, such as Social Science Research (Elsevier) and Work and Occupations (Sage) are commercial products. From the point of view of commercial publishers, the difference between an association journal and one that they produce themselves is that in the former case they pay associations licensing fees, which are used to pay for editing operations, while in the latter case they pay editorial teams directly to facilitate peer review and edit the content. In both cases, the final product is sold by the publisher under the same institution subscription model. Association journals carry an important stamp of academic legitimacy, which greatly enhances the value of their brand.

The scholarly publishing industry is increasingly dominated by a small number of highly profitable publishers. The “big five” scholarly publishers – Elsevier, Wiley-Blackwell, Springer, Taylor & Francis, and Sage – published just over 50 percent of all scholarly journal articles by 2013. In social sciences the market share of those companies was even higher, 70 percent. And in sociology itself their share is 79 percent, as shown in Figure 9. Rising prices enabled by market domination have followed. The average journal subscription price for college and university libraries increased 24 percent from 2014 to 2018, which is four-times the overall rate of inflation. In sociology, the institutional subscription price of ASR increased 135 percent from 2011 to 2018, from $311 to $730 – more than 9-times the rate of inflation.

Figure 9. Publisher share of sociology articles. From Web of Science Sociology category, 5000 most recent articles on 8 Oct 2018.

The profits at the top-five publishers are much higher than those of most companies. In fact, the largest, Elsevier, had a profit margin from 2011 to 2017 of 37 percent each year, reaching $1 billion profits on $2.8 billion revenue in 2017. How can the companies selling scholarly articles charge so much without losing customers to competitors? Consider five factors:

First, size and dominant (oligopoly) position in the research market are crucial. Elsevier published 430,000 articles in 2,500 journals in 2017. When few companies dominate an industry, they can charge higher prices without fear of meaningful competition from smaller firms.

Second, journals get most of the inputs for their products for free, which increases their profit margins.

Third, the people who use (read) their journals are not the ones paying for them – their libraries are – so increases in price do not lead to lower demand by readers.

Fourth, research is bundled into journals with prestigious reputations, so journals are uniquely valuable. Readers (or libraries) can’t decide just to buy the good articles, they must have access to the prestigious journals to maintain their own reputations as institutions.

Finally, journals are in turn bundled into publisher packages with hundreds or thousands of other journals (and the terms of the deals cannot be shared between consumers), so libraries are relatively powerless to shop or bargain competitively.

Note that these factors driving publishing profits apply whether the journal content comes through an academic association or not – they are inherent in the paywall journal subscription business model. The economic flow of commercial journal publishing without an association imprint is depicted in Figure 10. In this simpler process, the association role is removed, and association activities are not subsidized. However, rather than saving subscribers money as a result, such journals instead return greater profits to the publishers.

Figure 10. The economic flow of commercial journal publishing.

Open access models

In contrast to subscription-based publishing models, open access (OA) refers to scholarly publishing that is free to read and also, ideally, free to digitally copy and distribute. The OA concept dates to 2002, when a group of advocates declared that the internet made possible “the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.” In practice OA is often more limited, but always implies at least free access for readers.

Gold, Platinum

Open access can be accomplished through a number of different models that may or may not look like traditional journals. In publishing parlance, any model in which the research is made immediately available for free by the publisher is known as “Gold OA.” Most peer-reviewed open-access journals do not charge publishing fees. The most common fee-based Gold OA model is one in which the research institution, the funding agency, or the author pays for the publication process. Instead of charging libraries for subscriptions, the journal covers their costs (and any profits) by charging what are known as article processing charges (APCs).

The leading examples of Gold OA / APC publishing in sociology are the online journals Sociological Science, published by a non-profit group of sociologists; and Socius, published by ASA with Sage. The model was pioneered in the natural sciences by PLOS One, also non-profit, which publishes thousands of articles per year in an online-only format. Not coincidentally, these journals also tend to practice non-selective review, publishing all the papers they receive that meet their quality criteria. Without page limits, this model is scalable, as the journal can add staff and computing resources to match the demand. Prices to publish a single paper in these journals vary from a few hundred dollars in Sociological Science and Socius to more than $5,000 in the top Elsevier journals. In a simpler, and rarer, variation of OA, publishing is supported directly by a private foundation or other funder. These so-called “Platinum OA” journals include the peer-reviewed journal Demographic Research, which is run by the Max Planck Institute in Germany.

Authors who don’t have grants or institutions to fund their publications may pay APCs from their own pockets. This is often the case in sociology, which explains some of the resistance to this model in our discipline. Another source of opposition is the perception that APC journals are simply shifting the access problem, replacing the exclusion of readers who can’t pay with the exclusion of authors who can’t pay, especially those in less wealthy countries. Finally, some critics of fee-based Gold OA argue it leads to lower publication standards because journals have a financial incentive to accept papers. And there are some “pay to play” journals that exploit desperate or unscrupulous authors by selling them rubber-stamp peer review for their CVs.

The counter argument in favor of fee-based OA is that disseminating research should be considered a research expense, so research institutions should pay for publishing rather than passing on the cost to readers. The current subscription system draws money from less wealthy colleges and universities to pay for research produced at more elite research institutions. To illustrate this, I extracted the institutional affiliations of all authors from the 48 articles in American Sociological Review in 2017. If each of the articles was charged an equal share of the production cost, and the cost was then divided among co-authors of each paper, I found that 14 elite private universities would pay for 36 percent of the publication cost, and another 32 percent would be paid by 21 major public research universities. The remaining third would come from smaller universities and those outside the U.S. (many of which are also wealthy). Thus, billing the research producers rather than subscribers would have a progressive effect on the system – passing more costs onto wealthier institutions – while at the same time making the research available free to all readers. On the other hand, a steep per-article charge would pose an impossible burden on authors at less wealthy institutions.

Hybrid

Shortly after the birth of the OA movement, publishers devised a model to facilitate open access for readers without jeopardizing their subscription revenue. Called Hybrid OA, this model allows research institutions or author to publish in subscription-based journals, but to pay up front for the articles they write to be available online without charge to readers. Most commercial journals now offer this option. Hybrid OA allows some of the costs of publishing to be borne by those funding or conducting the research, while still taking advantage of the benefits of journal publishing (including status and prestige). Some research funders have been willing to pay high APCs to have their work in prestigious journals and still open to all readers, especially in medical and natural sciences, where, for example, Elsevier charges more than $5,000 for open publication of an article in Lancet or Cell. An obvious downside to this model is that research from wealthier institutions may be free to read, while that from authors who can’t pay remains behind paywalls – another form of exclusion.

Commercial publishers in sociology offer open options in their subscription journals as well. Authors publishing in Sage journals (including ASA journals) can pay $3,000 to have their work open immediately, as can those in Elsevier’s Social Science Research ($1,800). These options are rarely purchased, however.

In theory, of course, making some articles available for free lowers the value of a subscription to that journal, and should therefore lower the price. Because subscription pricing is so opaque, however, there is a real risk that the hybrid OA model merely allows publishers to increase profits by charging for the same work twice: once from the authors, and again from institutional subscribers who purchase access to the whole journal. And APCs are higher on average for hybrid journals than they are for OA journals. Major publishers pledge to reduce subscription fees for journals in which authors pay for their articles to be open, but these practices are not subject to verification. For that reason, and because hybrid publishing supports the continued existence of the subscription fee model, many major research funders are now turning against the practice, and prohibiting their grant money from being spent on hybrid journal APCs.

Green

Rather than pay subscription-based journals to make specific content open, under the so-called Green OA model researchers or their institutions make some version of their published work available to the public directly. Most journal author agreements permit authors to distribute their work in its less-than-final form, for example, as it was before copyediting or formatting. Authors can deposit these versions in an institutional or disciplinary repository such as SocArXiv (or, if it is already up as a preprint before publication, leave it there). The work also may be posted in an archive at the researcher’s university. Green OA allows journals to continue selling research papers through subscriptions even as readers can read them for free through a different site. The most complete Green OA policies, like that at Harvard, allow the institution to deposit all faculty research in its public repository. The faculty at Harvard does this by granting a nonexclusive right to the university to distribute the work, which supersedes the agreements authors sign with publishers.

Pirate OA

Until now I’ve described the legal and orderly operation of the scholarly communication system. But like any system it also has vulnerabilities, conflicts, and corruption. And, given the political nature of the contests over access to the production and consumption of knowledge, there is also civil disobedience and protest. There are large quantities of valuable information on the line, and often the only thing separating the haves from have-nots is a password. (People who share Netflix passwords are familiar with this situation.)

On an individual level, many people cheat a little here and there on academic publishers, by sharing passwords, passing copies of documents to their friends or colleagues who don’t have licenses access to them, or posting copies on their personal websites. (You can see people contributing to this illicit sharing system with the #icanhazpdf hashtag on Twitter.). All of these may be considered minor infractions of copyright or licensing agreements, which might individual frustrations with the paywall system but don’t fundamentally undermine it.

But unauthorized was taken to a new level by Sci-Hub, a project of Alexandra Elbakyan, a previously unknown graduate student in Kazakhstan. She devised a system for gathering university login credentials and used them to amass an archive of tens of millions of scholarly articles. And her system automatically grabs a copy of any new article a user requests. By 2016 readers were downloading more than 5 million papers per month from the system, led by users in China and India. Elbakyan is an international fugitive, with major publishers attempting to find legal means to stop her. The site has moved around the internet (as of this writing it is available at a Taiwan address, sci-hub.tw, among others), and many systems have attempted to block access to the archive.

Stopping an operation like Sci-Hub may seem like a hopeless task for the publishing industry, although they are trying (if your university has recently implemented mandatory two-factor authentication for logging into the campus system, you may have Sci-Hub to thank). Consider that in 2018 one person was able to carry almost the whole archive – 60 million articles, more than 100 terabytes – in a single suitcase, and hide it in India. With information so mobile and distributed, and copied at such low cost, undermining the paywall seems almost infinitely cheaper than protecting it. One reason for rising subscription prices, ironically, may be the increasing costs of blocking access by such hackers. The Sci-Hub experience illustrates the vulnerability of the dominant subscription business model – a problem that would not exist if the costs of publication were paid before publication rather than by subscribers.

Preprint repositories

If, as noted, preprints are sometimes considered “gray” literature, the business models of preprint services may seem equally ambiguous. All major preprint repositories are free for authors to use and offer open access to readers, but they have different business models. Here are the most prominent services:

Institutional repositories for the work of university faculty and staff.

SocArXiv, for work in the social sciences (as well as arts and humanities, education, and law). Hosted by the non-profit Center for Open Science (COS), which is funded by private foundation grants, along with a number of other services (e.g., PsyArXiv, EngrXiv). COS is an open source platform, so anyone can use the technology.

Social Science Research Network (SSRN). The largest social science preprint service, SSRN is owned by Elsevier, which lets authors and readers use it for free with registration, while they sell services such as subscriptions to new paper announcements.

ArXiv, for math, physics, and related disciplines. Based at Cornell University and supported by voluntary membership dues from research libraries.

bioRxiv, for life sciences. Supported by foundations.

ResearchGate and Academia.edu, run by private, for-profit companies that allow authors to post and share their work for free (not limited to preprint). They hope to generate revenue by offering social media features for networking, then selling data about users or placing ads.

Preprint repositories are conceptually similar to repositories of data and software, such as Github, Figshare, and the Open Science Framework, which all allow researchers to store and distribute research materials. What makes these services different from simple storage platforms (such as Dropbox or Google Drive), is how they facilitate operations such as version control, open collaboration, and the generation of metadata and licenses required for academic work.

Whether they are working as preprint services – distributing work that has not been peer-reviewed – or as part of a green OA strategy, services such as SocArXiv reveal an important fact about scholarly communication: archiving and distributing research doesn’t have to be expensive. Some services, such as peer-review, editing, marketing, and print distribution, clearly require more money to support. And preprint services do more than simply distribute PDFs – they also provide data preservation infrastructure, as well as storage and dissemination of metadata (such as DOIs and author IDs). But preprint services show that some goals of the system can be accomplished for a lot less money, especially when provided by non-profit organizations and universities. This was the original promise of the Internet. However, like the idea that virtually free network communication would reduce inequality and radically improve democracy, fulfilling that promise has proved challenging, to say the least.

Libraries and metadata

Academic libraries serve both an institutional mission and a public mission. They store and manage scholarly communication for the constituents of their communities, schools, colleges or universities. And they usually extend access to those resources to the wider community as much as they can. In keeping with the evolving nature of scholarly communication, in recent decades libraries have done more managing of digital licenses and materials and less storage and maintenance of print material collections. Unfortunately, the invisible nature of licensing agreements, and web delivery, have created the impression among many academic users that digital materials are available for free. So even as a greater and greater share of (often shrinking) library budgets are devoted to providing access to online resources, academic libraries have lost relevance in the eyes of uninformed users. Obviously this is exacerbated by the diminished need to visit the library building to access its resources. I illustrated above the place of the library and its budget in the subscription journal system. Here I turn briefly to the importance of the library’s role in provisioning the metadata of scholarly communication.

Metadata is data about data, and in this context it refers to the information about research outputs. When the products of scholarly communication are properly documented, they become the scholarly record. Beyond such obvious elements as the author name, title of the work, and publication source, the metadata about research products may include version histories, physical properties, grant information, copyright and license data, and digital location. These data points are collected and then linked, allowing them to be archived and then served up to readers by library databases.

Figure 11. Examples of metadata records, clockwise: Web of Science record for Mark Granovetter’s “Strength of Weak Ties”; copyright page from Patricia Hill Collins’ book Black Feminist Thought; Harvard Dataverse record for Matthew Desmond’s Milwaukee Area Renters Study.

The examples in Figure 11 show the most common metadata tags. For digital objects, these are DOIs (digital object identifiers). Both the article (Granovetter’s “Strength of Weak Ties”) and the data file (Matthew Desmond’s Milwaukee survey) have DOIs, which allow people to find and link to the objects online. For the book shown, Black Feminist Thought, by Patricia Hill Collins, you can see the ISBN numbers used for book catalogs, as well as the Library of Congress call number (HQ1426.C633), which is how it is shelved in university libraries. These records also include subject keywords and other technical and legal information.

The evolution of metadata into digital formats has allowed them to be processed and analyzed by machines, greatly enhancing our ability to map and understand the global knowledge network. The metadata for research outputs like these are the basis for massive databases that, in addition to journal subscriptions, are some of the more expensive items in college and university library budgets. They are also tied into library catalogs, allowing subscribers to locate and access digital copies. For books, most libraries use a database called WorldCat, which contains information about library holdings, allowing readers to find copies in their own or other libraries, through interlibrary loan or e-books. And services such as Crossref allow library users to access articles found in databases through their own library’s collection.

In addition to DOIs, which serve as stable object locators, bibliographers also must be able to uniquely identify authors, even when they have common names – another complex task that goes unnoticed by most users of the system. In recent years the nonprofit ORCID service has offered a uniform method of identifying authors with a 16-digit code, to address this need. For example, publication databases include about a dozen researchers named Daniel Schneider, but the one who is a sociologist at the University of California, Berkeley can be identified by his ORCID ID, 0000-0001-6786-0302.

Although some bibliographic databases are free to use, such as Google Scholar and Microsoft Academic, others are private and sold through subscriptions, including Web of Science (owned by Clarivate) and Scopus (owned by Elsevier). One of the powerful features of these systems is their aggregation of citation data – records of research citing other research – which allow researchers to understand the networks essential for scholarship. Citation data also form the basis for evaluation of journals, publishers, and scholars themselves – and that has important political and economic implications.

Bibliometrics

Anyone doing academic research might benefit from knowing which works in the scholarly record have been most important, and citation data provide one clear measure of influence. Of course, having a lot of citations doesn’t necessarily mean a work is good, only that it is important to other researchers. And a citation count does not replace reading of the work and the works that cite it for truly understanding its place in the development of scholarship. Nevertheless, analyzing citations and other quantitative measures of influence has become a central feature of the scholarly communication system, being used by libraries in deciding which journals to subscribe to, deans and tenure committees in deciding which professors to hire and promote, and scholars is deciding where to publish their work.

There are many statistical measures of scholarly output, or bibliometrics, which are themselves studied in the field of scientometrics and informetrics. Here I will describe two important measures, which are the basis for many variations.

Impact factor

When a piece of scholarship is first published it’s not possible to gauge its importance immediately unless you are already familiar with its specific research field. One of the functions of journals is to alert potential readers to good new research, and the placement of articles in prestigious journals serves a key indicator. Placement in a good journal, or a good book press, is also a status marker that is taken to signify the esteem of the experts in the field.

Since at least 1927, librarians have been using the number of citations to the articles in a journal as a way to decide whether to subscribe to that journal. More recently, bibliographers introduced a standard method for comparing journals, known as the journal impact factor (JIF). This requires data for three years, and is calculated as the number of citations in the third year to articles published over the two prior years, divided by the total number of articles published in those two years.

For example, in ASR there were 95 articles published in the years 2015-16, and those articles were cited 481 times in 2017 by journals indexed in Web of Science, so the JIF of ASR is 481/95 = 5.1. This allows for a comparison of impact across journals. Thus, the comparable calculation for Social Science Research is 482/273 = 1.8, and it’s clear that ASR is a more widely-cited journal. However, comparisons of journals in different fields using JIFs is less helpful. For example, the JIF for the top medical journal, New England Journal of Medicine, is 79.3, because there are many more medical journals publishing and citing more articles at higher rates, and more quickly than do sociology journals.

In addition to complications in making comparisons, there are limitations to JIFs. They depend on what journals and articles are in the database being used. And they mostly measure short-term impact. Further, they are often misused to judge the importance of articles rather than journals. That is, if you are a librarian deciding what journal to subscribe to, JIF is a useful way of knowing which journals your users might want to access. But if you are evaluating a scholar’s research, knowing that they published in a high-JIF journal does not mean that their article will turn out to be important. To illustrate this, Figure 12 shows the distribution of citations to papers published in ASR over a ten-year period (shown as citations per year to account for time since publication). Excluding 2018, the mean citations per year is 5.8, but while the top 10 percent of ASR papers were cited 20 times per year, the bottom 10 percent were cited 0.3 times per year on average. One shouldn’t conclude based on the JIF that a given new article in ASR will be important or highly cited.

Figure 12. Citations per year: American Sociological Review articles published 2009-2018.

Individual citation measures

If journal impact factors represent attempts to quantitatively assess the quality or impact of scholarly journals, individual citation measures serve a similar function for scholars themselves. The personal and professional stakes are high, because academic institutions make long-term commitments – often lifetime commitments – to faculty employees after a single evaluation. And the decisions are often made by people who are not familiar with, or qualified to judge, the quality of the person’s research. These include people in unrelated disciplines who are included in hiring and promotion review committees or higher administration. As a result, a complex system for evaluation has arisen, fueled by citation data collected and sold to universities by large for-profit companies.

One could judge a researcher by how much they have published, or the impact factor of the journals they publish in, or the prestige of their book publisher. But the number of items published may not be reflective of their impact, and the journals’ measures might not reflect the quality of the individual works. Counting citations to the author’s own work provides a more precise assessment.

Commercial sources such as Web of Science, Google, Elsevier, and Microsoft offer a citation analysis of authors that includes the number of their publications, the total number of citations, citations per year, citations per paper, citations per paper for year, and so on. To provide a reduction of all that data, bibliographers have developed a number of different indexes, the most prevalent of which is the h-index. This is defined as the greatest number works that have at least the same number of citations. H-index attempts to measure quantity and impact together.

To illustrate the h-index, I put the citation figures for two Harvard sociologists, Jason Beckfield and Christopher Winship, in Figure 13. For Beckfield, his 21nd most-cited paper was cited 21 times, while for Winship the equivalent numbers are both 20. Thus their h-indexes are almost the same, even though Winship has accumulated more than twice as many total citations (3848 versus 1552), and Winship has three papers that have been cited more than 400 times, while Beckfield’s most-cited paper was cited 220 times. The h-index does not reward additional citations to one’s most high-cited works, or additional publications with few or no citations (although these may add to the index once they have been around longer).

Figure 13. H-index calculations for Jason Beckfield and Christopher Winship, with data from Web of Science.

H-index is convenient to calculate from readily available data, but like any single measure it involves compromises. First, it favors more senior scholars, because new works often take a long time to generate enough citations to affect the score. Modifications have been proposed to address this, including one that normalizes the index by the number of years since an author’s first publication, and one that calculates average annual increases in the score.

Second, h-index does not account for multiple authors who play different roles in the publication process. This is one reason the index often is not useful for comparison across fields, as bioscientists, for example, often have many more authors on an individual work than do sociologists. Again, variations on the index may divide credit between authors, but it’s impossible to determine relative contributions precisely from author order. (This is part of a much larger problem of apportioning attribution credit in scholarly communication generally. )

Third, any system of citation needs to address the question of what materials and citations will be indexed. The reference earlier to preprints as “gray” literature reflects this ambiguity, and there is also a problem of citation tracking in books. Consider that in the Web of Science database, which is mostly limited to peer-reviewed journal articles, Jason Beckfield’s most-cited work has 221 citations; while Google Scholar, which includes books, preprints, and other less-carefully selected sources, counts 533 citations to the same paper. Although the aggregation of citations appears to happen “automatically” to most actors in the system, the curation of citations is technically demanding and expensive to do properly, as it requires gathering and decoding information in different formats from many different sources – many of which are proprietary and require expensive subscriptions. Finally, note that any citation metric will require a method of uniquely identifying authors and linking them to their work, such as the ORCID service described above.

Altmetrics

As the scholarly record has expanded, with communication formats and platforms proliferating, some bibliographers have attempted to capture wider indicators of research impact that traditional citation counts. The advocates for altmetrics argue that academic citations take years to emerge, and represent only a slice of how different groups of readers interact with and use research. So they seek to count alternative indicators, including how many times a particular work is shared on social media sharing, mentioned on blogs or in the news media, downloaded, viewed online, and the number of users who import the reference into their citation manager (such as Mendeley or Zotero). The leading purveyor of altmetrics is the private company Altmetric, which is owned by the company that owns Springer and publishes Nature journals. The Altmetric report for an article in the Springer journal Demography is shown in Figure 14. At the time this image was captured, the article had only been posted for three days, but the Altmetric score already gave an indication of its popularity – putting it in the top 5 percent of research scored by the company.

Figure 14. Altmetric report for an article in Demography, January 2019.

Instant altmetrics serves the insatiable appetite for real time response measurement. And it represents the kind of dilemma that is common in the expanding scholarly communication ecosystem. There is no doubt this kind of information can be used for socially valuable ends, but it is also a profit pipeline for companies that put their own interests first. And what is useful feedback for researchers is also a tool for productivity monitoring – or surveillance – by those who employ researchers. Such data about data is an expanding area of interest for major publishers, who are diversifying their profit sources as potential threats to subscription-based publishing profits loom.

Recommendations

Readers of this primer will have diverse values and goals with regard to scholarly communication, as we do with our research. The brief recommendations here reflect my objective of achieving a “more inclusive, open, equitable, and sustainable scholarly knowledge ecosystem,” in the language of the MIT Grand Challenges Summit report (to which I contributed).

Research

Academic research is a multibillion dollar global industry, employing millions of people, with tremendous influence, and scholarly communication is its lifeblood. Partly for that reason, a sociological research agenda for scholarly communication involves core questions of interest to the discipline. Just as research on, for example, labor markets, is important both for informing research and policy on the specific question and for advancing social scientific knowledge generally, so too can research on the scholarly communication system serve targeted as well as broad purposes. The selective references here are meant to suggest possible avenues for future research.

Sociologists interested in classic questions regarding the creation, control, and use of ideas – such as in mass media, entertainment, and politics – might incorporate academic research and its hierarchies into their maps of contested communication terrain. Those who study the concentration of economic power should find that scholarly publishing provides ample material, both with regard to market dominance and political or cultural influence. And studies of the social locations and identities of people in decision-making positions , might include research on the actors in the review and publication processes in academia. The system also is an important arena in which to explore models of institutional and social change , including social movements (e.g., the OA movement) , non-profits , and universities.

There are also a variety of discrete empirical studies that would make useful contributions to the understanding of, and possibilities for change in, the scholarly communication ecosystem. For example, we lack generalizable information about possible bias and discrimination in the peer review process in sociology specifically, and social science generally. These questions can be addressed by observational, archival, or experimental methods. We also need empirical studies of who serves as authors (including author order), editors, editorial board members, and other decision-makers in various locations within the publishing system, and the implications of disparities in those compositions. For example, a simple count of authors in 13 recent issues of American Sociological Review and American Journal of Sociology shows that 33 percent of their authors are women. What are the causes or consequences of such disparities? Further, how have changes in publishing practices affected the development of academic careers? Data-mining techniques make such assessments of the scholarly literature and its hierarchies much more powerful. Efforts to improve this system will require a strong evidence base, and sociology can (should) help.

Image result for phd comics "open access"

Figure 15. Still image from Piled Higher and Deeper (PHD Comics). 2012. “Open Access Explained!” https://www.youtube.com/watch?v=L5rVH1KGBCY.

Policy

The policy environment for scholarly communication involves actors at many levels, from individual libraries and universities to funding institutions, state and national legislatures, and regulatory authorities. Here are several principles for policy reform that may help guide your advocacy as these debates progress.

Ownership

Underlying all of scholarly communication is the fundamental question of who owns the work. The high prices and profit-making practices of publishers, which are increasingly dominating library budgets, would not be as harmful if the work could be made available in other ways. This is the central insight of the preprints innovation, under which authors make a version of their written work publicly available before (or in addition to) letting publishers distribute it. In the best case, the publisher only has a license to distribute the work; in the weaker version, the publisher has the copyright but grants a license back to the author or their institution.

Consider this document, which I have made available under a non-restrictive license that requires only attribution, while keeping my copyright as the author. If a publisher chooses to copy this and sell it for a profit, that would not restrict anyone else’s access to the work. In the (unlikely) event they produced it in such a way that people chose to buy it – for example, by making it into a video, translating it into another language, or printing and shipping it – that would represent a successful use of market mechanisms, and I would welcome it (the license requires attribution back to this document). Crucially, one reason I would not object is that I do not rely on sales of this document for my income; I produced it as part of my job as an employed faculty member (in my case employed at a state institution, but the same principle extends to private schools).

This illustrates the importance of open licensing in the scholarly communication system. Academic social science research is funded at the front end in the public interest – by research institutions – rather than through the sale of its products. Sales of already-open research products do not threaten this interest. By this reasoning, our current crisis with regard to private profits and public access could be mitigated by simply keeping ownership of research outputs in the hands of researchers or their institutions. This fundamental principle should motivate our policy reforms.

Incentives

In the fall of 2018 I conducted interviews with a number of leaders in scholarly communication. I asked Brian Nosek, a psychologist and Director of the Center for Open Science, where in the current system the dominant practices do not align with the values of openness. He said:

The elements of the system that I think are most problematic are that the incentives for individual reward are not necessarily aligned with the values that we have for scholarly activity. … It isn’t clear how I benefit from being transparent from my work, of being openly sharing – not just of the output, the final paper, but the process by which I generated those claims. Did I pursue a confirmatory design, or was this exploratory? Did I analyze the data lots of different ways, or not? Is my data available for you to examine, to see if my claims are believable? All of that, at present, is aspirational. Yes, we value that in scholarly activity, but we don’t incentivize it.

Our best efforts to change the culture of scholarly research are limited by the failure to make incentives align with our values. In sociology, none of the major journals require openness with regard data and materials (or even a declaration of whether they are available). Most formal policies on hiring, promotion, and tenure, do not explicitly reward openness.

To embrace the goals of making scholarship “inclusive, open, equitable, and sustainable,” we will need explicit policies to reward practices that promote them, with accountability for those charged with implementing them. For this conclusion I draw from the research on diversity in organizations, which shows that aspirational declarations are less effective than clear goals attached to distributed formal accountability. Research funders should apply concrete criteria and requirements for open scholarship (as many are now doing). Formalization of such rewards will lead individual and collective practices to align performance with incentives. At the symbolic level, we can use award criteria to signal core values, as when the American Sociological Association changed its dissertation award rules to require that eligible dissertations be publicly available. At the level of research evaluation, this should include reorienting assessment away from prestige indicators such as journal impact factor, and toward more substantive measures of value or impact.

Investment

The cost of scholarly communication, including library budgets in particular, is increasingly dominated by rising subscription costs. In addition, scholarly societies (including the American Sociological Association) are too dependent on subscription fees to fund their work. All of this constrains our capacity to develop and invest in more open and equitable alternatives while continuing to serve current needs. Getting out from under this structure may be expensive in the medium run but should save money in the long run. This won’t be possible by diverting money from library budgets alone. Direct efforts to wrest control and resources away from profiteering companies, especially by canceling large subscription contracts, may be useful and necessary, but it is not in itself a solution. Research funders and institutions need to invest in infrastructure for preserving and disseminating scholarly outputs – including data and research materials – and the organizational capacity to use them effectively.

Personal

Research and policy intervention may be essential for achieving the cultural change necessary to improve the scholarly communication system. Nevertheless, individuals have both the responsibility and opportunity to help influence that development in a direction that better achieves our goals and supports our values.

Education

People engaged in the work of scholarly communication should take steps to inform themselves about the structures and organizational dynamics of this system. Inertia is a powerful force in academia, and failure to educate ourselves can result in unconsciously reproducing regressive or inefficient practices. As sociologists, we need to understand that learning how to get ahead in the system as individuals is not the same learning how the system works for or against others, especially those with less access to participate in and benefit from scholarly communication. Both kinds of knowledge are vital, and should be formally integrated into research training programs.

Open practices

As with any reform process, individuals who want to promote progress in the arena of scholarly communication have to balance personal responsibility with individual pursuits. It won’t be possible to build an entirely new system around your own practices and construct a successful career in the current system at the same time. Striking that balance is a difficult task. Modeling best practices publicly is an important contribution individuals can make, both communicating to others the value of innovation and contributing directly to the success of progressive initiatives.

From the perspective of social science as a collective endeavor, the ideal scholarly communication strategy entails open sharing of research methods, materials, and outputs at all stages, using services and infrastructure that are publicly owned or controlled, and integrating scholarly collaboration and peer review to enhance our own and other’s research. By the theory of research openness, the more our research practices approximate such an orientation the greater our collective efficacy, efficiency, and accountability. The definitive empirical test of that theory may be elusive – and there may be unintended consequences or negative effects that follow from such practices – but it represents an extension of the philosophy of science as an iterative process across many actors over time, all benefiting from as well as contributing to the evolving collective body of knowledge.

So, where short of achieving this ideal should one compromise, and where should one press forward? A simple approach, and one we have encouraged with the development of SocArXiv and similar services, is to add a sharing or transparency layer across the research lifecycle. That is, these services allow a research to follow the conventional path of research – from data collection and analysis, to working papers and conference presentations, to review and publication of results – while taking steps to make open and accessible some or all of the work products along the way. For example, posting early drafts of papers, research materials, and datasets in open repositories, and making public an openly-licensed copy of the resulting peer reviewed publications. This is a modular approach, which allows opening the process and products either partially or fully, and at different stages. Such practices enhance accountability, protect precedence, and provide assistance to other researchers (known or unknown) who may benefit from our efforts at earlier stages in the process.

Mindful of the incentive bottleneck described by Brian Nosek above, a number of innovators have developed tools to integrate openness into the research workflow without creating burdens on researchers that require sacrifices of their own time and effort for uncertain career benefit to themselves. One important model of this approach is the Open Science Framework (osf.io), an open source, non-profit project of the Center for Open Science that is free to use. OSF enables collaborative project management using popular cloud servers (such as Dropbox) and citation managers (such as Zotero), while allowing researchers to make individual components of the project public or private. The platform includes data preservation, project wikis, and version control, provides DOIs, links to ORCID and other author profiles, and serves papers to SocArXiv and other preprint servers. The goal is to make the research process faster and more efficient, while also promoting the values of openness and equity. Integrating openness into the research process, in ways that don’t slow it down or entail additional costs, allows researchers to gain the personal as well as collective benefits of sharing and collaborating that such an orientation encourages.

The value of our work

Academics work in an institutional setting that combines normative expectations and commitments along with profit motives. For example, we perform peer review services (in part) because we believe it serves ends that we value – improving science, helping researchers improve their work – but in so doing we are also performing labor that often generates profits for others. This isn’t necessarily wrong, of course. Life is full of such compromises. But there are costs, and opportunity costs, associated with the choices we make about how and for whom (and with whom) to work.

One goal of the movement for open and equitable scholarly communication is to persuade academics that our normative commitments should extend beyond scholarly obligations – such as publishing our work, or reviewing and editing the work of others. We should also consider whether our work contributes to institutions or organizations that strengthen our values. And this is not a strictly individual pursuit. Like labor unions, or the mobilization of consumers to improve corporate practices, academia offers an opportunity for individuals to pool their resources and deploy them for the common good.

For example, I have decided not to publish in or review for Elsevier journals anymore (although I have in the past). In addition to business practices I don’t support, the company that owns Elsevier (currently RELX Group), has as long history of promoting policies that run against the values I advocate here, and continues to lobby against progressive policies. In the U.S. alone, the company has spent an average of $1.7 million per year on lobbying over the last decade, concentrated on opposing policies to require the free publication of federally-funded research. And through their political action committee, Elsevier contributed more than $70,000 over the last decade to the members of Congress who sponsored a bill to block federal open access regulations. The decision about which publishers to work for, and give content to, should involve an informed assessment of the organization’s role in the scholarly communication system.

We should reconsider our own orientation toward the evaluation of scholarship. To accurately assess the quality and impact of scholarship requires direct engagement with the work itself. As non-experts in much of what we evaluate, we must rely on the proxy reports of others. The challenge is to credit real assessments of quality and impact rather than superficial or misleading scores and metrics generated by the commercial publishing system. For example, a paper in a journal with a high impact factor is more likely to be good than one published in minor journal. But that is a very noisy and inefficient method of evaluation, and to rely on that information alone does a disservice to the scholars involved, including especially those at the margins of academia with fewer resources, and those without high-status social networks at their disposal.

The assessment of a scholar’s work should also include consideration of their research practices. Do they share their data and research materials, so it can be evaluated more holistically, as well as contribute to the wider social science? Do they generously collaborate, and publish openly, so that others may benefit from their contributions? And do they contribute to the collective effort to improve our work, and the systems in which we work? Our research is more than the number citations to it, and the journal titles on our CVs – it is a multifaceted contribution to the common good.

This work is licensed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.