This essay traces the history of refereeing at specialist scientific journals and at funding bodies and shows that it was only in the late twentieth century that peer review came to be seen as a process central to scientific practice. Throughout the nineteenth century and into much of the twentieth, external referee reports were considered an optional part of journal editing or grant making. The idea that refereeing is a requirement for scientific legitimacy seems to have arisen first in the Cold War United States. In the 1970s, in the wake of a series of attacks on scientific funding, American scientists faced a dilemma: there was increasing pressure for science to be accountable to those who funded it, but scientists wanted to ensure their continuing influence over funding decisions. Scientists and their supporters cast expert refereeing—or “peer review,” as it was increasingly called—as the crucial process that ensured the credibility of science as a whole. Taking funding decisions out of expert hands, they argued, would be a corruption of science itself. This public elevation of peer review both reinforced and spread the belief that only peer-reviewed science was scientifically legitimate.

By the time this essay reaches print, it will have passed through a gauntlet familiar to scholars in every discipline. The editor of this journal will have sent it to two or three anonymous referees. If the referees deemed the contents potentially worthy of publication (and this article could have been rejected once, twice, or many times before finding a home), they will have offered comments on how it should be improved before publication. I will have addressed those critiques in subsequent revisions, strengthening the article’s evidence and arguments until the editor and referees agree that the most serious objections have been answered. The final result is a research article in a peer-reviewed journal—a piece of writing that will count for more professional credit, and hold more scholarly credibility, than work posted on my personal website or uploaded to a preprint server without external review.

Peer-reviewed writing and grants are given a special status in most scholarly communities, and this is perhaps nowhere more true than in the sciences, where findings that have not faced the test of peer review are usually seen as preliminary or unreliable. My favorite example of this attitude toward peer review has long been an evocative headline about a 2012 Physics Letters B paper: “CERN’s Higgs Boson Discovery Passes Peer Review, Becomes Actual Science.”1

The most widely accepted story about peer review’s origin credits Henry Oldenburg with inventing it for the seventeenth-century Philosophical Transactions of the Royal Society, creating the impression that refereeing has been an unchanging part of science for over three hundred years. However, new historical work is beginning to shed more light on peer review’s development—and the real story is far more complicated than the neat tale of Oldenburg inventing refereeing out of whole cloth during the Scientific Revolution.

Most existing histories of peer review have focused on the emergence of the scientific referee during the nineteenth century or on the inner workings of referee systems at particular journals. Those studies have shown that refereeing was not initially thought of as a process that bestowed scientific credibility and that many high-profile journals and grant organizations had unsystematic (or nonexistent) refereeing processes well into the twentieth century. So how did we get to our current moment, in which peer review is thought to be so crucial to science that new findings aren’t “actual science” without it?

In this essay, I argue that the vision of peer review as a process central to science can be traced to the Cold War United States, where various stakeholders sought to navigate a growing tension between desires for scientific autonomy and public accountability in controversies over government science funding. I begin by tracing the development of refereeing systems at scientific journals, the institutions where scholars have done the most work on refereeing’s history. From there, I explore the development of refereeing at funding bodies, where comparatively little historical work has been done. Despite important differences between refereeing procedures at the two types of institutions, refereeing at both journals and grant organizations became more heavily emphasized in the late 1960s and 1970s, and I show that the Cold War United States is the crucial context for understanding that increased emphasis.

The massive expansion of government funding for scientific research in the postwar United States led to more scrutiny of science—and to suggestions that scientists should be more accountable to the public. Scientists balked at the suggestion that their methods or conclusions might be vetted by scientific laymen but did not want to surrender the public status or funding opportunities they had gained. A controversy over peer review procedures at the National Science Foundation (NSF) in the 1970s highlighted the conflict between scientists’ desire to keep scientific decisions in expert hands and the belief that public funds made scientists answerable to laymen and legislators. At a hearing about the NSF’s review procedures, various stakeholders argued that peer review was the only acceptable method of selecting which proposals to fund, and the outcome of the controversy placed increased emphasis on referee opinions at the NSF. The episode both reflected and helped cement the view that peer review was central to the proper practice of science.

Refereeing in Journals through World War II Because refereeing is so important to scientific and scholarly work, over the past several decades many scientists, physicians, journalists, and sociologists have written papers and books analyzing the contemporary state of peer review. Modern economists, sociologists, and philosophers of science have continued this investigation of how peer review functions within the scientific community, and there has been some excellent recent work on journal refereeing. The sociologist Joanne Gaudet has studied the methods and assumptions of peer review in journals, arguing that peer review systems are shaped by disciplinary conditions and economic contexts. The philosopher Carole Lee and her collaborators have undertaken a study of the functions of bias in peer review and have provocatively argued that it may not be possible or desirable to eliminate biases.2 Interestingly, peer review’s best-known historical origin story was penned not by a historian but by a pair of sociologists. In their 1971 paper “Patterns of Evaluation in Science: Institutionalization, Structure, and Functions of the Referee System,” Harriet Zuckerman and Robert Merton claimed that refereeing had its origins in the seventeenth-century Royal Society of London. Zuckerman and Merton wrote that when the Royal Society gave its Secretary, Henry Oldenburg, permission to compile the Philosophical Transactions in 1665, he immediately decided to gather expert opinions on the papers submitted for publication. This version of peer review’s history has been widely circulated and repeated in other scholarly papers, creating the pervasive impression that refereeing has been a part of science ever since the first scientific journal was created.3 However, more recent historical work has illustrated that Oldenburg did not employ any system resembling modern refereeing at the Philosophical Transactions. Other early scientific societies, such as the Académie des Sciences in France, did have procedures for evaluating the work of their members before it was circulated, but those systems of internal critique do not seem to map onto or lead into the systematic external refereeing we know today. Aileen Fyfe, Julie McDougall-Waters, and Noah Moxham have helpfully described this period as the “prehistory” of peer review, highlighting the gaps between the general idea of having peers comment on each others’ work and the formal systems of “peer review” that would develop later.4 Far from springing fully grown from the Scientific Revolution, refereeing developed in the nineteenth century and spread slowly and haphazardly, encountering much skepticism and criticism along the way.5 While some scientific societies had procedures for orally vetting submissions to their journals as early as the eighteenth century, written reports by specialists in a paper’s field did not enter the picture until 1831, when William Whewell proposed that two Fellows of the Royal Society should write down their views on submissions to the Philosophical Transactions and have their comments published in the new journal Proceedings of the Royal Society of London.6 Whewell’s plan to publish referee reports was quickly abandoned, but the practice of sending papers submitted for the Transactions (and, later, the Proceedings) out for referee opinions endured and expanded. By the mid-nineteenth century, arranging refereeing was one of the chief responsibilities of the Secretaries of the Royal Society. Although Whewell had intended for his referees’ identities to be known to both the author and the journal’s readers, the Royal Society quickly decided that referees would give more candid advice if they remained anonymous. Indeed, their reports were not even sent to the paper’s author—the referee report was considered a confidential document for the internal use of the Royal Society. If one of the Royal Society Secretaries wished to communicate concerns to the author, he would use a personal letter to relay the referees’ comments without revealing their identities.7 Referee anonymity, therefore, was linked with the practice of refereeing very early in its history. Over the course of the nineteenth and early twentieth centuries, an increasing number of scientific and learned societies in the United States and the United Kingdom adopted the practice of systematically consulting anonymous referees about submitted papers. Great Britain’s Geological Society and Royal Society of Chemistry, for instance, both adopted refereeing systems for their publications in the nineteenth century, while the American Physical Society and the American Sociological Association adopted them in the early twentieth century. During this period, concerns about the quality of the scientific literature led to the “referee” being reconceptualized as a gatekeeper—someone responsible for ensuring that a scientific paper merited publication. That vision of the referee as a guardian of the scientific literature led many journals to place refereeing largely in the hands of a small number of elite scientists, who therefore had tremendous power to shape a journal’s contents in their field of expertise.8 In the interwar period, a broader push toward standardization during the Progressive Era had a tremendous impact on scientific practice, particularly in the United States. The quest for standardization seems to have been one impetus that influenced the development of increasingly formal refereeing procedures at British and American scientific societies. Refereeing procedures at the American physics journal Physical Review, for example, became much more standardized during the 1920s and 1930s. Referees who had once written free-form letters sharing their general impressions of Physical Review submissions were now asked to fill out forms assessing a paper’s suitability according to a predetermined list of criteria. However, most papers accepted for Physical Review never went out to referees at all; the editor accepted most papers on his own authority, consulting referees only when he thought he might want to reject a paper. It was not until the 1960s that all Physical Review papers were sent out for external referee opinions.9 Commercial scientific journals such as the Philosophical Magazine and Nature largely kept editorial decisions in-house until the mid-twentieth century.10 Many of these commercial journals were the personal projects of dynamic editors who felt qualified to evaluate any contributions that might come their way. Unlike the publications of scientific societies, which were published whenever the society had enough material to print an issue, commercial journals had to meet monthly or weekly deadlines and often could not afford to reject too many contributions—or wait for external referees to submit reports. Refereeing also remained relatively uncommon outside the English-speaking scientific world in the nineteenth and early twentieth centuries. Journals affiliated with academic institutes in France and Germany, for example, generally did not employ refereeing procedures. In 1835, when the French Académie des Sciences founded the Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences, the Académie deliberately eschewed refereeing in order to publish the new periodical more quickly. Germany’s most prominent journals, such as the Annalen der Physik und Chemie, were generally under the control of a powerful editor who preferred to make publication decisions without relying on referee opinions. Scientists unaccustomed to outside refereeing did not necessarily see the practice as a superior system for scientific journals. Famously, in 1936 Albert Einstein was incensed to find that the editor of Physical Review had sent his submission to another physicist for commentary. In a terse note to the editor, John Tate, Einstein wrote that he and his coauthor “had not authorized you to show [our manuscript] to specialists before it is printed. I see no reason to address the—in any case erroneous—comments of your anonymous expert. On the basis of this incident I prefer to publish the paper elsewhere.”11 Well into the twentieth century, a reputable scientific journal could and did publish papers without consulting referees. Many journals that adopted refereeing in the mid-twentieth century did so not because of epistemic concerns but to lighten the editorial workload. At the American weekly Science, for instance, the editorial board performed almost all refereeing in-house for the first half of the twentieth century. The journal began relying more heavily on external referees in the 1950s after members of the editorial board complained that “the job of refereeing and suggesting revisions for hundreds of technical papers is neither the best use of their time nor pleasant, satisfying work.” Another example of a journal that adopted refereeing to ease editorial burdens is the American Journal of Medicine (AJM). When AJM was founded in 1946, its editor Alexander Gutman worked to ensure fast publication by handling the vast majority of acceptances or rejections himself. However, as the journal became more popular, Gutman was unable to keep up with the number of submissions. By the mid-1960s submissions to AJM were being sent out for external refereeing to ensure that the journal could maintain its reputation for fast publication.12 Other prominent English-language journals adopted systematic external refereeing even later. The New England Journal of Medicine began having two outside reviewers consider all potentially acceptable papers in the late 1960s. Nature began employing referees for every paper it published only in 1973.13 Well into the 1970s, the British medical journal the Lancet relied heavily on editorial judgment, with editors accepting or rejecting up to 90 percent of submissions themselves.14 Significantly, there seems to have been a growing perception that Americans placed more emphasis on systematic external refereeing than their counterparts in other nations. David Davies, the British geophysicist who edited Nature from 1973 to 1980, had been working at MIT when he accepted the position. He would later recall that he and his American colleagues were alarmed by Nature’s unsystematic attitude toward refereeing, and he felt that “getting the refereeing system beyond reproach” was his most important task as editor. By the late 1970s the editor of the Lancet worried that its articles might not be taken seriously in the United States unless the journal began employing peer review rather than allowing the editor to accept papers on his own authority. Despite such concerns, the Lancet decided to limit the influence its reviewers had over the editor. In a 1989 editorial, the journal complained that “in the United States far too much is being asked of peer review” and proudly assured readers that at the Lancet “reviewers are advisers not decision makers.”15

Refereeing at Funding Bodies Increased burdens on editorial boards provide one clue about why refereeing became more widely used at journals in the late twentieth century; however, that does not explain how the scientific community came to view refereeing as crucial for scientific legitimacy. To understand that shift, we must look not only at journals but at funding bodies. Sociologists have done a great deal of excellent work on refereeing systems at funding bodies, especially in the past decade. One notable project is the Comparative Assessment of Peer Review (CAPR), a four-year study of peer review procedures at six worldwide grant organizations that analyzed how peer review processes do or do not balance the scientific community’s desire for autonomy with funding bodies’ desire for relevant and accountable scientific research. Michèle Lamont’s 2009 book How Professors Think examined grant peer review panels in the humanities, illustrating interesting divides in what different fields consider “excellent” work, as well as illuminating the social functions peer review serves for members of the scholarly community.16 However, far less historical work has been done on the development of refereeing systems at funding bodies, perhaps because so few funding bodies used systematic refereeing procedures prior to the late twentieth century. Many funding bodies had unsystematic or internal review processes that placed heavy responsibility in the hands of organization employees. Private funding bodies such as the Rockefeller Foundation, for example, generally left funding decisions in the hands of trusted middle managers well into the postwar period, awarding money via what Robert Kohler has described as a “patronage system.”17 The same was true for many publicly funded grant organizations. The German Research Foundation, created in 1920 and initially called the Emergency Association for German Science, deliberately chose to rely on a small number of elite scientists for opinions on grant proposals, and much of the evaluation focused on the personal qualities of the applicants. Well into the twentieth century, a single three-man committee evaluated all applications for the Royal Society of London’s Government Grants; though all were invited to apply, the process awarded those grants almost exclusively to Fellows of the Royal Society.18 Grant organizations affiliated with learned societies or with governments were most likely to utilize external refereeing, and some of them adopted it in response to Progressive Era pushes for standardization. For example, the American National Research Council, the private research arm of the National Academy of Sciences, developed increasingly formalized refereeing procedures during the 1920s and 1930s, in part to ensure fairness to researchers without established national reputations. But many government funding bodies, including ones formed in the mid-twentieth century, eschewed systematic external refereeing in favor of creating an in-house panel of researchers who would evaluate all grants. In Great Britain, the government created the Medical Research Council (MRC) in the aftermath of World War I to promote medical research. The members of the council were a mix of scientific experts and legislators, and funding decisions were left entirely up to the council members before World War II. In 1946, the MRC created an Appointments and Grants Committee to help deal with the administrative work of grant applications, but this committee was ultimately answerable to the council as well and appears to have made only occasional use of outside opinions.19 Similarly, when the U.S. government formed the National Institutes of Health (NIH) in 1948, its Division of Research Grants initially evaluated grant applications with little or no consultation from outside referees.20 Instead, each application went first to a small “study section” composed of NIH-affiliated scientific experts in a particular field. From there, the study sections’ recommendations were forwarded to an NIH council of scientists and laymen, which added its own recommendations. Final decision-making power rested in the hands of the institute directors, heads of NIH member institutions such as the National Cancer Institute and the National Eye Institute. While the directors took the earlier evaluations into account, they were not obligated to follow the recommendations of the study sections or the council. Furthermore, NIH applicants would receive little information about why their grants had been accepted or rejected. Deliberations about the grants were considered confidential and internal to the NIH. The National Science Foundation (NSF), established by the National Science Foundation Act of 1950, was more reliant on outside experts than the NIH. From the outset, its proposal review process was designed to involve a range of experts from across the basic science fields that the NSF was going to support. Some proposals were sent out for “ad hoc” mail review, in which copies of the proposal were mailed to scientists who submitted their comments by return mail. Other proposals were evaluated by special panels of experts assembled in Washington. The choice of the reviewers, as well as the choice of panel review versus mail review, was left up to NSF employees. By 1975, according to internal NSF statistics, 44 percent of proposals were evaluated by ad hoc mail review, 28 percent received panel review, and 28 percent received reports from a mix of the two systems.21 Like the NIH, however, the NSF ultimately left final decisions about which grants to fund in the hands of its leaders. Referee opinions were seen as only one piece in the decision—an important piece, but not the determining factor as to whether the NSF would award or decline funding. As one internal NSF document explained the system, “It is intended that the program official, taking into account all pertinent factors and information input, will make a final decision.”22 Furthermore, scientists who submitted proposals would not be given copies of their reviews, only a summary report prepared by an NSF employee stating the major reasons for the acceptance or rejection of the proposal. Both the NSF and the NIH, however, would come to place more emphasis on external referee opinions following a 1975 controversy that I will discuss in more detail later in this essay. Today, just as it would be difficult to find a respectable scientific journal that does not use peer review, it is nearly impossible to find a major funding body that does not employ an external peer review process to evaluate its grants. Many major grant organizations now cite their use of external referees—that is, independent experts who are not employed by the grant organization—to illustrate the care they take with proposals. The Medical Research Council website has a page devoted to the details of its “external peer review process.” The German Research Foundation assures applicants that it consults over nine thousand external referees per year. The NIH has multiple pages on its website explaining the steps in their review process and the ways in which they ensure the “integrity and confidentiality” of grant refereeing.23 Interesting contrasts emerge when we consider grant refereeing and journal refereeing side by side. Funding bodies’ frequent practice of assembling a panel of experts, often meeting in person, stands in contrast to the system of sending a paper out for two or three opinions at journals. Furthermore, while journal referees were in theory only judging the paper at hand, grant applicants had to submit CVs and evidence of previous work. Grant referees explicitly considered the applicant’s background as a factor in their decision, making the referee report a judgment about both this particular proposal and the scientist’s body of work—and, often, his or her personal qualities—as a whole. There is also an epistemological difference. Journal referees are evaluating science that has already been performed and judging whether the paper’s conclusions ought to be given the stamp of authenticity that publication in a specialist journal bestows. Grant referees are, in theory, judging science that has yet to be done, and there is often an implicit sense in discussions about grant refereeing that even the best proposals pose something of a risk. Perhaps the most notable historical contrast between grant refereeing and journal refereeing is how comparatively rare grant refereeing was up to the mid-twentieth century. Most funding bodies kept their decisions among a small number of experts, in a system akin to the editorial board model so many journals had employed before refereeing became too much of a burden for the board members. Furthermore, journal refereeing procedures seem to have had little influence on grant refereeing procedures, and vice versa; the two systems, as John Burnham argued, seem to have evolved largely independently of one another.24 But that independence does not mean historians should continue to evaluate these two manifestations of refereeing entirely separately from one another. It cannot be a coincidence that the late twentieth century—specifically, the late 1960s and early 1970s—was the moment when both journals and funding bodies found themselves under increasing pressure to employ external referees. Nor does coincidence seem likely when we consider that it was American journals and funding bodies that were the most likely to employ refereeing and Americans who were seen as more dependent on refereeing than their counterparts in other countries. Something else was happening in the United States as well: increasingly, Americans were calling refereeing by a new term, “peer review.”

The Rise of “Peer Review” The term “peer review” has an intriguing linguistic history. It appears to have originated not from journals, and not in reference to external referees, but from review committees at grant organizations and in the medical community. In the 1960s and early 1970s, for example, major American newspapers usually used “peer review” to refer to reviews of medical practices for compliance with Medicare and Medicaid.25 To track the introduction of the term “peer review” into the English-speaking scientific community, it is useful to look at Science and Nature, two prominent scientific weeklies that contained news and commentary. Science first used the term “peer review” in a 1965 article about the NIH. Joseph D. Cooper explained that as part of an evaluation of the NIH’s review process, “project grantees … were asked to make assessments of the output of a system of project approval in which they were intimately involved both as grantees and as members of peer review groups.” However, Science used the term only infrequently until the 1970s—and then mostly in reference to review procedures specific to medical journals and the NIH.26 Across the Atlantic Ocean, the British Nature did not use the term at all until 1971 and for most of the 1970s used it only to refer to American grant or institutional review in the biomedical sciences. One Nature article in 1975 described “the so-called peer review system” specifically as a “method for judging the relative merits of competing grant proposals.”27 In the medical literature, the New England Journal of Medicine first used the term “peer review” in 1969 to refer to the procedures for ensuring that hospitals and physicians were abiding by the regulations of the new Medicaid funding programs: “Although state health authorities have been concerned with standard formulation and facility licensing and regulation for many years, few state programs can boast of the establishment of effective monitoring or ‘peer review’ methods to assure the adequacy of institutional and professional services under Medicaid.”28 Similarly, the term was first used in the Journal of the American Medical Association (JAMA) in 1970 to describe physicians’ assessments of each others’ practices under the Professional Standards Review Organization (PSRO) associated with Medicaid; this continued to be JAMA’s most frequent use of the phrase through the 1970s. “Peer review means different things to different people,” the physician Irvine H. Page explained in a 1973 editorial for JAMA. “To most American physicians it means PSRO, to the British House of Lords it means Peers examining other Peers for moral turpitude, and to the scientific community, it means Study Sections and Councils that determine a grantee’s financial and possibly research future.” Notably, journal refereeing was not one of the definitions Page offered, and JAMA would not use the term “peer review” to refer to journal refereeing until 1981.29 Franz J. Ingelfinger, the influential editor of the New England Journal of Medicine, was one of the first people to refer to journal refereeing as “peer review” in print. In 1968, Ingelfinger was invited to write an opinion piece for the Lancet on the purpose of the medical journal. He wrote that “to be accepted the article must pass peer review, and ideally either must pass with high honours or must undergo appropriate revision.” The term would not appear again in the Lancet until the 1970s. Ingelfinger was also the first to use the term “peer review” for journal refereeing in Science, which he did in a 1970 article on the current state of medical literature. In a 1970 article on the New England Journal of Medicine’s review process, a member of the editorial staff (likely Ingelfinger) wrote that the journal tried to maintain quality and integrity through “a ‘peer review’ of submitted articles whose findings are weighed in final judgment by editors and their advisers.” The biologist Thomas Jukes (an Englishman at the University of California, Berkeley, and a regular Nature columnist) was the first Nature author to call journal refereeing “peer review,” in 1977.30 The shift from “refereeing” to “peer review” is not merely a linguistic curiosity. Calling external refereeing at journals or funding bodies “peer review” established that the evaluation of a paper or grant proposal could only be done by experts—the peers of the person who submitted the work. The new term established a narrow range of acceptable reviewers and implicitly deemed those without a scientific background unqualified to evaluate the work in question. Moreover, it seems clear that this shift in terminology originated in the United States. One Nature contributor, Lord Zuckerman, indicated that many considered the phrase an Americanism, referring to the process of “what the Americans call ‘peer review’” in a 1972 letter to the Correspondence column.31

Funding Bodies in the Postwar United States The late twentieth-century United States seems to be a crucial context for understanding when and how “peer review” came to be seen as central to science. Changes to the way science was funded in the United States provide one potential clue as to why. Between 1948 and 1953, federal spending on scientific research in the United States increased by a factor of twenty-five, adjusted for inflation.32 In the 1950s, as Cold War tensions rose, few Americans questioned funding for scientific research; to many legislators and taxpayers, the idea that the Soviets might be ahead of the Americans in science and technology seemed sufficient justification for any spending. However, the most intense post-Sputnik anxiety soon abated, and as early as the mid-1960s legislators and analysts were questioning whether the government funding for science was yielding desirable results. In 1966, the Department of Defense (DOD) issued the Project Hindsight report, the result of an investigation into the outcomes of DOD funding for scientific research. The Hindsight report concluded that while applied research had yielded massive benefits in the form of new military technologies, the DOD’s investment in basic research had not produced similar advances. The report led many policy makers to question military and government expenditures on basic research.33 There were also questions about whether organizations like the NSF and the NIH were spending their funding wisely. In the early 1960s, Democratic Congressman Lawrence H. Fountain of North Carolina raised concerns that NIH grants were being mismanaged. Fountain had become alarmed after examining some of the NIH’s appropriations requests and learning that scientists could adjust their grant budgets without NIH approval. He pushed for a closer examination of the NIH’s finances, claiming that significant funds were being redirected to nonresearch expenses. Fountain’s investigation brought him into conflict with NIH director James Shannon, a confident, polished physiologist who first made his name as a researcher at the New York University School of Medicine. Initially Shannon seemed to agree that some grants might benefit from closer supervision and agreed to consider new supervisory rules. However, the NIH moved slowly to implement any changes. In March 1962, a frustrated Fountain assembled a subcommittee to evaluate the NIH’s progress and called Shannon to testify.34 At the hearing, Shannon came across as dismissive, even arrogant. He assured the committee that grant recipients “are selected on the basis of a rigorous screening by their scientific peers” and that “all subsequent administrative actions having to do with adjustments of budgets, and so forth, are essentially trivial in relation to this basic selection process.”35 He insisted that scientists themselves were the best judges of how grant money should be spent and that the inner workings of the scientific community naturally safeguarded against fraud or excessive requests. In Shannon’s view, the NIH’s internal grant review process chose the best scientific applicants, and that was sufficient to guarantee that funds were spent in a responsible manner. However, an investigation by Fountain’s committee had revealed that at least one NIH grantee had, indeed, mismanaged funds. The NIH had made several grants to a private company called Public Service Research, Incorporated. In their required accounting documents, Public Service Research had included several expenses that were not strictly research related, such as money to recruit personnel and funds for moving and renovating the company’s offices. NIH policy did not permit those kinds of expenses to be charged to research grants, but the breach had apparently gone unnoticed and had not prevented Public Service Research from being awarded further grants. While no one accused the company of intentional fraud, even Director Shannon was forced to admit that Public Service Research had misspent the awards and that the NIH should have noticed the problems before Fountain’s audit brought them to light.36 The committee questioned Shannon extensively about the NIH’s refereeing procedures, asking him to walk them through exactly how the NIH chose which proposals to fund. At the end of that questioning, they seem to have concluded that the NIH’s method of choosing grants via internal review was unremarkable and unproblematic. Instead, Fountain and his supporters blamed the NIH’s accounting system for the problems. Fountain concluded that NIH grant spending should be much more closely supervised and that grantees should have to obtain approval before changing their budgets, and he rallied his fellow fiscal conservatives to pass legislation requiring NIH grantees to account for their spending more precisely. The result was new and much stricter accounting practices for NIH grant recipients and significant reorganization at the NIH. Postaward accounting was no longer the responsibility of the Division of Research Grants; instead, once grants were awarded, all spending would be overseen by a new and separate grants management division.37 It is significant that although many at the hearings agreed that Public Service Research probably should not have received multiple grants, no one suggested that the NIH ought to rethink its grant reviewing method. They only argued that grant accounting should be more strictly monitored. Internal review by the NIH was still considered an acceptable method of choosing which grants to fund, and the word “referee” appears nowhere in the transcript of the hearings. Insufficient reliance on external referees was not yet a point of criticism. That, however, would soon change.

Controversies at the NSF in the 1970s The 1970s proved to be an even more tenuous time for American governmental science funding. The Cold War entered its period of détente, and tensions between the United States and the USSR temporarily abated. This blunted one of the most persuasive arguments in favor of increased science funding. Furthermore, as the Vietnam War became more and more controversial, ties between scientists and the military came under scrutiny. Antiwar activists on many college campuses protested departments and research institutes that held large military contracts and in some cases persuaded universities to cut formal ties with research institutes that specialized in military work. An increasing number of scientists also expressed concerns over the power of the “military-academic-industrial” complex; a group called “Science for the People” disrupted several meetings of the American Association for the Advancement of Science (AAAS) meetings in the early 1970s, shouting down speakers with military ties (such as Glenn Seaborg, president-elect of the AAAS and the discoverer of plutonium) and decrying the political overtones of certain new scientific theories, most notably sociobiology.38 Meanwhile, the U.S. economy was facing major challenges. The combination of an oil crisis with a period of stagflation led to widespread unemployment, a decrease in real wages, and budget restrictions at a wide range of organizations. Science, for example, adopted strict page limits in response to declining AAAS membership dues and increasing paper prices.39 The federal government was under particularly intense pressure to cut its budget in the face of a decreased tax base and widespread economic anxiety. All of this combined to create an environment where legislators on both sides of the aisle had reason to question government spending on scientific research. By 1975, two Republican Congressmen and a Democratic Senator were vocally criticizing the NSF. Congressman John Conlan (a Republican from Arizona), Congressman Robert Bauman (a Republican from Maryland), and Senator William Proxmire (a Democrat from Wisconsin) each had distinct reasons for questioning the NSF’s budget and grant-making procedures. Conlan was a first-wave member of the Christian Right who was gearing up to run for the Senate, billing himself as a social conservative who would bring Christian values to his government service.40 Two NSF-funded school programs had long been controversial among Arizona’s social conservatives: “Man, a Course of Study” (MACOS) and the “Individualized Science Instructional System” (ISIS). MACOS was a social sciences curriculum that covered the social habits of several different species of animals, as well as the Netsilik Eskimos; critics accused it of promoting moral relativism and focusing on violence, incest, and cannibalism among animals.41 ISIS, a program aimed at fourth graders, came under fire for allegedly including too much explicit discussion of human sexuality. During the hearings over the NSF’s 1976 appropriations request, Conlan stridently criticized both programs and questioned whether the NSF was spending its budget in the interests of the American public. Conlan soon found an ally in Bauman, a rising star in the Republican Party.42 Bauman, a small-government advocate and a founder of the American Conservative Union, began pushing for more fiscal accountability from the NSF. Eventually he proposed an amendment to the 1976 NSF appropriations request requiring the NSF to receive Congressional approval for any grants it wanted to fund. The amendment was approved and added to the appropriations bill. It was Proxmire, however, whose NSF criticisms made the biggest public splash. Proxmire joined the Senate in 1957 after winning a special election to replace the recently deceased Joseph McCarthy. He would go on to a long and colorful Senate career marked by principled, stubborn stances on many controversial issues. He was an early opponent of the Vietnam War, an early advocate of campaign finance reform, and a fierce critic of anything that he perceived as wasteful spending. In March 1975 Proxmire began issuing his Golden Fleece Award, a badge of dishonor that he bestowed each month on the government project that he deemed the worst use of taxpayer money. Proxmire would “award” his Golden Fleece every month from March 1975 until his Senate retirement in 1988. The first two Golden Fleece Awards went to NSF projects. In March, Proxmire singled out a sociological study at the University of Wisconsin about interpersonal attraction. “I believe that 200 million other Americans want to leave some things in life a mystery, and right on top of the things we don’t want to know is why a man falls in love with a woman and vice versa,” Proxmire proclaimed. “So National Science Foundation—get out of the love racket.” The April award went to the psychologist Ronald Hutchinson’s study of why humans, rats, and monkeys clench their jaws in moments of stress. “The funding of this nonsense makes me almost angry enough to scream and kick or even clench my jaw,” said Proxmire. “The good doctor has made a fortune from his monkeys and in the process made a monkey out of the American taxpayer.”43 Proxmire’s Golden Fleeces won widespread media coverage, and letters complaining about those studies began flooding the NSF offices.44 The researchers involved were also singled out for criticism and angry mail—even threats. Proxmire, for his part, would make the NSF a favorite target of the Golden Fleece Awards, insisting that the organization funded frivolous and wasteful projects that were of no use to the American people. The school curriculum controversy—in particular, the criticisms of MACOS—also received a great deal of press coverage.45 But the controversy was not only being carried out in the media. Dozens of letters were exchanged between the NSF, Proxmire, Conlan, and Bauman as the three legislators sought further information on particular NSF grants. By March 1975, the controversy had grown so intense that NSF Director H. Guyford Stever announced that no further funds would be approved “either for MACOS or any other precollegiate science course development … until we have conducted a thorough review of the NSF effort in these areas and reported to the National Science Board and Congress with recommendations.” That decision did not mollify Conlan, who was determined to learn how the NSF had decided to fund ISIS and was frustrated when the foundation refused to give him copies of the grant’s peer review reports: “I would again remind you that I am a Member of Congress on a Committee charged with the oversight of the National Science Foundation. … Consequently, I do again demand that you make available the peer reviewer comments originally demanded by me—in their original and complete form, not paraphrased.”46 NSF leaders, however, argued that peer review depended on the anonymity of the referees and said that the NSF was not obligated to disclose the reports to anyone, including members of Congress. In a letter to Conlan, Stever asserted that referee reports were filed under an “implied promise of confidentiality” that was “well understood and accepted by thousands of reviewers” and that releasing the text of the reports or the names of the reviewers would constitute a breach of trust. In a later letter, he argued that any change to the NSF’s policy of keeping referees anonymous would be at odds with the concept of peer review itself: Maintaining the confidentiality of comments solicited by the Foundation on grant proposals and the identity of the reviewer making these comments is a fundamental principle on which the Peer Review System of the National Science Foundation and also of other agencies is based. … If this confidentiality principle were to be changed, the entire process whereby the wisdom and knowledge of outstanding scientists and educators throughout the country is drawn upon in evaluating the worth of proposals would thereby be changed.47 Confidentiality would continue to be a bone of contention between the NSF and its critics as the conflict progressed. The careful phrase “implied promise of confidentiality” suggests that instructions given to NSF reviewers did not always explicitly promise that the reports would be kept confidential; however, Stever’s response to Conlan was consistent with internal NSF policy. NSF correspondence with both grant applicants and Congressmen routinely stated that it was the NSF’s practice not to disclose the full text of the referee reports or the names of the reviewers. “[Referees] are private citizens who serve voluntarily without compensation, often devoting several hours of their own time to review the proposal,” NSF General Counsel Charles F. Brown told one disgruntled applicant. “If the name of a reviewer were made public, the reviewer might be subject to harassment or badgering by the principal investigator.”48

The NSF Hearings: Scientific Autonomy or Public Accountability? The public debate and behind-the-scenes arguments over the NSF’s funding decisions led to the National Science Foundation Peer Review Special Oversight Hearings, held before the Subcommittee on Science, Research, and Technology in July 1975. Bauman, Conlan, and Proxmire were not members of the committee but were invited to testify at the hearings, as was NSF Director H. Guyford Stever and Deputy Director Richard Atkinson. Over a dozen scientists, sociologists, and other academics were also asked to testify about their opinions on peer review. The hearings generated nearly twelve hundred pages of transcripts and supplemental documents. Those transcripts contain not only discussion of the NSF’s particular review process but much discussion of the concept of peer review itself—its outcomes, its ideal form, and its role in scientific practice. The discussion of the NSF’s review process focused on two key points: first, what criteria should be used to choose which NSF proposals to fund, and, second, how much weight should be given to the referees’ opinions about a proposal. During questioning, both Bauman and Conlan acknowledged that their interest in the NSF stemmed from concerns that some of the funded proposals were not beneficial to the American public. Bauman argued that “we are dealing here with a finite quantity of money taken in the form of taxes from people against their will in a very difficult time in our economy and being spent by a Federal agency, spent in a manner now questioned,” and that the NSF “must demonstrate that it is using tax money in a prudent manner, in such a way that the taxpayers can expect that there will be some payoff from NSF-funded research.”49 Conlan’s complaints were more focused; much of his testimony centered on his objections to MACOS and ISIS and his frustration at the NSF’s refusal to provide him with the full, verbatim referee reports. Conlan described the NSF’s peer review system as a secretive process that allowed program directors to elicit any decision they wanted: “Here is an amazing system, gentlemen, where individual program managers are given carte blanche authority to select peer reviewers. … [It is] a completely arbitrary system that is closed and unaccountable to the scientific community and to the Congress. … It is common knowledge in the science community that NSF program managers can get whatever answer they want out of the peer review system.”50 Conlan’s criticisms of the NSF, however, did not lead him to criticize the concept of peer review in general. Instead, he argued that the NSF’s review system was designed to allow the directors, rather than the reviewers, to make decisions about the quality of a proposal. Because no one outside the NSF was allowed to see the reports, said Conlan, neither scientists nor Congressmen could verify that NSF staff members were listening to the referees. Conlan claimed that NSF program managers occasionally went so far as to misrepresent their referees’ comments. He recounted a conversation with one of the ISIS peer reviewers, Philip Morrison of MIT, and reported that Morrison had been very critical of the proposal and was surprised to learn that it received funding.51 Conlan thus proposed that he and his fellow Congressional critics were acting in defense of proper peer review. Referees, not directors, should be trusted to determine which proposals were most deserving of funding—and if all NSF reports were made available, NSF program officers would not be able to manipulate or discount their contents so easily. He therefore argued that all peer review reports, for funded and unfunded grants, should be available to Congress and to grant applicants. In Conlan’s view, this would stop the NSF from making any further questionable grants: “if peer review and NSF grants management are open, I think the area of faulty grantmanship or inadvisable or incompetent awards will take care of themselves.” Conlan also suggested that the reviewers’ names ought to be available to applicants, arguing that this would encourage more objectivity in the refereeing process: reviewers would work harder to be fair if their names would be attached to their reports.52 Unsurprisingly, the NSF leadership defended the foundation against the charges of secrecy and cronyism. Stever vigorously defended the role of program officers, arguing that their expertise and professionalism were crucial to the NSF’s work. However, he said, “it is the Foundation’s internal practice to make sure that no member of its professional staff, no matter how competent, is permitted to make a decision in private on behalf of the Foundation.” Stever attributed the Golden Fleece and curriculum controversies not to flaws in the NSF’s review process but to a gap between public expectations and scientific realities. “Society is expecting more of science,” he said. “Some of these expectations are reasonable, but some are unreasonable expectations of magical results on a faster timetable than ever will be realized.” Because the benefits of scientific research would not always be immediate or obvious, said Stever, the foundation believed that proposals should be judged on the quality of the science. “The foundation never supports frivolous research,” he insisted. “Scientific excellence is the primary criterion for NSF support.”53 The only way to choose proposals based on scientific excellence, Stever and Atkinson argued, was through a peer review process—by placing proposals in the hands of those who were best qualified to know what scientific excellence looked like. The foundation consulted with “members of the scientific community who are expert in the specialty or discipline involved before making a recommendation to support or to decline a specific proposal,” Stever explained. “The persons consulted are expected to be the scientific peers of those proposing to do the research; thus the growth of the term ‘peer review.’”54 Stever repeatedly insisted that these peers could give candid advice only if their identities were kept confidential; otherwise, reviewers might fear professional reprisal if they did not recommend particular proposals, especially those by senior colleagues in their field. The NSF’s critics and defenders disagreed about whether the NSF itself had done a good job with its refereeing procedures, but in many ways the claims they made about peer review itself were remarkably similar. Bauman, Conlan, and Stever all argued that peer review was critical and that referee opinions should be weighed heavily when determining which NSF proposals should receive funding—indeed, Conlan’s major criticism was that referee opinions were not given enough weight at the NSF. However, Conlan and Bauman both argued that the NSF’s peer reviewers could perform their most important work if their identities were public knowledge. Stever vehemently disagreed, arguing that if reviewers had to sign their names to their reports they might face pressure to give favorable reviews to particular proposals. Referees could be free to speak their minds only if they knew that they would remain anonymous. His comments cast the referee as a person who was performing a selfless service to science with no hope of reward, but also as a potentially malleable figure who might succumb to external pressure if his or her identity became known. This disagreement over anonymity reveals a major difference between the NSF’s critics and its supporters. NSF leaders supported anonymous peer review because, they argued, it was the only way to ensure candid and accurate feedback about proposals. Legislative critics, on the other hand, felt that the NSF’s decisions should be more transparent to legislators, taxpayers, and the scientists who applied for grants. In other words, the NSF’s priority was to ensure scientific excellence; the critics’ priority was to make the process accountable to the public.

We Ask Too Much of it: The Limits of Peer Review? Not everyone at the 1975 hearings, however, was convinced that peer review ensured either scientific excellence or public accountability. William D. Carey, the Executive Officer of the AAAS, argued that Congress and the NSF might be asking too much of peer review. Peer review, he told the committee, “is not a fail-safe procedure, nor one that should silence disagreements on the theory that there is something sacramental about it.” Carey endorsed the older system of using NSF peer review reports as advisory documents, saying, “Peer review has its uses as a first round of proposal screening, but it does not absolve the Government program manager from full responsibility for the decision to fund or reject a proposal. … There are some things that we should not ask of peer review. We should not ask it to take Government agencies off the hook on the question of protecting the public purse.”55 It was Proxmire, however, who expressed the most negative ideas about peer review. In his submitted testimony, he wrote, “I have received a number of letters pointing out that the peer review system serves to perpetuate the funding of established people, ideas and institutions.” Proxmire seemed to hold out little hope that peer review could be anything other than “incestuous.” He argued that NSF reviewers would favor proposals whose research had been published in peer-reviewed journals, that peer-reviewed journals would look favorably on articles that had received NSF funding, and that “we come full circle, when we realize that many of the top researchers in a scientific field have been recipients of NSF grants, reviewers of NSF grants and finally editors of their technical journals.”56 But interestingly, any limitations of peer review received little discussion during the testimony of the NSF officials. Almost all of the actors at the 1975 hearings agreed that peer review was the appropriate mechanism for evaluating grants. Peer review, according to those at the hearings, was “a fair and effective system,” “an indispensable component of the decisionmaking process for allocating funds,” and “an integral feature of ‘the scientific method.’” Subcommittee chair James Symington seemed to sum up the hearings’ attitude toward peer review when he said: “Witnesses overwhelmingly agree that some form of peer review should continue to be used to assist in the allocation of funds for scientific research. To be sure, no witness—including the NSF—has claimed that the Foundation’s peer review system and its decisionmaking ability are perfect. … Nevertheless, it appears that, as a base to work from, the peer review concept is seen as fundamentally sound.”57 In the 1960s, the term “peer review” was only beginning to be introduced; by 1975, peer review was seen as a central part of scientific knowledge making in the United States. The belief that peer review was “fundamentally sound” strongly shaped the NSF’s reactions to the criticisms. Most significantly, Stever announced at the hearings that as of 1 January 1976 applicants would receive complete, verbatim copies of their referee reports with the reviewers’ names redacted. He argued that keeping the referees anonymous was the only way to ensure their candor and incorruptibility but offered the verbatim reports as a compromise. Stever and Atkinson also indicated that in the future the NSF would lean more heavily on peer review, and less heavily on the judgment of NSF staff, when deciding which grants to fund. A new audit office would ensure that funding decisions placed appropriate weight on positive and negative referee reports. Finally, the NSF commissioned a report on its peer review process from the RAND Corporation to learn more about scientists’ views as to its efficacy and fairness.58 Meanwhile, although the NIH was not directly involved in the 1975 controversy, NIH leadership—including the new director, Donald H. Fredrickson—watched the debate closely. In response to the uproar, the NIH took some precautionary steps to reform its own peer review system. Between 1975 and 1978, the NIH gradually gave external reviewers a more central role in deciding which grant proposals should be funded. Like the NSF, the NIH also began providing more detailed comments about proposals to NIH applicants.59 Notably, both the NSF’s and the NIH’s reforms placed more emphasis on the opinions of external referees. The contrast with the NIH hearings in the 1960s is striking. In 1962, even with NIH grants under fire, few questioned the NIH’s internal review system. In 1975, however, NSF employees’ power to determine which grants should be funded was a major point of criticism. Funding body employees were individuals with biases; peer review, however, could be relied on to produce answers about what was and was not good science. The judgment of the anonymous referees could serve as a stand-in for the judgment of the scientific community as a whole. Individuals could not be trusted, but the system of peer review could. Peer review was thus elevated from an optional bureaucratic process to a system that was supposed to ensure the quality and trustworthiness of science.

Conclusion Few walked away from the 1975 NSF hearings completely satisfied with the outcome. The NSF’s education programs were significantly downsized, and funding for MACOS and ISIS was largely eliminated. However, Conlan and Bauman did not get the Congressional oversight of NSF grants they had advocated, and the Bauman amendment was removed from the NSF’s appropriations request. Furthermore, the NSF reforms were not quite as sweeping as they might have seemed. NSF program officers retained a great deal of influence over funding decisions—influence that grew when the NSF changed the name of its review process from “peer review” to “merit review” in the 1980s, emphasizing that concerns besides scientific excellence (such as national interests) would be part of the decision-making process.60 The 1975 peer review controversies are most noteworthy not because of their effect on the actual practice of refereeing at funding bodies but because of their implications for public conceptions of peer review. The hearings provided a moment for stakeholders with different views about science and its funding to debate what the practice of peer review was for. What emerged at the hearings was a general (though not unanimous) consensus that the NSF—and any other organization—had to rely on external referees in order to judge “good science” properly. At the hearings, peer review was cast as a process that was crucial to the way science worked, one that had to be preserved and defended in order for science to work properly in the future.61 That vision of peer review has endured and expanded, resulting in headlines like the one dubbing the Higgs boson “actual science.” However, Carey’s and Proxmire’s concerns about peer review have also endured. In fact, peer review seems to be in a moment of crisis. In recent years, high-profile papers have passed peer review only to be heavily criticized after publication or retracted amid allegations of fraud.62 Some studies of peer review outcomes have suggested that women and underrepresented minorities are more likely to receive unfavorable referee reports than their colleagues. Other observers have argued that current peer review procedures suppress innovative research, thus contributing to a public perception that most scientific research is irrelevant.63 In 2011, Great Britain’s House of Commons commissioned a report on the state of peer review and concluded that while peer review “is crucial to the reputation and reliability of scientific research,” many scientists believe the system stifles progress, is often biased, and that “there is little solid evidence on its efficacy.”64 There are also significant gaps between peer review in theory and peer review in practice. Although the system of peer review is allegedly supposed to determine what gets published and funded, program officers at funding bodies still shape the type of science their organizations support, and journal editors still retain tremendous power over what appears in their journals’ pages.65 Some scientists wonder if systematic peer review is still the best method of evaluating scientific research and have argued that peer review could be eliminated with little cost to the quality of the scientific literature.66 Scientists and laymen alike are frustrated when peer review does not instantly reward innovative papers with publication in a top journal or bestows grant money on a decorated scientist while leaving more ambitious work by a younger colleague unfunded. Many of the complaints about peer review’s perceived failures tend to assume that it is supposed to distinguish good science from bad science with perfect accuracy. But that is not the purpose for which refereeing was initially designed. The current “crisis” of peer review arguably has its origins in this moment in the 1970s, when the process was cast as the only acceptable method of evaluating scientific quality. The more we have expected of peer review, the more its opportunities to disappoint have expanded. Peer review’s perceived failures may have their roots in the gap between modern expectations of refereeing and the more modest functions it was initially designed to fulfill.