The issue of nonreplicable evidence has attracted considerable attention across biomedical and other sciences. This concern is accompanied by an increasing interest in reforming research incentives and practices. How to optimally perform these reforms is a scientific problem in itself, and economics has several scientific methods that can help evaluate research reforms. Here, we review these methods and show their potential. Prominent among them are mathematical modeling and laboratory experiments that constitute affordable ways to approximate the effects of policies with wide-ranging implications.

Funding: University of Southampton www.southampton.ac.uk (grant number 512188118 SSF). The work of John Ioannidis is supported by an unrestricted gift from Sue and Bob O'Donnell. METRICS is funded by a grant from the Laura and John Arnold Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Introduction

Serious worries have been voiced concerning a “reproducibility crisis” in many biomedical as well as social sciences; this crisis of confidence is fueled by the observation that numerous established findings may correspond to false positives that cannot be reproduced [1–5]. In response to the aforementioned concerns, several reforms have been put forward in various disciplines, purported to increase reproducibility [6]. Special focus has been placed on reforming researcher incentives [7,8,9], and some specific proposals have attracted considerable attention [10,11,12]. However, the study of behavioral responses to incentives is typically not the main focus of biomedical disciplines.

Behavioral responses to incentives may be evaluated with some modeling approaches followed in economics and related disciplines (e.g., political science). These disciplines have a policy focus, supported by the systematic study of how behavior responds to incentives. Formal economic tools are continually evolving and can be usefully employed for any policy analysis, but as yet they tend to be relatively unknown to the biomedical community. It is important to better understand these tools, especially when so many critical reforms of academic structures and incentives are being proposed. In this paper, our objectives are, first, to illustrate the possible benefits of economic analysis with concrete examples from existing reforms in which this analysis provides new insights and, second, to provide a relatively broad review of the relevant tools that can be employed to assess future reform proposals in biomedical sciences.

Although this review focuses on economics and related disciplines, some of the rigorous tools we review here are also used outside the social sciences (laboratory experimentation is common in psychology, game theory/dynamic modeling is widely used in evolutionary biology, and randomized controlled trials are common in clinical medicine). Clearly, relevant contributions from these disciplines will naturally be included in this review.

Key concepts Social phenomena exhibit a level of complexity and practical or ethical constraints that often make them not easily amenable to direct experimentation. However, the relevant problems may be approximated with mathematical modeling and empirical methods based on modeling. This approach has led to insightful conceptual developments that are worth summarizing. Strategic interaction. The incentives that one individual faces depend on the expectations about others’ behavior, which in turn depends on their incentives and beliefs. A stylized model can illustrate the point: upon submitting an article for publication, a researcher has several possible “strategies.” In particular, she may opt to reveal all relevant details, gloss over important details, or even grossly falsify the evidence. Hence, implementing proposals that increase transparency (e.g., protocol preregistration, sharing of full data, etc.) will affect the relative benefit of each of these options. What each researcher is likely to do depends also on what she expects other researchers will do. “Game Theory” is the mathematical branch of economics that tackles interdependences of this sort. Cost–benefit analysis. Economic models can explicitly address benefits and costs, including “opportunity costs.” Some reform proposals may become unattractive because of the accompanying costs. For example, considerable time and effort may be required to audit labs, replicate experiments, or meticulously prepare raw data for sharing. When this opportunity cost becomes too high, implementing transparency reforms might lead to a worse state of affairs. “Welfare Economics” systematically compares the costs and benefits for society resulting from a policy change. Asymmetric information. Different actors in the scientific environment possess different kinds of useful information. This is important because some agents (funding agencies, the general public, etc.) wish to affect the behavior of others (researchers) with the purpose of achieving certain desirable outcomes—e.g., a greater overall rate of knowledge accumulation. An important branch of “Information Economics” is agency theory, which analyzes what a “principal” needs to consider in order to control the behavior of an “agent” who has superior information. Public goods. When a certain action has a greater benefit for society as a whole than for the individual who chooses it, the action has characteristics of a “public good.” This is notable because in the presence of public goods, if everyone pursues their self-interest, society as a whole loses. In particular, scientific reproducibility can be viewed as a public good. Some scholars dispute that scientific knowledge is a public good, i.e., nobody can be excluded from its benefits. Instead, science may be a “contribution good,” since experts cannot be excluded from benefits but nonexperts can be [13]. Intellectual property versus free competition. The degree to which the government should grant legal protection of intellectual property may be decided based on economic arguments. Current research in biomedicine is often conducted by private entities (such as pharmaceutical companies or entrepreneurial start-ups). Given the obvious trade-off between transparency and trade secrecy, economic reasoning is required in order to analyze the arguments for “stealth research,” which is not shared with the wider scientific community [14].

Mathematical modeling of incentives Of course, tensions between individual and social objectives in the pursuit of science have been acknowledged and recognized for some time [15, 16]. Mathematical modeling can provide a rigorous framework for analyzing the potential effects of policy changes. Moreover, a good model may allow the analyst to uncover and specify mechanisms that would have been unclear otherwise. In particular, game theory is a useful tool to assess possible consequences of institutional reforms on individual incentives and aggregate outcomes. To illustrate, consider a policy of strictly reporting research with perfect honesty, completeness, and thoroughness (e.g., fully implementing reporting guidelines such as CONSORT or Preferred Reporting Items for Systematic Reviews and Meta-Analyses [PRISMA] [17, 18], using proper statistical methods and reporting the full results). Such a policy would try to rule out “lying by omission” (e.g., not reporting all details of the design, especially those that may generate concerns about the study, or using questionable research practices [19,20] that will deliver seemingly more significant and seemingly more robust results) but not conscious overt fraud (e.g., fabrication of data, reporting nonexisting analyses). Assuming that such a policy will not be too cumbersome to implement and monitor (so that misleading omission will indeed be precluded), consider a model of competition for publishing mediated by scientific journals that was developed by Gall and Maniadis [21]. The model aims for simplicity rather than generality, but is well suited to demonstrate the working of game theoretic analysis, revealing the strategic interdependency between different activities that will determine what one should expect from different policies. As suggested by Stephan [22], academic competition can be modeled as a tournament. Assume that researchers compete for one publication spot, and they can spend effort on “sexing up” their result, engaging in either “lying by omission” or conscious fraud. A higher level of cheating offers an advantage in publishing but has higher cost. Nash equilibrium analysis tells us that preventing “mild cheating” will also decrease the frequency of “extreme” cheating and reduce questionable behavior in total. Such strategic complementarity is not uncommon and also appears in a number of other games, such as the well-known paper–scissors–rock game. The result is robust to changes in parameters and model specifications and would support the policy of full disclosure with maximal transparency (Fig 1). From a dynamic point of view, a lower prevalence of questionable behavior today yields more robust findings, which in turn will provide a more solid basis for future research. This will also affect the desirability of engaging in questionable behavior in the future, for instance, by increasing the potential for robust, significant results or raising the cost of questionable behavior. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Modeling the consequences of reporting research with perfect honesty, omitting relevant details, or committing overt fraud. https://doi.org/10.1371/journal.pbio.2001846.g001 Bobtcheff and colleagues [23] point to another detrimental effect of winner-takes-all contests in scientific research: intense competition for attention could lead researchers to compromise on quality in order to be the first to publish a new result. Indeed, recent contributions from rigorous population models using evolutionary tools indicate that small and poor designs tend to yield an advantage in the dynamic publication race [24,25]. The higher the reward for a successful publication, the higher the temptation is to engage in questionable activities. An editor or reader who is aware of this reasoning will therefore discount the evidence or have incentives to check the result more diligently. Lacetera and Zirulia [26] use a mathematical model of the interaction between a researcher and a recipient (e.g., editor or reader), allowing for monitoring by the latter. They find ambiguous effects of policies that reduce the cost of monitoring or increase the rewards of successful publication, depending on the precise parametrization of their model. Discounting findings that are too good to be true lies at the heart of “persuasion games.” A persuasion game has two players: a “sender” that conveys verifiable information and the “receiver” of this information. In applications, the sender role could correspond to a researcher, a reviewer, or a journal, and the counterpart role of the receiver could correspond to a reviewer, an editor, or the general public/general readership, respectively. For instance, for clinical drug trials, their industry sponsors provide empirical evidence on the effectiveness of a drug to decision-makers, e.g., regulators who decide whether to license the drug or clinicians who ponder whether to use it with their patients. The sender has a private interest in convincing the receiver that a certain assertion (e.g., that a drug is effective) is true and may have some degrees of freedom in what information to convey. For instance, one may decide to take multiple looks at the data and stop clinical trials once a desired empirical result emerges or use more readily obtained favorable results from surrogate endpoints. Milgrom [27] summarizes some basic insights from persuasion games. If the information that the sender could have sent is perfectly known, a rational receiver perfectly discounts the sender’s exaggeration and infers the actual information (this is called the “unravelling argument”). Thus, there is no need for external intervention to improve information sharing. Similarly, as for disclosing research procedures, the well-known unravelling results by Grossman and Milgrom [28, 29] would suggest that expert referees will infer the worst from a sender’s lack of transparency, which in turn disciplines the sender. Unfortunately, this is no longer true if the receiver is uncertain about what information the sender could have revealed and what remains opaque or hidden. This insight suggests that a useful policy for reducing false-positives might entail enhancing transparency about the researchers’ degrees of freedom. The sender may also first determine how much research to perform and then what to disclose to the receiver, yielding incentives to conduct an excessive number of trials and to selectively report the best-looking results [30]. A rational receiver will realize this, and the sender will therefore anticipate that very powerful evidence will be needed to convince the receiver. In any equilibrium of the game, the sender will conduct too many trials reaching for the largest possible sample and will reveal all results. The ability to selectively report will induce excessive experimentation by the sender but will benefit society, as this extra knowledge is fully revealed. This result again relies crucially on the receiver’s rationality and his perfect knowledge of the sender’s preferences and his arsenal of questionable research practices. Otherwise, not all information is revealed in equilibrium. The sender may even opt to conceal some information that would otherwise serve his interests (in order to avoid revealing his preferences). In another interesting case, if the sender knows that with some probability he will face a naïve receiver (who takes the information at face value), mandatory disclosure is useful because the sender is likely to conceal some negative results. The effects of strategic interaction are subtle and often yield surprising policy implications, emphasizing the need for an explicit game-theoretic framework. Ottaviani and colleagues [31,32] examine the optimal policies of receivers, such as regulatory authorities in drug approval procedures. Rational authorities will fully anticipate that any approval policy will induce the sender to respond strategically, e.g., by choosing the number of trials until a desirable empirical pattern emerges or fiddling with the assignment of subjects to treatment and control groups. In equilibrium, the authority has correct expectations on the sender’s manipulation and uses this information to interpret the results reported. If the players in this game are rational, the authority will correctly infer all information that is generated by the sender’s experimentation. Since the sender’s information is fully inferred by the receiver, the interesting question is whether certain rules, such as approval standards or transparency requirements, induce the sender to generate more or less information. For instance, Ottaviani and colleagues identify cases where commitment to well-defined approval standards can mitigate problems of excessive research. Felgenhauer and Schulte [33] show that increasing the costs of presenting additional evidence can increase the informational value of a given set of evidence and can be socially beneficial because it “separates wheat from chaff.” Following this reasoning, the informational value of evidence may differ between different fields or journals, reflecting disparities in generating new evidence and the value of being published, respectively. This would suggest that in disciplines in which generating new evidence is cheap (or in disciplines in which articles tend to be submitted to a small number of elite journals, in which the possible reward is higher) standards should be more conservative and demanding than in fields in which generating evidence is more costly or the publication stakes are lower. This model thus suggests a surprising beneficial side-effect of raising the research documentation standards. The mathematical biology/ecology literature has also tackled the issue whether increasing the difficulty of publication (according to some criterion, i.e., statistical significance) could have beneficial effects. Some studies find that liming the communication of research findings can sometimes have beneficial effects on the informational value of observed results [34]. However, other studies find the opposite and argue that their conclusion is driven by the absence of an assumed explicit or implicit cost of publishing or reading articles [35]. Park, Peacey, and Munafò [36] point out that researchers learn about other informed agents’ opinions, adjusting their beliefs about the likely true answer to research questions. Such observational learning may lead to herding (relying more on other researchers’ opinions) and a loss of socially valuable information. Allowing reviewers to have a modicum of subjectivity in their recommendation may mitigate the problem. Accordingly, proposals for introducing a system to achieve more “mechanical decisions” at the review stage may have a negative effect by exacerbating herding. There are many more issues in the design and analysis of research practices that mathematical modeling tools from economics and other disciplines could perhaps fruitfully address. Two examples are incentives in peer review and the role of intermediaries in science. Economic theory can improve our understanding of why incentives for referees are so low [37]. The literature on “platform competition” may be readily applied to examine the role that intermediaries (such as journals, editors, or publishing houses) may play in ensuring credibility of empirical research, for instance, in light of the emergence of open access journals [38].

The role of the lab In recent decades, controlled laboratory experiments have become more popular in economics. These experiments are typically computer-based, use a neutral framing (to avoid priming subjects), and offer nontrivial monetary incentives [39]. Plott [40] argues that the lab can be used as a “testbed” to address the effects of a policy change: “[…] first conduct experiments with a policy (preferably several competing policies) implemented in a simple environment. The outcomes are evaluated according to some pre-specified criteria, such as efficiency, which can be measured in an experimental environment. If performance is sufficiently bad, a policy is to be dropped, and if it shows promise, then the environment is complicated to offer the policy a more complex challenge.” The focus is not only on proof of principle but also on whether a given mechanism works for reasons consistent with the principles behind the mechanism’s design [41]. Roth [42] argues that “design economics” (a combination of economic theory, computation, and experiments) can be used to analyze and test the properties of new institutions. The most well-known application in medicine might be Roth’s market-design approach for reforming the market for new physicians in the United States and Canada (Table 1). In the absence of centralized intervention, this market exhibited a natural inefficiency—the timing of agreements between new doctors and hospitals unraveled to increasingly early dates (even two years before the end of a physician’s training). Kagel and Roth [43] examined experimentally whether mechanisms with good theoretical properties are superior to those that lack such desirable properties. They found that lab behavior reproduces the evidence from natural settings, which lends support to the idea that it is the allocation mechanism that drives differences in the real world rather than uncontrolled differences across markets. Other examples of economic modeling successfully complemented by laboratory experiments include optimal auction design for radio spectrum licenses [44] and studying the consequences of issuing tradable “emission permits” to polluting companies [45]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. The market for new doctors: Economic modeling and experiments. https://doi.org/10.1371/journal.pbio.2001846.t001 The combination of economic theory and laboratory experiments can be fruitfully applied to the problem of reforms in research. For example, policies that aim to alter practices at the journal or funder level are likely to have far-reaching “general equilibrium” effects. This means that entire markets will be affected by the policy change and often more than one market. For this reason, it is difficult for a randomized controlled trial to fully capture the relevant effects, and the economics lab can offer complementary evidence. Consider, for example, an editorial policy that makes mandatory the full documentation necessary for scientific reproducibility. When there is competition across journals, the response of other journals to the policy change will be critical. For example, suppose some journals adopt the policy (e.g., by requiring preregistration and full data and protocol sharing), but others do not. Then the proportion of papers allowing full reproducibility will increase in the former journals [47]. However, this does not imply that the proportion of such papers will increase across the entire field. Authors who benefit from these practices will send their papers preferentially to journals that have adopted the policy and avoid others. The whole “market” may not experience an increase in reproducible practices. A randomized controlled trial at any given journal may then yield a misleading conclusion about the possible consequences of such policy changes. Economic modeling can help simulate the whole “market,” and lab experiments, complemented by rigorous field evidence, can provide useful insights. Theoretical analysis can also identify the likely intensity of a policy intervention required, depending on observable circumstances. For instance, when competition among journals undermines propping up reproducibility, a coordinated, centralized solution is needed. This can be achieved, for example, if authorities such as promotion committees and scientific associations recognize and offer more credit for publications in journals that impose high reproducibility standards. This will induce all journals to shift to a new regime in a concerted manner. Moreover, most research funders are interested in the consequences of their policies according to some criterion, for instance, aiming to maximize the volume of reproducible knowledge from the activities that they support. It might be too costly for them to initiate their assessment by performing a randomized trial. However, they may use economic modeling and the laboratory to attack the problem in a simplified form before embarking on a decision to conduct a costly randomized trial or to scale up a policy plan.

Testing models of researcher incentives Laboratory experiments in economics can also inform realistic mathematical models. Almost all of the persuasion models described above assume that agents would happily deceive others if that would suit their own interests. However, introspection and morality suggest that this might not necessarily be the case. Indeed, while early economics experiments found that more than half of subjects lie often [48,49], many subjects do not lie fully, and the extent of alignment of incentives between the deceiver and the deceived also matters. Hence, there is a need to estimate the precise psychic costs of deceiving. Fischbacher and Föllmi‐Heusi [50] use an experimental design that allows for more honest revelation of pure aversion to lying, net of social influences. About half of participants (students from Zurich) lie in the experiment, with about 22% doing so “completely.” In contrast, when using a similar experimental design for a representative sample of the German population, almost no participants chose to lie [51]. In a recent meta-analysis of experiments sharing this design, Abeler and colleagues [52] found that subjects forego about three-fourths of the potential gains from lying. Gneezy and colleagues [53] categorize behavior into different types and find that lying is increasing in its benefit and shows a small tendency to increase over time. A third of subjects in each period opt to always reveal the truth, while 28% choose the money-maximizing strategy. Psychological experimental studies of unethical behavior focus less on measurement of aggregate cheating and more on revealing the complex nature of behavior under ethical dilemmas. This literature has taught us important lessons. Research misbehavior is likely to take place in a “group setting” (that allows diffusion of responsibility), and it concerns particularly creative people. Both factors tend to be associated with higher tendencies to engage in immoral behavior [54,55]. Moreover, observation of others’ cheating behavior tends to increase our own but only when the perpetrator is identified as an “in-group” member [56]. This points to the need of additional research on how scientists identify with certain groups. This type of experimental evidence is complementary to surveys that tackle scientific misbehavior directly but face possible misrepresentation biases. Fanelli (see [20]) summarizes findings from several disciplines: a majority of researchers are involved in some type of questionable practices, although only 3% admit falsifying or fabricating data. There is a clear need for more survey and experimental evidence that employs researchers as participants and concentrates on a scientific context. An example of such an approach is the recent psychological research by Bakker and colleagues [57], who show that research psychologists have a flawed intuition about the power of their research designs. In summary, laboratory experiments using economic tools hold a double promise. First, they can be used as simple tests of the viability and efficiency of alternative scientific practices (often complementing field evidence). Second, they may illuminate principles of human behavior that are likely to underlie behavior in the research environment and thus inform formal theories of such behavior.