To draw causal conclusions about the efficacy of a psychological intervention, researchers must compare the treatment condition with a control group that accounts for improvements caused by factors other than the treatment. Using an active control helps to control for the possibility that improvement by the experimental group resulted from a placebo effect. Although active control groups are superior to “no-contact” controls, only when the active control group has the same expectation of improvement as the experimental group can we attribute differential improvements to the potency of the treatment. Despite the need to match expectations between treatment and control groups, almost no psychological interventions do so. This failure to control for expectations is not a minor omission—it is a fundamental design flaw that potentially undermines any causal inference. We illustrate these principles with a detailed example from the video-game-training literature showing how the use of an active control group does not eliminate expectation differences. The problem permeates other interventions as well, including those targeting mental health, cognition, and educational achievement. Fortunately, measuring expectations and adopting alternative experimental designs makes it possible to control for placebo effects, thereby increasing confidence in the causal efficacy of psychological interventions.

To draw causal conclusions about the efficacy of a psychological intervention, researchers must compare the treatment condition with a baseline or control group that accounts for improvements caused by factors other than the treatment. In pharmacological research, the control group receives a sugar pill (a placebo) that looks identical to the experimental pill, meaning that participants cannot tell whether they are in the experimental condition or the control condition. Because they are blind to their condition assignment, they should not hold different expectations for the effectiveness of the pill, and any difference between the groups on the outcome measure may be attributed to the effect of the treatment.1

Compared with drug trials, psychological interventions face bigger challenges in accounting for placebo effects. Participants in psychological interventions typically know which treatment they received. For example, participants undergoing an experimental cognitive therapy for anxiety are aware that they are receiving treatment and are likely to expect to improve as a result. Measuring the effectiveness of this therapy by comparing it with a no-treatment control condition would be inadequate because the two groups would have different expectations for improvement, and few scientists would accept such a comparison as compelling evidence that the ingredients of the therapy were responsible for observed improvements. A better comparison would be with an active control group, one that receives a similar therapy that does not specifically target their anxiety.

Many researchers, reviewers, and editors of psychology interventions apparently believe that including an active control group automatically controls for placebo effects. We have come to this conclusion because published papers regularly include causal claims about the effectiveness of an intervention without any attempt to test whether the experimental and control groups shared the same expectations. This failure to control for the confounding effect of differential expectations is not a minor omission—it is a fundamental design flaw that potentially undermines any causal inference. Absent any measurement of expectations, conclusions about the effectiveness of an intervention, whether the intervention is designed to improve education, mental health, well-being, or perceptual and cognitive abilities, are suspect. We should distrust those conclusions just as we discount findings from a drug study in which participants knew they were getting the treatment.

To illustrate how such a lack of verification undermines claims of intervention effectiveness, we examine in detail the claim that action video-game training enhances perceptual and cognitive abilities. We focus on the game-training literature not because it is a particularly egregious example of poor design, but because it is better than most—unlike many other psychology interventions, game-training studies typically include active control conditions that are closely matched to the training condition. Nevertheless, they still do not adequately account for expectation effects.

A Broader Problem Although our example singled out video-game interventions, the placebo problem is pernicious and pervasive, affecting most cognitive interventions in psychology. For example, one highly cited intervention study (Mahncke et al., 2006) compared training with a commercial brain-fitness program with two control groups: an active group and a no-contact control group. The training group completed auditory tasks that adapted to participants’ performance, continuously challenging them. The active control group watched educational DVDs and performed only the pretest and posttest tasks (i.e., their learning from the DVDs was not tested). Compared with the two control groups, the brain fitness group improved more from pretraining to posttraining on a different set of auditory memory tasks. This finding has been used to promote the scientific effectiveness of a commercial brain-fitness training program, but it lacks an adequate control for placebo effects, meaning that it does not provide compelling evidence for the effectiveness of the intervention. First, the similarity of the training tasks to the outcome measures means that the training group probably would have a greater reason to expect improvements; participants who watched DVDs have little reason to expect improved auditory memory performance. The authors took the lack of a difference between the DVD group and the no-contact control as evidence “. . . that there is no meaningful placebo effect.” This inference is premature. The active control group provided no check against a differential placebo effect because it did not equate the expectations to those of the intervention group. Remarkably, the authors concluded that lack of difference between the DVD and no-contact control groups means that “future studies may not need to include both types of control groups.” Dispensing with active control groups altogether would invalidate any conclusions about training effectiveness. Only with an appropriate active control group, one that equates expectations to those of the training group, can an intervention draw a causal conclusion about training effectiveness. As another example, take the exciting claim that adaptive memory exercises improve IQ in both children and adults. Most studies have included only a no-contact control, which does not eliminate placebo effects (e.g., Jaeggi, Buschkuehl, Jonides, & Perrig, 2008; Rudebeck, Bor, Ormond, O’Reilly, & Lee, 2012). In fact, when other researchers did measure expectations (Redick et al., 2012), those who received memory training believed that they had shown improved intelligence, memory, and ability to complete daily activities after training. Similar to our video-game results, in the absence of an active control group that equates for expected performance improvements on each outcome measure, any actual improvements might be explained by perceived or expected benefits rather than actual benefits. Devising an appropriate control condition for a psychological intervention can be challenging. Take the case of the link between playing violent games and aggression (e.g., Anderson et al., 2010; Ferguson & Kilburn, 2009). Participants viewing graphic materials or playing violent games in a lab are likely to expect a link to aggression, or at least more of a link than they would for playing nonviolent puzzle, sports, or racing games. What active control condition could overcome the surface plausibility of that association, thereby eliminating expectation effects and demand characteristics (Ferguson & Dyck, 2012; see Adachi & Willoughby, 2011, for an alternative explanation for violent game effects when games are more closely matched)? Whenever the effect of an intervention maps onto participant beliefs about what should result from an intervention, definitive claims about the effect of the intervention itself are inappropriate. These placebo problems are not limited to cognitive interventions. Take the claim that daily writing improves physical and mental health (see Pennebaker, 1996, for review). In such studies, participants in the experimental group typically write (repeatedly) about personal thoughts and feelings, experienced trauma, or highly emotional issues. In contrast, those in the control condition typically write about trivial topics (e.g., “Describe the outfit you are wearing today in detail” or “Describe the things you do before class on a typical Monday”; Park & Blumberg, 2002). Matching the activity in the experimental and the active control group is laudable, but the two groups presumably differ in their expectations for therapeutic benefits, meaning that any improvements might result from a differential placebo effect. The lack of placebo controls in psychotherapy interventions is not new, and it has been discussed for decades (e.g., Rosenthal & Frank, 1956). But it persists. Consider the newly emerging field of internet-based psychotherapy. Many studies use only waitlist controls, some use control conditions in which participants simply read information about their condition, and others use online support groups with no guidance or interaction with an online therapist (e.g., Carlbring et al., 2011; B. Klein, Richards, & Austin, 2006). Without a control for differential expectations, the mechanisms through which these interventions produce their effect (placebo or nonplacebo) are difficult to know.

A Way Forward The lack of masked condition assignment in psychological interventions is not a minor inconvenience—it is a fundamental design flaw, and experimenters have an obligation to test for the possible consequences of these design limitations. Although some have claimed that placebo control groups in psychological interventions, such as ones examining the effect of game play on cognition, are impossible (Bavelier & Davidson, 2013), that limitation does not excuse researchers from the requirement to account for expectation effects before inferring that an intervention was effective. There are methods to measure and account for the influence of differential expectations and demand characteristics. These include explicitly assessing expectations, carefully choosing outcome measures that are not influenced by differential expectations, and using alternative designs that manipulate and measure expectation effects directly. Assessing expectations Our surveys illustrate one approach that can test for the possibility of differential placebo effects in already- published intervention studies. Using Amazon Mechanical Turk, we found that the active control conditions typically used in video-game studies provide an inadequate baseline because participants believe that the action-game treatments will produce bigger improvements in visual processing than will the control games. The same approach could be used for other interventions, both as a check for placebo problems and as a way to choose outcome measures for future interventions. For example, participants undergoing an aerobic exercise intervention show greater cognitive improvements than do those in stretching and toning control groups (e.g., Colcombe & Kramer, 2003; Kramer et al., 1999). By recruiting a separate group of participants, describing each intervention (or having them participate in one or two sessions), and checking their expectations, it would be possible to test whether differential expectations are consistent with the pattern of training benefits. Although this method might not generate expectations as strong as engaging in the entire intervention, it could be one of several checks on expectations, and it could help when selecting the most appropriate active control task. In the case of exercise and cognition, we suspect the pattern of expectations would be comparable for the treatment and control conditions, but without empirical verification, differential placebo effects are a possibility. Again, the lack of such tests is not a minor omission—such checks are a necessary precondition for causal claims given the lack of a truly double-blind design. In addition to checking for differential expectations after the fact, researchers could test for them during the study itself (e.g., O. Klein et al., 2012; Orne, 1969). This method has the advantage that expectation and improvement can be measured in the same subjects (the danger, though, is that tests of expectancy may be reactive). As a hypothetical example, consider a driving intervention aimed at reducing reaction time to road hazards in a sample of older drivers. If participants in each condition were asked to report their beliefs in the efficacy of the training, the pattern shown in Figure 2a would be comforting: Participants’ beliefs are not systematically related to the degree of improvement. The pattern in Figure 2b would be cause for concern, though: It is consistent with an effect driven by expectations rather than the treatment.3 See Serfaty, Csipke, Haworth, Murad, and King (2011) for a careful consideration of potential expectation effects in the depression literature and Redick et al. (2012) in the cognitive training literature. Download Open in new tab Download in PowerPoint Choosing the right tasks Even better than measuring expectations during a study or after the fact would be to choose an active control task or outcome measure on the basis of an independent assessments of expectations. For example, a game- training study could choose an outcome measure that shows no difference in expectations between the action game and control game but that the hypothesis predicts should benefit from action-game training. An even stronger manipulation would choose an outcome measure in which participants expect to benefit more from the control game. If training on the action game then produced greater improvements, the effect could not result from differential expectations. Note that differential expectations do not necessarily account for differential improvements; such expectations might not have causal potency either, and differential expectations might not produce differences in actual performance across conditions. However, the presence of differential expectations undermines claims about the power of a treatment. Only by isolating the active ingredient of the experimental treatment can we draw firm causal conclusions about its impact. One possible way to isolate a treatment effect from differential expectations is to demonstrate, empirically, that expectations cannot influence performance on an outcome measure. A task that is objectively impervious to experimentally increased motivation or expectations should be less subject to placebo effects in a training study. For example, if task performance is unchanged by giving a large incentive for good performance, then different expectations for improvement on that task might have little effect. Such a null effect of motivation on task performance provides a check on the causal potency of differential expectations. Researchers could take this approach one step further by maximizing motivation to perform well on the pretraining tasks. If subjects are highly motivated and incentivized to perform well during the pretest, then any further improvements are less likely to result solely from expectations of improvement. This procedure would provide a better baseline to isolate the effect of the treatment. As we note later, however, expectations can have effects that go beyond increasing motivation to perform well. Alternative designs When it is ethical, experimenters could manipulate expectations directly to test whether a particular outcome measure is sensitive to expectation effects (O’Leary & Borkovec, 1978). For example, in a neutral expectancy design, half of each group (experimental and active control) is led to believe that the intervention they are receiving will improve their outcome, whereas the other half is led to have neutral expectations (see Clifasefi, Takarangi, & Bergman, 2006, for an example in the alcohol intoxication literature). In a counterdemand design, participants are led to believe that benefits will accrue only after a specified amount of training or experience, and they are tested before and after this period. By directly manipulating expectations, these designs help isolate the effects of expectation from other effects of an intervention. A dose-response design, in which different groups receive different amounts of treatment, might also be diagnostic; a cognitive-training intervention that produces the same degree of effect on an outcome measure after one training session as after 100 training sessions is suspect. However, dose-response effects could still result from changing expectations as a function of the amount of treatment experienced. Component control manipulations, to some extent, also address the effect of expectancy on outcomes. In this method, a multicomponent intervention serves as the experimental treatment, whereas the same treatment minus one component serves as the control. Given the similarity of each treatment, placebo effects are less likely (although researchers still must test for them). Such designs help isolate the possible mechanisms responsible for improvement (for an example of this method in the video-game and cognition literature, see Brown, May, Nyman, & Palmer, 2012). However, if the active control group still contains enough of the active ingredient, then it might show benefits as well. Although component control designs provide specificity about possible causal mechanisms underlying improvements, they do not necessarily eliminate differential expectations.

“Just a Placebo Effect?” We have discussed placebo effects largely in terms of expectations influencing the motivation to perform well on an outcome measure (e.g., someone devoting more effort to a memory measure after completing memory training because he or she now expects to perform better). However, placebo effects can operate in other ways and take many forms (for review, see Benedetti, Mayberg, Wager, Stohler, & Zubieta, 2005; Price, Finniss, & Benedetti, 2008). Much of the work on the power of placebo effects has focused on pain reduction. Placebos can trigger the release of endogenous opioids and can also reduce pain through nonopioid mechanisms (Montgomery & Kirsch, 1996). Placebo treatments are associated with functional brain changes, including decreased activity in pain-related brain areas (Wager et al., 2004). Placebos also can operate via classical conditioning: If the act of taking medication is associated with a physiological response, an inert placebo can trigger a similar conditioned response (Stockhorst, Steingrüber, & Scherbaum, 2000). Finally, expectancies can affect memory for previous experiences (Price et al., 1999), biasing self-report and subjective outcome measures in favor of an intervention. Placebo effects are real and worthy of explanation in their own right, and we do not mean to dismiss their important (and clinically relevant) effects in medical and psychological interventions. However, whenever researchers want to attribute causal potency to the intervention itself, it is incumbent on them to verify that the improvements are not driven by expectations.

Setting the Bar Too High? Given the challenges inherent in conducting psychology interventions, studies necessarily lack some of the critical controls of a double-blind clinical trial. Even studies with weak control conditions can provide useful speculative evidence for possible causal relationships, though, particularly early in a field’s development. Although expectations can and should be assessed in all intervention studies, when they are not, researchers should temper causal conclusions appropriately and discuss potential placebo effects explicitly. Is it unfair to demand adequate testing of and control for placebo effects in all psychological interventions? We think not, but others may disagree. Below we address several of the more common reactions to these guidelines that we have encountered in our discussions with colleagues and in the literature. The requirement to control for placebo problems will make it too difficult to “get an effect” In other words, imposing a requirement for adequate active control conditions will produce too many false negatives in studies of training benefits (Schubert & Strobach, 2012). Balancing the risk of missing a real effect against the risk of false positives is essential. However, those risks must be considered in light of the consequences of not knowing whether effects are due to the treatment itself or to participants’ expectations. We do not see why controlling for the confound of differential expectations undermines the chances of finding a true benefit if one exists. The early, exploratory stages of research should tolerate less rigorous adherence to methodological standards Perhaps the initial study in a field should have license to use less-than-ideal control conditions to identify possible treatments if the authors acknowledge those limits. Even then, a study lacking appropriate controls risks wasting effort, money, and time as researchers pursue false leads. Moreover, the methods of an initial, flawed study can become entrenched as standard practice, leading to their perpetuation; new studies justify their lack of control by citing previous studies that did the same. For that reason, we argue that any intervention, even one addressing a new experimental question, should include adequate tests for expectation effects. Our methods are better than those used in other psychology intervention studies All intervention studies should use adequate controls for placebo effects, and the fact that other studies neglect such controls does not justify substandard practices. For example, the use of active control conditions in the video-game-training literature is better than the common use of no-contact controls in the working-memory- training literature, but that does not excuse the lack of placebo controls in either. “Everyone else is doing it” does not justify the use of a poor design. Converging evidence overcomes the weaknesses in any individual study, thereby justifying causal conclusions Replication and converging evidence are welcome, but convergence means little if individual studies do not eliminate confounds. In some areas, such as the video-game literature, researchers often appeal to cross- sectional data comparing gamers with nongamers as converging evidence that games cause changes in perception and cognition. Of course nonexperimental studies suffer from a host of other problems (namely third variable and directionality problems), and such designs do not permit any causal conclusions (Boot et al., 2011; Kristjánsson, 2013). Converging evidence is useful in bolstering causal claims only to the extent that we have confidence in the methods of the individual studies providing the evidence.

Final Thoughts Expectation effects and placebo effects are known problems and, in many ways, are interesting in their own right. In some cases, whether improvements result from the treatment or from a placebo effect is irrelevant; if the expectation that a treatment will alleviate anxiety leads to less anxiety, the patient still benefits (although demand characteristics may be more of a concern, in this case leading to “benefits” that appear only in the laboratory). Many treatments in use today might work in part through placebo effects or work better through an interaction between nonplacebo and placebo effects. However, because we are scientists interested in mechanisms of improvement, and our research is funded on the basis of understanding the causal efficacy of treatments, it matters whether improvements are placebo-driven. Only when we know the mechanisms through which improvements occur can we design interventions that tap those mechanisms. Despite full awareness of the reasons for and benefits of double-blind designs, psychologists persist in drawing inappropriate inferences from designs that lack adequate controls. Without measuring and controlling for placebo effects, such studies provide little more than speculation about the causes of improvements. In the case of cognitive interventions, the field has had enough speculation. Researchers, reviewers, and editors should no longer accept inadequate control conditions, and causal claims should be rejected unless a study demonstrably eliminates differential placebo effects. We are hopeful that, with better designs and better checks on placebo effects, future research will provide more compelling evidence for the effectiveness of interventions. We have outlined a number of methods, designs, and approaches that, when considered together, can lead to a better understanding of how psychological interventions induce improvements.

Acknowledgements D. J. Simons and W. R. Boot designed the survey and developed the idea for the article. D. J. Simons implemented the surveys using materials prepared by C. Stutts and C. Stothart. C. Stothart conducted the statistical analyses, and W. R. Boot and C. Stutts prepared figures. W. R. Boot and C. Stothart wrote the first draft of the manuscript, and D. J. Simons and W. R. Boot edited and revised it. Reported data are available online at http://www.openscienceframework.org/project/7EB6A/

Declaration of Conflicting Interests

The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.

Notes 1.

An exception would be if the experimental drug induced noticeable side effects. If so, participants might become unblinded to their condition, requiring additional checks to ensure differential expectations were not responsible for observed changes on outcome measures (e.g., see Moscucci, Byrne, Weintraub, & Cox, 1987). 2.

Expectations for improvement are not the same as motivation to perform well in general, and controlling for overall motivation is not the same as controlling for differential placebo effects. Differences in overall motivation might lead to improvements on all tasks, but differential expectations might lead to improvements that are more selective. Some papers conflate these two effects, arguing that controlling for overall motivation eliminates placebo effects. For example, Cain, Landau, and Shimamura (2012) showed that action-game players did not outperform non–action-game players on a story-memory task and used that lack of a difference to argue against a placebo effect or demand characteristics as an explanation for group differences on other tasks. Our survey shows why that inference is unmerited. Our participants also expected no differential improvement on the story-memory task, but they showed different expectations for other tasks. The concern about inadequate control groups applies both to differences in overall motivation and to differential expectations for individual tasks as a function of the training condition. 3.

Note that it also might be due to awareness of the actual effectiveness of the intervention. The results do not distinguish between these possibilities. A slope difference between the experimental and control conditions would suggest two different mechanisms for improvement, one driven by expectations and one by the intervention.