The idea that self-control is like a muscle—temporarily weakened following exertion, but strengthened with practice over time—is an elegant analogy that has grown increasingly popular over the past 15 years or so. This so-called “strength model of self-control” (for a review, see [1]) posits that engaging in self-control (e.g., overriding prepotent responses, ignoring distracting stimuli, making choices) draws from an internal bank of limited self-control resources. Performing an act of self-control diminishes these resources, and thereby reduces the ability to effectively engage in any subsequent act of self-control, a state termed “ego-depletion.”

Support for this ego-depletion effect stems primarily from experiments using the sequential task paradigm (e.g., [2]). In this paradigm, participants are first randomly assigned either to an ego-depletion condition in which they perform a self-control task (e.g., suppressing facial expressions, performing the incongruent trials of a Stroop task) or to a corresponding control condition in which they perform a task assumed to require less self-regulatory effort (e.g., freely expressing emotions, performing the neutral or congruent trials of the Stroop task). After the initial task, all participants perform a different task (often referred to as the outcome task) assumed to also rely on self-control. Worse performance on the outcome task by depletion participants compared to control participants is interpreted as evidence for the ego-depletion phenomenon and for the strength model of self-control.

Since the initial demonstration of this phenomenon [2], the ego-depletion effect has been demonstrated across a wide array of domains such as decision-making [3], social rejection [4], and executive functioning [5], and thus appears to be a robust and reliable phenomenon. Indeed, a meta-analysis of 83 ego-depletion articles reported that although some published studies failed to replicate the phenomenon, the vast majority of published studies demonstrate evidence consistent with ego-depletion, with a large average effect size of 0.62 (Cohen’s d; [6]). Although there are some recent attempts to develop theoretical alternatives to the strength model (e.g., [7–8]), even these models seem to be based on the assumption that ego-depletion is a robust phenomenon that must be explained by empirically-grounded theories of self-control.

Despite such prior evidence reported in the literature, some recently published articles have started to cast doubt on not only the magnitude and robustness of the effect but even the very existence of the ego-depletion effect, as will be reviewed shortly. In light of recent controversies surrounding the replicability of some well-known social psychological phenomena, such as behavioral priming [9–11], it seems highly important to rigorously examine how reliable and robust the ego-depletion effect actually is. The present study contributes to such an effort by reporting one pre-registered study that tested a set of a priori hypotheses about the ego-depletion effect and several variables that may potentially moderate the magnitude of the effect.

Is the Ego-Depletion Effect Robust and Reliable?

There are a number of reasons why reexamining the robustness of the ego-depletion effect is needed, despite Hagger et al.’s [6] meta-analytic study reporting a strong average effect size (Cohen’s d = .62). Here, we review several reasons.

Publication bias. In the Hagger et al. [6] meta-analysis, 198 individual experiments were analyzed, but, of those, 47 experiments did not show statistically significant results. There are also more recent failures to replicate the ego-depletion effect [12–13]. The question is how common such replication failures are and how many additional replication failures are in the “file drawer,” given the well-known publication bias (i.e., the reluctance of researchers to submit and for journals to publish null findings). The rate of confirmed hypotheses in published psychology studies is estimated at 92% [14], which is much higher than should be expected given typical effect sizes and statistical power. This points to the strong likelihood of selective reporting of confirmative and significant results [15], leading small or unreliable effects to appear larger and more reliable than they actually are, and possibly even cultivating the illusion that a phenomenon exists when it actually does not [16]. Emerging data-analytic techniques have recently been applied to the ego-depletion literature to assess the credibility of this body of research. One technique, the “incredibility index” (IC index) [17], computes the probability that the set of studies contains fewer non-significant findings than would be credible given their power. Carter and McCullough [18] applied this technique to the studies included in the Hagger et al. [6] meta-analysis. A post-hoc power analysis of the studies estimated average power to be 0.55, which resulted in an IC-index greater than 0.999. That is, there is a 99% probability that the studies included in the Hagger et al. [6] meta-analysis were pulled from a larger body of research, much of it unpublished, that included more null and negative results than the 47 (out of 151) originally reported.

Small-study effects. Another related reason for reexamining the ego-depletion effect is that the majority of studies employed only small samples. Effect sizes from small samples are highly variable (i.e., large standard errors), which will lead to occasionally inflated effect sizes. Small sample sizes are especially problematic in the presence of publication bias. For example, assuming a true effect of zero, small studies would result in both positive and negative effects of highly variable magnitude. However, given the publication bias against reporting non-significant findings and findings that contradict a prevailing theory, only the strong positive effects are made known to the scientific community. In this example, publication bias leads to small study effects (i.e., smaller studies systematically result in different effect sizes than larger studies). Carter and McCullough [18] estimated the influence of small-study effects in the ego-depletion literature by examining the correlation between the magnitude of effect sizes and their standard errors for the studies included in the Hagger et al. [6] meta-analysis, resulting in a significant positive correlation. The majority of previous ego-depletion studies used surprisingly small sample sizes, with an average n of 27 per condition (inter-quartile range between ns of 17 and 31), yielding a power ranging from 0.31 to 0.69, lower than the recommended 0.80. These small studies tended to report larger ego-depletion effect sizes than those that used more adequate sample sizes. After statistically controlling for small study effects, Carter and McCullough [16] suggested that the true ego-depletion effect may be smaller than typically reported in the literature (though some disagree with the use of some of these meta-analytic techniques; [19]). Furthermore, although small study effects can have a variety of causes, Carter and McCullough eliminated some alternative explanations and concluded that it was likely due to publication bias. If the true effect is smaller than researchers expect it to be, it will be difficult to detect unless a larger than typical sample size is used.

Potential p-hacking. Given the known publication bias against null and negative findings, another possible reason for reexamining the ego-depletion effect is that pressure to reach statistical significance (i.e., p-value less than .05) may cause researchers to engage in questionable research practices [20] that exploit the flexibility in data collection and data analysis (sometimes called “researcher degrees of freedom”; [21]). This flexibility comes in the form of, but is not limited to, conducting analyses throughout data collection in search of statistical significance (e.g., data peeking), controlling for covariates (e.g., gender) without compelling justification, and hypothesizing after the results are known (HARKing; [22]). Such p-hacking practices, however, lead to inflated effect sizes and an increase in false-positives. These practices appear to be quite widespread in psychology [20, 23], and there is some evidence that such practices may have occurred in the ego-depletion literature. For example, significant findings were occasionally obtained by using one-tailed analyses (e.g., [24]) or by inappropriate rounding of p-values (e.g., reporting a p-value of .054 as p < .05; [25–27]). Also, despite the lack of theoretical justifications for doing so, some studies control for covariates (e.g., frustration experienced during the initial task; [28]) only after the typical analysis is not significant, or remove an entire group (e.g., women; [29]) post-hoc to show a significant ego-depletion effect.

Lack of clear understanding of potential moderator variables affecting the ego-depletion effect. In addition to these methodological issues, another important reason for reexamining the ego-depletion effect is that, despite the popularity of the phenomenon in both the scientific community and the media, there is very little understanding of robust and systematic moderator variables, such as task characteristics and individual differences, that shed light on the circumstances under which the ego-depletion effect is attenuated or intensified. In other words, ego-depletion may be easier to detect with particular types of self-control tasks or with particular types of participants. This is the issue we tackle most directly in the present study. Regarding task characteristics, there is some evidence that task difficulty may moderate the ego-depletion effect, although previous studies addressing this possibility have yielded inconsistent results [5, 30]. In addition, it is possible that the ego-depletion effect may be stronger for longer outcome tasks as they might require more self-control and thus be more likely to lead to ego-depletion. Similarly, individual differences in motivation and effort might moderate the effect. For example, participants who are especially motivated to perform the depletion task might be in a greater state of ego-depletion, and so considering this moderator variable might improve the likelihood of detecting the effect. Another potential moderator of the magnitude of the ego-depletion effect is variation in how closely participants follow the instructions during the depletion phase. For example, in the widely used White Bear task (e.g., [25–27]), participants in the depletion condition are instructed to write down their thoughts while not thinking of a white bear. Assuming that participants are following instructions, this task should require them to engage in self-control. However, even though condition differences in effort have been reported (e.g., [25]), given the nature of the task, it is difficult to objectively measure and verify what participants are actually doing during the task (i.e., suppressing specific thoughts in the depletion condition compared to not suppressing any thoughts at all in the control condition). Similarly, in the widely used video-viewing attention control task, participants view a 6-min video in which video footage of a woman being interviewed appears in one portion of the screen, while words appear one at a time in another portion of the screen (e.g., [5, 31–33]). Those in the depletion condition are instructed to ignore the words and instead focus all of their attention on the woman being interviewed (presumably taxing self-control). Participants in the control condition receive no mention of the words whatsoever even though they are a highly salient feature of the video. Therefore, some control participants are likely to purposely remember them if they think they will be tested on them after the video, while others may exert effort to ignore them. To our knowledge, no ego-depletion studies using this task have explicitly measured whether participants in the depletion condition actually followed instructions or whether participants in the control condition ignored the words, memorized the words, or viewed the video passively. Taken together, these variations in strategies during the depletion task could blur the distinction between the control and depletion conditions.