The field of psychology is in the middle of a painful, deeply humbling period of introspection. Long-held psychological theories are failing replication tests, forcing researchers to question the strength of their methods and the institutions that underlie them.

Take Joseph Hilgard, a psychologist at the University of Pennsylvania.

In a recent blog post titled "I was wrong," he fesses up to adding a shoddy conclusion to the psychological literature (with the help of colleagues) while he was a graduate student at the University of Missouri. "[W]e ran a study, and the study told us nothing was going on," he writes. "We shook the data a bit more until something slightly more newsworthy fell out of it."

This a bold and honest move — the type that gives me reasons to be optimistic for the future of the science. He's confessing to a practice called p-hacking, or the cherry-picking of data after an experiment is run in order to find a significant, publishable result. While this has been commonplace in psychology, researchers are now reckoning with the fact that p-hacks greatly increase the chances that their journals are filled with false positives. It's p-hacks like the one Hilgard and his colleagues used that gave weight to a theory called ego depletion, the very foundation of which is now being called into question.

Ego depletion is a theory that finds when a task requires a lot of mental energy — resisting temptations, regulating emotions — it depletes a store of internal willpower and dulls our mental edge. A forthcoming paper in Perspectives on Psychological Science finds no evidence of ego depletion in a trial of more than 2,000 participants across a couple dozen labs.

The study from Hilgard and his colleagues was a spin on a classic ego-depletion experiment. In their test, participants were assigned to play video games of varying levels of violence and difficulty, and then later took a brain teaser to test how much of their willpower had been sapped. The researchers wanted to find out if it was the game's violent content that led to a decrease in willpower or if it was the game's difficulty.

But there was a problem: The experiment found no effects for game violence or for game difficulty.

"So what did we do?" Hilgard writes. "We needed some kind of effect to publish, so we reported an exploratory analysis, finding a moderated-mediation model that sounded plausible enough."

They found that if they ran the numbers accounting for a player's experience level with video games, they could achieve a significant result.

This newfound correlation was weak, the data was messy, and it barely touched the threshold of significance. But it was publishable. What's more, upon drafting the report, the researchers changed their hypothesis to match this finding.

There are a few big problems with this. The biggest is that their experiment was not designed to study experience level as a main effect. If it had, they perhaps would have done a better job of recruiting participants with varying ranges of experience. "Only 25 people out of the sample of 238 met the post-hoc definition of 'experienced player,'" Hilgard writes. Such a small sample leaves the study with much less statistical power to find a real result.

The other big problem is that it adds a confirmatory finding to the ego depletion literature, when in actuality there's a greater chance that the study is inconclusive. On Twitter, Hilgard points out the paper was recently mis-cited to make it seem even more impactful than it really was.

Amusingly, the one citation to this paper is a miscite. (We did not measure aggressive responses.) pic.twitter.com/dlaTw0Cbq0 — Joe Hilgard (@JoeHilgard) March 21, 2016

All of this goes to show how individual instances of p-hacking can snowball into a pile of research that collapses when its foundations are tested.

It's for this reason that many research psychologists are calling for the preregistration of study designs. This would lock researchers into their hypotheses before the first line of data comes in. It would hold them more accountable for recruiting the right type of participants and drawing solid conclusions from them. It would also make it more acceptable to publish a negative finding. Because in the case of ego depletion, it's looking like the negative findings are the most solid.

"It's embarrassing that we structured the whole paper to be about the exploratory analysis, rather than the null results," Hilgard writes.

Ultimately, as I've argued, the fact that people like Hilgard are embracing transparency and publicly admitting to their errors will be good for the science.

He continued: "In the end, I'm grateful that the RRR [registered replication reports] has set the record straight on ego depletion. It means our paper probably won't get cited much except as a methodological or rhetorical example, but it also means that our paper isn't going to clutter up the literature and confuse things in the future."

A growing number of scientists like Hilgard don't want the public's trust just because they wear lab coats and can figure out a way to get published in a fancy journal. They want to earn it by being more honest and open. By making their data more accessible, by preregistering experiments, and by frankly admitting to errors, they just might achieve their goal.

Correction: This article originally misidentified the name of the journal that will publish a replication report on ego depletion. The paper will appear in Perspectives on Psychological Science, not Social Psychological and Personality.