Sometimes, to get statistically significant results, researchers will adjust analyses in ways that achieve them. This is called p-hacking. Changing outcomes can result in different numbers of patients “qualifying” through inclusion and exclusion criteria in such a way as to change the actual groups being studied.

3) Be careful when designing studies and picking outcomes. Too often, when trying to prove that subjects changed their diet or exercise habits, we simply ask them if they did. This risks getting results influenced by self-report bias. If a study’s focus is an educational intervention that tells students they should walk more and watch less TV, we shouldn’t be surprised that they say they did, even when there’s no change in body fat percentage.

Because interventions tend to be delivered in groups (randomly assigning by classes or schools), it’s important that we analyze results only by groups. There are only as many “participants” as there are groups. Too often, researchers conduct statistics on the individuals, and when they see improvements, it’s because of the differences between groups, not the interventions.

4) Not significant is not significant. Negative results — those that do not back up the hypothesis of the researcher — should not be spun as positive. Researchers are often tempted to argue that these results are clinically significant, or that they have “promise.”

Sometimes, researchers want to test one intervention against an already proven one. If they find that there’s no difference, they conclude that the two are equally effective. This can be a mistake.

5) Don’t assume that an intervention is better than nothing. Most studies conduct a two-sided analysis. This means they look at whether an intervention is better or worse, then consider the results significant if the p-value is less than 0.05. In some studies, though, researchers assume that interventions can only help people lose weight, not gain it. They therefore conduct a one-sided test, which effectively doubles the allowable p-value. Results that would not have been significant become so.