Submitted on November 1, 2011

If you follow education research – or quantitative work in any field – you’ll often hear the term “significant effect." For example, you will frequently read research papers saying that a given intervention, such as charter school attendance or participation in a tutoring program, had “significant effects," positive or negative, on achievement outcomes.

This term by itself is usually sufficient to get people who support the policy in question extremely excited, and to compel them to announce boldly that their policy “works." They’re often overinterpreting the results, but there’s a good reason for this. The problem is that “significant effect” is a statistical term, and it doesn’t always mean what it appears to mean. As most people understand the words, “significant effects” are often neither significant nor necessarily effects.

Let’s very quickly clear this up, one word at a time, working backwards.

In education research, the term “effect” usually refers to an estimate from a model, such as a regression. For example, I might want to see how education influences income, but, in order to isolate this relationship, I need to control for other factors that also affect income, such as industry and experience. Put more simply, I want to look at the average relationship between education and income among people who have the same level of experience, work in the same industry and share other characteristics that shape income. That quantified relationship – usually controlling for a host of different variables - is often called an "effect."

But we can’t randomly assign education to people the way we would a pharmaceutical drug. And there are dozens of interrelated variables that might affect income, many of which, such as ability or effort, can’t even be measured directly.

In good models using large, detailed datasets with a thorough set of control variables, a statistically significant “effect” might serve as pretty good tentative evidence that there is a causal relationship between two variables – e.g., that having more education leads to higher earnings, at least to some degree, all else being equal. Sometimes, it’s even possible for social scientists to randomly assign “treatment” (e.g., merit pay programs), or exploit this when it happens (e.g., charter school lotteries). One can be relatively confident that the results from studies using random assignment, assuming they're well-executed, are not only causal per se, but also less likely to reflect bias from unmeasured influences. Even in these cases, however, there are usually validity-related questions left open, such as whether a program’s effect in one context/location will be the same elsewhere.

So, in general, when you hear about “effects," especially those estimated without the benefit of random assignment, it's best to think of them as relationships or associations that are often (but not nearly always) causal to some extent, though the estimate of that association’s size varies in its precision, and the degree to which it reflects the influence of unmeasured factors.

Then there’s the term “significant." “Significant” is of course a truncated form of “statistically significant." Statistical significance means we can be confident that a given relationship is not zero. That is, the relationship or difference is probably not just random “noise." A significant effect can be either positive (we can be confident it’s greater than zero) or negative (we can be confident it’s less than zero). In other words, it is “significant” insofar as it’s not nothing. The better way to think about it is “discernible." There’s something there.

In our education/income example, a “significant positive effect” of education on income means that one can be confident that, on average, more educated people earn more than people with less education, even when we control for experience, industry and, presumably, a bunch of other variables that might be associated with income.

(Side note: One can also test for statistical significance of simpler relationships that are not properly called "effects," such as whether there is a difference between test scores in one year compared with a prior year.)

Most importantly, as I mentioned in a previous post, an “effect” that is statistically significant is not necessarily educationally meaningful. Remember – significant means that the relationship is not zero, but that doesn’t mean it’s big or even moderate. Quite often, “significant” effects are so small as to be rather meaningless, especially when using big datasets. You need to check the size of the "effect," the proper interpretation of which depends on the outcome used, the type and duration of “treatment” in question and other factors.

For example, today's NAEP results indicated a "significant increase" in fourth and eighth grade math and eighth grade reading, but in all three cases, the increase was as modest as it gets - just one scale score point, roughly a month of "learning." Certainly, this change warrants attention, but it may not square with most people's definition of "significant" (and it may also reflect differences in the students taking the test).

So, summing up, a when you hear that something has a “statistically significant effect” on something else, remember that it’s not necessarily significant or an effect in the common use of those words. It’s best to think of them as “statistically discernible relationships." They can be big or small, they’re not necessarily causal, and they can vary widely in terms of precision.

- Matt Di Carlo