You’ve just performed a hypothesis test and your results are statistically significant. Hurray! These results are important, right? Not so fast. Statistical significance does not necessarily mean that the results are practically significant in a real-world sense of importance.
In this blog post, I’ll talk about the differences between practical significance and statistical significance, and how to determine if your results are meaningful in the real world.
The hypothesis testing procedure determines whether the sample results that you obtain are likely if you assume the null hypothesis is correct for the population. If the results are sufficiently improbable under that assumption, then you can reject the null hypothesis and conclude that an effect exists. In other words, the strength of the evidence in your sample has passed your defined threshold of the significance level (alpha). Your results are statistically significant.
You use p-values to determine statistical significance in hypothesis tests such as t-tests, ANOVA, and regression coefficients among many others. Consequently, it might seem logical that p-values and statistical significance relate to importance. However, that is false because conditions other than large effect sizes can produce tiny p-values.
Hypothesis tests with small effect sizes can produce very low p-values when you have a large sample size and/or the data have low variability. Consequently, effect sizes that are trivial in the practical sense can be highly statistically significant.
Here’s how small effect sizes can still produce tiny p-values:
You have a very large sample size. As the sample size increases, the hypothesis test gains greater statistical power to detect small effects. With a large enough sample size, the hypothesis test can detect an effect that is so minuscule that it is meaningless in a practical sense.
The sample variability is very low. When your sample data have low variability, hypothesis tests can produce more precise estimates of the population’s effect. This precision allows the test to detect tiny effects.
Statistical significance indicates only that you have sufficient evidence to conclude that an effect exists. It is a mathematical definition that does not know anything about the subject area and what constitutes an important effect.
While statistical significance relates to whether an effect exists, practical significance refers to the magnitude of the effect. However, no statistical test can tell you whether the effect is large enough to be important in your field of study. Instead, you need to apply your subject area knowledge and expertise to determine whether the effect is big enough to be meaningful in the real world. In other words, is it large enough to care about?
How do you do this? I find that it is helpful to identify the smallest effect size that still has some practical significance. Again, this process requires that you use your knowledge of the subject to make this determination. If your study’s effect size is greater than this smallest meaningful effect, your results are practically significant.
For example, suppose you are evaluating a training program by comparing the test scores of program participants to those who study on their own. Further, we decide that the difference between these two groups must be at least five points to represent a practically meaningful effect size. An effect of 4 points or less is too small to care about.
After performing the study, the analysis finds a statistically significant difference between the two groups. Participants in the study program score an average of 3 points higher on a 100-point test. While these results are statistically significant, the 3-point difference is less than our 5-point threshold. Consequently, our study provides evidence that this effect exists, but it is too small to be meaningful in the real world. The time and money that participants spend on the training program are not worth an average improvement of only 3 points.
Not all statistically significant differences are interesting!
Related post: Effect Sizes in Statistics
Use Confidence Intervals to Determine Practical Significance
That sounds pretty straightforward. Unfortunately, there is one small complication. The effect size in your study is only an estimate because it is based on a sample. Thanks to sampling error, there is a margin of error around the estimated effect.
We need a method to determine whether the estimated effect is still practically significant when you factor in this margin of error. Enter confidence intervals!
A confidence interval is a range of values that likely contains the population value. I’ve written about confidence intervals extensively elsewhere, so I’ll keep it short here. The crucial idea is that confidence intervals incorporate the margin of error by creating a range around the estimated effect. The population value is likely to fall within that range. Your task is to determine whether all, some, or none of that range represents practically significant effects.
Related posts: How Confidence Intervals Work
Example of Using Confidence Intervals for Practical Significance
Suppose we conduct two studies on the training program described above. Both studies are statistically significant and produce an estimated effect of 9. These effects look good because they’re both greater than our smallest meaningful effect size of 5. However, these estimates don’t incorporate the margin of error. The confidence intervals (CIs) for both studies below provide that crucial information.
Study A’s CI extends from values that are too small to be meaningful (<5) to those that are large enough to be meaningful. Even though the study is statistically significant and the estimated effect is 9, the CI creates doubt about whether the actual population effect is large enough to be meaningful. The CI tells us that if we implement the program on a larger scale, we might produce only an average 3-point increase! We can’t be sure about practical significance after we include the margin of error around the estimate.
On the other hand, the CI for Study B contains only meaningful effect sizes. We can be more confident that the population effect size is large enough for us to care about!
I really like confidence intervals because you can use them to determine both statistical significance (if they exclude zero) and practical significance. Confidence intervals focus on the size of the effect and the uncertainty around the estimate rather than just whether the effect exists.
In closing, statistical significance indicates that your sample provides sufficient evidence to conclude that the effect exists in the population. Practical significance asks whether that effect is large enough to care about. Use statistical analyses to determine statistical significance and subject-area expertise to assess practical significance.