Failing to reject the null hypothesis is an odd way to state that the results of your hypothesis test are not statistically significant. Why the peculiar phrasing? “Fail to reject” sounds like one of those double negatives that writing classes taught you to avoid. What does it mean exactly? There’s an excellent reason for the odd wording!
In this post, learn what it means when you fail to reject the null hypothesis and why that’s the correct wording. While accepting the null hypothesis sounds more straightforward, it is not statistically correct!
Before proceeding, let’s recap some necessary information. In all statistical hypothesis tests, you have the following two hypotheses:
- The null hypothesis states that there is no effect or relationship between the variables.
- The alternative hypothesis states the effect or relationship exists.
We assume that the null hypothesis is correct until we have enough evidence to suggest otherwise.
After you perform a hypothesis test, there are only two possible outcomes.
- When your p-value is less than or equal to your significance level, you reject the null hypothesis. The data favors the alternative hypothesis. Congratulations! Your results are statistically significant.
- When your p-value is greater than your significance level, you fail to reject the null hypothesis. Your results are not significant. You’ll learn more about interpreting this outcome later in this post.
Related post: Hypothesis Testing Overview
Why Don’t Statisticians Accept the Null Hypothesis?
To understand why we don’t accept the null, consider the fact that you can’t prove a negative. A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. It might exist, but your study missed it. That’s a huge difference and it is the reason for the convoluted wording. Let’s look at several analogies.
Species Presumed to be Extinct
Australian Tree Lobsters were assumed to be extinct. There was no evidence that any were still living because no one had seen them for decades. Yet in 1960, scientists observed them. The same thing happened to the Gracilidris Ant and the Nelson Shrew among many others. Dedicated scientists were looking for these species but hadn’t been in the right time and place to observe them. Lack of proof doesn’t represent proof that something doesn’t exist!
In a trial, we start with the assumption that the defendant is innocent until proven guilty. The prosecutor must work hard to exceed an evidentiary standard to obtain a guilty verdict. If the prosecutor does not meet that burden, it doesn’t prove the defendant is innocent. Instead, there was insufficient evidence to conclude he is guilty.
Perhaps the prosecutor conducted a shoddy investigation and missed clues? Or, the defendant successfully covered his tracks? Consequently, the verdict in these cases is “not guilty.” That judgment doesn’t say the defendant is proven innocent, just that there wasn’t enough evidence to move the jury from the default assumption of innocence.
When you’re performing hypothesis tests in statistical studies, you typically want to find an effect or relationship between variables. The default position in a hypothesis test is that the null hypothesis is correct. Like a court case, the sample evidence must exceed the evidentiary standard, which is the significance level, to conclude that an effect exists.
The hypothesis test assesses the evidence in your sample. If your test fails to detect an effect, it’s not proof that the effect doesn’t exist. It just means your sample contained an insufficient amount of evidence to conclude that it exists. Like the species that were presumed extinct, or the prosecutor who missed clues, the effect might exist in the overall population but not in your particular sample. Consequently, the test results fail to reject the null hypothesis, which is analogous to a “not guilty” verdict in a trial. There just wasn’t enough evidence to move the hypothesis test from the default position that the null is true.
The critical point across these analogies is that a lack of evidence does not prove something does not exist—just that you didn’t find it in your specific investigation. Hence, you never accept the null hypothesis.
Related post: The Significance Level as an Evidentiary Standard
What Does Fail to Reject the Null Hypothesis Mean?
Accepting the null hypothesis would indicate that you’ve proven an effect doesn’t exist. As you’ve seen, that’s not the case at all. You can’t prove a negative! Instead, the strength of your evidence falls short of being able to reject the null. Consequently, we fail to reject it.
Failing to reject the null indicates that our sample did not provide sufficient evidence to conclude that the effect exists. However, at the same time, that lack of evidence doesn’t prove that the effect does not exist. Capturing all that information leads to the convoluted wording!
What are the possible implications of failing to reject the null hypothesis? Let’s work through them.
First, it is possible that the effect truly doesn’t exist in the population, which is why your hypothesis test didn’t detect it in the sample. Makes sense, right? While that is one possibility, it doesn’t end there.
Another possibility is that the effect exists in the population, but the test didn’t detect it for a variety of reasons. These reasons include the following:
- The sample size was too small to detect the effect.
- The variability in the data was too high. The effect exists, but the noise in your data swamped the signal (effect).
- By chance, you collected a fluky sample. When dealing with random samples, chance always plays a role in the results. The luck of the draw might have caused your sample not to reflect an effect that exists in the population.
Notice how studies that collect a small amount of data or low-quality data are likely to miss an effect that exists? These studies had inadequate statistical power to detect the effect. We certainly don’t want to take results from low-quality studies as proof that something doesn’t exist!
However, failing to detect an effect does not necessarily mean a study is low-quality. Random chance in the sampling process can work against even the best research projects!