Failing to reject the null hypothesis is an odd way to state that the results of your hypothesis test are not statistically significant. Why the peculiar phrasing? “Fail to reject” sounds like one of those double negatives that writing classes taught you to avoid. What does it mean exactly? There’s an excellent reason for the odd wording!

In this post, learn what it means when you fail to reject the null hypothesis and why that’s the correct wording. While accepting the null hypothesis sounds more straightforward, it is not statistically correct!

Before proceeding, let’s recap some necessary information. In all statistical hypothesis tests, you have the following two hypotheses:

- The null hypothesis states that there is no effect or relationship between the variables.
- The alternative hypothesis states the effect or relationship exists.

We assume that the null hypothesis is correct until we have enough evidence to suggest otherwise.

After you perform a hypothesis test, there are only two possible outcomes.

- When your p-value is less than or equal to your significance level, you reject the null hypothesis. The data favors the alternative hypothesis. Congratulations! Your results are statistically significant.
- When your p-value is greater than your significance level, you fail to reject the null hypothesis. Your results are not significant. You’ll learn more about interpreting this outcome later in this post.

**Related post**: Hypothesis Testing Overview

## Why Don’t Statisticians Accept the Null Hypothesis?

To understand why we don’t accept the null, consider the fact that you can’t prove a negative. A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. It might exist, but your study missed it. That’s a huge difference and it is the reason for the convoluted wording. Let’s look at several analogies.

### Species Presumed to be Extinct

Australian Tree Lobsters were assumed to be extinct. There was no evidence that any were still living because no one had seen them for decades. Yet in 1960, scientists observed them. The same thing happened to the Gracilidris Ant and the Nelson Shrew among many others. Dedicated scientists were looking for these species but hadn’t been in the right time and place to observe them. Lack of proof doesn’t represent proof that something doesn’t exist!

### Criminal Trials

In a trial, we start with the assumption that the defendant is innocent until proven guilty. The prosecutor must work hard to exceed an evidentiary standard to obtain a guilty verdict. If the prosecutor does not meet that burden, it doesn’t prove the defendant is innocent. Instead, there was insufficient evidence to conclude he is guilty.

Perhaps the prosecutor conducted a shoddy investigation and missed clues? Or, the defendant successfully covered his tracks? Consequently, the verdict in these cases is “not guilty.” That judgment doesn’t say the defendant is proven innocent, just that there wasn’t enough evidence to move the jury from the default assumption of innocence.

### Hypothesis Tests

When you’re performing hypothesis tests in statistical studies, you typically want to find an effect or relationship between variables. The default position in a hypothesis test is that the null hypothesis is correct. Like a court case, the sample evidence must exceed the evidentiary standard, which is the significance level, to conclude that an effect exists.

The hypothesis test assesses the evidence in your sample. If your test fails to detect an effect, it’s not proof that the effect doesn’t exist. It just means your sample contained an insufficient amount of evidence to conclude that it exists. Like the species that were presumed extinct, or the prosecutor who missed clues, the effect might exist in the overall population but not in your particular sample. Consequently, the test results fail to reject the null hypothesis, which is analogous to a “not guilty” verdict in a trial. There just wasn’t enough evidence to move the hypothesis test from the default position that the null is true.

The critical point across these analogies is that a lack of evidence does not prove something does not exist—just that you didn’t find it in your specific investigation. Hence, you never accept the null hypothesis.

**Related post**: The Significance Level as an Evidentiary Standard

## What Does Fail to Reject the Null Hypothesis Mean?

Accepting the null hypothesis would indicate that you’ve proven an effect doesn’t exist. As you’ve seen, that’s not the case at all. You can’t prove a negative! Instead, the strength of your evidence falls short of being able to reject the null. Consequently, we fail to reject it.

Failing to reject the null indicates that our sample did not provide sufficient evidence to conclude that the effect exists. However, at the same time, that lack of evidence doesn’t prove that the effect does not exist. Capturing all that information leads to the convoluted wording!

What are the possible implications of failing to reject the null hypothesis? Let’s work through them.

First, it is possible that the effect truly doesn’t exist in the population, which is why your hypothesis test didn’t detect it in the sample. Makes sense, right? While that *is* one possibility, it doesn’t end there.

Another possibility is that the effect exists in the population, but the test didn’t detect it for a variety of reasons. These reasons include the following:

- The sample size was too small to detect the effect.
- The variability in the data was too high. The effect exists, but the noise in your data swamped the signal (effect).
- By chance, you collected a fluky sample. When dealing with random samples, chance always plays a role in the results. The luck of the draw might have caused your sample not to reflect an effect that exists in the population.

Notice how studies that collect a small amount of data or low-quality data are likely to miss an effect that exists? These studies had inadequate statistical power to detect the effect. We certainly don’t want to take results from low-quality studies as proof that something doesn’t exist!

However, failing to detect an effect does not necessarily mean a study is low-quality. Random chance in the sampling process can work against even the best research projects!

Tony says

Hi Jim,

I hope all is well. I am enjoying your blog. I am not a statistician, however, I use statistical formulae to provide insight on the direction in which data is going. I have used both the regression analysis and a T-Test. I know that both use a null hypothesis and an alternative hypothesis. Could you please clarity the difference between a regression analysis and a T-Test? Are there conditions where one is a better option than the other?

Thanks,

Tony

Jim Frost says

Hi Tony,

t-Tests compare the means of one or two groups. Regression analysis typically describes the relationships between a set of independent variables and the dependent variables. Interestingly, you can actually use regression analysis to perform a t-test. However, that would be overkill. If you just want to compare the means of one or two groups, use a t-test. Read my post about performing t-tests in Excel to see what they can do. If you have a more complex model than just comparing one or two means, regression might be the way to go. Read my post about when to use regression analysis.

Vijay says

Hi Jim,

This article is really enlightening but there is still some darkness looming around.

I see that low p-values mean strong evidence against null hypothesis and finding such a sample is highly unlikely when null hypothesis is true.

So , is it OK to say that when p-value is 0.01 , it was very unlikely to have found such a sample but we still found it and hence finding such a sample has not occurred just by chance which leads towards rejection of null hypothesis.

Jim Frost says

Hi Vijay,

That’s mostly correct. I wouldn’t say, “has not occurred by chance.” So, when you get a very low p-value it does mean that you are unlikely to obtain that sample if the null is true. However, once you obtain that result, you don’t know for sure which of the two occurred:

You really don’t know for sure. However, by the decision making results you set about the strength of evidence required to reject the null, you conclude that the effect exists. Just always be aware that it could be a false positive.

That’s all a long way of saying that your sample was

unlikelyto occur by chance if the null is true.Amna says

Why do we consult the statistical tables to find out the critical values of our test statistics?

Jim Frost says

Hi Amna,

Statistical tables started back in the “olden days” when computers didn’t exist. You’d calculate the test statistic value for your sample. Then, you’d look in the appropriate table and using the degrees of freedom for your design and find the critical values for the test statistic. If the value of your test statistics exceeded the critical value, your results were statistically significant.

With powerful and readily available computers, researchers could analyze their data and calculate the p-values and compare them directly to the significance level.

I hope that answers your question!

Shazzad Hossain says

If we are not able to reject the null hypothesis. What could

be the solution?

Jim Frost says

Hi Shazzad,

The first thing to recognize is that failing to reject the null hypothesis might not be an error. If the null hypothesis is false, then the correct outcome is failing to reject the null.

However, if the null hypothesis is false and you fail to reject, it is a type II error, or a false negative. Read my post about types of errors in hypothesis tests for more information.

This type of error can occur for a variety of reasons, including the following:

There are various other possibilities, but those are several common problems.

I hope this helps!

Jaid says

Hi Jim,

Thank you so much for this article! I am taking my first Statistics class in college and I have one question about this.

I understand that the default position is that the null is correct, and you explained that (just like a court case), the sample evidence must EXCEED the “evidentiary standard” (which is the significance level) to conclude that an effect/relationship exists. And, if an effect/relationship exists, that means that it’s the alternative hypothesis that “wins” (not sure if that’s the correct way of wording it, but I’m trying to make this as simple as possible in my head!).

But what I don’t understand is that if the P-value is GREATER than the significance value, we fail to reject the null….because shouldn’t a higher P-value, mean that our sample evidence EXCEEDS the evidentiary standard (aka the significance level), and therefore an effect/relationship exists? In my mind it would make more sense to reject the null, because our P-value is higher and therefore we have enough evidence to reject the null.

I hope I worded this in a way that makes sense. Thank you in advance!

Jaid

Jim Frost says

Hi Jaid,

That’s a great question. The key thing to remember is that higher p-values correspond to weaker evidence against the null hypothesis. A high p-value indicates that your sample is likely (high probability = high p-value) if the null hypothesis is true. Conversely, low p-values represent stronger evidence against the null. You were unlikely (low probability = low p-value) to have collect a sample with the measured characteristics if the null is true.

So, there is negative correlation between p-values and strength of evidence against the null hypothesis. Low p-values indicate stronger evidence. Higher p-value represent weaker evidence.

In a nutshell, you reject the null hypothesis with a low p-value because it indicates your sample data are unusual if the null is true. When it’s unusual enough, you reject the null.

I hope that answers your question!

Michael says

There is something I am confused about. If our significance level is .05 and our resulting p-value is .02 (thus the strength of our evidence is strong enough to reject the null hypothesis), do we state that we reject the null hypothesis with 95% confidence or 98% confidence?

My guess is our confidence level is 95% since or alpha was .05. But if the strength of our evidence is 98%, why wouldn’t we use that as our stated confidence in our results?

Jim Frost says

Hi Michael,

You’d state that you can reject the null at a significance level of 5% or conversely at the 95% confidence level. A key reason is to avoid cherry picking your results. In other words, you don’t want to choose the significance level based on your results.

Consequently, set the significance level/confidence level before performing your analysis. Then, use those preset levels to determine statistical significance. I always recommend including the exact p-value when you report on statistical significance. Exact p-values do provide information about the strength of evidence against the null.

ADEKANMBI Dende Ibrahim says

Thank you for sharing this knowledge , it is very appropriate in explaining some observations in the study of forest biodiversity.

Moe Moe Htay says

Thank you so much. This provides for my research

Vishnu Vinjamuri says

If one couples this with what they call estimated monetary value of risk in risk management, one can take better decisions.

AL says

Thank you for providing this clear insight.

Vishnu Vinjamuri says

Nice article Jim. The risk of such failure obviously reduces when a lower significance level is specified.One benefits most by reading this article in conjunction with your other article “Understanding Significance Levels in Statistics”.

Ratnadeep Sharma says

That’s fine. My question is why doesn’t the numerical value of type 1 error coincide with the significance level in the backdrop that the type 1 error and the significance level are both the same ? I hope you got my question.

Jim Frost says

Hi, they are equal. As I indicated, the significance level equals the type I error rate.

Ratnadeep Sharma says

Kindly elighten me on one confusion. We set out our significance level before setting our hypothesis. When we calculate the type 1 error, which happens to be a significance level, the numerical value doesn’t equals (either undermining value comes out or an exceeding value comescout ) our significance level that was preassigned. Why is this so ?

Jim Frost says

Hi Ratnadeep,

You’re correct. The significance level (alpha) is the same as the type I error rate. However, you compare the p-value to the significance level. It’s the p-value that can be greater than or less than the significance level.

The significance level is the evidentiary standard. How strong does the evidence in your sample need to be before you can reject the null? The p-value indicates the strength of the evidence that is present in your sample. By comparing the p-value to the significance level, you’re comparing the actual strength of the sample evidence to the evidentiary standard to determine whether your sample evidence is strong enough to conclude that the effect exists in the population.

I write about this in my post about the understanding significance levels. I think that will help answer your questions!