Failing to reject the null hypothesis is an odd way to state that the results of your hypothesis test are not statistically significant. Why the peculiar phrasing? “Fail to reject” sounds like one of those double negatives that writing classes taught you to avoid. What does it mean exactly? There’s an excellent reason for the odd wording!
In this post, learn what it means when you fail to reject the null hypothesis and why that’s the correct wording. While accepting the null hypothesis sounds more straightforward, it is not statistically correct!
Before proceeding, let’s recap some necessary information. In all statistical hypothesis tests, you have the following two hypotheses:
- The null hypothesis states that there is no effect or relationship between the variables.
- The alternative hypothesis states the effect or relationship exists.
We assume that the null hypothesis is correct until we have enough evidence to suggest otherwise.
After you perform a hypothesis test, there are only two possible outcomes.
- When your p-value is less than or equal to your significance level, you reject the null hypothesis. The data favors the alternative hypothesis. Congratulations! Your results are statistically significant.
- When your p-value is greater than your significance level, you fail to reject the null hypothesis. Your results are not significant. You’ll learn more about interpreting this outcome later in this post.
Related posts: Hypothesis Testing Overview and The Null Hypothesis
Why Don’t Statisticians Accept the Null Hypothesis?
To understand why we don’t accept the null, consider the fact that you can’t prove a negative. A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. It might exist, but your study missed it. That’s a huge difference and it is the reason for the convoluted wording. Let’s look at several analogies.
Species Presumed to be Extinct
Australian Tree Lobsters were assumed to be extinct. There was no evidence that any were still living because no one had seen them for decades. Yet in 1960, scientists observed them. The same thing happened to the Gracilidris Ant and the Nelson Shrew among many others. Dedicated scientists were looking for these species but hadn’t been in the right time and place to observe them. The same idea applies to ghosts!
Lack of proof doesn’t represent proof that something doesn’t exist!
Criminal Trials
In a trial, we start with the assumption that the defendant is innocent until proven guilty. The prosecutor must work hard to exceed an evidentiary standard to obtain a guilty verdict. If the prosecutor does not meet that burden, it doesn’t prove the defendant is innocent. Instead, there was insufficient evidence to conclude he is guilty.
Perhaps the prosecutor conducted a shoddy investigation and missed clues? Or, the defendant successfully covered his tracks? Consequently, the verdict in these cases is “not guilty.” That judgment doesn’t say the defendant is proven innocent, just that there wasn’t enough evidence to move the jury from the default assumption of innocence.
Hypothesis Tests
When you’re performing hypothesis tests in statistical studies, you typically want to find an effect or relationship between variables. The default position in a hypothesis test is that the null hypothesis is correct. Like a court case, the sample evidence must exceed the evidentiary standard, which is the significance level, to conclude that an effect exists.
The hypothesis test assesses the evidence in your sample. If your test fails to detect an effect, it’s not proof that the effect doesn’t exist. It just means your sample contained an insufficient amount of evidence to conclude that it exists. Like the species that were presumed extinct, or the prosecutor who missed clues, the effect might exist in the overall population but not in your particular sample. Consequently, the test results fail to reject the null hypothesis, which is analogous to a “not guilty” verdict in a trial. There just wasn’t enough evidence to move the hypothesis test from the default position that the null is true.
The critical point across these analogies is that a lack of evidence does not prove something does not exist—just that you didn’t find it in your specific investigation. Hence, you never accept the null hypothesis.
Related post: The Significance Level as an Evidentiary Standard
What Does Fail to Reject the Null Hypothesis Mean?
Accepting the null hypothesis would indicate that you’ve proven an effect doesn’t exist. As you’ve seen, that’s not the case at all. You can’t prove a negative! Instead, the strength of your evidence falls short of being able to reject the null. Consequently, we fail to reject it.
Failing to reject the null indicates that our sample did not provide sufficient evidence to conclude that the effect exists. However, at the same time, that lack of evidence doesn’t prove that the effect does not exist. Capturing all that information leads to the convoluted wording!
What are the possible implications of failing to reject the null hypothesis? Let’s work through them.
First, it is possible that the effect truly doesn’t exist in the population, which is why your hypothesis test didn’t detect it in the sample. Makes sense, right? While that is one possibility, it doesn’t end there.
Another possibility is that the effect exists in the population, but the test didn’t detect it for a variety of reasons. These reasons include the following:
- The sample size was too small to detect the effect.
- The variability in the data was too high. The effect exists, but the noise in your data swamped the signal (effect).
- By chance, you collected a fluky sample. When dealing with random samples, chance always plays a role in the results. The luck of the draw might have caused your sample not to reflect an effect that exists in the population.
Notice how studies that collect a small amount of data or low-quality data are likely to miss an effect that exists? These studies had inadequate statistical power to detect the effect. We certainly don’t want to take results from low-quality studies as proof that something doesn’t exist!
However, failing to detect an effect does not necessarily mean a study is low-quality. Random chance in the sampling process can work against even the best research projects!
If you’re learning about hypothesis testing and like the approach I use in my blog, check out my eBook!
Dr. Dilip Raj , Senior Professor, Dept of Preventive Medicine, SMS Medical College, Jaipur, INDIA. says
Thank you very much for explaining the topic. It brings clarity and makes statistics very simple and interesting. Its helping me in the field of Medical Research.
Roya Mojtahedi says
Hi Jim,
My question is that can I reverse Null hyposthesis and start with Null: µ1 ≠ µ2 ?
Then, if I can reject Null, I will end up with µ1=µ2 for mean comparison and this what I am looking for. But isn’t this cheating?
Jim Frost says
Hi Roya,
That can be done but it requires you to revamp the entire test. Keep in mind that the reason you normally start out with the null equating to no relationship is because the researchers typically want to prove that a relationship or effect exists. This format forces the researchers to collect a substantial amount of high quality data to have a chance at demonstrating that an effect exists. If they collect a small sample and/or poor quality (e.g., noisy or imprecise), then the results default back to the null stating that no effect exists. So, they have to collect good data and work hard to get findings that suggest the effect exists.
There are tests that flip it around as you suggest where the null states that a relationship does exist. For example, researchers perform an equivalency test when they want to show that there is no difference. That the groups are equal. The test is designed such that it requires a good sample size and high quality data to have a chance at proving equivalency. If they have a small sample size and/or poor quality data, the results default back to the groups being unequal, which is not what they want to show.
So, choose the null hypothesis and corresponding analysis based on what you hope to find. Choose the null hypothesis that forces you to work hard to reject it and get the results that you want. It forces you to collect better evidence to make your case and the results default back to what you don’t want if you do a poor job.
I hope that makes sense!
Lou Harland-Jones says
Really appreciate how you have been able to explain something difficult in very simple terms. Also covering why you can’t accept a null hypothesis – something which I think is frequently missed. Thank you, Jim.
Javier says
Hi Jim, I really appreciate your blog, making difficult things sound simple is a great gift.
I have a doubt about the p-value. You said there are two options when it comes to hypothesis tests results . Reject or failing to reject the null, depending on the p-value and your significant level.
But… a P-value of 0,001 means a stronger evidence than a P-value of 0,01? ( both with a significant level of 5%. Or It doesn`t matter, and just every p-Value under your significant level means the same burden of evidence against the null?
I hope I made my point clear.
Thanks a lot for your time.
Jim Frost says
Hi Javier,
There are different schools of thought about this question. The traditional approach is clear cut. Your results are statistically significance when your p-value is less than or equal to your significance level. When the p-value is greater than the significance level, your results are not significant.
However, as you point out, lower p-values indicate stronger evidence against the null hypothesis. I write about this aspect of p-values in several articles, interpreting p-values (near the end) and p-values and reproducibility.
Personally, I consider both aspects. P-values near 0.05 provide weak evidence. Consequently, I’d be willing to say that p-values less than or equal to 0.05 are statistically significant, but when they’re near 0.05, I’d consider it as a preliminary result that requires more research. However, if the p-value is less 0.01, or even better 0.001, then that’s much stronger evidence and I’ll give those results more weight in my evaluation.
If you read those two articles, I think you’ll see what I mean.
Penny Bird says
HI,
I have a quick question that you may be able to help me with. I am using SPSS and carrying out a Mann W U Test it says to retain the null hypothesis. The hypothesis is that males are faster than women at completing a task. So is that saying that they are or are not
Jim Frost says
Hi Penny,
In that case, your sample data provides insufficient evidence to conclude that males are faster. The results do not prove that males and females are the same speed. You just don’t have enough evidence to say males are faster. In this post, I cover the reasons why you can’t prove the null is true.
Ali Qamar says
Hello Sir,
What if I have to prove in my hypothesis that there shouldn’t be any affect of treatment on patients? Can I say that if my null hypothesis is accepted i have got my results (no effect)? I am confused what to do in this situation. As for null hypothesis we always have to write it with some type of equality.
What if I want my result to be what i have stated in null hypothesis i.e. no effect? How to write statements in this case? I am using non parametric test, Mann whitney u test
Jim Frost says
Hi Ali,
You need to perform an equivalence test, which is a special type of procedure when you want to prove that the results are equal. The problem with a regular hypothesis test is that when you fail to reject the null, you’re not proving that they the outcomes are equal. You can fail to reject the null thanks to a small sample size, noisy data, or a small effect size even when the outcomes are truly different at the population level. An equivalence test sets things up so you need strong evidence to really show that two outcomes are equal.
Unfortunately, I don’t have any content for equivalence testing at this point, but you can read an article about it at Wikipedia: Equivalence Test.
Lowell says
Great explanation and great analogies! Thanks.
Wynn says
I got problems with analysis.
I did wound healing experiments with drugs treatment (total 9 groups). When I do the 2-way ANOVA in excel, I got the significant results in sample (Drug Treatment) and columns (Day, Timeline) . But I did not get the significantly results in interactions. Can I still reject the null hypothesis and continue the post-hoc test?
Thank you very much.
Gaurav Bansal says
Hi Jim,
There are so many books covering maths/programming related to statistics/DS, but may be hardly any book to develop an intuitive understanding. Thanks to you for filling up that gap. After statistics, hypothesis-testing, regression, will it be possible for you to write such books on more topics in DS such as trees, deep-learning etc.
I recently started with reading your book on hypothesis testing (just finished the first chapter). I have a question w.r.t the fuel cost example (from first chapter), where a random sample of 25 families (with sample mean 330.6) is taken. To do the hypothesis testing here, we are taking a sampling distribution with a mean of 260. Then based on the p-value and significance level, we find whether to reject or accept the null hypothesis. The entire decision (to accept or reject the null hypothesis) is based on the sampling distribution about which i have the following questions :
a) we are assuming that the sampling distribution is normally distributed. what if it has some other distribution, how can we find that ?
b) We have assumed that the sampling distribution is normally distributed and then further assumed that its mean is 260 (as required for the hypothesis testing). But we need the standard deviation as well to define the normal distribution, can you please let me know how do we find the standard deviation for the sampling distribution ?
Thanks.
henry says
Maybe its the idea of “Innocent until proven guilty”? Your Null assume the person is not guilty, and your alternative assumes the person is guilty, only when you have enough evidence (finding statistical significance P0.05 you have failed to reject null hypothesis, null stands,implying the person is not guilty. Or, the person remain innocent.. Correct me if you think it’s wrong but this is the way I interpreted.
Jim Frost says
Hi Henry,
I used the courtroom/trial analogy within this post. Read that for more details. I’d agree with your general take on the issue except when you have enough evidence you actually reject the null, which in the trial means the defendant is found guilty.
Ahmed K. F. says
Can regression analysis be done using 5 companies variables for predicting working capital management and profitability positive/negative relationship?
Also, does null hypothesis rejecting means whatsoever is stated in null hypothesis that is false proved through regression analysis?
I have very less knowledge about regression analysis. Please help me, Sir. As I have my project report due on next week. Thanks in advance!
Jim Frost says
Hi Ahmed, yes, regression analysis can be used for the scenario you describe as long as you have the required data.
For more about the null hypothesis in relation to regression analysis, read my post about regression coefficients and their p-values. I describe the null hypothesis in it.
Piere says
With regards to the legal example above. While your explanation makes sense when simplified to this statistical level, from a legal perspective it is not correct. The presumption of innocence means one does not need to be proven innocent. They are innocent. The onus of proof lies with proving they are guilty. So if you can’t prove someones guilt then in fact you must accept the null hypothesis that they are innocent. It’s not a statistical test so a little bit misleading using it an example, although I see why you would.
If it were a statistical test, then we would probably be rather paranoid that everyone is a murderer but they just haven’t been proven to be one yet.
Great article though, a nice simple and thoughtout explanation.
Jim Frost says
Hi Piere,
It seems like you misread my post. The hypothesis testing/legal analogy is very strong both in making the case and in the result.
In hypothesis testing, the data have to show beyond a reasonable doubt that the alternative hypothesis is true. In a court case, the prosecutor has to present sufficient evidence to show beyond a reasonable doubt that the defendant is guilty.
In terms of the test/case results. When the evidence (data) is insufficient, you fail to reject the null hypothesis but you do not conclude that the data proves the null is true. In a legal case that has insufficient evidence, the jury finds the defendant to be “not guilty” but they do not say that s/he is proven innocent. To your point specifically, it is not accurate to say that “not guilty” is the same as “proven innocent.”
It’s a very strong parallel.
Mary says
Hi Jim
Just a question, in my research on hypotheses for an assignment, I am finding it difficult to find an exact definition for a hypothesis itself. I know the defintion, but I’m looking for a citable explanation, any ideas?
Jim Frost says
Hi Mary,
To be clear, do you need to come up with a statistical hypothesis? That’s one where you’ll use a particular statistical hypothesis test. If so, I’ll need to know more about what you’re studying, your variables, and the type of hypothesis test you plan to use.
There are also scientific hypotheses that you’ll state in your proposals, study papers, etc. Those are different from statistical hypotheses (although related). However, those are very study area specific and I don’t cover those types on this blog because this is a statistical blog. But, if it’s a statistical hypothesis for a hypothesis test, then let me know the information I mention above and I can help you out!
wan says
Hi, good read, I’m kind of a novice here, so I’m trying to write a research paper, and I’m trying to make a hypothesis. however looking at the literature, there are contradicting results.
researcher A found that there is relationship between X and Y
however, researcher B found that there is no relationship between X and Y
therefore, what is the null hypothesis between X and y? do we choose what we assumed to be correct for our study? or is is somehow related to the alternative hypothesis? I’m confused.
thank you very much for the help.
Jim Frost says
Hi Wan,
Hypotheses for a statistical test are different than a researcher’s hypothesis. When you’re constructing the statistical hypothesis, you don’t need to consider what other researchers have found. Instead, you construct them so that the test only produces statistically significant results (rejecting the null) when your data provides strong evidence. I talk about that process in this post.
Typically, researchers are hoping to establish that an effect or relationship exists. Consequently, the null and alternative hypotheses are typically the following:
Null: The effect or relationship doesn’t not exist.
Alternative: The effect or relationship does exist.
However, if you’re hoping to prove that there is no effect or no relationship, you then need to flip those hypotheses and use a special test, such as an equivalences test.
So, there’s no need to consider what researchers have found but instead what you’re looking for. In most cases, you are looking for an effect/relationship, so you’d go with the hypotheses as I show them above.
I hope that helps!
Jan says
Great, deep detailed answer. Appreciated!
JOSE ANDRES PEREZ MENDOZA says
Thank you for explaining it too clearly. I have the following situation with a Box Bohnken design of three levels and three factors for multiple responses. F-value for second order model is not significant (failing to reject null hypothesis, p-value > 0.05) but, lack of fit of the model is not significant. What can you suggest me about statistical analysis?
Jim Frost says
Hi Jose,
Are your first order effects significant?
You want the lack of fit to be nonsignificant. If it’s significant, that means the model doesn’t fit the data well. So, you’re good there! 🙂
Nate says
Hi Jim,
thank you for all the explicit explanation on the subject.
However, i still got a question about “accepting the null hypothesis”. from textbook, the p-value is the probability that a statistic would take a value that is as extreme as or more extreme than that actually observed.
so, that’s why when p<0.01 we reject the null hypothesis, because it's too rare (p0.05, i can understand that for most cases we cannot accept the null, for example, if p=0.5, it means that the probability to get a statistic from the distribution is 0.5, which is totally random.
But how about when the p is very close to 1, like p=0.95, or p=0.99999999, can’t we say that the probability that the statistic is not from this distribution is less than 0.05, | or in another way, the probability that the statistic is from the distribution is almost 1. can’t we accept the null in such circumstance?
ananth says
Wow! This is beautifully explained. “Lack of proof doesn’t represent proof that something doesn’t exist!”. This kinda, hit me with such force. Can I then, use the same analogy for many other things in life? LOL! 🙂
H0 = God does not exist; H1 = God does exist;
WE fail to reject H0 as there is no evidence.
Thank you sir, this has answered many of my questions, statistically speaking! No pun intended with the above.
Jim Frost says
Hi, LOL, I’m glad it had such meaning for you! I’ll leave the determination about the existence of god up to each person, but in general, yes, I think statistical thinking can be helpful when applied to real life. It is important to realize that lack of proof truly is not proof that something doesn’t exist. But, I also consider other statistical concepts, such as confounders and sampling methodology, to be useful keeping in mind when I’m considering everyday life stuff–even when I’m not statistically analyzing it. Those concepts are generally helpful when trying to figure out what is going on in your life! Are there other alternative explanations? Is what you’re perceiving likely to be biased by something that’s affecting the “data” you can observe? Am I drawing a conclusion based on a large or small sample? How strong is the evidence?
A lot of those concepts are great considerations even when you’re just informally assessing and draw conclusions about things happening in your daily life.
Geetanjali Kapoor says
Dear Jim, thanks for clarifying. absolutely, now it makes sense. the topic is murky but it is good to have your guidance, and be clear. I have not come across an instructor as clear in explaining as you do. Appreciate your direction. Thanks a lot, Geetanjali
Jim Frost says
Hi Geetanjali,
I’m glad my website is helpful! That makes my day hearing that. Thanks so much for writing!
Brang San says
Hi Jim. I am doing data analyis for my masters thesis and my hypothesis testings were insignificant. And I am ok with that. But there is something bothering me. It is the low reliabilities of the 4-Items sub-scales (.55, .68, .75), though the overall alpha is good (.85). I just wonder if it is affecting my hypothesis testings.
Saumya Srivastava says
Thank you sir for replying, yes sir we it’s a RCT study.. where we did within and between the groups analysis and found p>0.05 in between the groups using Mann Whitney U test. So in such cases if the results comes like this we need to
Mention that we failed reject the null hypothesis? Is that correct? Whether it tells that the study is inefficient as we couldn’t accept the alternative hypothesis. Thanks is advance.
Jim Frost says
Hi Saumya, ah, this becomes clearer. When ask statistical questions, please be sure to include all relevant information because the details are extremely important. I didn’t know it was an RCT with a treatment and control group. Yes, given that your p-value is greater than your significance level, you fail to reject the null hypothesis. The results are not significant. The experiment provides insufficient evidence to conclude that the outcome in the treatment group is different than the control group.
By the way, you never accept the alternative hypothesis (or the null). The two options are to either reject the null or fail to reject the null. In your case, you fail to reject the null hypothesis.
I hope this helps!
Saumya Srivastava says
Sir,
p value is0.05, by which we interpret that both the groups are equally effective. In this case I had to reject the alternative hypothesis/ failed to reject null hypothessis.
Saumya Srivastava says
sir, within the group analysis the p value for both the groups is significant (p0.05, by which we interpret that though both the treatments are effective, there in no difference between the efficacy of one over the other.. in other words.. no intervention is superior and both are equally effective.
Jim Frost says
Hi Saumya,
Thanks for the additional details. If I understand correctly, there were separate analyses before that determined each treatment had a statistically significance effect. However, when you compare the two treatments, there difference between them is not statistically significant.
If that’s the case, the interpretation is fairly straightforward. You have evidence that suggests that both treatments are effective. However, you don’t have evidence to conclude that one is better than the other.
Saumya Srivastava says
Hi thank you for a wonderful explanation. I have a doubt:
My Null hypothesis says: no significant difference between the effect fo A and B treatment
Alternative hypothesis: there will be significant difference between the effect of A and B treatment.
and my results show that i fail to reject null hypothesis.. Both the treatments were effective, but not significant difference.. how do I interpret this?
Jim Frost says
Hi Saumya,
First, I need to ask you a question. If your p-value is not significant, and so you fail to reject the null, why do you say that the treatment is effective? I can answer you question better after knowing the reason you say that. Thanks!
Geetanjali Kapoor says
Dear Jim,
thanks for making stats much more understandable and answering all question so painstakingly.
I understand the following on p value and null.
If our sample yields a p value of .01, it means that that there is a 1% probability that our kind of sample exists in the population. that is a rare event. So why shouldn’t we accept the HO as the probability of our event was v rare. Pls can you correct me.
Thanks, G
Jim Frost says
Hi Geetanjali,
That’s a great question! They key thing to remember is that p-values are a conditional probability. P-value calculations assume that the null hypothesis is true. So, a p-value of 0.01 indicates that there is a 1% probability of observing your sample results, or more extreme, *IF* the null hypothesis is true.
The kicker is that we don’t whether the null is true or not. But, using this process does limit the likelihood of a false positive to your significance level (alpha). But, we don’t know whether the null is true and you had an unusual sample or whether the null is false. Usually, with a p-value of 0.01, we’d reject the null and conclude it is false.
I hope that answered your question. This topic can be murky and I wasn’t quite clear which part you needed clarification.
Abhilash Singh says
Hello Jim,
Thank you for the wonderful explanation. However, I was just curious to know that what if in a particular test, we get a p-value less than the level of significance, leading to evidence against null hypothesis. Is there any possibility that our interpretation of population effect might be wrong due to randomness of samples? Also, how do we conclude whether the evidence is enough for our alternate hypothesis?
Jim Frost says
Hi Abhilash,
Yes, unfortunately, when you’re working with samples, there’s always the possibility that random chance will cause your sample to not represent the population. For information about these errors, read my post about the types of errors in hypothesis testing.
In hypothesis testing, you determine whether your evidence is strong enough to reject the null. You don’t accept the alternative hypothesis. I cover that in my post about interpreting p-values.
Dan says
Hi, I am trying to interpret this phenomenon after my research.
The null hypothesis states that “The use of combined drugs A and B does not lower blood pressure when compared to if drug A or B is used singularly”
The alternate hypothesis states:
The use of combined drugs A and B lower blood pressure compared to if drug A or B is used singularly.
At the end of the study, majority of the people did not actually combine drugs A and B, rather indicated they either used drug A or drug B but not a combination. I am finding it very difficult to explain this outcome more so that it is a descriptive research. Please how do I go about this? Thanks a lot
Lisa Cook says
Hi Jim,
What confuses me is how we set/determine the null hypothesis? For example stating that two sets of data are either no different or have no relationship will give completely different outcomes, so which is correct?
Is the null that they are different or the same?
Jim Frost says
Hi Lisa,
Typically, the null states there is no effect/no relationship. That’s true for 99% of hypothesis tests. However, there are some equivalence tests where you are trying to prove that the groups are equal. In that case, the null hypothesis states that groups are not equal.
The null hypothesis is typically what you *don’t* want to find. You have to work hard, design a good experiment, collect good data, and end up with sufficient evidence to favor the alternative hypothesis. Usually in an experiment you want to find an effect. So, usually the null states there is no effect and you have get good evidence to reject that notion.
However, there are a few tests where you actually want to prove something is equal, so you need the null to state that they’re not equal in those cases and then do all the hard work and gather good data to suggest that they are equal. Basically, set up the hypothesis so it takes a good experiment and solid evidence to be able to reject the null and favor the hypothesis that you’re hoping is true.
Mottakin Ahmed says
Thank you for the explanation. I have one question that.
If Null hypothesis is failed to reject than is possible to interpret the analysis further?
Jim Frost says
Hi Mottakin,
Typically, if your result is that you fail to reject the null hypothesis there’s not much further interpretation. You don’t want to be in a situation where you’re endlessly trying new things on a quest for obtaining significant results. That’s data mining.
Tony says
Hi Jim,
I hope all is well. I am enjoying your blog. I am not a statistician, however, I use statistical formulae to provide insight on the direction in which data is going. I have used both the regression analysis and a T-Test. I know that both use a null hypothesis and an alternative hypothesis. Could you please clarity the difference between a regression analysis and a T-Test? Are there conditions where one is a better option than the other?
Thanks,
Tony
Jim Frost says
Hi Tony,
t-Tests compare the means of one or two groups. Regression analysis typically describes the relationships between a set of independent variables and the dependent variables. Interestingly, you can actually use regression analysis to perform a t-test. However, that would be overkill. If you just want to compare the means of one or two groups, use a t-test. Read my post about performing t-tests in Excel to see what they can do. If you have a more complex model than just comparing one or two means, regression might be the way to go. Read my post about when to use regression analysis.
Vijay says
Hi Jim,
This article is really enlightening but there is still some darkness looming around.
I see that low p-values mean strong evidence against null hypothesis and finding such a sample is highly unlikely when null hypothesis is true.
So , is it OK to say that when p-value is 0.01 , it was very unlikely to have found such a sample but we still found it and hence finding such a sample has not occurred just by chance which leads towards rejection of null hypothesis.
Jim Frost says
Hi Vijay,
That’s mostly correct. I wouldn’t say, “has not occurred by chance.” So, when you get a very low p-value it does mean that you are unlikely to obtain that sample if the null is true. However, once you obtain that result, you don’t know for sure which of the two occurred:
You really don’t know for sure. However, by the decision making results you set about the strength of evidence required to reject the null, you conclude that the effect exists. Just always be aware that it could be a false positive.
That’s all a long way of saying that your sample was unlikely to occur by chance if the null is true.
Amna says
Why do we consult the statistical tables to find out the critical values of our test statistics?
Jim Frost says
Hi Amna,
Statistical tables started back in the “olden days” when computers didn’t exist. You’d calculate the test statistic value for your sample. Then, you’d look in the appropriate table and using the degrees of freedom for your design and find the critical values for the test statistic. If the value of your test statistics exceeded the critical value, your results were statistically significant.
With powerful and readily available computers, researchers could analyze their data and calculate the p-values and compare them directly to the significance level.
I hope that answers your question!
Shazzad Hossain says
If we are not able to reject the null hypothesis. What could
be the solution?
Jim Frost says
Hi Shazzad,
The first thing to recognize is that failing to reject the null hypothesis might not be an error. If the null hypothesis is false, then the correct outcome is failing to reject the null.
However, if the null hypothesis is false and you fail to reject, it is a type II error, or a false negative. Read my post about types of errors in hypothesis tests for more information.
This type of error can occur for a variety of reasons, including the following:
There are various other possibilities, but those are several common problems.
I hope this helps!
Jaid says
Hi Jim,
Thank you so much for this article! I am taking my first Statistics class in college and I have one question about this.
I understand that the default position is that the null is correct, and you explained that (just like a court case), the sample evidence must EXCEED the “evidentiary standard” (which is the significance level) to conclude that an effect/relationship exists. And, if an effect/relationship exists, that means that it’s the alternative hypothesis that “wins” (not sure if that’s the correct way of wording it, but I’m trying to make this as simple as possible in my head!).
But what I don’t understand is that if the P-value is GREATER than the significance value, we fail to reject the null….because shouldn’t a higher P-value, mean that our sample evidence EXCEEDS the evidentiary standard (aka the significance level), and therefore an effect/relationship exists? In my mind it would make more sense to reject the null, because our P-value is higher and therefore we have enough evidence to reject the null.
I hope I worded this in a way that makes sense. Thank you in advance!
Jaid
Jim Frost says
Hi Jaid,
That’s a great question. The key thing to remember is that higher p-values correspond to weaker evidence against the null hypothesis. A high p-value indicates that your sample is likely (high probability = high p-value) if the null hypothesis is true. Conversely, low p-values represent stronger evidence against the null. You were unlikely (low probability = low p-value) to have collect a sample with the measured characteristics if the null is true.
So, there is negative correlation between p-values and strength of evidence against the null hypothesis. Low p-values indicate stronger evidence. Higher p-value represent weaker evidence.
In a nutshell, you reject the null hypothesis with a low p-value because it indicates your sample data are unusual if the null is true. When it’s unusual enough, you reject the null.
I hope that answers your question!
Michael says
There is something I am confused about. If our significance level is .05 and our resulting p-value is .02 (thus the strength of our evidence is strong enough to reject the null hypothesis), do we state that we reject the null hypothesis with 95% confidence or 98% confidence?
My guess is our confidence level is 95% since or alpha was .05. But if the strength of our evidence is 98%, why wouldn’t we use that as our stated confidence in our results?
Jim Frost says
Hi Michael,
You’d state that you can reject the null at a significance level of 5% or conversely at the 95% confidence level. A key reason is to avoid cherry picking your results. In other words, you don’t want to choose the significance level based on your results.
Consequently, set the significance level/confidence level before performing your analysis. Then, use those preset levels to determine statistical significance. I always recommend including the exact p-value when you report on statistical significance. Exact p-values do provide information about the strength of evidence against the null.
ADEKANMBI Dende Ibrahim says
Thank you for sharing this knowledge , it is very appropriate in explaining some observations in the study of forest biodiversity.
Moe Moe Htay says
Thank you so much. This provides for my research
Vishnu Vinjamuri says
If one couples this with what they call estimated monetary value of risk in risk management, one can take better decisions.
AL says
Thank you for providing this clear insight.
Vishnu Vinjamuri says
Nice article Jim. The risk of such failure obviously reduces when a lower significance level is specified.One benefits most by reading this article in conjunction with your other article “Understanding Significance Levels in Statistics”.
Ratnadeep Sharma says
That’s fine. My question is why doesn’t the numerical value of type 1 error coincide with the significance level in the backdrop that the type 1 error and the significance level are both the same ? I hope you got my question.
Jim Frost says
Hi, they are equal. As I indicated, the significance level equals the type I error rate.
Ratnadeep Sharma says
Kindly elighten me on one confusion. We set out our significance level before setting our hypothesis. When we calculate the type 1 error, which happens to be a significance level, the numerical value doesn’t equals (either undermining value comes out or an exceeding value comescout ) our significance level that was preassigned. Why is this so ?
Jim Frost says
Hi Ratnadeep,
You’re correct. The significance level (alpha) is the same as the type I error rate. However, you compare the p-value to the significance level. It’s the p-value that can be greater than or less than the significance level.
The significance level is the evidentiary standard. How strong does the evidence in your sample need to be before you can reject the null? The p-value indicates the strength of the evidence that is present in your sample. By comparing the p-value to the significance level, you’re comparing the actual strength of the sample evidence to the evidentiary standard to determine whether your sample evidence is strong enough to conclude that the effect exists in the population.
I write about this in my post about the understanding significance levels. I think that will help answer your questions!