P values determine whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!

Ironically, despite being so influential, P values are misinterpreted very frequently. What *is* the correct interpretation of P values? What do P values *really* mean? That’s the topic of this post!

P values are a slippery concept. Don’t worry. I’ll explain P values using an intuitive, concept-based approach so you can avoid a very common misinterpretation that that can cause you serious problems.

## What Is the Null Hypothesis?

P values are directly connected to the null hypothesis. So, we need to cover that first!

In all hypothesis tests, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, and so on. There is some benefit or difference that the researchers hope to identify.

However, it’s possible that there really is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. When you assess the results of a hypothesis test, you can think of the null hypothesis as the devil’s advocate position, or the position you take for the sake of argument.

To understand this idea, imagine a hypothetical study for medication that we know is completely useless. In other words, the null hypothesis is true. There is no difference at the population level between subjects who take the medication and subjects who don’t.

Even though the null hypothesis is correct, we’ll most likely see an effect in the sample because of random sample error. It’s an incredibly rare occurrence for the sample data to exactly match the actual population value. Therefore, the position you take for the sake of argument (devil’s advocate) is that random sample error produces the observed sample effect rather than it being a true effect.

## What Are P values?

P-values indicate the believability of the devil’s advocate case that the null hypothesis is true given the sample data. They gauge how consistent your sample statistics are with the null hypothesis. Specifically, if the null hypothesis is correct, what is the probability of obtaining an effect at least as large as the one in your sample?

- High P-values: Your sample results are consistent with a null hypothesis that is true.
- Low P-values: Your sample results are not consistent with a null hypothesis that is true.

If your P value is small enough, you can conclude that your sample is so incompatible with the null hypothesis that you can reject the null for the entire population. P-values are an integral part of inferential statistics because they help you use your sample to draw conclusions about a population.

**Background information**: Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

## How Do You Interpret P values?

Here is the technical definition of P values:

P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true.

Let’s go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03. You’d interpret this P-value as follows:

If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.

How probable are your sample data if the null hypothesis is correct? That’s the only question that P values answer. This restriction segues to a very frequent and problematic misinterpretation.

**Related post**: Understanding P values can be easier using a graphical approach: How Hypothesis Tests Work: Significance Levels and P-values.

## P values Are *NOT *an Error Rate

Unfortunately, P values are frequently misinterpreted. A common mistake is the notion that they represent the likelihood of rejecting a null hypothesis that is actually true (Type I error). The idea that P values are the probability of making a mistake is WRONG! You can read a blog post I wrote to learn *why* P values are misinterpreted so frequently.

You can’t use P values to directly calculate the error rate for several reasons.

First, P value calculations assume that the null hypothesis is correct. Thus, from the P value’s point of view, the null hypothesis is 100% true. Remember, P values assume that the null is true and the observed sample effect is caused by random sample error.

Second, P values tell you how consistent your sample data are with a null hypothesis that is true. However, when your data are very inconsistent with the null hypothesis, P values can’t determine which of the following two possibilities is more probable:

- The null hypothesis is true, but your sample is unusual due to random sample error.
- The null hypothesis is false.

To figure out which option is true, you must apply expert knowledge of the study area and, very importantly, assess the results of similar studies.

Going back to our medication study, let’s highlight the correct and incorrect way to interpret the P value of 0.03:

**Correct:**Assuming the medication has zero effect in the population, you’d obtain the sample effect, or larger, in 3% of studies because of random sample error.**Incorrect:**There’s a 3% chance of making a mistake by rejecting the null hypothesis.

Yes, I realize that the incorrect definition seems more straightforward, and that’s probably why it is so common. But, using this definition gives you a false sense of security, as I’ll show you next.

**Related posts**: See a graphical illustration of how t-tests and the F-test in ANOVA produce P values.

## What Is the True Error Rate?

The difference between the correct and incorrect interpretation is not just a matter of wording. There is a fundamental difference in the amount of evidence against the null hypothesis that each definition implies.

The P value for our medication study is 0.03. If you interpret that P value as a 3% chance of making a mistake by rejecting the null hypothesis, you’d feel like you’re on pretty safe ground. However, after reading this post, you should realize that P values are not an error rate and you can’t interpret them this way.

So, if the P value is not the error rate for our study, what is the error rate? Hint: It’s higher!

As I explained earlier, you can’t directly calculate an error rate based on a P value, at least not using the frequentist approach that produces P values. However, you can estimate error rates associated with P values by using the Bayesian approach and simulation studies.

Sellke et al.* have done this. While the exact error rate varies based on different assumptions, the values below use run-of-the-mill assumptions.

P value | Probability of rejecting a null hypothesis that is true |

0.05 | At least 23% (and typically close to 50%) |

0.01 | At least 7% (and typically close to 15%) |

These higher error rates probably surprise you! Regrettably, the common misconception that P values are the error rate produces the false impression of considerably more evidence against the null hypothesis than is warranted. A single study with a P value around 0.05 does not provide substantial evidence that the sample effect actually exists in the population.

These estimated error rates emphasize the need to have lower P values and replicate studies that confirm the initial results before you can safely conclude that an effect exists at the population level. In fact, studies with lower P values have higher reproducibility rates in follow-up studies. Learn about the Types of Errors in Hypothesis Testing.

Now that you know how to interpret P values correctly, check out my Five P Value Tips to Avoid Being Fooled by False Positives and Other Misleading Results!

Typically, you’re hoping for low p-values, but even high p-values have benefits!

### Reference

*Thomas SELLKE, M. J. BAYARRI, and James O. BERGER, Calibration of p-values for Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1

Edward says

Hi Jim,

Thanks for helpful posts. I have been browsing your blog for some time now and I gained a lot.

One quick question:

What happens if the null hypothesis is rejected based on t-test but we can’t do so by looking at p-value.

I know one is derived from the other statistic. But which one we should look at first in order to be able to say something about the null hypothesis: t-statistics or p-value in the t-test?

The same applies to ANOVA as well.

Which one do we look at first? Whether if Significance F is less than F statistics or the P-value alone?

Jim Frost says

Hi Edward,

You can either reject the null hypothesis by determining whether the test statistic (t, F, chi-square, etc.) falls into the critical or by comparing the p-value to the significance level. These two methods will always agree. If the test statistic falls within the critical region, then the p-value is less than or equal to the significance level.

Because the two methods are 100% consistent, you can use either one to evaluate statistical significance. You don’t need to use both methods, except maybe when you’re learning about how it all works. Personally, I find it easiest just to look at the p-value.

To see how both methods work, read my posts about how hypothesis tests work, how t-tests work, and how the F-test works in one-way ANOVA.

I hope this helps!

Yash Guleria says

Hi Jim,

I have 3 p values .. 0, 2E-12 and 3.2E-316.

I dont know what is wrong but how do i interpret these values?

Jim Frost says

Hi Yash,

Those p-values are written using scientific notation. Scientific notation is a convenient way to represent very large and very small numbers. In your case, these represent very small p-values.

The number after the E specifies the direction and number of places to move the decimal point. For example, the negative 13 value in “E-12” indicates you need to move the decimal point 12 places to the left. On the other hand, positive values indicate that you need to shift the decimal point to the right.

These values are smaller than any reasonable significance level and, therefore, represent statistically significant results. You can reject the null hypothesis for whichever hypothesis test you are performing.

I hope this helps!

adamson okunmwendia says

you are good jim. you are the best

Trent says

Jim, thank you so, so much for your patience and help over the past week. I think I can finally say that I get it. Not easy to keep everything straight, but your simplistic breakdown in your most recent post really helped to clear everything up. Even though I previously read about p-values and type I errors from your other blog posts, I guess I needed to re-hear/re-think those tricky concepts in a variety of different ways to finally absorb them. I finally feel comfortable enough to share these cool insights with my research peers, and I’ll point them to your blog for extra stats goodies!

Thank you so much, again. I’m slowly making my way through your blog (trying to balance grad school at the same time); I look forward to your other posts!

aloha

trent

P.S. Please do email me about the notification issue, I don’t believe I received an email from you yet. Your blog has really helped me get a better grasp of stats (I found your blog from your chocolate vs mustard analogy for interaction analyses, that was brilliant!), and so I’d be more than happy to help with the notification issue in any way I can.

Jim Frost says

Hi Trent,

You’re very welcome! P-values are a very confusing concept. Somewhere in one of my posts, I have a link to an external article that shows how even scientists have a hard time describing what they are! They’re not intuitive. And, when you conduct a study, p-values really aren’t exactly what you want them to be. You

wantthem to be the probability that the null is true. That would be the most helpful. Unfortunately, they’re not that–and they can’t be that. I’m not sure if you read it, but I’ve written a post about why p-values are so easy to misunderstand.Despite these difficulties, p-values provide valuable information. In fact, as I write in an article, there’s a relationship between p-values and the reproducibility of studies.

Just a couple more p-value posts to read if you’re so interested! If you haven’t already.

Best of luck with grad school! I’m sure you’ll do great!

By the way, I did email you. If you haven’t received it, that’s odd! I will try again from a different email address over the weekend.

A says

Jim… I cannot explain how many videos I have watched and articles I have read to try and understand this and you just cleared it all up with this. Saved my life. Thank you, thank you, thank you.

Jim Frost says

You’re very welcome! Presenting statistics in a clear manner is my goal for the website. So, it makes my day when I hear that my articles have actually helped people! Thanks for writing!

Trent says

Hi Jim, thank you so much for your reply! I’m sorry I wasn’t able to check back in until now. It seems that I still haven’t been able to connect the final pieces of the puzzle, based on your response to: “Thus, for a sample statistic assessed by a large group of similar studies, a P<0.05 would translate to a Type I error rate of <5%."

This is where I'm getting stuck:

Prior to a study, researchers typically set their significance level (alpha level) to 0.05. Researchers will then compare their p-value to the alpha level of 0.05 to determine if their results are statistically significant. If P<0.05, then the results are statistically significant at an alpha level of 0.05, which by extension means that the results have a 5% or lower probability of being a false positive (since the alpha level was set to 0.05, and alpha level = probability of a false positive), right? If this is all true, then a P<0.05 for a study with a significance level of 0.05 does not have a false positive probability of 23% (and typically close to 50%)… it has a 5% or lower probability of being a false positive.

That said, based on your article, I know I'm messing up my logic somewhere, but I can't figure out where…

aloha

trent

P.S. I double checked my gmail spam & trash folders and there were no notification emails of any of your replies.

Jim Frost says

Hi Trent,

I’m going to send you an email soon about the notification issue. So, be on the lookout for that.

I

thinkpart of the confusion is over the issue of single studies versus a group of studies. Or, relatedly, a single p-value versus a range of p-values. Alpha is a range of p-values and applies to a group of studies. All studies (the group) that have p-values less than or equal to 0.05 (range of p-values) have a Type I error rate of 0.05. That error rate applies to the groups of studies. You can’t apply it to a single study (i.e., a single p-value).A single p-value for a single study is not that type of error rate at all. It represents the probability of obtaining your sample if the null is true. In other words, the p-value calculations begin with the assumption that the null is true. Therefore, you can not use the p-value to determine the probability that the null (or alternative) hypothesis is true. In other words, you can’t map p-values to the false positive rate.

So, when you say “If P<0.05, then the results are statistically significant at an alpha level of 0.05, which by extension means that the results have a 5% or lower probability of being a false positive (since the alpha level was set to 0.05, and alpha level = probability of a false positive)," that's not true. For one thing, the p-value assumes the null *is* true. For another, the group of studies as a whole has an error rate of 0.05, but you don't know the error rate for an individual study. Additionally, you just don't know whether the null is true or false. The error rate only applies to studies where the null is true. And, the p-value calculations assume the null is true. But, you don't know for sure whether it is true or not for any given study.

Let's go back to what I said about the p-values being the "devil's advocate" argument. For any treatment effect that you observe in sample data, you can make the argument that the effect is simply random sampling error rather than a true effect. The p-value essentially says, "OK, lets assume the null is true. How likely was it for us to observe these results in that case." If the probability is low, you were unlikely to obtain that sample if the null is true. It pokes a hole in the devil's advocate argument. It's important to remember that p-values are a probability related to obtaining your data assuming the null is true and *not* a probability that the null is true. You're trying to equate p-values to the probability of the null being true--which is not possible with the Frequentist approach.

Trent says

Hi Jim,

Thank you for your reply. The two other articles you linked were really helpful. I think I’m almost there with understanding the whole picture. May I clarify my current understanding with you?

Alpha applies to a group of similar studies, thus we can’t directly translate the p-value of a single study to the Type I error rate for a given hypothesis. However, using simulation studies or Bayesian methods, we can estimate the Type I error rate–from a single study–for a P=0.05 sample statistic to 23% (and typically close to 50%).

That said, in order to estimate the Type I error rate directly using alpha (and P-values), we need to see the results from a group of similar studies (ie meta-analysis). Thus, for a sample statistic assessed by a large group of similar studies, a P<0.05 would translate to a Type I error rate of <5%.

How did I do?

aloha

trent

P.S. I'm unsure how the "Notify me of new comments via email" function is supposed to work on your blog, but it didn't notify me via email of your reply. So I had no idea that you replied to my comment until I checked back on this post.

Jim Frost says

Hi Trent,

I’m glad the other articles were helpful! There’s actually quite a bit to take into understand p-values. It’s possible to come up with a brief definition, but implies a thorough knowledge of underlying concepts! I will look into the Notify function. It should email you. I’ll hunt around in the settings to be sure, but I believe it is set up to send emails. Is there a chance it went to your junk folder?

Yes! That’s

veryclose! Just a couple of minor quibbles and clarifications. I wouldn’t say that you use simulation and Bayesian methods to estimate the Type I error rate. That’s specific to the hypothesis testing framework. And, it applies to group of similar studies. Alpha = the Type I error rate. And both apply to a group of studies.Simulation studies and Bayesian methods can help you take a P-value from an individual study and estimate the probability of a false discovery (or false positive). P-values relate to individual studies and the probability of a false positive applies to that individual study. So, we’ve moved from probabilities for a group of studies (Alpha/Type I error) to probabilities of false positive for an individual study. To make that shift from a group to an individual study, we must switch methodologies because the Frequentist method cannot calculate the false discovery rate for a single study.

An important note, for simulation studies or Bayesian methodology to estimate the false discovery rate, you need additional information beyond just the sample data. You need an estimate of the probability that the alternative hypothesis is true at the beginning of the study. This is known as the prior probability in Bayesian circles. To develop this probably, you already need to know and incorporate external information into the calculations. This information can come from a group of similar studies as you mention. This probability along with the P-value affects the false discovery rate. That’s why there is a range of values for any given P-value. There is no direct, fixed mapping of p-values to the false discovery rate. A criticism of the prior probability is that it is being estimated. Presumably, the researchers are performing a study because they’re not sure if the alternative is true or not.

It’s not clear to me what you mean in your sentence, “Thus, for a sample statistic assessed by a large group of similar studies, a P<0.05 would translate to a Type I error rate of <5%." I'll assume you're referring to a p-value from a meta analysis. In that case, it still depends on the prior probability. If the prior probability is very high, the false discovery rate will be low. Conversely, if the prior probability is low, the false discovery rate will be higher. You can't state a general rule like the one in your sentence.

Thanks for writing with the interesting questions!

Trent says

Hi Jim, wonderful post! A lot to chew on. May I clarify a point of confusion?

I’ve been taught that alpha is the probability of committing a Type I error. In addition, studies typically set alpha to 0.05, and beta to 0.20 (giving a power of 0.8).

Based on your article, this must be false. A true statement should read:

“Studies typically set the P-VALUE cut-off to 0.05, and beta to 0.20 (giving a power of 0.8).”

Logically following, this means that alpha is generally not set to anything. And for a study with a p-value cut-off of 0.05, the alpha would actually be about 0.23 (and typically close to 0.50).

Is my understanding, correct?

aloha

trent

Jim Frost says

Hi Trent,

It’s correct that alpha (aka the significance level) represents the probability of a type I error. Hypothesis tests are designed so that the researchers can set that value. However, it’s not possible to set beta. You can estimate beta using a power analysis. Power is just 1-beta. However, power analyses are estimates and not something your technically setting like you do with alpha. I write more about this in my post about Type I and Type II errors.

I definitely understand your confusion regarding p-values and alpha. The important thing to keep in mind is that alpha really applies to a class of studies. Of all studies that use an alpha of 0.05 and the null is true, you’d expect to obtain significant results (i.e., a false positive) in 5% of those cases.

P-values represent the strength of the evidence against the null for an individual study. You can state it as being the probability of obtaining the observed outcome, or more extreme, if the null is true. However, you can’t state that it is the probability of the null being true. It’s the probability of the outcome if you assume the null is true (which you don’t really know for sure). Not the probability of whether the null is true.

I think based on what you write, you might be confusing that issue (re: alpha actually being 0.23). Both P-values and alpha relate to cases where the null is true–which you don’t know. The false positive error rates which I think you’re getting at, and I write about at the end, are dealing with the probability of the null being true. In the former, you’re assuming the null is true while in the latter you’re calculating the probability of whether it is true. Using the Frequentist approach (p-values, alpha) you cannot calculate the probability of the null being true. However, you can do that using simulation studies and sometimes using Bayesian methods.

I always think this is a bit easier to understand using graphs and so highly recommend reading my post about p-values and the significance level, which primarily uses graphs.

I hope this helps!

YIHENEW says

Thank you. You give me good insight

David says

Awesome read! How would sample size affect the True Error rate? I would assume since p-values tend to become smaller as sample size increases, that would also effectively reduce the True Error rate since you are more confident about the population (assuming True Error means type I and type II errors).

Jim Frost says

Hi David, Thanks, and I’m glad you enjoyed the article!

There are two types of errors in hypothesis testing. So, let’s see how changing the sample size affects them. You might want to read my article about Type I and Type II Errors in Hypothesis Testing.

There’s three basic components for calculating p-values. The effect size, variability in the data, and the sample size. For the sake of discussion, let’s hold the effect size and the variability constant and just increase the sample size. In that case, you would expect that the p-values would decrease. Frequentists will cringe at this, but lower p-values are associated with lower false discovery rates (Type I errors). Additionally, increasing the sample size while holding the other two factors constant will increase the power of your test. Power is just (1 – Type II error rate). So, you’d expect the Type II errors (false negatives) to decrease. Increasing the sample size is good all around because it lowers both types of error

for a single study! I explain the italicized text later!However, a couple of important caveats for the above. Of course, as I point out in this article, you can’t calculate any error rates from the p-value using the frequentist approach. There’s no direct mapping from p-values to an error rate. You can use simulation studies and the Bayesian approach to estimate the false positive rate from the p-value. However, this requires an estimate of the a priori probability that the alternative hypothesis is correct. That information might be hard to obtain. After all, you’re conducting the study because you don’t know. Additionally, it’s always difficult to calculate the type II error rate. So, while you can say that increasing the sample should reduce both type I and type II errors, you don’t really know what they are! By the way, in a related vein, you might want to read how P-values correlate with the reproducibility of scientific studies.

Let’s return to Frequentist approach because there’s another side of things that isn’t obvious. In contrast with the earlier example for an individual study, the Frequentist approach talks about the Type I errors not for an individual study but for a class of studies that use the same significance level. A result is statistically significant when the p-value is less than the significance level. The significance level equals the Type I Error for all studies that use a particular significance level. For example, 5% of all studies that use a significance level of 0.05 should be false positives. Of course, when you see significant test results, you don’t know for sure which ones are real effects and which ones are false discoveries.

Let’s now hold the other two factors constant but

reducesample size. Let’s reduce it enough so that you have low power for detecting an effect. As your statistical power decreases, your test is less likely to detect real effects when they exist (the Type II error rate increases). However, the hypothesis test controls or holds constant the Type I error rate at your significance level. That’s built into the test. If you have a low power hypothesis test, the test’s ability to detect a real effect is low but it’s false positive rate remains the same. Consequently, when you obtain statistically significant results for a test with low power, you need to be wary because it’s relatively likely to be false positive and less likely to represent a real effect.That’s probably more than what you wanted, but it’s a fascinating topic!

Tetyana says

Dear Jim, thank you very much for you posts!

Does it mean that after I have obtained some small p-value, I have to do some other tests?

Jim Frost says

Hi Tetyana,

After you obtain a small p-value, you can reject the null hypothesis. You don’t necessarily need to perform other tests. I just want analysts to avoid a common misinterpretation. Obtaining a statistically significant result is still a good thing, but you have to keep in mind what it really represents.

Ahmad Allam says

Thank you.

Ahmad Allam says

Thank you very much. You made me reassuring . Appreciated.

How could I record this result in a scientific manuscript?

Jim Frost says

Hi Ahmad,

I think it’s perfectly acceptable to report such a small p-value using the scientific notation that is in your output. The other option would be to report it as a regular value by moving the decimal point 16 places to the left, but that takes up so much more room. So, I’d use scientific notation. It’s there to save space for extremely small and large values depending on the context.

Ahmad Allam says

Hi Jim. Thanks for this value post. But if you can help me on that, I got this result (6.79974E-16) ??? What that mean?

Appreciated.

Jim Frost says

Hi Ahmad,

That is called scientific notation. The E-16 in it indicates that you need to move the decimal point 16 digits to the left. That’s a very small value. Therefore, you have a very significant p-value!

Pamela Marcum says

What an awesome post! Should be required reading for all STEM students.

Jim Frost says

Thanks, Pamela. That means a lot to me!

Amit Kumar Sahoo says

Thanks Jim for your response. i think i got it..

Amit Kumar Sahoo says

Hi Jim,

Thanks for the post. Am little confused with the statement below

“If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.”

Now as per defination

“P-values indicate the believability of the devil’s advocate case that the null hypothesis is true given the sample data. ”

So doesn’t that mean higher P value will accept the alternate hypothesis since higher the probability of alternate happening when null is true. Am not able to get my head wrapped around this concept..

Amit

Jim Frost says

Hi Amit,

Great question! So, the first thing to realize is that the null and alternative hypotheses are mutually exclusive. If the probability of the alternative being true is higher, then the probability of the null must be lower.

However, the p-value doesn’t indicate the probability of either hypothesis being true. This is a very common misconception. Anytime you start linking p-values to the probability that a hypothesis is true, you know you’re going in the wrong direction!

P-values represent the probability of obtaining the effect observed in your sample, or more extreme, if the null hypothesis is true. It’s a probability of obtaining your data assuming the null is true. Consequently, a low p-value indicates that you were unlikely to obtain the sample data that was collected if the null is true. In this manner, lower p-values represent stronger evidence against the null hypothesis. Lower p-values indicate that your data are less compatible with the null hypothesis.

I think this is easier to understand graphically. I have a link in this post to another post How Hypothesis Tests Work: Significance Levels and P-values. This post shows how it works with graphs. I’d recommend taking a look at it.

I hope this helps!

Khursheed statistics says

Hello sir …..hope u r f9

I hve no words that u hve cleared me a lot of concepts of stats ….nd I am really hppy

……nd Wht evr u r uploading

Owsme

Jim Frost says

Hi Khursheed, I’m so happy to hear that you found this post to be helpful. Thanks for the encouraging words. They mean a lot to me!

naseer says

What should be the nature of the relationship of p values (especially Bonferroni corrected) with the Cohen’s d values for the same set of data?

Sean Saunders says

Jim, thanks for this post, but perhaps you could clarify something for me: assuming that H0 is true, if we set an alpha=0.05 level of significance and get a p-value less than that as the result of our sample data, wouldn’t that indicate, since less than 5% of samples would have such an effect due to random sample error, that there is only a 5% chance of getting such a sample, and thus, a 5% chance of rejecting the null hypothesis incorrectly? What am I missing here? Almost every stats book I’ve ever read has presented the concept this way (a type 1 error is even called an alpha-error!) Thanks for your feedback!

Jim Frost says

Hi Sean, thanks for your comment. Yes, you’re absolutely correct. The significance level (alpha) is the type I error rate. It’s the probability that you will reject the null hypothesis when it is true. However, the p-value is not an error rate. It’s a bit confusing because you compare one to the other.

In the post above, I provide a link to a post where I explain significance levels and p-values using graphs. I think it’s much easier to understand that way. I’ll explain below, but check that post out too.

Both alpha and p-values refer to regions on a probability distribution plot. You need an area under the curve to calculate probabilities. You can calculate probabilities for regions, but not a specific value.

That works fine for alpha. If the null is true, you expect sample values to fall in the critical regions X% of times based on the significance level that you specify. For p-values, the problems occur when you want to know the error rate for your specific study. You can’t do that for a single value from an individual study because you need an area under the curve.

The best you can say for p-values is: if the null is true, then you’d expect X% of studies to have an effect at least as large as the one in your study. X = your P-value. Notice the “at least as large.” That’s needed to produce the range of values for an area under the curve. It’s also means you can’t apply the percentage to your specific study. You can apply it only to the entire range of theoretical studies that have an effect at least as large as yours. That range collectively has an error rate that equals the p-value, but not your study alone.

Another thing to consider is that, within the range defined by the p-value, your study provides the weakest results because it defines the point closest to the null. So, the overall error rate for the range is largely based on theoretical studies that provide stronger evidence than your actual study!

In a similar fashion, if you reject the null for your study using an alpha = 0.05, you know that all studies in the critical region have a Type I error rate = 0.05. Again, this applies to the entire range of studies and not yours alone.

I hope this all makes sense. Again, read the other post and it’s easier to see with graphs.