P values determine whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!
Ironically, despite being so influential, P values are misinterpreted very frequently. What is the correct interpretation of P values? What do P values really mean? That’s the topic of this post!
P values are a slippery concept. Don’t worry. I’ll explain p-values using an intuitive, concept-based approach so you can avoid making a widespread misinterpretation that can cause serious problems.
What Is the Null Hypothesis?
P values are directly connected to the null hypothesis. So, we need to cover that first!
In all hypothesis tests, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, and so on. There is some benefit or difference that the researchers hope to identify.
However, it’s possible that there actually is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. When you assess the results of a hypothesis test, you can think of the null hypothesis as the devil’s advocate position, or the position you take for the sake of argument.
To understand this idea, imagine a hypothetical study for medication that we know is entirely useless. In other words, the null hypothesis is true. There is no difference at the population level between subjects who take the medication and subjects who don’t.
Despite the null being accurate, you will likely observe an effect in the sample data due to random sampling error. It is improbable that samples will ever exactly equal the null hypothesis value. Therefore, the position you take for the sake of argument (devil’s advocate) is that random sample error produces the observed sample effect rather than it being an actual effect.
What Are P values?
P-values indicate the believability of the devil’s advocate case that the null hypothesis is correct given the sample data. They gauge how consistent your sample statistics are with the null hypothesis. Specifically, if the null hypothesis is right, what is the probability of obtaining an effect at least as large as the one in your sample?
- High P-values: Your sample results are consistent with a true null hypothesis.
- Low P-values: Your sample results are not consistent with a null hypothesis.
If your P value is small enough, you can conclude that your sample is so incompatible with the null hypothesis that you can reject the null for the entire population. P-values are an integral part of inferential statistics because they help you use your sample to draw conclusions about a population.
How Do You Interpret P values?
Here is the technical definition of P values:
P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true.
Let’s go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03. You’d interpret this P-value as follows:
If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.
How probable are your sample data if the null hypothesis is correct? That’s the only question that P values answer. This restriction segues to a very persistent and problematic misinterpretation.
Related posts: Understanding P values can be easier using a graphical approach: How Hypothesis Tests Work: Significance Levels and P-values and learn about significance levels from a conceptual standpoint.
P values Are NOT an Error Rate
Unfortunately, P values are frequently misinterpreted. A common mistake is that they represent the likelihood of rejecting a null hypothesis that is actually true (Type I error). The idea that P values are the probability of making a mistake is WRONG! You can read a blog post I wrote to learn why P values are misinterpreted so frequently.
You can’t use P values to directly calculate the error rate for several reasons.
First, P value calculations assume that the null hypothesis is correct. Thus, from the P value’s point of view, the null hypothesis is 100% true. Remember, P values assume that the null is true, and sampling error caused the observed sample effect.
Second, P values tell you how consistent your sample data are with a true null hypothesis. However, when your data are very inconsistent with the null hypothesis, P values can’t determine which of the following two possibilities is more probable:
- The null hypothesis is true, but your sample is unusual due to random sampling error.
- The null hypothesis is false.
To figure out which option is right, you must apply expert knowledge of the study area and, very importantly, assess the results of similar studies.
Going back to our medication study, let’s highlight the correct and incorrect way to interpret the P value of 0.03:
- Correct: Assuming the medication has zero effect in the population, you’d obtain the sample effect, or larger, in 3% of studies because of random sample error.
- Incorrect: There’s a 3% chance of making a mistake by rejecting the null hypothesis.
Yes, I realize that the incorrect definition seems more straightforward, and that’s why it is so common. Unfortunately, using this definition gives you a false sense of security, as I’ll show you next.
What Is the True Error Rate?
The difference between the correct and incorrect interpretation is not just a matter of wording. There is a fundamental difference in the amount of evidence against the null hypothesis that each definition implies.
The P value for our medication study is 0.03. If you interpret that P value as a 3% chance of making a mistake by rejecting the null hypothesis, you’d feel like you’re on pretty safe ground. However, after reading this post, you should realize that P values are not an error rate, and you can’t interpret them this way.
If the P value is not the error rate for our study, what is the error rate? Hint: It’s higher!
As I explained earlier, you can’t directly calculate an error rate based on a P value, at least not using the frequentist approach that produces P values. However, you can estimate error rates associated with P values by using the Bayesian approach and simulation studies.
Sellke et al.* have done this. While the exact error rate varies based on different assumptions, the values below use run-of-the-mill assumptions.
|P value||Probability of rejecting a true null hypothesis|
|0.05||At least 23% (and typically close to 50%)|
|0.01||At least 7% (and typically close to 15%)|
These higher error rates probably surprise you! Regrettably, the common misconception that P values are the error rate produces the false impression of considerably more evidence against the null hypothesis than is warranted. A single study with a P value around 0.05 does not provide substantial evidence that the sample effect exists in the population. For more information about how these false positive rates are calculated, read my post about P-values, Error Rates, and False Positives.
These estimated error rates emphasize the need to have lower P values and replicate studies that confirm the initial results before you can safely conclude that an effect exists at the population level. Additionally, studies with smaller P values have higher reproducibility rates in follow-up studies. Learn about the Types of Errors in Hypothesis Testing.
Now that you know how to interpret P values correctly, check out my Five P Value Tips to Avoid Being Fooled by False Positives and Other Misleading Results!
Typically, you’re hoping for low p-values, but even high p-values have benefits!
*Thomas SELLKE, M. J. BAYARRI, and James O. BERGER, Calibration of p-values for Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1