P values determine whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!
Ironically, despite being so influential, P values are misinterpreted very frequently. What is the correct interpretation of P values? What do P values really mean? That’s the topic of this post!
P values are a slippery concept. Don’t worry. I’ll explain P values using an intuitive, concept-based approach so you can avoid a very common misinterpretation that that can cause you serious problems.
What Is the Null Hypothesis?
P values are directly connected to the null hypothesis. So, we need to cover that first!
In all hypothesis tests, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, and so on. There is some benefit or difference that the researchers hope to identify.
However, it’s possible that there really is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. When you assess the results of a hypothesis test, you can think of the null hypothesis as the devil’s advocate position, or the position you take for the sake of argument.
To understand this idea, imagine a hypothetical study for medication that we know is completely useless. In other words, the null hypothesis is true. There is no difference at the population level between subjects who take the medication and subjects who don’t.
Even though the null hypothesis is correct, we’ll most likely see an effect in the sample because of random sample error. It’s an incredibly rare occurrence for the sample data to exactly match the actual population value. Therefore, the position you take for the sake of argument (devil’s advocate) is that random sample error produces the observed sample effect rather than it being a true effect.
What Are P values?
P-values indicate the believability of the devil’s advocate case that the null hypothesis is true given the sample data. They gauge how consistent your sample statistics are with the null hypothesis. Specifically, if the null hypothesis is correct, what is the probability of obtaining an effect at least as large as the one in your sample?
- High P-values: Your sample results are consistent with a null hypothesis that is true.
- Low P-values: Your sample results are not consistent with a null hypothesis that is true.
If your P value is small enough, you can conclude that your sample is so incompatible with the null hypothesis that you can reject the null for the entire population. P-values are an integral part of inferential statistics because they help you use your sample to draw conclusions about a population.
How Do You Interpret P values?
Here is the technical definition of P values:
P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true.
Let’s go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03. You’d interpret this P-value as follows:
If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.
How probable are your sample data if the null hypothesis is correct? That’s the only question that P values answer. This restriction segues to a very frequent and problematic misinterpretation.
Related post: Understanding P values can be easier using a graphical approach: How Hypothesis Tests Work: Significance Levels and P-values.
P values Are NOT an Error Rate
Unfortunately, P values are frequently misinterpreted. A common mistake is the notion that they represent the likelihood of rejecting a null hypothesis that is actually true (Type I error). The idea that P values are the probability of making a mistake is WRONG! You can read a blog post I wrote to learn why P values are misinterpreted so frequently.
You can’t use P values to directly calculate the error rate for several reasons.
First, P value calculations assume that the null hypothesis is correct. Thus, from the P value’s point of view, the null hypothesis is 100% true. Remember, P values assume that the null is true and the observed sample effect is caused by random sample error.
Second, P values tell you how consistent your sample data are with a null hypothesis that is true. However, when your data are very inconsistent with the null hypothesis, P values can’t determine which of the following two possibilities is more probable:
- The null hypothesis is true, but your sample is unusual due to random sample error.
- The null hypothesis is false.
To figure out which option is true, you must apply expert knowledge of the study area and, very importantly, assess the results of similar studies.
Going back to our medication study, let’s highlight the correct and incorrect way to interpret the P value of 0.03:
- Correct: Assuming the medication has zero effect in the population, you’d obtain the sample effect, or larger, in 3% of studies because of random sample error.
- Incorrect: There’s a 3% chance of making a mistake by rejecting the null hypothesis.
Yes, I realize that the incorrect definition seems more straightforward, and that’s probably why it is so common. But, using this definition gives you a false sense of security, as I’ll show you next.
What Is the True Error Rate?
The difference between the correct and incorrect interpretation is not just a matter of wording. There is a fundamental difference in the amount of evidence against the null hypothesis that each definition implies.
The P value for our medication study is 0.03. If you interpret that P value as a 3% chance of making a mistake by rejecting the null hypothesis, you’d feel like you’re on pretty safe ground. However, after reading this post, you should realize that P values are not an error rate and you can’t interpret them this way.
So, if the P value is not the error rate for our study, what is the error rate? Hint: It’s higher!
As I explained earlier, you can’t directly calculate an error rate based on a P value, at least not using the frequentist approach that produces P values. However, you can estimate error rates associated with P values by using the Bayesian approach and simulation studies.
Sellke et al.* have done this. While the exact error rate varies based on different assumptions, the values below use run-of-the-mill assumptions.
|P value||Probability of rejecting a null hypothesis that is true|
|0.05||At least 23% (and typically close to 50%)|
|0.01||At least 7% (and typically close to 15%)|
These higher error rates probably surprise you! Regrettably, the common misconception that P values are the error rate produces the false impression of considerably more evidence against the null hypothesis than is warranted. A single study with a P value around 0.05 does not provide substantial evidence that the sample effect actually exists in the population.
These estimated error rates emphasize the need to have lower P values and replicate studies that confirm the initial results before you can safely conclude that an effect exists at the population level. In fact, studies with lower P values have higher reproducibility rates in follow-up studies. Learn about the Types of Errors in Hypothesis Testing.
Now that you know how to interpret P values correctly, check out my Five P Value Tips to Avoid Being Fooled by False Positives and Other Misleading Results!
Typically, you’re hoping for low p-values, but even high p-values have benefits!
*Thomas SELLKE, M. J. BAYARRI, and James O. BERGER, Calibration of p-values for Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1