Can high p-values be helpful? What do high p-values mean?
Typically, when you perform a hypothesis test, you want to obtain low p-values that are statistically significant. Low p-values are sexy. They represent exciting findings and can help you get articles published.
However, you might be surprised to learn that higher p-values, the ones that are not statistically significant, are also valuable. In this post, I’ll show you the potential value of a p-value that is greater than 0.05, or whatever significance level you’re using.
The Role of Hypothesis Testing and P-Values
I’ve written about hypothesis testing and interpreting p-values in many other blog posts. I’ll summarize them for this blog post, but please read the related posts for more details.
Hypothesis testing is a form of inferential statistics. You want to use your sample data to draw conclusions about the entire population. When you collect a random sample, you might observe an effect within the sample, such as a difference between group means. But, does that effect exist in the population? Or, is it just random error in the sample?
For example, suppose you’re comparing two teaching methods and want to determine whether one method produces higher mean test scores. In your sample data, you see that the mean for Method A is greater than Method B. However, random samples contain random error, which makes your sample means very unlikely to equal the population means precisely. Unfortunately, the difference between the sample means of two teaching methods can represent either an effect that exists in the population or random error in your sample.
This point is where p-values and significance levels come in. Typically, you want p-values that are less than your significance levels (e.g., 0.05) because it indicates your sample evidence is strong enough to conclude that Method A is better than Method B for the entire population. Teaching method appears to have a real effect. Exciting stuff!
Higher P-Values and Their Importance
However, for this post, I’ll go in the opposite direction and try to help you appreciate higher, insignificant p-values! These are cases where you cannot conclude that an effect exists in the population. For the teaching method example above, a higher p-value indicates that we have insufficient evidence to conclude that one teaching method is better than the other.
Let’s graphically illustrate three different hypothetical studies about teaching methods in the plots below. Which of the following three studies have statistically significant results? The difference between the two groups is the effect size for each study. Here’s the CSV data file: studies.
All three studies appear to have differences between their sample means. However, even if the population means are exactly equal, the sample means are unlikely to be equal. We need to filter out the signal (real differences) from the noise (random error). That’s where hypothesis tests play a role.
The table displays the p-values from the 2-sample t-tests for the three studies.
Surprise! Only the graph with the smaller difference between means is statistically significant!
The key takeaway here is that you can use graphs to illustrate experimental results, but you must use hypothesis tests to draw conclusions about effects in the population. Don’t jump to conclusions because the patterns in your graph might represent random error!
P-values Greater Than the Significance Level
A crucial point to remember is that the effect size that you see in the graphs is only one of several factors that influence statistical significance. These factors include the following:
- Effect size: Larger effect sizes are less likely to represent random error. However, by itself, the effect size is insufficient.
- Sample size: Larger sample sizes allow hypothesis tests to detect smaller effects.
- Variability: When your sample data are more variable, random sampling error is more likely to produce substantial differences between groups even when no effect exists in the population.
You can have a large effect size, but if your sample size is small and/or the variability in your sample is high, random error can produce large differences between the groups. High p-values help identify cases where random error is a likely culprit for differences between groups in your sample.
Studies one and two, which are not significant, show the protective function of high p-values in action. For these studies, the differences in the graphs above might be random error even though it appears like there is a real difference. It’s tempting to jump to conclusions and shout to the world that Method A is better, “Everyone, start teaching using Method A!”
However, the higher p-values for the first two studies indicate that our sample evidence is not strong enough to reject the notion that we’re observing random sample error. If it is random error, Method A isn’t truly producing better results than Method B. Instead, the luck of the draw created a sample where subjects in the Method A group were, by chance, able to score higher for some reason other than teaching method, such as a greater inherent ability. In fact, if you perform the study again, it would not be surprising if the difference vanished or even went the other direction!
What High P-Values Mean and Don’t Mean
One thing to note, a high p-value does not prove that your groups are equal or that there is no effect. High p-values indicate that your evidence is not strong enough to suggest an effect exists in the population. An effect might exist but it’s possible that the effect size is too small, the sample size is too small, or there is too much variability for the hypothesis test to detect it.
While you might not like obtaining results that are not statistically significant, these results can stop you from jumping to conclusions and making decisions based on random noise in your data! High p-values help prevent costly mistakes. After all, if you base decisions on random error, you won’t gain the benefits you expect. This protection against jumping to conclusions applies to studies about teaching methods, medication effectiveness, product strength, and so on.
High p-values can be a valuable caution against making rash decisions or drawing conclusions based on differences that look important but might be random error!