What is the Bonferroni Correction?
Running several statistical tests during one study can be tricky. The problem lies in the alpha value, also known as the significance level. An alpha value works well for one hypothesis test but becomes problematic for multiple tests by allowing too many type I errors. These errors occur when a test leads you to reject the null hypothesis when it is actually correct—a false positive.
While the type I error rate equals the significance level for a single test, it grows as you increase the number of tests. Multiple hypothesis tests in a study are a “family.” The family-wise error rate (FWER) is the probability of making at least one Type I error among all the hypothesis tests when performing multiple tests.
The Bonferroni correction adjusts your significance level for these additional tests. Its goal is to reduce the family-wise error rate (FWER) so that it equals your original significance level (e.g., 0.05). By controlling the FWER, the Bonferroni correction helps prevent those erroneous “significant” findings.
How to Calculate the Bonferroni Correction
The Bonferroni correction counteracts the family-wise error rate problem by adjusting the alpha value based on the number of tests.
To find your adjusted significance level, divide the significance level (α) for a single test by the number of tests (n).
Bonferroni correction = α / n
For example, if your original, single test alpha is 0.05 and you have a set of five hypothesis tests, your adjusted significance level of 0.05 / 5 = 0.01. Your results are statistically significant when your p-value is less than or equal to the adjusted significance level.
This adjusted alpha value helps control the overall rate of false positives, ensuring your findings are more reliable.
Scenarios for Using the Bonferroni Correction
Hopefully, you understand why you’d use the Bonferroni correction. Now let’s look at some of the scenarios for using it.
Comparing group means: You might run a series of t-tests to compare group means. For example, if your experiment has six groups, comparing all their means involves 15 t-tests.
Studying relationships between multiple variables: A matrix of variables produces many pairwise correlations, each with a p-value. For example, a 3 X 4 matrix produces 12 correlations.
Examining more than one endpoint in a clinical trial: An endpoint is an outcome that the experiment evaluates statistically. A clinical trial can assess multiple measures, such as quality of life, symptoms, overall survival, and response rate. Each endpoint has its own hypothesis test.
These examples show you how it is surprisingly easy for a single study to conduct many hypothesis tests.
Why Use The Bonferroni Correction?
The Bonferroni Correction adjusts for the number of hypothesis tests in your study. But is the number of tests a problem that analysts really need to address? Let’s find out!
The probability of a Type I error for a single test equals the significance level. For example, with a significance level of 0.05, a single test has a 5% probability of a false positive.
However, the error rate for a family of tests is always higher than for an individual test. As the number of hypothesis tests increases, the chance that at least one is a false positive grows. With enough tests, they’re practically guaranteed to produce a Type I error—unless you use the Bonferroni correction.
The table below shows how increasing the number of hypothesis tests causes the family-wise error rate to increase.
The family-wise error rate formula is 1 – (1 – α)^C. Alpha is your significance level for a single test, and C is the number of tests.
So why use the Bonferroni correction? To control the number of false positives.
Let’s say you compare the means of six groups corresponding to 15 hypothesis tests. Your significance level is 0.05. Without the Bonferroni correction, there’s a 54% chance that at least one of your test results is a false positive. Startling, isn’t it?
When your family-wise error rate is too high, you can’t trust the significant results you find! The false positives become such a problem that not controlling the error rate is a form of p-hacking. Learn more about What is P-Hacking: Methods & Best Practices.
However, using the Bonferroni correction, your family-wise error rate equals the original individual significance level. In our example, our 15 hypothesis tests will collectively have a FWER of 0.05 if we use the adjustment.
Controversy Over Error Rate Corrections
This safety net, however, comes with a catch. The Bonferroni correction works by reducing statistical power, which is the ability to detect effects when they exist. By its very nature, the adjustment makes finding genuine effects less likely, increasing the chance of false negatives.
Additionally, defining a ‘family’ of hypotheses can also be challenging and might influence your adjusted test results. The number of tests included in this family could sway the outcomes.
However, it’s essential to note that these issues are common in FWER control methods and are not unique to the Bonferroni correction.
In short, you must evaluate the problems associated with type I vs. type II errors for a particular study. Are false positives or false negatives more problematic? The answer varies with each research project.
Finally, if you determine that a family-wise error rate correction is appropriate, there is the statistical question of which control method is best. The Bonferroni correction is just one of several methods. Statisticians recommend using it when you have a smaller number of hypothesis tests that are not correlated. However, if you have many tests and/or they are correlated, the Bonferroni correction reduces statistical power too much. It is too conservative.
There are other methods to consider. To learn about some of them, read my article about Post Hoc Tests.
So, there you have it! The Bonferroni correction is a simple, effective way to manage the risks associated with multiple comparisons. It keeps your family-wise error rate in check and your results robust.
Dunn, Olive Jean. “Multiple Comparisons Among Means.” Journal of the American Statistical Association, vol. 56, no. 293, 1961, pp. 52–64.