## What is the Bonferroni Correction?

The Bonferroni correction adjusts your significance level to control the overall probability of a Type I error (false positive) for multiple hypothesis tests.

Running several statistical tests during one study can be tricky. The problem lies in the alpha value, also known as the significance level. An alpha value works well for one hypothesis test but becomes problematic for multiple tests by allowing too many type I errors. These errors occur when a test leads you to reject the null hypothesis when it is actually correct—a false positive.

While the type I error rate equals the significance level for a single test, it grows as you increase the number of tests. Multiple hypothesis tests in a study are a “family.” The family-wise error rate (FWER) is the probability of making at least one Type I error among *all* the hypothesis tests when performing multiple tests.

The Bonferroni correction adjusts your significance level for these additional tests. Its goal is to reduce the family-wise error rate (FWER) so that it equals your original significance level (e.g., 0.05). By controlling the FWER, the Bonferroni correction helps prevent those erroneous “significant” findings.

Learn more about Type I and Type II Errors and the Significance Level.

## How to Calculate the Bonferroni Correction

The Bonferroni correction counteracts the family-wise error rate problem by adjusting the alpha value based on the number of tests.

To find your adjusted significance level, divide the significance level (α) for a single test by the number of tests (n).

Bonferroni correction = α / n

For example, if your original, single test alpha is 0.05 and you have a set of five hypothesis tests, your adjusted significance level of 0.05 / 5 = 0.01. Your results are statistically significant when your p-value is less than or equal to the adjusted significance level.

This adjusted alpha value helps control the overall rate of false positives, ensuring your findings are more reliable.

## Scenarios for Using the Bonferroni Correction

Hopefully, you understand why you’d use the Bonferroni correction. Now let’s look at some of the scenarios for using it.

**Comparing group means**: You might run a series of t-tests to compare group means. For example, if your experiment has six groups, comparing all their means involves 15 t-tests.

**Studying relationships between multiple variables**: A matrix of variables produces many pairwise correlations, each with a p-value. For example, a 3 X 4 matrix produces 12 correlations.

**Examining more than one endpoint in a clinical trial**: An endpoint is an outcome that the experiment evaluates statistically. A clinical trial can assess multiple measures, such as quality of life, symptoms, overall survival, and response rate. Each endpoint has its own hypothesis test.

These examples show you how it is surprisingly easy for a single study to conduct many hypothesis tests.

## Why Use The Bonferroni Correction?

The Bonferroni Correction adjusts for the number of hypothesis tests in your study. But is the number of tests a problem that analysts really need to address? Let’s find out!

The probability of a Type I error for a single test equals the significance level. For example, with a significance level of 0.05, a single test has a 5% probability of a false positive.

However, the error rate for a family of tests is always higher than for an individual test. As the number of hypothesis tests increases, the chance that at least one is a false positive grows. With enough tests, they’re practically guaranteed to produce a Type I error—unless you use the Bonferroni correction.

The table below shows how increasing the number of hypothesis tests causes the family-wise error rate to increase.

The family-wise error rate formula is 1 – (1 – α)^C. Alpha is your significance level for a single test, and C is the number of tests.

So why use the Bonferroni correction? To control the number of false positives.

Let’s say you compare the means of six groups corresponding to 15 hypothesis tests. Your significance level is 0.05. Without the Bonferroni correction, there’s a 54% chance that at least one of your test results is a false positive. Startling, isn’t it?

When your family-wise error rate is too high, you can’t trust the significant results you find! The false positives become such a problem that not controlling the error rate is a form of p-hacking. Learn more about What is P-Hacking: Methods & Best Practices.

However, using the Bonferroni correction, your family-wise error rate equals the original individual significance level. In our example, our 15 hypothesis tests will collectively have a FWER of 0.05 if we use the adjustment.

## Controversy Over Error Rate Corrections

This safety net, however, comes with a catch. The Bonferroni correction works by reducing statistical power, which is the ability to detect effects when they exist. By its very nature, the adjustment makes finding genuine effects less likely, increasing the chance of false negatives.

Additionally, defining a ‘family’ of hypotheses can also be challenging and might influence your adjusted test results. The number of tests included in this family could sway the outcomes.

However, it’s essential to note that these issues are common in FWER control methods and are not unique to the Bonferroni correction.

In short, you must evaluate the problems associated with type I vs. type II errors for a particular study. Are false positives or false negatives more problematic? The answer varies with each research project.

Finally, if you determine that a family-wise error rate correction is appropriate, there is the statistical question of which control method is best. The Bonferroni correction is just one of several methods. Statisticians recommend using it when you have a smaller number of hypothesis tests that are not correlated. However, if you have many tests and/or they are correlated, the Bonferroni correction reduces statistical power too much. It is too conservative.

There are other methods to consider. To learn about some of them, read my article about Post Hoc Tests.

So, there you have it! The Bonferroni correction is a simple, effective way to manage the risks associated with multiple comparisons. It keeps your family-wise error rate in check and your results robust.

## Reference

Dunn, Olive Jean. “Multiple Comparisons Among Means.” *Journal of the American Statistical Association*, vol. 56, no. 293, 1961, pp. 52–64.

Ferraz GonÃ§alves says

Hi Jim,

The Bonferroni test is too conservative because it can eliminate variables, creating false negative variables. It could be correct that before applying the Bonferroni correction to eliminate variables with a p<0.05? This would reduce the number o variables included in the correction and, therefore, would reduce the number of false negatives.

Can you give me your opinion on this?

Thank you,

Ferraz GonÃ§alves

Jim Frost says

Hi Ferraz,

That’s a great question that I had to give some thought.

By reducing the number of variables, you reduce the number of comparison, which leads to a less severe adjustment. Apparently, some analysts use this method, although I have not.

The idea behind the approach sounds intuitive. But there are some potential problems. One is that it is potentially a form of p-hacking. If a researcher were to look at the full results based on the full set of variables and then based on the results decide to use this method, it can be a form of cherry-picking. It would be important for the researcher to prespecify the exact process before looking at any results to help minimize cherry-picking.

Additionally, the Bonferroni correction is designed to reduced the familywise error rate for a full set of comparison. Filtering out variables based on their p-values can introduce bias in that process.

So, in a nutshell, the proposed method can reduce the conservativeness but potentially introduces bias due to p-hacking and applying the process to a pre-filtered subset.

Instead, you might use a procedure like the Benjamini-Hochberg procedure that provides a good balance between identifying true effects and controlling for false positives. In other words, avoid using an overly conservative approach (e.g., the Bonferroni correction) to begin with, particularly when you’re attempted to tinker with it. Start by finding a more appropriate method.

Alvaro says

Hi Jim,

Thanks for your post.

Iâ€™ve Read in some papers that when an investigator perform an interim analysis ( taking a look to data) before recruitment is complÃ©tele done, that should be considered as multiple comparison, and the final p-value ( with all expected data) should be adjusted; Bonferroni correction is usually cited for this.

But my question is, Is that really neccesary, taking on acccount that the investigator just took a look to the data and did not perform any change to the original research protocol?

Thanks a lot for your help.

Greetings

Jim Frost says

Hi Alvaro,

There’s several issues here.

The first is the one that you raise. To answer your question, yes, I think it’s generally a good idea to use a correction when you take an interim look at the data. That interim look involves a hypothesis test. Then you perform another hypothesis test at the end. Hence, you have multiple hypothesis tests, which can inflate the familywise error rate. That’s precisely what the Bonferroni correction is designed for.

However, you don’t necessarily need to use the Bonferroni correction. There are others available that might be more powerful, such as O’Brien-Fleming or Pocock boundaries.

There are other issues loaded in your question though. Data peaking (aka interim analyses) can be a form of p-hacking. Even if the investigator merely “took a look” at the data without making any changes to the original research protocol, there’s still a potential risk. The mere act of observing the data can introduce bias, consciously or unconsciously, in subsequent decisions or interpretations. For instance, if an investigator sees promising results in an interim analysis, they might, even unintentionally, be more optimistic in their subsequent analysis or interpretation of the final data. Learn more about P-hacking (including data peaking).

As yourself this, if the investigator is truly not making any changes to the protocol, why take an early look at the data? There would not seem to be any reason for the early look in that case. Hence, they should just perform the experiment as designed, analyze the data at the end, and not “peak.” That approach will help maintain the integrity of their results.

Collin says

Greetings Jim!

Under what circumstances do we use holm adjustment method instead of bonferroni?

Thank you!

Jim Frost says

Hi Collin,

In general, if you are conducting a small number of tests and want to be conservative in your approach, the Bonferroni correction might be suitable.

If you are concerned about the loss of power with the Bonferroni correction and are conducting a moderate to large number of tests, the Holm adjustment can be a better alternative. It provides a balance between controlling for Type I errors while not being overly conservative.

gwr says

Jim, I’m running multiple correlations between different second-order factors of IQ, which are, of course known to be highly related. (I’m actually exploring the patterns of heteroscedasticity in higher-ability populations, rather than the Pearson correlation, but that is a whole other question and not what I’m asking about here.) One of my supervisors found a correction for when running multiple correlations, as I’d asked what I should do if I’m pulled up for multiple comparisons, which I understand to do something conceptually similar to a Bonferroni correction. However, since the factors are known to be highly related, according to your post, this step may be redundant and it certainly sounds as if it would make any analysis too conservative. All I need to argue this is the best reference to cite to back up my point.

CAN JUSTIN KIESSLING (PhD) says

Great summary