Power in statistics is the probability that a hypothesis test can detect an effect in a sample when it exists in the population. It is the sensitivity of a hypothesis test. When an effect exists in the population, how likely is the test to detect it in your sample?
High statistical power occurs when a hypothesis test is likely to find an effect that exists in the population. A low power test is unlikely to detect that effect.
For example, if statistical power is 80%, a hypothesis test has an 80% chance of detecting an effect that actually exists. Now imagine you’re performing a study that has only 10%. That’s not good because the test is far more likely to miss the effect.
In this post, learn about statistical power, why it matters, how to increase it, and calculate it for a study.
Why Power in Statistics Matters
In all hypothesis tests, the researchers are testing an effect of some sort. It can be the effectiveness of a new medication, the strength of a new product, etc. There is a relationship or difference between groups that the researchers hope to identify. Learn more about Effects in Statistics.
Unfortunately, a hypothesis test can fail to detect an effect even when it does exist. This problem happens more frequently when the test has low statistical power.
Consequently, power is a crucial concept to understand before starting a study. Imagine the scenario where an effect exists in the population, but the test fails to detect it in the sample. Not only did the researchers waste their time and money on the project, but they’ve also failed to identify an effect that exists. Consequently, they’re missing out on the benefits the effect would have provided!
Clearly, researchers want an experimental design that produces high statistical power! Unfortunately, if the design is lacking, a study can be doomed to fail from the start.
Power matters in statistics because you don’t want to spend time and money on a project only to miss an effect that exists! It is vital to estimate the power of a statistical test before beginning a study to help ensure it has a reasonable chance of detecting an effect if one exists.
Statistical Power and Hypothesis Testing Errors
To better understand power in statistics, you first need to know why and how hypothesis tests can make incorrect decisions.
Related post: Overview of Hypothesis Testing
Why do hypothesis tests make errors?
Hypothesis tests use samples to draw conclusions about entire populations. Researchers use these tests because it’s rarely possible to measure a whole population. So, they’re stuck with samples.
Unfortunately, samples don’t always accurately reflect the population. Statisticians define sampling error as the difference between a sample and the target population. Occasionally, this error can be large enough to cause hypothesis tests to draw the wrong conclusions. Consequently, statistical power becomes a crucial issue because increasing it reduces the chance of errors. Learn more about Sampling Error: Definition, Sources & Minimizing.
How do they make errors?
Samples sometimes show effects that don’t exist in the population, or they don’t display effects that do exist. Hypothesis tests try to manage these errors, but they’re not perfect. Statisticians have devised clever names for these two types of errors—Type I and Type II errors!
- Type I: The hypothesis test rejects a true null hypothesis (false positive).
- Type II: Test fails to reject a false null (false negative).
Power in statistics relates only to type II errors, the false negatives. The effect exists in the population, but the test doesn’t detect it in the sample. Hence, we won’t deal with Type I errors for the rest of this post. If you want to know more about both errors, read my post, Types of Errors in Hypothesis Testing.
The Type II error rate (known as beta or β) is the probability of a false negative for a hypothesis test. Furthermore, the inverse of Type II errors is the probability of correctly detecting an effect (i.e., a true positive), which is the definition of statistical power. In mathematical terms, 1 – β = the statistical power.
For example, if the Type II error rate is 0.2, then statistical power is 1 – 0.2 = 0.8. It logically follows that a lower Type II error rate equates to higher power.
Analysts are typically more interested in estimating power than beta.
How to Increase Statistical Power
Now that you know why power in statistics is essential, how do you ensure that your hypothesis test has high power?
Let’s start by understanding the factors that affect power in statistics. The following conditions increase a hypothesis test’s ability to detect an effect:
- Larger sample sizes.
- Larger effect sizes.
- Lower variability in the population.
- Higher significance level (alpha) (e.g., 5% → 10%).
Of these factors, researchers typically have the most control over the sample size. Consequently, that’s your go-to method for increasing statistical power.
Effect sizes and variability are often inherent to the subject area you’re studying. Researchers have less control over them than the sample size. However, there might be some steps you can take to increase the effect size (e.g., larger treatments) or reduce the variability (e.g., tightly controlled lab conditions).
Do not choose a significance level to increase statistical power. Instead, set it based on your risk tolerance for a false positive. Usually, you’ll want to leave it at 5% unless you have a compelling reason to change it. To learn more, read my post about Understanding Significance Levels.
Studies typically want at least 80% power, but sometimes they need even more. How do you plan for a study to have that much capability from the start? Perform a power analysis before collecting data!
A statistical power analysis helps determine how large your sample must be to detect an effect. This process requires entering the following information into your statistical software:
- Effect size estimate
- Population variability estimate
- Statistical power target
- Significance level
Notice that the effect size and population variability values are estimates. Typically, you’ll produce these estimates through literature reviews and subject-area knowledge. The quality of your power analysis depends on having reasonable estimates!
After entering the required information, your statistical software displays the sample size necessary to achieve your target value for statistical power. I recommend using G*Power for this type of analysis. It’s free!
I’ve written an article about this process in more detail, complete with examples. How to Calculate Sample Size Needed for Power.
For readers who are up for a bit more complex topic, failing to detect an effect is not the only problem with low power studies. When such a study happens to have a significant result, it will report an exaggerated effect size! For more information, read Low Power Tests Exaggerate Effect Sizes.
Michael Wen says
I am a physician,and I am a die-hard fan of Jim’s series on statistics which help me a lot to establish an intuitive understanding of various statistical concepts and procedures.Therefore,thanks in earnest for your fabulous books.I wonder whether you could consider writing another one on modern beyesian statistical analysis of data since now the frequentis approaches are under scathing attacks by many statisticians especially the NHST and many of them strongly advocate using beyesian methods as alternative instead.However,as a applied users of statistics with very little and weak mathematics background,I found it extremely hard to grasp these beyesian methods,so I am eager to read such one written by Jim.Thx for your consideration.