Determining a good sample size for a study is always an important issue. After all, using the wrong sample size can doom your study from the start. Fortunately, power analysis can find the answer for you. Power analysis combines statistical analysis, subject-area knowledge, and your requirements to help you derive the optimal sample size for your study.
Statistical power in a hypothesis test is the probability that the test will detect an effect that actually exists. As you’ll see in this post, both under-powered and over-powered studies are problematic. Let’s learn how to find a good sample size for your study! Learn more about Statistical Power.
When you perform hypothesis testing, there is a lot of preplanning you must do before collecting any data. This planning includes identifying the data you will gather, how you will collect it, and how you will measure it among many other details. A crucial part of the planning is determining how much data you need to collect. I’ll show you how to estimate the sample size for your study.
Before we get to estimating sample size requirements, let’s review the factors that influence statistical significance. This process will help you see the value of formally going through a power and sample size analysis rather than guessing.
Related post: 5 Steps for Conducting Scientific Studies with Statistical Analyses
Factors Involved in Statistical Significance
Look at the chart below and identify which study found a real treatment effect and which one didn’t. Within each study, the difference between the treatment group and the control group is the sample estimate of the effect size.
Did either study obtain significant results? The estimated effects in both studies can represent either a real effect or random sample error. You don’t have enough information to make that determination. Hypothesis tests incorporate these considerations to determine whether the results are statistically significant.
- Effect size: The larger the effect size, the less likely it is to be random error. It’s clear that Study A exhibits a more substantial effect in the sample—but that’s insufficient by itself.
- Sample size: Larger sample sizes allow hypothesis tests to detect smaller effects. If Study B’s sample size is large enough, its more modest effect can be statistically significant.
- Variability: When your sample data have greater variability, random sampling error is more likely to produce considerable differences between the experimental groups even when there is no real effect. If the sample data in Study A have sufficient variability, random error might be responsible for the large difference.
Hypothesis testing takes all of this information and uses it to calculate the p-value—which you use to determine statistical significance. The key takeaway is that the statistical significance of any effect depends collectively on the size of the effect, the sample size, and the variability present in the sample data. Consequently, you cannot determine a good sample size in a vacuum because the three factors are intertwined.
Related post: How Hypothesis Tests Work
Statistical Power of a Hypothesis Test
Because we’re talking about determining the sample size for a study that has not been performed yet, you need to learn about a fourth consideration—statistical power. Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. The power of the test depends on the other three factors.
For example, if your study has 80% power, it has an 80% chance of detecting an effect that exists. Let this point be a reminder that when you work with samples, nothing is guaranteed! When an effect actually exists in the population, your study might not detect it because you are working with a sample. Samples contain sample error, which can occasionally cause a random sample to misrepresent the population.
Related post: Types of Errors in Hypothesis Testing
Goals of a Power and Sample Size Analysis
Power analysis involves taking these three considerations, adding subject-area knowledge, and managing tradeoffs to settle on a sample size. During this process, you must rely heavily on your expertise to provide reasonable estimates of the input values.
Power analysis helps you manage an essential tradeoff. As you increase the sample size, the hypothesis test gains a greater ability to detect small effects. This situation sounds great. However, larger sample sizes cost more money. And, there is a point where an effect becomes so minuscule that it is meaningless in a practical sense.
You don’t want to collect a large and expensive sample only to be able to detect an effect that is too small to be useful! Nor do you want an underpowered study that has a low probability of detecting an important effect. Your goal is to collect a large enough sample to have sufficient power to detect a meaningful effect—but not too large to be wasteful.
As you’ll see in the upcoming examples, the analyst provides numeric values that correspond to “a good chance” and “meaningful effect.” These values allow you to tailor the analysis to your needs.
All of these details might sound complicated, but a statistical power analysis helps you manage them. In fact, going through this procedure forces you to focus on the relevant information. Typically, you specify three of the four factors discussed above and your statistical software calculates the remaining value. For instance, if you specify the smallest effect size that is practically significant, variability, and power, the software calculates the required sample size.
Let’s work through some examples in different scenarios to bring this to life.
2-Sample t-Test Power Analysis for Sample Size
Suppose we’re conducting a 2-sample t-test to determine which of two materials is stronger. If one type of material is significantly stronger than the other, we’ll use that material in our process. Furthermore, we’ve tested these materials in a pilot study, which provides background knowledge for the estimates.
In a power and sample size analysis, statistical software presents you with a dialog box something like the following:
We’ll go through these fields one-by-one. First off, we will leave Sample sizes blank because we want the software to calculate this value.
Differences
Differences is often a confusing value to enter. Do not enter your guess for the difference between the two types of material. Instead, use your expertise to identify the smallest difference that is still meaningful for your application. In other words, you consider smaller differences to be inconsequential. It would not be worthwhile to expend resources to detect them.
By choosing this value carefully, you tailor the experiment so that it has a reasonable chance of detecting useful differences while allowing smaller, non-useful differences to remain potentially undetected. This value helps prevent us from collecting an unnecessarily large sample.
For our example, we’ll enter 5 because smaller differences are unimportant for our process.
Power values
Power values is where we specify the probability that the statistical hypothesis test detects the difference in the sample if that difference exists in the population. This field is where you define the “reasonable chance” that I mentioned earlier. If you hold the other input values constant and increase the test’s power, the required sample size also increases. The proper value to enter in this field depends on norms in your study area or industry. Common power values are 0.8 and 0.9.
We’ll enter a power of 0.9 so that the 2-sample t-test has a 90% chance of detecting a difference of 5.
Standard deviation
Standard deviation is the field where we enter the data variability. We need to enter an estimate for the standard deviation of material strength. Analysts frequently base these estimates on pilot studies and historical research data. Inputting better variability estimates will produce more reliable power analysis results. Consequently, you should strive to improve these estimates over time as you perform additional studies and testing. Providing good estimates of the standard deviation is often the most difficult part of a power and sample size analysis.
For our example, we’ll assume that the two types of material have a standard deviation of 4 units of strength. After we click OK, we see the results.
Related post: Measures of Variability
Interpreting the Statistical Power Analysis and Sample Size Results
Statistical power and sample size analysis provides both numeric and graphical results, as shown below.
The text output indicates that we need 15 samples per group (total of 30) to have a 90% chance of detecting a difference of 5 units.
The dot on the Power Curve corresponds to the information in the text output. However, by studying the entire graph, we can learn additional information about how statistical power varies by the difference. If we start at the dot and move down the curve to a difference of 2.5, we learn that the test has a power of approximately 0.4 (40%). This power is too low. However, we indicated that differences less than 5 were not practically significant to our process. Consequently, having low power to detect a difference of 2.5 is not problematic.
Conversely, follow the curve up from the dot and notice how power quickly increases to nearly 100% before we reach a difference of 6. This design satisfies the process requirements while using a manageable sample size of 15 per group.
Other Power Analysis Options
Now, let’s explore a few more options that are available for power analysis. This time we’ll use a one-tailed test and have the software calculate a value other than sample size.
Suppose we are again comparing the strengths of two types of material. However, in this scenario, we are currently using one kind of material and are considering switching to another. We will change to the new material only if it is stronger than our current material. Again, the smallest difference in strength that is meaningful to our process is 5 units. The standard deviation in this study is now 7. Further, let’s assume that our company uses a standard sample size of 20, and we need approval to increase it to 40. Because the standard deviation (7) is larger than the smallest meaningful difference (5), we might need a larger sample.
In this scenario, the test needs to determine only whether the new material is stronger than the current material. Consequently, we can use a one-tailed test. This type of test provides greater statistical power to determine whether the new material is stronger than the old material, but no power to determine if the current material is stronger than the new—which is acceptable given the dictates of the new scenario.
In this analysis, we’ll enter the two potential values for Sample sizes and leave Power values blank. The software will estimate the power of the test for detecting a difference of 5 for designs with both 20 and 40 samples per group.
We fill in the dialog box as follows:
And, in Options, we choose the following one-tailed test:
Interpreting the Power and Sample Size Results
The statistical output indicates that a design with 20 samples per group (a total of 40) has a ~72% chance of detecting a difference of 5. Generally, this power is considered to be too low. However, a design with 40 samples per group (80 total) achieves a power of ~94%, which is almost always acceptable. Hopefully, the power analysis convinces management to approve the larger sample size.
Assess the Power Curve graph to see how the power varies by the difference. For example, the curve for the sample size of 20 indicates that the smaller design does not achieve 90% power until the difference is approximately 6.5. If increasing the sample size is genuinely cost prohibitive, perhaps accepting 90% power for a difference of 6.5, rather than 5, is acceptable. Use your process knowledge to make this type of determination.
Use Power Analysis for Sample Size Estimation For All Studies
Throughout this post, we’ve been looking at continuous data, and using the 2-sample t-test specifically. For continuous data, you can also use power analysis to assess sample sizes for ANOVA and DOE designs. Additionally, there are hypothesis tests for other types of data, such as proportions tests (binomial data) and rates of occurrence (Poisson data). These tests have their own corresponding power and sample analyses.
In general, when you move away from continuous data to these other types of data, your sample size requirements increase. And, there are unique intricacies in each. For instance, in a proportions test, you need a relatively larger sample size to detect a difference when your proportion is closer 0 or 1 than if it is in the middle (0.5). Many factors can affect the optimal sample size. Power analysis helps you navigate these concerns.
After reading this post, I hope you see how power analysis combines statistical analyses, subject-area knowledge, and your requirements to help you derive the optimal sample size for your specific needs. If you don’t perform this analysis, you risk performing a study that is either likely to miss an important effect or have an exorbitantly large sample size. I’ve written a post about a Mythbusters experiment that had no chance of detecting an effect because they guessed a sample size instead of performing a power analysis.
In this post, I’ve focused on how power affects your test’s ability to detect a real effect. However, low power tests also exaggerate effect sizes!
Finally, experimentation is an iterative process. As you conduct more studies in an area, you’ll develop better estimates to input into power and sample size analyses and gain a clearer picture of how to proceed.
Hi again Jim, apologies if this was posted multiple times but I looked into the Bonferroni Correction and saw that this was the equation αnew = αoriginal / n
where:
αoriginal: The original α level
n: The total number of comparisons or tests being performed
Seeing this would 6000 or 1000 be the n in my case? Would I also have to perform this once or more then once. Second question after finding this out when performing the power analysis that you mentioned do I have to do it multiple times to account for the different combinations with the states that I will match with each other.
Hi Alex,
In this context, n is the number of comparisons between groups. If you want to compare all groups to each other (i.e., all pairwise comparisons), then with 6 groups you’ll have 15 comparisons. So, n = 15. However, you don’t necessarily need to compare all groups. It depends on your research question. If you can avoid all pairwise comparisons, it’s a good thing. Just decide on your comparisons and record it in your plans before proceeding with the project. If you wait until after analyzing the data, you might (even if subconsciously) be tempted to cherry pick the comparisons that give good results.
As an example of an alternative to all pairwise comparisons, you might compare five of the states to one reference state in your sample. That reduces the pairwise comparisons (n) from 15 to 5. That helps because you’re dividing alpha by the number of comparisons. A lower n won’t lower your Bonferroni corrected significance level as much:
0.05/15 = 0.003
0.05/5 = 0.01
You’ll need an extremely low p-value with 15 comparisons (< 0.003) to get a significant result. Hence, you're more likely to detect an effect with 5 comparisons vs. 15! But let your research question guide you. Which comparisons do you need to perform to answer your research question? I cover all this topic in my post about Using Post Hoc Tests with ANOVA. Of course, you’re not working with ANOVA. But if you need information about what and why you need to control the familywise error rate, it’ll be helpful. The same ideas will apply to the multiple comparisons you’re making with the 2 proportions test. In your case, if you go with 15 comparisons (all pairwise for the 6 states), your familywise error rate is 0.54. Over a 50% chance of a false positive!
Hello again Jim, I looked on your other page about the margin of error and I had a few extra questions. The approach I would be taking is as you said taking with using 1000 people from each for a comparison with the surveys. I saw the formula that you had so would my confidence level for this instance be 95%? Also as your formula is listed would my bottom number be 1000 as well or would it be 6000, or would I have to complete this one instead Finding the Margin of Error for Other Percentages formula.
Hi Alex,
Typically, surveys don’t delve so deep into statistical differences between groups in the responses. At least not that I’ve seen. Usually, they’ll calculate and report the margin of error. If the margins don’t overlap, you can assume the difference is statistically significant. However, as I point out in the margin of error post, that process is conservative because the difference can be statistically significant even with a little overlap.
What you need to do for your cases is perform a power analysis for a two-sample proportions test. That’s beyond what most public opinion surveys do but will get you the answers you need. In your case, the proportions you’re testing are the proportion of individual in state A who respond a particular way to a survey item and the other will be the proportion in state B who respond that way to the item.
I didn’t realize that you were performing hypothesis testing with your survey data, or I would’ve mentioned this from the start! Because you’re comparing six states, you’re also facing the problem of the multiple comparison increasing the familywise error rate for that set of comparison. You’ll need to use something like a Bonferroni correction to appropriately lower the significance level you use, which will affect the numbers you need for a particular power.
I hope that helps!
Hello Jim, I am hoping you can have some guidance for me here. I am currently doing an assignment involving this subject here and my professor said this statement to me, There’s no rationale for the six thousand surveys. How did you arrive at your sample size? You need to report the power analysis (and numbers you used in that analysis) to arrive at your chosen sample size–like everything else in scientific writing the sample size needs justification. My study involves six states and getting specific individuals opinions from each state about there opinions on crime and how it has affected them. Surveys are my choice of use here, so my question is how would I come about to a sample size here. I had thought 6,000 was a starting point but am unsure if thats right?
Hi Alex,
With surveys you typically calculate the sample size to produce a specific margin of error. Click the link to learn more about that and how to tell whether there are differences. It’s a little different process that power analysis in other contexts but it’s related. The big questions are how precise do you want your estimates to be? And if you have groups you want to compare, that can affect the calculates.
For instance, 6,000 would generally be considered a large sample size for survey research. However, if you’re comparing subgroups within your sample, that can affect how many you need. I don’t know if you plan to do this or not, but if you wanted to compare the differences between the six states, that means you’d have about 1,000 per state. That’s still fairly decent but you’ll have a larger margin of error. You’ll need to know whether your primary interest is estimates for the total sample or differences between subgroups. If it’s differences between subgroups, that always increases your required sample size.
That’s not to say that 1000 per state isn’t enough. I don’t know. But you’d do the margin of error calculations to see if it produces sufficient precision for your needs. The process involves a combination of doing the MoE calculations and knowing the required precision (or possibly standards in your subject area).
So can a “power analysis’ be done to get the sample size for a proposed survey instead of calculating for the sample size? In other words, is a “power analysis” the same as calculating for the sample size when doing a research study? Thank you.
Hi Ronrico,
There’s definitely a related concept. For surveys, you typically need to calculate the margin of error. Click the link to read my post about it!
Hi Jim,
Wonderful post!
I was wondering, how would I be able to determine if a sample size is large enough for a paper that I’m reading, assuming they do not give the power calculation?
If they d give the power calculation, should the be 80% or over for stat sig results?
Thank you so much 🙂
Hi Ramona,
Determining whether a study’s sample size and, hence, its statistical power, are sufficient isn’t quite as straightforward as it might appear. It’s tempting to take the study’s sample size, effect size, and variability and enter them into a power analysis. However, that’s problematic. What happens is that if the study has statistically significant findings the power analysis will always indicate sufficient sample size/power. However, if the study has non-significant results, the power analysis will always indicate that the sample size/power are insufficient.
That’s a problem because it’s possible to obtain significant results with low power studies and insignificant results with high power studies. It’s important recognize all these cases because significant low power studies will exaggerate the effects sizes and insignificant high power studies are more likely to indicate that the effect does not exist in the population.
What you need to do instead is enter the study’s sample size, use a literature review to obtain reasonable estimates of the variability (if possible), and then enter an effect size that represents either the literature’s collective best estimate of it or a minimum sample size that is still practically meaningful. Note that you are not using the study’s estimates for these calculations for the reasons I indicate earlier!
Hi Sir Jim!
I I’d like to know how I can utilize the GPOWER Calculator to figure out the sample size for my study. It essentially employed stratified random sampling. I’m hoping you’ll respond! best wishes!
Hi Ellie,
It depends on how you’ve conducted your stratified sampling and what you want to test. Are you comparing the strata within your sample? If so, you’d just select the type of test, such as a t-test, and then enter your values. G Power uses the default setting that your groups size are equal. That’s fine if you’re using a disproportionate stratified sampling design and set all your strata to the same size. However, if your strata sizes are unequal, you’ll need to adjust the allocation ratio.
Hello Jim. I want your help in calculating sample size for my study. I have three groups, first group is control (normal), second is a clinical population group undergoing treatment 1 and third colonics group (same disease as group2) undergoing treatment 2. So here I will compare some parameters between pre-post treatment for group 2 and 3 separately first. Then compare group 2 and 3 before treatment and after treatment and then compare baseline parameters and after treatment parameters across all three groups. I hope I have not confused you. I want to know the sample size for my three groups. My hypothesis is that the two treatments will improve the parameters in group 2 and 3, what I want to check is which treatment (1 or 2) is most effective.. I request you to kindly help me in this regard
Dear Jim,
I have question regarding calculating the sample size in this scenario: I’m doing a hospital based study (chart review study) where i will include all patients who have a specific disease (celiac disease) in the last 5 years. How would i know that the number which i will get is sufficient to answer my research questions considering that this disease is rare? suppose for example i ended up with 100 patients, how would i know that i can use this sample for further analysis ? Is there a way to calculate ahead the minimum number of patients needed to do my research?
Hello,
I am looking to determine the sample size necessary to detect differences in bird populations (composition and abundance) between forest treatment types. I assume I would use an ANOVA given that I have control units. My data will be bird occurrence data, so I imagine Poisson distribution. I have zero pilot data, though. Do you have any recommendations for reading up on ways to simulate or bootstrap data in this situation for use in making variability estimates?
Thank you!!
Hi Lorelle,
Yes, I’d think you’d use something Poisson regression or negative binomial regression because of the count data. I write a little bit about them in my post about choosing the correct type of regression analysis. You can include categorical variables for forest types.
I don’t have good ideas for developing variability estimates. That can be the most difficult part of a power analysis. I’d recommend reading up on the literature as much as possible. Perhaps others have conduct similar research and you can use their estimates. Unfortunately, if you don’t have any data, you can’t bootstrap or simulate it.
I wish I had some better advice, but the best I can think of is to look through the literature for comparable studies. That’s always a good idea anyway, but here it’ll help you with the power analysis too.
I am confused in some parts as I am new to this, let’s assume I have difference in mean, standard error, power 80%, I have these information to get a sample size, (delta, sd, power). But question is how I would know this is correct sample size to get 80% power? which type I need to put paired or two.sample or one.sample? After power.t.test I get sample size 8.7 for two sample and 6 for one sample, I am not sure which would be correct one. How to determine that?
Hi Zakson,
The correct test depends on the nature of the data you collect. Are you comparing the means of two groups? In that case, you need to use a 2-sample t-test. If you have one group and are comparing its mean to a test value, you need a 1-sample t-test.
You can read about the purposes and interpretations the various t-tests in my post about How to do t-tests in Excel. That should be helpful even if you’re not using Excel. Also, I write more about how t-tests work, which will be helpful in showing you what each test can do.
I hope that helps!
Hey there! What sort of test would be best to determine sample size needed for a study determining a 10% difference between two groups at a power of say 80%? Thanks!
Hi Kristin, you’d need to perform a power and sample size analysis for a 2-sample t-test. As I indicate in this post, you’ll need to supply an estimate of the population’s standard deviation, the difference you want to detect, and the power, and the procedure will tell you the sample size per group.
I have an essay question if anyone can help me with:
Do a calculation: write down what you think the typical power of psychological study really is and what percentage of research hypotheses are “good” hypotheses. Assume that journals reserve 10% of their pages for publishing null results. Under these assumptions, what percentage of published psychological research is wrong? Do you agree that this analysis make sense or is this the wrong way to think about “right” and “wrong” research
Hi Hayley,
I can’t do your essay for you, but I’ve written two blog posts that should be extremely helpful for your assignment.
Reproducibility in Psychology Experiments
Low power tests exaggerate effect sizes
Those two should give you some good food for thought!
Dear Jim
I have a question regrading sample size calculation for a laboratory study. The laboratory evaluation includes evaluation of marginal integrity of 2 dental material vs a control material? what type of test should I use ?
Hi Eman, that largely depends on the type of data you’re collecting for your outcome. If marginal integrity is continuous data and you want to compare the means between the control and two treatment groups, one-way ANOVA is a great place to start.
Hi Jim, what if I want to run mixed model ANOVAS twice (on two different dependent variables) – would I have to then double the sample size that I calculated using g power? Thanks, Joanna
Hi Jim. What about molecular data? For instance, I sequenced my 6 samples, 3 controls and 3 treatments, but each sample (tank replicate) consist of 500-800 individuals of biological replicates (larvae). Given the analysis after sequencing is that there are thousand of genes that may show mean differences between the control and treatment. My concern is, does power analysis still play a fair role here, given that increasing the “sample size” which is the number of tank replicate to a number of 5 or more suggested by power analysis to get >0.8 is nearly impossible in a physical setting?
Hi Jim,
I have somewhat of a basic question. I am performing some animal studies and looking at the effect of preservation solution on ischemia repercussion injury following transplantation. I am comparing 5 different preservation solutions. What should be my sample size for each group? I want to know how exactly I can calculate that.
Thanks
Hi Siba,
You’ll need to have an estimate of the effect. Or, an estimate of the minimum effect size that is practically meaningful in a real-world sense. If you’re comparing means, you’ll also need an estimate of the variability. The nature of what and how to determine the sample size depends on the type of hypothesis test you’ll be using. That in turn depends on the nature of your outcome variable. Are you comparing means with continuous data or comparing proportions with binary data? But in all cases you’ll need that effect size estimate.
You’ll also need software to calculate that for you. I recommend a freeware program called G*Power. Although, most statistical applications can do these power calculations. I cover examples in this post that should be helpful for you.
If you have 5 solutions and you want to compare their means, you’ll need to perform power and sample size calculations for one-way ANOVA.
Hi Jim,
I’ve calculate that I need 34 pairs for a paired t-test with an alpha=0.05 and beta=0.10 with standard deviation of 1.945 to detect a 1.0 increase in the difference. If after 5 pairs I run my hypothesis tests and I find that the difference is significant (i.e. I reject the null hypothesis) is there a need to complete the remaining 29 pairs?
Thanks,
Sam
Thank you for the explanation. I am currently using G power to determine my sample size. But I am still confused about the effect size. Let say I use medium effect size for conducting a correlation, so sample size that have been suggested is 138 (example) but then when I use medium effect size for conducting a t test to find differences between two independent group, the sample size that have been suggested is 300 (example). So which sample size I should take? Does the same effect size need to be use for every statistical test? or actually each statistical test have different effect size?
HI Jim
I want to calculate the sample size for my animal studies. We have designed a novel neural probe and want to perform experiment to test the functionality of these probes in rat brain. As this a binary study i.e. either probe works or don’t work (success or failure) and its a new technology so its lacking any previous literature.
Can anyone please suggest me which statistical analysis (test) I should use and what parameters i.e. effect size should I use. I am using G power and looking for 95% confidence level.
Thanks in Advance
Vishal
Hi Vishal,
It sounds like you need to use a 2-sample proportions test. It’s one of the many hypothesis tests that I cover in my new Hypothesis Testing ebook. You’ll find the details about how and why to use it, assumptions, interpretations and examples for it.
As for using G*Power to estimate power and sample size, under the Test family drop-down list, choose Exact. Under the Statistical test drop-down, choose Proportions: Inequality, two independent groups (Fisher’s exact test). That assumes that your two groups have different probes. From there, you’ll need to enter estimates for your study based on whatever background subject-area research/knowledge you have.
I hope this helps!
Hi Jim,
Is that scientifically appropriate to use G*Power in sample size calculation of a clinical biomedical research?
Hi, yes, G*Power should be appropriate to use for statistical analyses in any area. Did you have a specific concern about it?
Hi Everyone
I want to calculate the sample size for my animal studies. We have designed a novel neural probe and want to perform experiment to test the functionality of these probes in rat brain. As this a binary study i.e. either probe works or don’t work (success or failure) and its a new technology so its lacking any previous literature.
Can anyone please suggest me which statistical analysis (test) I should use and what parameters i.e. effect size should I use. I am using G power and looking for 95% confidence level.
Thanks in Advance
Vishal
Thank you, Jim, for the app reference. I am checking it out right now. #TeamNoSleep
Hi Jamie, Ah, yes, #TeamNoSleep. I’ve unfortunately been on that team! 🙂
Hi Jim,
What is the name of the software you use?
Hi Hamed,
I’m using Minitab statistical software. If you’d like free software to calculate power and sample sizes, I highly recommend G*Power.
Hi Jim,
I would like to calculate power for a poisson regression (my DV consists of count data). Do you have any guidance on how to do so?
Hi Veronica,
Unfortunately, I’m not familiar with an application will calculation power for Poisson regression. If your counts are large enough (lambda greater than 10), Poisson approximates a normal distribution. You might then be able to use power analysis for linear multiple regression, which I have seen in the free application G*Power. That might give you an idea at least. I’m not sure about power analysis specifically for Poisson regression.
Dear Jim, your post looks very nice. I have just one comment: how I could calculate the sample size and power for an “Equal variances” test comparing more than 2 samples ? Is it mandatory as in t-tests ? Which is the test statistic used in that test ?
Thanks in advance for your tip
Hi Ciro, to be honest, I’ve never seen a power analysis for an equal variances test with more than two samples!
The test statistic depends upon which of several methods you use, F-test, Levene’s test statistic, and Bartlett’s test statistic.
While it would be nice to estimate power for this type of test, I don’t think it’s a common practice and I haven’t seen it available in the software I have checked.
Why are the sample sizes here all so small?
Hi Meng,
For sample sizes, large and small are relative. Given the parameters entered, which include the effect size you want to detect, the properties of the data, and the desired powered, the sample sizes are exactly the correct size! Of course, you’re always working estimates for these values and there’s a chance your estimates are off. But, the proper sample size depends on the nature of all those properties.
I’m curious, was there some reason why you were expecting larger sample sizes? Some times you’ll see big studies, such as medical trials. In some cases with lives on the line, you’ll want very large sample sizes that go beyond just issue of statistical power. But, for many scientific studies where the stakes aren’t so high, they use the approach described here.
Does he formula n equals z times standard deviation decided by margin of error all squared already a power analysis? I’m looking for power analysis for just estimating a statistic (descriptive statistics) and not hypothesis testing as in many cases of inferential statistics. Does that formula suffice? Thanks in advanced 😊
Hi Andrew,
You might not realize it, but you’re asking me a trick question! The answer for how you calculate power for descriptive statistics is that you don’t calculate power for descriptive statistics.
Descriptive statistics simply describe the characteristics of a particular group. You’re not making inferences about a larger population. Consequently, there is no hypothesis testing. Power relates to the probability that a hypothesis test will detect a population effect that actually exists. Consequently, if there is no hypothesis test/inferences about a population, there’s no reason to calculate power.
Relatedly, descriptive statistics do not involve a margin of error based on random sampling. The mean of a group is a specific known value without error (excluding measurement error) because you’re measuring all members of that group.
For more information about this topic, read my post about the differences between descriptive and inferential statistics.
Hi sir,
Just wanted to understand, if the confidence interval and power is same.
Thanks for your explanation, Jim.
Jim,
I would like to design a test for the following problem
(under the assumption that the Poisson distribution applies):
Samples from a population can be either defective or not (e.g. some technical component from a production)
Out of a random sample of N, there should be at most k defective occurrences, with a 95% probability (e.g. N = 100’000, k = 30).
I would like to design a test for this (testing this Hypothesis) with a sample size N1 (different from N).
What should my limit on k1 (defective occurrences from the sample of N1) be?
Such that I can say that with a 95% confidence, there will be at most k occurrences out of N samples.
E.g. N2 = 20’000. k1 = ???
Any hints how to tackle this problem?
Many thanks in advance
Tom
Hi Tom,
To me, it sounds like you need to use the binomial distribution rather than the Poisson distribution. You use the binomial distribution when you have binary data and you know the probability of an event and the number of trials. That’s sounds like you’re scenario!
In the graph below, I illustrate a binomial distribution where we assume the defect rate is 0.001 and the sample size is 100,000. I had the software shade the upper and lower ~2.5% of the tails. 95% of the outcomes should fall within the middle.
If you have sample data, you can use the Proportions hypothesis test, which is based on the binomial distribution. If you have a single sample, use the Proportions test to determine whether your sample is significantly different from a target probability and to construct a confidence interval.
I hope this help!
Hi Jim,
Thanks very much for putting together this very helpful and informative page. I just have a quick question about statistical power: it’s been surprisingly difficult for me to locate an answer to it in the literature.
I want to calculate the sample size required in order to reach a certain level of a priori statistical power in my experiment. My question is about what ‘sample size’ means in this type of calculation. Does it mean the number of participants or the number of data points? If there is one data point per participant, then these numbers will obviously be the same. However, I’m using a mixed-effects logistic regression model in which there are multiple data points nested within each participant. (Each participant produces multiple ‘yes/no’ responses.)
It would seem odd if the calculation of a priori statistical power did not differentiate between whether each participant produces one response or multiple responses.
Gavin
Thank you so much sir for the lucid explanation. Really appreciate your kind help. Many Thanks!
Dear sir,
When i search online for sample size determination, i predominantly see mention of margin of error formula for its calculation.
At other places, like your website, i see use of effect size and desired power etc. for the same calcation.
I’m struggling to reconcile between these 2 approaches. Is there a link between the two?
I wish to determine sample size for testing a hypothesis with sufficient power, say 80% or 90%. Please guide me.
Hi Khalid, a margin of error (MOE) quantifies the amount of random sampling error in the estimation of a parameter, such as the mean or proportion. MOEs represent the uncertainty about how well the sample estimates from a study represent the true population value and are related to confidence intervals. In a confidence interval, the margin of error is the distance between the sample estimate and each endpoint of the CI.
Margins of error are commonly used for surveys. For example, if a survey result is that 75% of the respondents like the product with a MOE of 3 percent. This result indicates that we can be 95% confident that 75% +/- 3% (or 72-78%) of the population like the product.
If you conduct a study, you can estimate the sample size that you need to achieve a specific margin of error. The narrower the MOE, the more precise the estimate. If you have requirements about the precision of the estimates, then you might need to estimate the margin of error based on different sample sizes. This is simply one form of power and sample size analysis where the focus is on how sample sizes relate to the margin of error.
However, if you need to calculate power to detect an effect, use the methods I describe in this post.
In summary, determine what your requirements are and use the corresponding analysis. Do you need to estimate a sample size that produces a level of precision that you specify for the estimates? Or, do you need to estimate a sample size that produces an amount of power to detect a specific size effect? Of course, these are related questions and it comes down to what you want to input in as your criteria.
I hope this helps!
受益匪浅,感触良多!
Jim,
Thank you so much for this very intuitive article on sample size.
Thank you,
Ashwini
Hi Ashwini, you’re very welcome! I’m glad it was helpful!
Thank you.This was very helpful
You’re very welcome, Hellen! I’m glad you found it to be helpful!
Thanks for your answer Jim. I was indeed aware of this tool, which is great for demonstration. I think I’ll stick to it.
Awaiting your book!
Thanks! If all goes well, the first one should be out in September 2018!
Once again, a nice demonstration. Thanks Jim.
I was wondering which software you used in your examples. Is it, perhaps, R or G*Power? And, would you have any suggestions on an (online/offline) tool that can be used in class?
Thanks!
Hi George, thank you very much! I’m glad it was helpful! I used Minitab for the examples, but I would imagine that most statistical software have similar features.
I found this interactive tool for displaying how power, alpha, effect size, etc. are related. Perhaps this is what you’re looking for?
Thanks for information, please explain for case- control study, sample size calculation if different study says different prevalence for different parameter.
Thnks sir ….
Wana to salute uh. ……bt to far
Sir send me sme articles on distributions of probability. ..
MOST KNDNSS