Effect Sizes in Statistics

By Jim Frost 25 Comments

Effect sizes in statistics quantify the differences between group means and the relationships between variables. While analysts often focus on statistical significance using p-values, effect sizes determine the practical importance of the findings.

Photograph of different sized shoes to illustrate effect sizes in statistics. — Effect sizes can be small, medium, and large!

In experiments and other studies, analysts typically assess relationships between variables. Effect sizes represent the magnitude of a relationship between variables. For example, you might want to know whether average health outcomes differ between the control group and a treatment group receiving a new medicine. Or, you might want to determine whether processing temperatures relate to a product’s strength.

Effect sizes tell you whether these relationships are strong or weak. Do these variables have a large or negligible impact on the outcome? The experimental medicine might improve health outcomes, but is it a trivial or substantial improvement? This type of information is crucial in determining whether the effect is meaningful in real-world applications.

Effect sizes come in two general flavors, unstandardized and standardized. Depending on your field, you might be more familiar with one or the other.

In this post, you’ll learn about both unstandardized and standardized effect sizes. Specifically, we’ll look at the following effect sizes:

Unstandardized: Mean differences between groups and regression coefficients
Standardized: Correlation coefficients, Cohen’s d, eta squared, and omega squared.

Finally, I close the post by explaining the difference between statistical significance and effect sizes, and why you need to consider both.

Unstandardized Effect Sizes

Unstandardized effect sizes use the natural units of the data. Using the raw data units can be convenient when you intuitively understand those units. This is often the case with tangible concepts, such as weight, money, temperature, etc.

Let’s look at two common types of unstandardized effect sizes, the mean difference between groups and regression coefficients.

Mean Differences between Groups

This one is simple. Just subtract group means to calculate the unstandardized effect size

Difference Between Group Means = Group 1 Mean – Group 2 Mean

Group 1 and 2 can be the means of the Treatment and Control groups, the Posttest and pretest means, two different types of treatments, and so on.

For example, imagine we’re developing a weight loss pill. The control group loses an average of 5kg while the treatment group loses an average of 15 during the study. The effect size is 15 – 5 = 10 kg. That’s the mean difference between the two groups.

Because you are only subtracting means, the units remain the natural data units. In the example, we’re using kilograms. Consequently, the effect size is 10 kg.

Regression Coefficients

Regression coefficients are an effect size that indicates the relationship between variables. These coefficients use the units of your model’s dependent variable.

For example, suppose you fit a regression model with years of experience as an independent variable and income in U.S. dollars as the dependent variable. The model estimates a coefficient for years of experience of, say, 867. This value indicates that for every one-year increase in experience, income increases by an average of $867.

That value is the effect size for the relationship between years of experience and income. It is an unstandardized effect size because it uses the natural units of the dependent variable, U.S. dollars.

Standardized Effect Sizes

Standardized effect sizes do not use the original data units. Instead, they are unitless, allowing you to compare results between studies and variables that use different units.

Additionally, standardized effect sizes are useful for experiments where the original units are not inherently meaningful or potentially confusing to your readers. For example, think back to the years of experience and income example. That study reported its results in U.S. dollars, or insert your local currency for that example. As a measurement unit, your currency is inherently meaningful to you. You understand what the magnitude of the value represents.

Conversely, many psychology studies use inventories to assess personality characteristics. Those inventory units are not inherently meaningful. For example, it might not be self-evident whether a 10-point difference on a specific inventory represents a small or large effect. Even if you know the answer because it’s your specialty, your readers might not!

However, by standardizing the effect size and removing the data units, the effect’s magnitude becomes apparent. You can compare it to other findings and you don’t need to be familiar with the original units to understand the results.

Consider using standardized effect sizes for comparisons between studies and different variables. Or when the original units are not intuitively meaningful. Meta-analyses often use standardized effect sizes from many studies to summarize a set of findings.

Let’s examine several common standardized effect sizes, including correlation coefficients, Cohen’s d, eta squared, and omega squared.

Correlation coefficients

Scatterplot displaying data with a correlation of +0.8 — These data have a correlation of +0.8

You might not think of correlation coefficients as standardized effect sizes, but they are a standardized alternative to regression coefficients. Correlation does not use the original data units and all values fall between -1 and +1. You can use them to compare the strengths of the relationships between different pairs of variables because they use a standardized scale.

In the regression coefficient example, recall that the coefficient of 867 represents the mean change of the dependent variable in U.S. dollars. You could instead report the correlation between experience and income.

To understand the potential strength of correlation coefficients, consider different studies that find correlations between height and weight, processing temperature and product strength, and hours of sunlight and depression scores. These studies assess relationships between entirely different types of variables that use different measurement units.

Now imagine these pairs of variables all have the same correlation coefficient. Even though the pairs are highly dissimilar, you know that the strengths of the relationships are equal. Or, if one had a higher correlation, you’d quickly see that it has a stronger relationship. The diverse nature of the variables is not a problem at all because correlation coefficients are standardized!

Instead of correlation coefficients, you can also use standardized regression coefficients for the same reasons.

Cohen’s d

Cohen’s d is a standardized effect size for differences between group means. For the unstandardized effect size, you just subtract the group means. To standardize it, divide that difference by the standard deviation. It’s an appropriate effect size to report with t-test and ANOVA results.

The numerator is simply the unstandardized effect size, which you divide by the standard deviation. The standard deviation is either the pooled standard deviation for both groups or the control group. Because both parts of the fraction use the same units, the division process cancels them out and produces a unitless result.

Cohen’s d represents the effect size by indicating how large the unstandardized effect is relative to the data’s variability. Think of it as a signal-to-noise ratio. A large Cohen’s d means the effect (signal) is large relative to the variability (noise). A d of 1 indicates that the effect is the same magnitude as the variability. A 2 signifies that the effect is twice the size of the variability. Etc.

For example, if the unstandardized effect size is 10 and the standard deviation is 2, Cohen’s d is an impressive 5. However, if you have the same effect size of 10 and the standard deviation is also 10, Cohen’s d is a much less impressive 1. The effect is on par with the variability in the data.

As you gain experience in your field of study, you’ll learn which effect sizes are considered small, medium, and large. Cohen suggested that values of 0.2, 0.5, and 0.8 represent small, medium, and large effects. However, these values don’t apply to all subject areas. Instead, build up a familiarity with Cohen’s d values in your subject area.

Learn more about Cohen’s d.

Eta Squared and Omega Squared

Eta Squared and the related Omega Squared are standardized effect sizes that indicate the percentage of the variance that each categorical variable in an ANOVA model explains. Values can range from 0 to 100%. These effect sizes are similar to R-squared, which represents the percentage of the variance that all variables in the model collectively explain.

Each categorical variable has a value that indicates the percentage of the variance that it explains. Like R-squared, eta squared and omega squared are intuitive measures that you can use to compare variable effect sizes between models.

The difference between eta squared and omega squared is that omega squared adjusts for bias present in eta squared, particularly for small samples. Typically, statisticians prefer omega squared because it is an unbiased estimator.

Related post: How to Interpret R-squared

Effect Sizes and Statistical Significance

Historically, statistical results were all about statistical significance. Statistical significance was the goal. However, that emphasis has changed over time. Analysts have increasingly reported effect sizes to show that their findings are important in the real world.

What is the difference between these two concepts?

After performing a hypothesis test, statistically significant results indicate that your sample provides sufficient evidence to conclude that an effect exists in the population. Specifically, statistical significance suggests that the population effect is unlikely to equal zero.

That’s a good start. It helps rule out random sampling error as the culprit for an apparent effect in your sample.

While the word “significant” makes the results sound important, it doesn’t necessarily mean the effect size is meaningful in the real world. Again, it suggests only a non-zero effect, which includes trivial findings.

If you have a large sample size and/or a low amount of variability in your data, hypothesis tests can produce significant p-values for trivial effects.

Conversely, effect sizes indicate the magnitudes of those effects. By assessing the effect size, you can determine whether the effect is meaningful in the real world or trivial with no practical importance.

In a nutshell, here’s the difference:

Statistical significance: After accounting for random sampling error, your sample suggests that a non-zero effect exists in the population.
Effect sizes: The magnitude of the effect. It answers questions about how much or how well the treatment works. Are the relationships strong or weak?

Consider both Effect Size and Statistical Significance!

It’s essential to use both statistics together. After all, you can have a sizeable effect apparent in your sample that is not significant. In that case, random sampling error might be creating the appearance of an effect in the sample, but it does not exist in the population.

When your results are statistically significant, assess the effect size to determine whether it is practically important.

To get bonus points from me, interpret the effect size with confidence intervals to evaluate the estimate’s precision.

For additional information on this topic, including more about the role of confidence intervals in this process, read my post about Practical versus Statistical Significance.

Reference

Baguley T., Standardized or simple effect size: what should be reported? Br J Psychol. 2009 Aug;100(Pt 3):603-17. doi: 10.1348/000712608X377117. Epub 2008 Nov 17. PMID: 19017432.

Comments

Zoe says

June 19, 2024 at 11:33 am

Hi Jim, I’ve been using your blog and books to learn statistics for some time and have found all your information extremely helpful. I haven’t been able to find an answer to a question I have on any of your materials, and I was hoping you might be able to clarify.

I understand that generally people compare effect sizes within a single model, but I’m interested in comparing effect sizes across models. The models in question are drawn from the same sample, and although the variables in the regression model differ, they are conceptually similar, which is why I’d be interested in finding a method to compare them that yields statistically significant results.

Thanks for all your help!

Loading...

Reply
Dr Dilip Raj says

April 28, 2024 at 2:57 am

very informative.

Loading...

Reply
Charlotte says

September 11, 2023 at 2:07 pm

Hi Jim,

That’s really helpful, thanks.

If you’ve done a bivariate test, for example, and found that the result is statistically significant, and probability sampling methods have been used, am I correct in saying I could make an inference about effect size then?

Speaking of different sample methods, I am not sure why some researchers talk about statistical significance and effect size when they have used non-probability sampling methods. Even if they get a statistically significant result, if they’ve used a non-probability sampling method, then surely it isn’t valid, as the sample is unlikely to be representative of the population. Therefore, I don’t believe you can infer effect size either.

Best wishes,

Charlotte

Loading...

Reply
- Jim Frost says
  
  September 11, 2023 at 5:46 pm
  
  Hi Charlotte,
  
  Thanks for the great questions!
  
  Yes, you’re correct on both points, although I’ll offer some caveats on cases where researchers don’t use probability sampling.
  
  Regarding the first question about effect size, you can draw an unbiased inference in that case. Just keep in mind that statistical significance doesn’t necessarily indicate the result is practically significant. It just means that the effect observed in the sample likely exists in the population. Assess the effect’s CI and determine whether it’s practically significant using subject-area knowledge.
  
  As for studies that don’t use probability sampling, that’s a definite limitation that the researchers should be transparent about. It reduces the ability to generalize the sample results to the population. However, that doesn’t mean the results of those studies are worthless. Even though they didn’t use probability sampling, the sample might approximate the population closely enough to trust the results. Sometimes you can assess the sample characteristics and determine whether they’re different from the population. That’s not as good as using probability sampling, but the results might still be trustworthy.
  
  In other cases, it might be for exploratory research. The idea is to find out if there’s any potential there for better quality follow-up research. In some cases, researchers work with hard-to-contact populations where probability sampling is impossible, so non-probability sampling is the only way to study that population. Or, there might be time or funding constraints. Probability sampling tends to be more expensive and time consuming.
  
  In short, I wouldn’t totally disregard the results of studies that use non-probability sampling. It’s a limitation but sometimes it’s an unavoidable one. But by evaluating their methodology and the sample, you might be able to trust the results to an extent if the sample approximates the population in the characteristics relevant to the study. Or, you might be able to determine whether the results are biased high or low and interpret the results results accordingly.
  
  This is a case where you don’t want perfect to be the enemy of good.
  
  Loading...
  
  Reply
Charlotte says

September 5, 2023 at 4:47 am

Hi Jim – I enjoyed reading your post, so thank you! I have a question, I was hoping you’d be able to answer please? Can effect size statistics be both descriptive and inferential? So can they tell us the size of the relationship between the variables in the sample (so descriptive) and also in the population (so inferential)? Any help would be appreciated. Take care.

Loading...

Reply
- Jim Frost says
  
  September 5, 2023 at 4:49 pm
  
  Hi Charlotte,
  
  That’s a great question. The answer is that sometimes it’s possible for an effect size to apply both to a sample and a population, but not always.
  
  For starters, an effect size always applies to the sample you’re assessing. No matter how you collect the sample or anything else, you can always state that the effect size you found in the sample applies to that sample. That’s the easy part.
  
  The trickier part begins when you want to generalize beyond the sample to an entire population. This is the point where you have to start worrying about how the sample was collected. Is it a representative sample? That’s a key point for generalizing the results but there are other issues that all related to the external validity of an experiment. Click the link to learn about the issues that negatively affect external validity/generalizability.
  
  In short, there are many reasons why you might not be able to generalize the sample effect size to the population. So, definitely the effect size you find in a sample applies to that sample (descriptive). But it may or may not apply to a population (inferential). Typically, you need to design and conduct an experiment very carefully to be able to generalize the results to a larger population.
  
  I hope that answers your question!
  
  Loading...
  
  Reply
PETER Y JI says

August 29, 2023 at 5:24 pm

I have a general question about sample size and its relation to effect sizes. I’m certain that I’m demonstrating my novice understanding of the issue so apologies in advance.

My colleague insisted that as sample size increases, it solely and directly decreases the effect size, almost as if this was an axiom. I’m pretty sure he is not entirely correct, but I wanted to see if my explanation is on target. Moreover, my perception is that my colleagues have a general understanding that large sample sizes solve all research problems, so it is not surprising my colleague is fixated on sample size.

The study context was the impact of contracting Covid on the intelligence levels of elderly populations.

My response is that sample size must be considered when evaluating a study’s effect size, but sample size is not related to effect size in a direct linear fashion. My thought process is below.

Small sample sizes produce larger effect sizes because of sampling error and the central limit theorem. Small sample sizes have high variability and likely result in a large effect sizes due to instability in their means and standard errors.
Small sample sizes do produce highly variable effect sizes because with small sample sizes, effect sizes are unreliable and hence more variable.
But the opposite is not necessarily true. Increasing the sample size does not have a direct linear effect of decreasing the effect size.
Increasing sample size does increase certainty of estimates. But increasing sample size increases the ability to detect an effect if it exists; it does not directly reduce the size of the effect.
I think the problem is whether the effect size is considered a priori vs. post-hoc. A priori, theoretically I conceptualize effect sizes as what we expect the difference to be, or the strength of the relationship, based on what we know about the issue at hand. In this case, the effect size of the impact of Covid on intelligence is likely to be small, after accounting for other factors impacting intelligence, especially in the elderly population. From a design measurement perspective, in this case, it is difficult to determine if the way that the researchers measured intelligence would actually detect the effect of Covid on intelligence.
Theoretically, the effect size is already there to detect. So a large sample size won’t necessarily “reduce” the effect size, simply because the effect is already there to begin with. But increasing the sample size likely introduces confounds, covariates, and more error, thereby lowering the effect size because uncertainty is introduced. But if the effect is there, then the effect is there in whatever size. Increasing only the sample size does not directly shrink the effect. So, post-hoc, it might seem there is a small effect size after the sample size is increased, but other factors should be considered rather than just the sample size.
If I followed my colleague’s logic, then if all studies increased their sample sizes, all effects would disappear. Effect sizes should be evaluated not just by the sample size, but by the conceptualization of the study and its intentions. Better research designs and replication is necessary for examining effect sizes.
So my colleague is “sort of” correct. While increasing sample size might result in lower effect sizes, sample size is not the only consideration here. I’m sure I’m missing something in my argument, but I’m not sure what it is.

As an epilogue, my colleagues seem fixated on “rules” and “what looks right” and “cookie cutter” approaches to research. It’s almost as if, large sample sizes solve all research concerns. I keep arguing that conceptualization and the context of the study must be considered when evaluating the veracity of the results. Honestly, it feels like an isolated and lost battle.

Comments appreciated. Many thanks.

Peter Ji

Loading...

Reply
- Jim Frost says
  
  August 29, 2023 at 5:45 pm
  
  Hi Peter,
  
  I think you might be slightly misunderstanding your colleague. Although, I can’t be sure because I don’t know exactly what they said.
  
  Larger sample sizes do decrease the detectable effect size. That is true and I think that is what your colleague was trying to say. I’d add a few caveats to that.
  
  When you increase the sample size, the power for detecting a given effect size tends to increase. Alternatively, you can say that as you increase sample size and if you hold statistical power constant at a particular level (e.g., 80%), you’ll be able to detect a smaller effect size at that level. Both are two sides of the same coin. As sample size increases, you have a better chance of detecting a given effect size or you can detect a smaller effect size with a given power.
  
  You are correct that increasing the sample size does not cause the effect size to decrease. There should be no relation between sample size and average effect size. What changes as you increase the sample size is that the precision of the estimate increases (smaller MOE, which is good). That increased precision is what allows you to detect a smaller effect with a larger sample.
  
  If your colleague was truly suggesting that increasing the sample size would tend to reduce the effect size, that is incorrect. But if they mean “detectable” effect size, then that’s essentially correct with the view caveats and conditions I mentioned.
  
  Larger samples are hugely beneficial to statistical analysis, although it doesn’t solve all problems. But it does increase your ability to detect smaller effects and increases your statistical power. So, if an effect exists in the population, a larger sample size is more likely to find it assuming you’re holding everything else constant methodology wise.
  
  Loading...
  
  Reply
Stan Alekman says

September 1, 2021 at 2:33 pm

Jim, I always calculate and report the confidence interval of the effect size in addition to statistical significance because it is a random variable. I have done studies where the effect size is impressive but the CI is so wide that the point value is much less impressive. Replication is needed.

I have seen reports of effect size expressed as standard deviation units. What meaning or interpretation cam I give to these values?

Thank you.

Stan Alekman

Loading...

Reply
Maria Fionda says

July 19, 2021 at 1:50 pm

Hello Jim,

First, thank you for the helpful explanation! I’m wondering if you can help me understand how statistical significance (p-value), effect size *and* odds ratios work to tell the story, then. For example, the chi-square results of two categorical variables, Student Group (A vs B) and Re-enrollment (yes vs no):

Yes No
Group A 20 7
Group B 1416 1276

Chi-square (1, N=2719) = 4.95, p = 0.0261
Phi-coefficient (effect size): 0.0426
Odds ratio: 2.59 = Group A re-enrollment rate is 159% higher

It seems like the effect size is so negligible (i.e. it is saying that that significant difference is driven only by the large sample size but not because there is a real difference in the rates of re-enrollment between the two groups) but then the odds ratio makes it look like there is indeed a large difference. What story are these numbers telling?

I’m not sure if you offer paid consultation services. I tried clicking around the website to see if I could request this but all I could find was your suggestion that we ask our question on the article most closely related. I figured either this one or your chi-square article (https://statisticsbyjim.com/hypothesis-testing/chi-square-test-independence-example/) would be the most appropriate ones.

My sincere apologies if this question is beyond the intended scope of your services.

Loading...

Reply
Tess says

June 6, 2021 at 12:14 pm

Hi Jim, firstly, I want to say thank for your incredibly helpful blogs and your dedication for making statistics more accessible. I appreciate that my question is probably rather vague, but I wondered if you had some general advice about appraising effect sizes in published research literature (my subject area is psychology). I have noticed reviews and meta-analyses discussing the likelihood of studies being underpowered in relation to effect sizes. I have a general understanding of this area, but I am always left feeling a bit stuck when I see comments in papers such as :

“There were no differences between the clusters in terms of depression, anxiety or somatoform dissociation, although the effect sizes (0.72, 0.70 and 0.41 respectively) suggest that this was a product of low power.”

I find myself thinking:

– How did the authors glance at these results and come up with such conclusions?
– Are they saying that if the study was better powered, they would likely find a significant difference with the mentioned effect sizes?

I would appreciate you input on this and any general guidance you might have for interpreting effect sizes in published literature.

Loading...

Reply
- Jim Frost says
  
  June 6, 2021 at 8:20 pm
  
  Hi Tess,
  
  You’re very welcome and I’m so glad that my website has been helpful!
  
  It’s great that you’re trusting your instincts with your doubts about that type of assertion. No, those authors should not be stating something like that. The truth is that when you obtain insignificant results, you don’t know whether you’d get significant results if you had a larger sample size or not. While I’m not exactly sure what their thought process was, I have an idea.
  
  If those effect sizes are Cohen’s d, then they represent medium to large effect sizes. And, they’re probably thinking, well, if they’re medium to large but they’re not significant, they would be significant if we had gathered a larger sample. Basically, these authors are assuming that if they had a larger sample size that they would continue to obtain the same size effects, which would become significant. That’s not how it works! If you could just do that, you’d never need a larger sample size. Just get a small one an extrapolate out for a large one.
  
  The reality is that when you have a small sample size (underpowered), the estimated effect sizes are more unstable. That means they can swing around wildly and produce unusual values. It’s easier to get unusual results with a small sample than a large sample. However, a large sample won’t necessarily follow the same pattern because a large sample smooths out those erratic swings in the estimated effect sizes.
  
  In fact, I think one of the largest benefits of p-values is when you actually get a large, insignificant p-value. It’s a protection against the very thing the authors have mistakenly concluded. The insignificant p-value warns you that while you have an apparent effect, given the variability in the sample and the size of the sample, there’s a large chance that the apparent effect in the sample represents random sample error and NOT a true effect in the population. For more information, read a post that I’ve written about Can High P-values Be Meaningful.
  
  Additionally, it’s a known problem that underpowered studies tend to produce inflated effect sizes. For more information, read my post about Low Power Tests Exaggerate Effect Sizes.
  
  Loading...
  
  Reply
Evgeniya says

June 4, 2021 at 1:56 am

Maybe I should use a different effect sizes? In my regression model the dependent variable and explanatory variables of interest are quantitative but regression coefficients are not meaningful.

Loading...

Reply
Evgeniya says

May 31, 2021 at 1:50 pm

Hello Jim, thanks for a very helpful and comprehensive article. But I have one question: can I use omega squared after regression to measure effect sizes if my independent variables of interest are quantitative?

Loading...

Reply
Kenneth Tuttle Wilhelm says

May 3, 2021 at 11:42 pm

Evening Jim,

Thank you for the reply. Currently, the common replication of studies, in our curriculum (IB), is limited to t-tests, and Mann-Whitney, maybe the occasional ANOVA.

I rarely see anyone looking at skew or kurtosis. So this means no one is looking at the sample distribution and whether it’s normal or not. Which then forces most into the non-parametric end. And other things like an automatic rejection of parametric analysis of data from Likert scales, as an example.

Psychology of course, being such a multi-factor research area, I believe that it makes sense that when considering the characteristics of a sample population, there will be instances where means may be similar (CLT), and yet there maybe wide variability present in an experimental group. Hence, my question about whether post-hoc or preplanned analysis is more appropriate.

The issue is, that students replicating prior research, may only be looking means, as that’s all the original research focused on, or at least reported. So it’s only after collecting the data, and observing the descriptive stats, that a variance is noticed, and one becomes interested in checking the variance.

In my particular environment, the school population is very multi-cultural (over 60% of students are from foreign countries), and even the surrounding general population is multi-ethnic. So if results show one IV-DV relationship (possibly a control) with an IV-DV relationship (experimental) showing a visual difference between variances, to me, it makes sense to at least consider the difference as having some informative value.

At a minimum, analysis of the variances, allows for students to demonstrate further critical thinking, with a bit of extra stats results added to the discussion and recommendations.

Cheers

Loading...

Reply
Kenneth Wilhelm says

May 1, 2021 at 9:37 pm

Hello Jim,

You used the word ‘…historically…’, which is my segue into a couple of questions.

When designing an experiment, that is relying on, let’s say a two sample t-test, should researchers indicate in their procedures that they’ll also analyse the differences in variance? Or would this be something that it’s acceptable to do pos-hoc when observing that while the means are not difference, the graph and numbers give hint that the variances may be statistically different?

I, not too infrequently notice means not being significant, maybe leaning towards it, but short of the alpha, while the Standard Deviations of two groups do appear (visually) to be different. I’m a teacher of Psychology in an international school, and the curriculum programme requires students to replicate studies. So I’m looking at design, analysis, and results on a regular basis.

My own grad stats class, 28 years ago, so I’m more than rusty. I’ve completed reading through your Hypothesis Testing book, lots of stuff forgotten, and quite a bit that’s new. Now moving on to the Regression Analysis. (And joining the ASA as a teacher to boot)

Loading...

Reply
- Jim Frost says
  
  May 3, 2021 at 9:53 pm
  
  Hi Kenneth,
  
  I’m so glad that you’re finding my Hypothesis Testing book to be helpful! That sounds like a great requirement for students! I wrote about the lack of replication and the relationship to p-values in the field of Psychology.
  
  Ideally, the researchers would note in their plans that differences between the means and/or standard deviations would be important and, consequently, plan from the beginning to test both.
  
  I’m assuming that differences in the standard deviation would be a meaningful finding? If so, that’s a good reason. However, if you’re just assessing the difference in variability only because the difference between the means was not significant but you’re hoping to find something, that’s a different matter.
  
  I don’t see anything wrong with doing the extra analysis based on observing the results. I’d include that in the discussion for transparency. And really consider whether that is an important finding or whether the only reason you’re looking at the variance is because the mean difference was not significant. You might also consider why, if a difference in variability is important, you didn’t think of it earlier. If it’s a worthwhile aspect to study, work it into the plan from the beginning!
  
  There is a danger of adding analyses on at the end–to keep performing analyses until something pops out that is significant. That’s a form of cherry picking. I assume that’s not the case you’re describing. That you’re just adding the variability?
  
  Best of luck on your exciting journey!
  
  Loading...
  
  Reply
Stan Alekman says

March 29, 2021 at 3:39 pm

Hi Jim. Regarding unstandardized effect sizes, one can compute the confidence interval of the mean difference between groups for an impression of the uncertainty in the observed effect size.This can no doubt be done for standardized effect sizes as well although I have never done it.

Loading...

Reply
- Jim Frost says
  
  March 29, 2021 at 3:42 pm
  
  Hi Stan,
  
  You are absolutely correct as usual, both for unstandardized and standardized effect sizes! I include that as a bonus point right near the end.
  
  Loading...
  
  Reply
Lucas says

March 27, 2021 at 10:06 am

Hello Jim, how are you? Very nice article like other of your own. Jim, I was wondering if it is possilbe get te printed versions of your books. Thank you so much.

Regards from Uruguay.

Lucas

Loading...

Reply
- Jim Frost says
  
  March 29, 2021 at 3:19 pm
  
  Hi Lucas!
  
  I have a soft spot for Uruguay because my wife is from there!
  
  Yes, you can definitely get print versions of all my books. You can order them from Amazon. In My Webstore, I provide Amazon links for multiple countries. You can also order them from other online retailer and some physical bookstores can order them for you.
  
  Loading...
  
  Reply
antonius suhartomo says

March 25, 2021 at 12:29 pm

Thanks Jim for this explanation, as electrical engineering background I got something new

Loading...

Reply
- Jim Frost says
  
  March 25, 2021 at 3:19 pm
  
  You’re very welcom, Antonius! I’m always glad to hear when someone gets something new out of it! 🙂
  
  Loading...
  
  Reply
Marty Shudak says

March 24, 2021 at 4:00 pm

Jim, thank you for such an intuitive description of effect sizes. I may be misreading this part but under the heading: Mean Differences between Groups, the last sentence; shouldn’t the effect size be 10?

Loading...

Reply
- Jim Frost says
  
  March 24, 2021 at 8:28 pm
  
  Hi Marty, thanks and yes, you’re absolutely correct about that! I fixed it. Higher math like that is challenging! 😉
  
  Loading...
  
  Reply