Effect sizes in statistics quantify the differences between group means and the relationships between variables. While analysts often focus on statistical significance using p-values, effect sizes determine the practical importance of the findings.
In experiments and other studies, analysts typically assess relationships between variables. Effect sizes represent the magnitude of a relationship between variables. For example, you might want to know whether average health outcomes differ between the control group and a treatment group receiving a new medicine. Or, you might want to determine whether processing temperatures relate to a product’s strength.
Effect sizes tell you whether these relationships are strong or weak. Do these variables have a large or negligible impact on the outcome? The experimental medicine might improve health outcomes, but is it a trivial or substantial improvement? This type of information is crucial in determining whether the effect is meaningful in real-world applications.
Effect sizes come in two general flavors, unstandardized and standardized. Depending on your field, you might be more familiar with one or the other.
In this post, you’ll learn about both unstandardized and standardized effect sizes. Specifically, we’ll look at the following effect sizes:
- Unstandardized: Mean differences between groups and regression coefficients
- Standardized: Correlation coefficients, Cohen’s d, eta squared, and omega squared.
Finally, I close the post by explaining the difference between statistical significance and effect sizes, and why you need to consider both.
Unstandardized Effect Sizes
Unstandardized effect sizes use the natural units of the data. Using the raw data units can be convenient when you intuitively understand those units. This is often the case with tangible concepts, such as weight, money, temperature, etc.
Let’s look at two common types of unstandardized effect sizes, the mean difference between groups and regression coefficients.
Mean Differences between Groups
This one is simple. Just subtract group means to calculate the unstandardized effect size
Difference Between Group Means = Group 1 Mean – Group 2 Mean
Group 1 and 2 can be the means of the Treatment and Control groups, the Posttest and pretest means, two different types of treatments, and so on.
For example, imagine we’re developing a weight loss pill. The control group loses an average of 5kg while the treatment group loses an average of 15 during the study. The effect size is 15 – 5 = 10 kg. That’s the mean difference between the two groups.
Because you are only subtracting means, the units remain the natural data units. In the example, we’re using kilograms. Consequently, the effect size is 10 kg.
Regression coefficients are an effect size that indicates the relationship between variables. These coefficients use the units of your model’s dependent variable.
For example, suppose you fit a regression model with years of experience as an independent variable and income in U.S. dollars as the dependent variable. The model estimates a coefficient for years of experience of, say, 867. This value indicates that for every one-year increase in experience, income increases by an average of $867.
That value is the effect size for the relationship between years of experience and income. It is an unstandardized effect size because it uses the natural units of the dependent variable, U.S. dollars.
Standardized Effect Sizes
Standardized effect sizes do not use the original data units. Instead, they are unitless, allowing you to compare results between studies and variables that use different units.
Additionally, standardized effect sizes are useful for experiments where the original units are not inherently meaningful or potentially confusing to your readers. For example, think back to the years of experience and income example. That study reported its results in U.S. dollars, or insert your local currency for that example. As a measurement unit, your currency is inherently meaningful to you. You understand what the magnitude of the value represents.
Conversely, many psychology studies use inventories to assess personality characteristics. Those inventory units are not inherently meaningful. For example, it might not be self-evident whether a 10-point difference on a specific inventory represents a small or large effect. Even if you know the answer because it’s your specialty, your readers might not!
However, by standardizing the effect size and removing the data units, the effect’s magnitude becomes apparent. You can compare it to other findings and you don’t need to be familiar with the original units to understand the results.
Consider using standardized effect sizes for comparisons between studies and different variables. Or when the original units are not intuitively meaningful. Meta-analyses often use standardized effect sizes from many studies to summarize a set of findings.
Let’s examine several common standardized effect sizes, including correlation coefficients, Cohen’s d, eta squared, and omega squared.
You might not think of correlation coefficients as standardized effect sizes, but they are a standardized alternative to regression coefficients. Correlation does not use the original data units and all values fall between -1 and +1. You can use them to compare the strengths of the relationships between different pairs of variables because they use a standardized scale.
In the regression coefficient example, recall that the coefficient of 867 represents the mean change of the dependent variable in U.S. dollars. You could instead report the correlation between experience and income.
To understand the potential strength of correlation coefficients, consider different studies that find correlations between height and weight, processing temperature and product strength, and hours of sunlight and depression scores. These studies assess relationships between entirely different types of variables that use different measurement units.
Now imagine these pairs of variables all have the same correlation coefficient. Even though the pairs are highly dissimilar, you know that the strengths of the relationships are equal. Or, if one had a higher correlation, you’d quickly see that it has a stronger relationship. The diverse nature of the variables is not a problem at all because correlation coefficients are standardized!
Instead of correlation coefficients, you can also use standardized regression coefficients for the same reasons.
Cohen’s d is a standardized effect size for differences between group means. For the unstandardized effect size, you just subtract the group means. To standardize it, divide that difference by the standard deviation. It’s an appropriate effect size to report with t-test and ANOVA results.
The numerator is simply the unstandardized effect size, which you divide by the standard deviation. The standard deviation is either the pooled standard deviation for both groups or the control group. Because both parts of the fraction use the same units, the division process cancels them out and produces a unitless result.
Cohen’s d represents the effect size by indicating how large the unstandardized effect is relative to the data’s variability. Think of it as a signal-to-noise ratio. A large Cohen’s d means the effect (signal) is large relative to the variability (noise). A d of 1 indicates that the effect is the same magnitude as the variability. A 2 signifies that the effect is twice the size of the variability. Etc.
For example, if the unstandardized effect size is 10 and the standard deviation is 2, Cohen’s d is an impressive 5. However, if you have the same effect size of 10 and the standard deviation is also 10, Cohen’s d is a much less impressive 1. The effect is on par with the variability in the data.
As you gain experience in your field of study, you’ll learn which effect sizes are considered small, medium, and large. Cohen suggested that values of 0.2, 0.5, and 0.8 represent small, medium, and large effects. However, these values don’t apply to all subject areas. Instead, build up a familiarity with Cohen’s d values in your subject area.
Learn more about Cohen’s d.
Eta Squared and Omega Squared
Eta Squared and the related Omega Squared are standardized effect sizes that indicate the percentage of the variance that each categorical variable in an ANOVA model explains. Values can range from 0 to 100%. These effect sizes are similar to R-squared, which represents the percentage of the variance that all variables in the model collectively explain.
Each categorical variable has a value that indicates the percentage of the variance that it explains. Like R-squared, eta squared and omega squared are intuitive measures that you can use to compare variable effect sizes between models.
The difference between eta squared and omega square is that omega squared adjusts for bias present in eta squared, particularly for small samples. Typically, statisticians prefer omega squared because it is an unbiased estimator.
Related post: How to Interpret R-squared
Effect Sizes and Statistical Significance
Historically, statistical results were all about statistical significance. Statistical significance was the goal. However, that emphasis has changed over time. Analysts have increasingly reported effect sizes to show that their findings are important in the real world.
What is the difference between these two concepts?
After performing a hypothesis test, statistically significant results indicate that your sample provides sufficient evidence to conclude that an effect exists in the population. Specifically, statistical significance suggests that the population effect is unlikely to equal zero.
That’s a good start. It helps rule out random sampling error as the culprit for an apparent effect in your sample.
While the word “significant” makes the results sound important, it doesn’t necessarily mean the effect size is meaningful in the real world. Again, it suggests only a non-zero effect, which includes trivial findings.
If you have a large sample size and/or a low amount of variability in your data, hypothesis tests can produce significant p-values for trivial effects.
Conversely, effect sizes indicate the magnitudes of those effects. By assessing the effect size, you can determine whether the effect is meaningful in the real world or trivial with no practical importance.
In a nutshell, here’s the difference:
- Statistical significance: After accounting for random sampling error, your sample suggests that a non-zero effect exists in the population.
- Effect sizes: The magnitude of the effect. It answers questions about how much or how well the treatment works. Are the relationships strong or weak?
Consider both Effect Size and Statistical Significance!
It’s essential to use both statistics together. After all, you can have a sizeable effect apparent in your sample that is not significant. In that case, random sampling error might be creating the appearance of an effect in the sample, but it does not exist in the population.
When your results are statistically significant, assess the effect size to determine whether it is practically important.
To get bonus points from me, interpret the effect size with confidence intervals to evaluate the estimate’s precision.
For additional information on this topic, including more about the role of confidence intervals in this process, read my post about Practical versus Statistical Significance.
Baguley T., Standardized or simple effect size: what should be reported? Br J Psychol. 2009 Aug;100(Pt 3):603-17. doi: 10.1348/000712608X377117. Epub 2008 Nov 17. PMID: 19017432.