Welch’s ANOVA is an alternative to the traditional analysis of variance (ANOVA) and it offers some serious benefits. One-way analysis of variance determines whether differences between the means of at least three groups are statistically significant. For decades, introductory statistics classes have taught the classic Fishers one-way ANOVA that uses the F-test. It’s a standard statistical analysis, and you might think it’s pretty much set in stone by now. Surprise, there’s a significant change occurring in the world of one-way analysis of variance!
There is a new kid on the ANOVA block. Well, not a new kid, but an old kid who’s gaining in popularity.
Let me acquaint you with Welch’s ANOVA. You use it for the same reasons as the classic statistical test, to assess the means of three or more groups. However, Welch’s analysis of variance provides critical benefits and protections because you can use it even when your groups have unequal variances. In fact, you read it here first; Welch’s ANOVA might knock out the classic version.
In this post, I’ll explain the dangers of using the Classic analysis of variance with unequal variances, the benefits of using Welch’s ANOVA, and I interpret a Welch’s ANOVA example with the Games-Howell post hoc test.
One-Way ANOVA Assumptions
Welch’s ANOVA enters the discussion because it can help you get out of a tricky situation with an assumption. Like all statistical tests, one-way ANOVA has some assumptions. If you fail to satisfy the assumptions, you might not be able to trust the results. Simulation studies have been crucial in revealing which assumptions are strict requirements and which are more lenient.
The Classic one-way test assumes that all groups share a common standard deviation (or variance) even when their means are different. Unfortunately, simulation studies find that this assumption is a strict requirement. If your groups have unequal variances, your results can be incorrect if you use the classic test. On the other hand, Welch’s ANOVA isn’t sensitive to unequal variances.
Before I delve into the importance of this assumption, I’ll briefly describe how the simulation study tested it.
Comparing Welch’s ANOVA to Fisher’s
For all hypothesis tests, you specify the significance level. Ideally, the significance level equals the probability of rejecting a null hypothesis that is true (Type I error). This error is basically a false positive because the test results (a small p-value) lead you to believe incorrectly that some of the group means are different. When tests produce valid results, the Type I error rate equals the significance level. For example, if your significance level is 0.05, then 5% of tests should have this error when the null is true.
The investigators who perform a simulation study know when the null hypothesis is true or false. They can use this knowledge to determine whether the proportion of tests with a Type I error matches the significance level, which is the target. The researchers can generate data that violate an assumption to determine whether it affects the results. The larger the difference between the significance level and the Type I error rate, the more critical it becomes to satisfy the assumption.
Related post: Types of Error in Hypothesis Testing
Simulation Results for Unequal Variances
The simulation study assessed 50 different conditions related to unequal variances. For each state, the computer drew 10,000 random samples and statistically analyzed them using both Welch’s ANOVA and the traditional one-way test.
For the Classic ANOVA, the simulation study found that unequal standard deviations cause the Type I error rate to shift away from the significance level target. If the group sizes are equal and the significance level is 0.05, the actual error rate falls between 0.02 and 0.08. However, if the groups have different sizes, the error rates can be as large as 0.22!
Welch’s ANOVA to the Rescue
If you determine that your groups have standard deviations that are unequal, what should you do? Use Welch’s ANOVA! The same simulation study found that Welch’s analysis of variance is unaffected by unequal variances. In fact, Welch’s ANOVA explicitly does not assume that the variances are equal.
Let’s compare the simulation study results for the two types of analysis of variance when standard deviations are unequal, and the significance level is 0.05.
- Classic ANOVA error rates extend from 0.02 to 0.22.
- Welch’s ANOVA error rates have a much smaller range of 0.046 to 0.054.
In fact, it’s fine to use Welch’s ANOVA even when your groups do have equal variances because its statistical power is nearly equivalent to that of the Classic test. Welch’s analysis of variance is an excellent analysis that you can use all the time for one-way analysis of variance. It completely wipes away the need to worry about the assumption of homogeneous variances.
Welch’s ANOVA Example
In this example, our data are the ground reaction forces that are generated by jumping from steps of different heights. You can download the CSV data file for the WelchsANOVAExample.
First, I’ll graph the data to give us a good sense of the situation. The chart below is an interval plot that displays the group means and 95% confidence intervals.
The ranges are based on the individual standard deviations for each group, and they look different. So, Welch’s analysis of variance is a good choice for these data.
Next, I’ll perform the hypothesis test. Depending on your statistical software, the Welch’s procedure might be a separate command, or you may need to tell the software to not assume equal variances. The Welch’s ANOVA output is below.
The output for Welch’s ANOVA is relatively similar to the Classic test. Although, you’ll notice that it does not contain the usual analysis of variance table. Like interpreting any hypothesis test, compare the p-value to your significance level to determine whether the differences between the means are statistically significant. For our example results, the very low P-value indicates that these results are statistically significant. Our sample evidence provides sufficient evidence to conclude that the means of all groups are not equal in the population.
Using Post Hoc Tests with Welch’s ANOVA
While the overall results above indicate that not all group means are equal, we don’t know which differences between group means are statistically significant. To identify significant differences between specific groups, you need to perform a pairwise comparisons post hoc test. When you use Welch’s ANOVA, you can use the Games-Howell multiple comparisons method.
For more information about this process, read my post about Using Post Hoc Tests with ANOVA.
The Games-Howell post hoc test is most like Tukey’s method for Classic ANOVA. Both procedures do the following:
- Control the joint error rate for the entire series of comparisons.
- Compare all possible pairs of groups within a collection of groups.
The Games-Howell post hoc test, like Welch’s analysis of variance, does not require the groups to have equal standard deviations. Conversely, Tukey’s method does require equal standard deviations.
The Games-Howell post hoc test results are below:
None of the confidence intervals for the differences between group means contain zero. Consequently, these confidence intervals indicate that the differences between all pairs of groups are statistically significant.
I hope you’ll consider using Welch’s ANOVA anytime you need to perform a one-way test of the means!