What is ANCOVA?
ANCOVA, or the analysis of covariance, is a powerful statistical method that analyzes the differences between three or more group means while controlling for the effects of at least one continuous covariate.
ANCOVA is a potent tool because it adjusts for the effects of covariates in the model. By isolating the effect of the categorical independent variable on the dependent variable, researchers can draw more accurate and reliable conclusions from their data.
In this post, learn about ANCOVA vs ANOVA, how it works, the benefits it provides, and its assumptions. Plus, we’ll work through an ANCOVA example and interpret it!
How are ANCOVA and ANOVA different?
ANCOVA is an extension of ANOVA. While ANOVA can compare the means of three or more groups, it cannot control for covariates. ANCOVA builds on ANOVA by introducing one or more covariates into the model.
In an ANCOVA model, you must specify the dependent variable (continuous outcome), at least one categorical variable that defines the comparison groups, and a covariate.
ANCOVA is simply an ANOVA model that includes at least one covariate.
Covariates are continuous independent variables that influence the dependent variable but are not of primary interest to the study. Additionally, the experimenters do not control the covariates. Instead, they only observe and record their values. In contrast, they do control the categorical factors and set them at specific values for the study.
Researchers refer to covariates as nuisance variables because they:
- Are uncontrolled conditions in the experiment.
- Can influence the outcome.
This unfortunate combination of attributes allows covariates to introduce both imprecision and bias into the results. You can see why they’re a nuisance!
Even though the researchers aren’t interested in these variables, they must find a way to deal with them. That’s where ANCOVA comes in!
Two-Fold Benefits for Analysis of Covariance
Fortunately, you can use an ANCOVA model to control covariates statistically. Simply put, ANCOVA removes the effects of the covariates on the dependent variable, allowing for a more accurate assessment of the relationship between the categorical factors and the outcome.
ANCOVA does the following:
- Increases statistical power and precision by accounting for some of the within-group variability.
- Removes confounder bias by adjusting for preexisting differences between groups.
Let’s think through an ANCOVA example to understand the potential improvements of using this method. Then we’ll perform the analysis.
Suppose we want to determine which of three teaching methods is the best by comparing their mean test scores. We can include a pretest score as a covariate to account for participants having different starting skill levels.
How does ANCOVA use the covariate to improve the results relative to ANOVA for this example?
Power and Precision Increases
Individual differences in academic ability can significantly impact the outcome. In fact, even within a single teaching method group, there can be substantial variation in participants’ skills. This unexplained variation (error) can obscure the true impact of each method.
By including pretest scores as a covariate in the ANCOVA model, it can adjust for the initial skill level of each participant. This adjustment allows for a clearer and more accurate understanding of whether a participant’s success on the final test was due to the teaching method or their preexisting ability.
In the context of the F-test’s calculations, ANCOVA explains a portion of the within-group variability for each teaching method by attributing it to the pretest score. Using the covariate to reduce the error, ANCOVA can better detect differences between teaching methods (power) and provide greater precision when estimating mean test score differences between the groups (effect sizes).
If the groups have preexisting differences in ability, that can bias the results at the end. Imagine one group starting with more high achievers than the other groups. At the end of the study, the average test score for that group will be higher than warranted due to the early lead in skills rather than the teaching method itself.
ANCOVA models adjust for preexisting differences between the groups, creating a level playing field for unbiased comparisons of the teaching methods.
Let’s perform an ANCOVA analysis! I’ll stick with the teaching method example we’ve been working with, but I’ll use only two groups for simplicity, methods A and B. Download the CSV dataset to try it yourself: ANCOVA.
In the model, I’ve entered Posttest Score as the dependent variable (continuous outcome), Teaching Method as the categorical factor, and Pretest Score as the covariate.
In the first set of output, we see that Teaching Method has a very low p-value. The mean difference between teaching methods is statistically significant!
The Pretest Score covariate is significant. It, too, has a relationship with the independent variable. If it wasn’t significant, consider removing it, making it an ANOVA model.
So, how do the teaching methods compare? In the coefficient output below, Method B has a coefficient of 10, indicating its group mean is 10 points higher than Method A. That’s our estimated effect size!
It’s always beneficial to see your data to gain a better understanding. Here’s our data with the regression line for each group.
The vertical shift between the two regression lines is the mean difference between the two groups, which is 10 points for our example. ANCOVA determines whether this line shift is statistically significant. Notice how the lines are parallel? More on that in the assumptions section! Learn more about Regression Lines and Their Equations.
ANCOVA is a linear model. Consequently, it has the same assumptions as ordinary least squares regression—with an addition (kind of).
Here’s a simplified list of the ANCOVA assumptions:
- Linear relationships adequately explain the outcomes.
- Independent variables are not correlated with the error term.
- Observations of the error term are uncorrelated with each other.
- Error term has a constant variance.
- No perfect correlation between independent variables.
- Error term follows a normal distribution.
If your model doesn’t satisfy these assumptions, the results might be untrustworthy. To learn about them in more detail, read my post Ordinary Least Squares Assumptions.
Homogeneity of Slopes Assumption
ANCOVA also has the homogeneity of regression slopes assumption. I don’t classify this issue as an assumption because it’s OK to have unequal slopes. In fact, some data require it. Instead, it’s actually about specifying a model that fits your data adequately and then knowing how to interpret the results correctly. Learn about Specifying the Correct Model.
As you saw in our example analysis, each group in an ANCOVA has a regression line. If those regressions all have the same slope, interpreting the mean differences between groups is simple because they are constant across all values of your covariate.
In the illustration below, points on the regression lines represent the group means for any given covariate value. When the slopes are equal, the differences between group means are constant across all covariate values. That’s nice and easy to interpret because there is only one mean difference for each pair of groups—just like our ANCOVA example!
Check the homogeneity of regression slopes assumption by including an interaction term (grouping factor*covariate) in the ANCOVA model. If the interaction term is:
- Not significant, the slopes are equal and you can remove this term.
- Significant, the slopes are not equal.
If your data naturally produce regression lines with different slopes, that’s not a problem. You just need to model it correctly using an interaction term and know how to interpret it.
When you have unequal slopes, the differences between group means are not constant across covariate values. In other words, the differences depend on the value of the covariate. Consequently, you must pick several covariate values and compare the group means at those points.
The illustration below shows how unequal slopes cause the differences between group means to change with the covariate.