What is a Covariate?
Covariates are continuous independent variables (or predictors) in a regression or ANOVA model. These variables can explain some of the variability in the dependent variable.
That definition of covariates is simple enough. However, the usage of the term has changed over time. Consequently, analysts can have drastically different contexts in mind when discussing covariates.
Historically, statisticians considered covariates to be a subtype of continuous predictors that appears only in ANOVA models, usually relating to designed experiments (DOE). Originally, they were part of experimental designs where the primary variables of interest are categorical factors that the researchers control.
In these designs, most other potential explanatory variables (confounders) are addressed by controlling the experimental environment and using a randomized design. However, analysts might be aware of uncontrollable variables that could influence the outcome in some studies.
These nuisance variables are covariates. They’re a nuisance because they can increase both variability and bias.
Including these nuisance variables as covariates in the model statistically controls their impact on the dependent variable, which can increase statistical power and reduce confounder bias. Learn more about How Confounders Can Bias Your Results.
So, the historical definition of a covariate is that it is:
- In an randomized experimental design where researchers set the categorical factors of primary interest.
- A continuous, independent variable that researchers measure (as opposed to setting).
- Uncontrollable and can’t be randomized (i.e., a nuisance).
- Not a primary variable of interest even though it correlates with the outcome.
I’ve heard long-time researchers stick steadfastly to this definition and even firmly proclaim that the analytical procedure must enter a covariate into the model last to calculate the sums of squares correctly!
In current times, the historical definition of covariate has faded somewhat. Many analysts use this term as a synonym for a continuous predictor—not only for the specific subset of experimental design cases I describe above.
In current usage, a covariate might be a primary variable of interest in a non-DOE context!
In an analytical sense, the modern usage is valid. Covariates in the stricter context performs the same function as continuous predictors in the broader definition.
Just be aware that some analysts will have an extremely specific context in mind when discussing covariates. Others will be thinking in much broader terms!
Let’s look at a covariate example that fits the original definition involving an experimental design.
Consider a manufacturing process where temperature and pressure are experimental factors. The experimenters set the temperature controls at A, B, and C and the pressure controls at X, Y, and Z. While temperature and pressure are continuous variables, the experiment treats them as categorical factors because the researchers set them to several specific values.
To minimize sources of variation and the effect of other variables, the researchers control the experimental environment as much as possible and use randomization to determine the settings for each experimental run. All in all, it’s a highly controlled, randomized experiment.
However, the researchers know from experience that humidity levels also affect the outcome. Unfortunately, humidity is much harder to control because it depends on outdoor conditions and is impossible to regulate throughout the manufacturing environment. Consequently, they record humidity as a covariate during each experimental run so the ANCOVA model can account for its effect.
The manufacturer is primarily interested in how Temperature and Pressure affect their manufacturing outcome. However, by including humidity as a covariate, the model can control for changing humidity conditions during the experiment.