• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Covariates: Definition & Uses

By Jim Frost 6 Comments

What is a Covariate?

Covariates are continuous independent variables (or predictors) in a regression or ANOVA model. These variables can explain some of the variability in the dependent variable.

That definition of covariates is simple enough. However, the usage of the term has changed over time. Consequently, analysts can have drastically different contexts in mind when discussing covariates.

An image of someone understanding covariates.Historically, statisticians considered covariates to be a subtype of continuous predictors that appears only in ANOVA models, usually relating to designed experiments (DOE). Originally, they were part of experimental designs where the primary variables of interest are categorical factors that the researchers control.

In these designs, most other potential explanatory variables (confounders) are addressed by controlling the experimental environment and using a randomized design. However, analysts might be aware of uncontrollable variables that could influence the outcome in some studies.

These nuisance variables are covariates. They’re a nuisance because they can increase both variability and bias.

Including these nuisance variables as covariates in the model statistically controls their impact on the dependent variable, which can increase statistical power and reduce confounder bias. Learn more about How Confounders Can Bias Your Results.

So, the historical definition of a covariate is that it is:

  • In an randomized experimental design where researchers set the categorical factors of primary interest.
  • A continuous, independent variable that researchers measure (as opposed to setting).
  • Uncontrollable and can’t be randomized (i.e., a nuisance).
  • Not a primary variable of interest even though it correlates with the outcome.

When you include a covariate in an ANOVA model, it becomes an ANCOVA model (Analysis of Covariance). Learn more about ANCOVA: Uses, Assumptions & Example.

I’ve heard long-time researchers stick steadfastly to this definition and even firmly proclaim that the analytical procedure must enter a covariate into the model last to calculate the sums of squares correctly!

Related posts: Experimental Designs and Independent vs. Dependent Variables

Modern Usage

In current times, the historical definition of covariate has faded somewhat. Many analysts use this term as a synonym for a continuous predictor—not only for the specific subset of experimental design cases I describe above.

In current usage, a covariate might be a primary variable of interest in a non-DOE context!

In an analytical sense, the modern usage is valid. Covariates in the stricter context performs the same function as continuous predictors in the broader definition.

Just be aware that some analysts will have an extremely specific context in mind when discussing covariates. Others will be thinking in much broader terms!

Covariate Example

Let’s look at a covariate example that fits the original definition involving an experimental design.

Consider a manufacturing process where temperature and pressure are experimental factors. The experimenters set the temperature controls at A, B, and C and the pressure controls at X, Y, and Z. While temperature and pressure are continuous variables, the experiment treats them as categorical factors because the researchers set them to several specific values.

To minimize sources of variation and the effect of other variables, the researchers control the experimental environment as much as possible and use randomization to determine the settings for each experimental run. All in all, it’s a highly controlled, randomized experiment.

However, the researchers know from experience that humidity levels also affect the outcome. Unfortunately, humidity is much harder to control because it depends on outdoor conditions and is impossible to regulate throughout the manufacturing environment. Consequently, they record humidity as a covariate during each experimental run so the ANCOVA model can account for its effect.

The manufacturer is primarily interested in how Temperature and Pressure affect their manufacturing outcome. However, by including humidity as a covariate, the model can control for changing humidity conditions during the experiment.

Share this:

  • Tweet

Related

Filed Under: ANOVA Tagged With: conceptual, data types

Reader Interactions

Comments

  1. Swati Puranik says

    September 11, 2023 at 5:24 am

    Hi Jim,

    Thank you for the detailed explanation. But I am confused about a data that I have been handed. It is a series of RCBD field experiments measuring various traits (plant height, flowering days etc.) conducted over three years on different genotypes of a particular plant species. I have three replicates for each measurement. So, primarily the hypothesis question are:
    1. Is the trait affected by differences in the genotype?
    2. Is the trait affected by changes in the year?
    3. Is the trait affected by both genotype and year?
    4. Is there an impact of replicates?

    So, which one should I consider as factor and which one as a covariate?

    To further increase the complexity, these genotypes have different ploidy levels. But I am assuming since this is something that can been experimentally measured but not humanly controlled, should be included as a covariate.

    Reply
    • Jim Frost says

      September 11, 2023 at 5:56 pm

      Hi Swati,

      From what you write, it sounds like the following are the types of variables in your study:

      Outcomes: the various traits. Probably need to fit separate models for each trait/outcome.
      Factors: genotype, year (you might include year as a blocking variable instead.)
      Covariate: ploidy levels (if it’s a continuous variable)

      As for determining the role of replicates, assess the consistency of results across replicates. If there’s large variation, uncontrolled variables might be affecting the results. You’re hoping for low variation between replicates.

      I hope that helps!

      Reply
  2. CH says

    September 9, 2023 at 3:30 pm

    Can race, ethnicity, and school type be used as covariates in studies on high school students if the predictor variables are parent future expectations for students and the outcome variable is student grades? My rationale is that these variables may explain some of the variance in the model. The sample population is from grades 9-12. I have used these variables in a stepwise regression and now I am rethinking it based on your definition of covariate.

    Reply
    • Jim Frost says

      September 9, 2023 at 11:40 pm

      Hi,

      Yes, your logic is sound for including them! The only thing is that the variables you mention are categorical variables rather than continuous. So, they can’t technically be covariates because that term is reserved for continuous variables. However, you can include those categorical variables in your model for the same reasons as you do for covariates. You can call them demographic control variables or something like that. So, again, your logic for including them is sound! 🙂

      Reply
  3. Collin says

    November 27, 2022 at 8:16 am

    Can’t the type of experimental design, say, completely randomized block design (RCBD), or Latin square design rule out the effects of a potential confounding factor. In other words, isn’t the consideration of a covariate only applicable to RCD trials?

    Reply
    • Jim Frost says

      November 27, 2022 at 6:54 pm

      Hi Collin,

      Blocked design, including Latin Square designs, are one way to handle nuisance variables. Blocks are essentially a categorical nuisance variable. For example, a block might represent days when you think the experimental conditions might change on the different days over which experimental runs occur. With blocks, you might not even be sure exactly what the nuisance is, or it might be a combination of variables, such as with blocking by day. Although, blocks can certainly represent known factors, such as material batches, shifts, etc. But either way, blocks are categorical.

      Covariates are another method for handling continuous nuisance variables. You’ll enter the nuisance variables with continuous values. Humidity is a good example of a covariate. It’s not categorical but quite clearly a continuous variable where you’d enter the percentage.

      And you can use blocks and covariates together too!

      Reply

Comments and QuestionsCancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Top Posts

    • How To Interpret R-squared in Regression Analysis
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Placebo Effect Overview: Definition & Examples
    • Mean, Median, and Mode: Measures of Central Tendency
    • Z-table
    • Cronbach’s Alpha: Definition, Calculations & Example
    • Weighted Average: Formula & Calculation Examples
    • F-table
    • Bernoulli Distribution: Uses, Formula & Example
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

    Recent Posts

    • Bernoulli Distribution: Uses, Formula & Example
    • Placebo Effect Overview: Definition & Examples
    • Randomized Controlled Trial (RCT) Overview
    • Prospective Study: Definition, Benefits & Examples
    • T Test Overview: How to Use & Examples
    • Wilcoxon Signed Rank Test Explained

    Recent Comments

    • Jim Frost on Cronbach’s Alpha: Definition, Calculations & Example
    • John on Cronbach’s Alpha: Definition, Calculations & Example
    • Jim Frost on Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Thu Nguyen on Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Quang Dat on 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

    Copyright © 2023 · Jim Frost · Privacy Policy