Independent variables and dependent variables are the two fundamental types of variables in statistical modeling and experimental designs. Analysts use these methods to understand the relationships between the variables and estimate effect sizes. What effect does one variable have on another?

In this post, learn the definitions of independent and dependent variables, how to identify each type, how they differ between different types of studies, and see examples of them in use.

## What is an Independent Variable?

Independent variables (IVs) are the ones that you include in the model to explain or predict changes in the dependent variable. The name helps you understand their role in statistical analysis. These variables are *independent*. In this context, independent indicates that they stand alone and other variables in the model do not influence them. The researchers are not seeking to understand what causes the independent variables to change.

Independent variables are also known as predictors, factors, treatment variables, explanatory variables, input variables, x-variables, and right-hand variables—because they appear on the right side of the equals sign in a regression equation. In notation, statisticians commonly denote them using Xs. On graphs, analysts place independent variables on the horizontal, or X, axis.

In machine learning, independent variables are known as features.

For example, in a plant growth study, the independent variables might be soil moisture (continuous) and type of fertilizer (categorical).

Statistical models will estimate effect sizes for the independent variables.

**Relate post**: Effect Sizes in Statistics

### Including independent variables in studies

The nature of independent variables changes based on the type of experiment or study:

**Controlled experiments**: Researchers systematically control and set the values of the independent variables. In randomized experiments, relationships between independent and dependent variables tend to be causal. The independent variables *cause* changes in the dependent variable.

**Observational studies**: Researchers do not set the values of the explanatory variables but instead observe them in their natural environment. When the independent and dependent variables are correlated, those relationships might not be causal.

When you include one independent variable in a regression model, you are performing simple regression. For more than one independent variable, it is multiple regression. Despite the different names, it’s really the same analysis with the same interpretations and assumptions.

Determining which IVs to include in a statistical model is known as model specification. That process involves in-depth research and many subject-area, theoretical, and statistical considerations. At its most basic level, you’ll want to include the predictors you are specifically assessing in your study and confounding variables that will bias your results if you don’t add them—particularly for observational studies.

For more information about choosing independent variables, read my post about Specifying the Correct Regression Model.

**Related posts**: Randomized Experiments, Observational Studies, and Confounding Variables

## What is a Dependent Variable?

The dependent variable (DV) is what you want to use the model to explain or predict. The values of this variable *depend* on other variables. It is the outcome that you’re studying. It’s also known as the response variable, outcome variable, and left-hand variable. Statisticians commonly denote them using a Y. Traditionally, graphs place dependent variables on the vertical, or Y, axis.

For example, in the plant growth study example, a measure of plant growth is the dependent variable. That is the outcome of the experiment, and we want to determine what affects it.

## How to Identify Independent and Dependent Variables

If you’re reading a study’s write-up, how do you distinguish independent variables from dependent variables? Here are some tips!

### Identifying IVs

How statisticians discuss independent variables changes depending on the field of study and type of experiment.

In randomized experiments, look for the following descriptions to identify the independent variables:

- Independent variables cause changes in another variable.
- The researchers control the values of the independent variables. They are controlled or manipulated variables.
- Experiments often refer to them as factors or experimental factors. In areas such as medicine, they might be risk factors.
- Treatment and control groups are always independent variables. In this case, the independent variable is a categorical grouping variable that defines the experimental groups to which participants belong. Each group is a level of that variable.

In observational studies, independent variables are a bit different. While the researchers likely want to establish causation, that’s harder to do with this type of study, so they often won’t use the word “cause.” They also don’t set the values of the predictors. Some independent variables are the experiment’s focus, while others help keep the experimental results valid.

Here’s how to recognize independent variables in observational studies:

- IVs explain the variability, predict, or correlate with changes in the dependent variable.
- Researchers in observational studies must include confounding variables (i.e., confounders) to keep the statistical results valid even if they are not the primary interest of the study. For example, these might include the participants’ socio-economic status or other background information that the researchers aren’t focused on but can explain some of the dependent variable’s variability.
- The results are adjusted or controlled for by a variable.

Regardless of the study type, if you see an estimated effect size, it is an independent variable.

### Identifying DVs

Dependent variables are the outcome. The IVs explain the variability or causes changes in the DV. Focus on the “depends” aspect. The value of the dependent variable depends on the IVs. If Y depends on X, then Y is the dependent variable. This aspect applies to both randomized experiments and observational studies.

In an observational study about the effects of smoking, the researchers observe the subjects’ smoking status (smoker/non-smoker) and their lung cancer rates. It’s an observational study because they cannot randomly assign subjects to either the smoking or non-smoking group. In this study, the researchers want to know whether lung cancer rates depend on smoking status. Therefore, the lung cancer rate is the dependent variable.

In a randomized COVID-19 vaccine experiment, the researchers randomly assign subjects to the treatment or control group. They want to determine whether COVID-19 infection rates depend on vaccination status. Hence, the infection rate is the DV.

Note that a variable can be an independent variable in one study but a dependent variable in another. It depends on the context.

For example, one study might assess how the amount of exercise (IV) affects health (DV). However, another study might study the factors (IVs) that influence how much someone exercises (DV). The amount of exercise is an independent variable in one study but a dependent variable in the other!

## How Analyses Use IVs and DVs

Regression analysis and ANOVA mathematically describe the relationships between each independent variable and the dependent variable. Typically, you want to determine how changes in one or more predictors associate with changes in the dependent variable. These analyses estimate an effect size for each independent variable.

Suppose researchers study the relationship between wattage, several types of filaments, and the output from a light bulb. In this study, light output is the dependent variable because it depends on the other two variables. Wattage (continuous) and filament type (categorical) are the independent variables.

After performing the regression analysis, the researchers will understand the nature of the relationship between these variables. How much does the light output increase on average for each additional watt? Does the mean light output differ by filament types? They will also learn whether these effects are statistically significant.

**Related post**: When to Use Regression Analysis

## Graphing Independent and Dependent Variables

As I mentioned earlier, graphs traditionally display the independent variables on the horizontal X-axis and the dependent variable on the vertical Y-axis. The type of graph depends on the nature of the variables. Here are a couple of examples.

Suppose you experiment to determine whether various teaching methods affect learning outcomes. Teaching method is a categorical predictor that defines the experimental groups. To display this type of data, you can use a boxplot, as shown below.

The groups are along the horizontal axis, while the dependent variable, learning outcomes, is on the vertical. From the graph, method 4 has the best results. A one-way ANOVA will tell you whether these results are statistically significant. Learn more about interpreting boxplots.

Now, imagine that you are studying people’s height and weight. Specifically, do height increases cause weight to increase? Consequently, height is the independent variable on the horizontal axis, and weight is the dependent variable on the vertical axis. You can use a scatterplot to display this type of data.

It appears that as height increases, weight tends to increase. Regression analysis will tell you if these results are statistically significant. Learn more about interpreting scatterplots.

Ty says

Excellent explanation.

Thank you

Yashika says

Hi Jim,

Thanks a lot for creating this excellent blog. This is my go-to resource for Statistics.

I had been pondering over a question for sometime, it would be great if you could shed some light on this.

In linear and non-linear regression, should the distribution of independent and dependent variables be unskewed?

When is there a need to transform the data (say, Box-Cox transformation), and do we transform the independent variables as well?

Thank You

Kelly Anderson says

If I use a independent variable (X) and it displays a low p-value <.05, why is it if I introduce another independent variable to regression the coefficient and p-value of Y that I used in first regression changes to look insignificant? The second variable that I introduced has a low p-value in regression.

Jim Frost says

Hi Kelly,

Keep in mind that the significance of each IV is calculated after accounting for the variance of all the other variables in the model, assuming you’re using the standard adjusted sums of squares rather than sequential sums of squares. The sums of squares (SS) is a measure of how much dependent variable variability that each IV accounts for. In the illustration below, I’ll assume you’re using the standard of adjusted SS.

So, let’s say that originally you have X1 in the model along with some other IVs. Your model estimates the significance of X1 after assessing the variability that the other IVs account for and finds that X1 is significant. Now, you add X2 to the model in addition to X1 and the other IVs. Now, when assessing X1, the model accounts for the variability of the IVs including the newly added X2. And apparently X2 explains a good portion of the variability. X1 is no longer able to account for that variability, which causes it to not be statistically significant.

In other words, X2 explains some of the variability that X1 previously explained. Because X1 no longer explains it, it is no longer significant.

Additionally, the significance of IVs is more likely to change when you add or remove IVs that are correlated. Correlated IVs is known as multicollinearity. Multicollinearity can be a problem when you have too much. Given the change in significance, I’d check your model for multicollinearity just to be safe! Click the link to read a post that wrote about that!

I hope that helps!

Tanu Khare says

nice explanation

Dessie says

it is excellent explanation