Independent variables and dependent variables are the two fundamental types of variables in statistical modeling and experimental designs. Analysts use these methods to understand the relationships between the variables and estimate effect sizes. What effect does one variable have on another?
In this post, learn the definitions of independent and dependent variables, how to identify each type, how they differ between different types of studies, and see examples of them in use.
What is an Independent Variable?
Independent variables (IVs) are the ones that you include in the model to explain or predict changes in the dependent variable. The name helps you understand their role in statistical analysis. These variables are independent. In this context, independent indicates that they stand alone and other variables in the model do not influence them. The researchers are not seeking to understand what causes the independent variables to change.
Independent variables are also known as predictors, factors, treatment variables, explanatory variables, input variables, x-variables, and right-hand variables—because they appear on the right side of the equals sign in a regression equation. In notation, statisticians commonly denote them using Xs. On graphs, analysts place independent variables on the horizontal, or X, axis.
In machine learning, independent variables are known as features.
For example, in a plant growth study, the independent variables might be soil moisture (continuous) and type of fertilizer (categorical).
Statistical models will estimate effect sizes for the independent variables.
Relate post: Effect Sizes in Statistics
Including independent variables in studies
The nature of independent variables changes based on the type of experiment or study:
Controlled experiments: Researchers systematically control and set the values of the independent variables. In randomized experiments, relationships between independent and dependent variables tend to be causal. The independent variables cause changes in the dependent variable.
Observational studies: Researchers do not set the values of the explanatory variables but instead observe them in their natural environment. When the independent and dependent variables are correlated, those relationships might not be causal.
When you include one independent variable in a regression model, you are performing simple regression. For more than one independent variable, it is multiple regression. Despite the different names, it’s really the same analysis with the same interpretations and assumptions.
Determining which IVs to include in a statistical model is known as model specification. That process involves in-depth research and many subject-area, theoretical, and statistical considerations. At its most basic level, you’ll want to include the predictors you are specifically assessing in your study and confounding variables that will bias your results if you don’t add them—particularly for observational studies.
For more information about choosing independent variables, read my post about Specifying the Correct Regression Model.
What is a Dependent Variable?
The dependent variable (DV) is what you want to use the model to explain or predict. The values of this variable depend on other variables. It is the outcome that you’re studying. It’s also known as the response variable, outcome variable, and left-hand variable. Statisticians commonly denote them using a Y. Traditionally, graphs place dependent variables on the vertical, or Y, axis.
For example, in the plant growth study example, a measure of plant growth is the dependent variable. That is the outcome of the experiment, and we want to determine what affects it.
How to Identify Independent and Dependent Variables
If you’re reading a study’s write-up, how do you distinguish independent variables from dependent variables? Here are some tips!
How statisticians discuss independent variables changes depending on the field of study and type of experiment.
In randomized experiments, look for the following descriptions to identify the independent variables:
- Independent variables cause changes in another variable.
- The researchers control the values of the independent variables. They are controlled or manipulated variables.
- Experiments often refer to them as factors or experimental factors. In areas such as medicine, they might be risk factors.
- Treatment and control groups are always independent variables. In this case, the independent variable is a categorical grouping variable that defines the experimental groups to which participants belong. Each group is a level of that variable.
In observational studies, independent variables are a bit different. While the researchers likely want to establish causation, that’s harder to do with this type of study, so they often won’t use the word “cause.” They also don’t set the values of the predictors. Some independent variables are the experiment’s focus, while others help keep the experimental results valid.
Here’s how to recognize independent variables in observational studies:
- IVs explain the variability, predict, or correlate with changes in the dependent variable.
- Researchers in observational studies must include confounding variables (i.e., confounders) to keep the statistical results valid even if they are not the primary interest of the study. For example, these might include the participants’ socio-economic status or other background information that the researchers aren’t focused on but can explain some of the dependent variable’s variability.
- The results are adjusted or controlled for by a variable.
Regardless of the study type, if you see an estimated effect size, it is an independent variable.
Dependent variables are the outcome. The IVs explain the variability or causes changes in the DV. Focus on the “depends” aspect. The value of the dependent variable depends on the IVs. If Y depends on X, then Y is the dependent variable. This aspect applies to both randomized experiments and observational studies.
In an observational study about the effects of smoking, the researchers observe the subjects’ smoking status (smoker/non-smoker) and their lung cancer rates. It’s an observational study because they cannot randomly assign subjects to either the smoking or non-smoking group. In this study, the researchers want to know whether lung cancer rates depend on smoking status. Therefore, the lung cancer rate is the dependent variable.
In a randomized COVID-19 vaccine experiment, the researchers randomly assign subjects to the treatment or control group. They want to determine whether COVID-19 infection rates depend on vaccination status. Hence, the infection rate is the DV.
Note that a variable can be an independent variable in one study but a dependent variable in another. It depends on the context.
For example, one study might assess how the amount of exercise (IV) affects health (DV). However, another study might study the factors (IVs) that influence how much someone exercises (DV). The amount of exercise is an independent variable in one study but a dependent variable in the other!
How Analyses Use IVs and DVs
Regression analysis and ANOVA mathematically describe the relationships between each independent variable and the dependent variable. Typically, you want to determine how changes in one or more predictors associate with changes in the dependent variable. These analyses estimate an effect size for each independent variable.
Suppose researchers study the relationship between wattage, several types of filaments, and the output from a light bulb. In this study, light output is the dependent variable because it depends on the other two variables. Wattage (continuous) and filament type (categorical) are the independent variables.
After performing the regression analysis, the researchers will understand the nature of the relationship between these variables. How much does the light output increase on average for each additional watt? Does the mean light output differ by filament types? They will also learn whether these effects are statistically significant.
Related post: When to Use Regression Analysis
Graphing Independent and Dependent Variables
As I mentioned earlier, graphs traditionally display the independent variables on the horizontal X-axis and the dependent variable on the vertical Y-axis. The type of graph depends on the nature of the variables. Here are a couple of examples.
Suppose you experiment to determine whether various teaching methods affect learning outcomes. Teaching method is a categorical predictor that defines the experimental groups. To display this type of data, you can use a boxplot, as shown below.
The groups are along the horizontal axis, while the dependent variable, learning outcomes, is on the vertical. From the graph, method 4 has the best results. A one-way ANOVA will tell you whether these results are statistically significant. Learn more about interpreting boxplots.
Now, imagine that you are studying people’s height and weight. Specifically, do height increases cause weight to increase? Consequently, height is the independent variable on the horizontal axis, and weight is the dependent variable on the vertical axis. You can use a scatterplot to display this type of data.
It appears that as height increases, weight tends to increase. Regression analysis will tell you if these results are statistically significant. Learn more about interpreting scatterplots.