What is Multiple Linear Regression?
Multiple linear regression (MLR) is a statistical method that quantifies the linear relationship between multiple independent variables (predictors) and a single continuous dependent variable (outcome). It estimates this relationship by finding the best-fitting linear equation for the data points.
Multiple linear regression analysis has two primary goals:
- Understanding the relationships between the independent variables and the dependent variable.
- Predicting the outcome based on the values of the independent variables.
This analysis typically estimates the relationships using the least squares method, which identifies the regression equation that minimizes the squared differences between observed values and predicted values (i.e., the residual sum of squares (RSS)).
Multiple Linear Regression Equation
The standard notation for a multiple linear regression model is the following:
![]()
Where:
- Y is the dependent variable (DV or the outcome you’re predicting).
- X1, X2, … , Xk are the independent variables (IVs or predictors).
- β0 is the intercept (the predicted value of Y when all X-variables = 0).
- β1, β2, … , βk are coefficients (slopes) for the IVs, each representing the average change in Y associated with a one-unit increase in its corresponding predictor, holding all other predictors constant.
- ε represents random error or residuals, showing how individual observations deviate from the predicted values.
To understand the relationships, analysts examine the regression coefficients (β1, β2, … , βk) to determine how strongly, and in what direction, each predictor independently influences the outcome, while holding the other predictors constant.
Predicting outcomes involves substituting values for the independent variables (X1, X2, … , Xk) into the multiple linear regression equation.
Example and Interpreting Results
For example, suppose a researcher studying factors affecting home prices fits a multiple linear regression model with three predictors: square footage (SqFt), number of bedrooms (Bedrooms), and age of the home (Age). The estimated regression equation might be:
![]()
The coefficient interpretations are as follows:
- The coefficient for square footage (150) means each additional square foot corresponds, on average, to a $150 increase in home price, holding other variables constant.
- The coefficient for number of bedrooms (20,000) indicates each additional bedroom corresponds, on average, to a $20,000 increase in home price, holding other variables constant.
- The coefficient for age of home (-1,000) suggests that each additional year of home age corresponds, on average, to a $1,000 decrease in price, holding other variables constant.
If you consider a home with 2,000 square feet, 3 bedrooms, and an age of 10 years, the predicted home price would be:
![]()
Multiple linear regression provides both a quantitative understanding of how strongly each predictor independently influences the outcome and a method for predicting the outcome value for specific combinations of predictor values. Analysts assess the model’s effectiveness by examining R-squared to determine how much variation in the outcome the predictors collectively explain and use residual plots to verify that the linear model satisfies underlying assumptions.
« Back to Glossary Index