What is Linear Regression?
Linear regression models the relationships between at least one explanatory variable and an outcome variable. These variables are known as the independent and dependent variables, respectively. When there is one independent variable (IV), the procedure is known as simple linear regression. When there are more IVs, statisticians refer to it as multiple regression.
Learn more about independent and dependent variables.
This flexible analysis allows you to separate the effects of complicated research questions by modeling and controlling all relevant variables. It lets you isolate the role that each variable plays. This procedure uses sample data to estimate the population parameters. The regression coefficients in your statistical output are the parameter estimates.
Learn more about when you should use regression analysis.
Linear regression has two primary purposes—understanding the relationships between variables and forecasting.
- The coefficients represent the estimated magnitude and direction (positive/negative) of the relationship between each independent variable and the dependent variable.
- A linear regression equation allows you to predict the mean value of the dependent variable given values of the independent variables that you specify.
Despite the name, linear regression can model curved relationships. In this context, the term “linear” describes the form of the regression equation. A regression equation is linear when all its terms are one of the following:
- Parameter multiplying an independent variable.
Additionally, a linear regression equation can only add terms together, producing one general form:
Dependent variable = constant + parameter * IV + … + parameter * IV
Statisticians refer to this form as being linear in the parameters. Hence, you cannot include parameters in an exponent in linear regression, but you can raise a variable to a power to model curvature.
Linear regression was the original form that statisticians studied, and it is the easiest type of model to fit and interpret. However, a linear model cannot fit some datasets well and a nonlinear model is required.
Specifying the correct model requires balancing subject-area knowledge, statistical results, and satisfying the assumptions.
Learn more about the difference between linear and nonlinear models and specifying the correct regression model.
Linear Regression Assumptions
Least squares regression, also known as ordinary least squares, is the most common form of linear regression. However, there are other types, such as least absolute deviation and ridge regression.
Each type has a set of assumptions that you primarily assess using the residuals. Residuals are the difference between the observed value and the mean value that the model predicts for that observation. If you fail to satisfy the assumptions, the results might not be valid.
Learn more about the assumptions for ordinary least squares.
Example of Linear Regression
Suppose we use linear regression to model how the outside temperature in Celsius and Insulation thickness in centimeters, our two independent variables, relate to air conditioning costs in dollars (dependent variable).
Let’s interpret the results for the following multiple linear regression equation:
Air Conditioning Costs$ = 2 * Temperature C – 1.5 * Insulation CM
The coefficient sign for Temperature is positive (+2), which indicates a positive relationship between Temperature and Costs. As the temperature increases, so does air condition costs. More specifically, the coefficient value of 2 indicates that for every 1 C increase, the average air conditioning cost increases by two dollars.
On the other hand, the negative coefficient for insulation (–1.5) represents a negative relationship between insulation and air conditioning costs. As insulation thickness increases, air conditioning costs decrease. For every 1 CM increase, the average air conditioning cost drops by $1.50.
We can also enter values for temperature and insulation into this linear regression equation to predict the mean air conditioning cost.
Yan, Xin (2009), Linear Regression Analysis: Theory and Computing