• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Linear Regression Equation Explained

By Jim Frost 2 Comments

A linear regression equation describes the relationship between the independent variables (IVs) and the dependent variable (DV). It can also predict new values of the DV for the IV values you specify.

In this post, we’ll explore the various parts of the regression line equation and understand how to interpret it using an example.

I’ll mainly look at simple regression, which has only one independent variable. These models are easy to graph, and we can more intuitively understand the linear regression equation.

Related post: Independent and Dependent Variables

Deriving the Linear Regression Equation

Least squares regression produces a linear regression equation, providing your key results all in one place. How does the regression procedure calculate the equation?

The process is complex, and analysts always use software to fit the models. For this post, I’ll show you the general process.

Consider this height and weight dataset. There is clearly a relationship between these two variables. But how would you draw the best line to represent it mathematically?

Scatterplot of height and weight data.

Regression analysis draws a line through these points that minimizes their overall distance from the line. More specifically, least squares regression minimizes the sum of the squared differences between the data points and the line, which statisticians call the sum of squared errors (SSE).

Let’s fit the model!

Now, we can see the line and its corresponding linear regression equation, which I’ve circled.

Graph that displays the linear regression equation for our model.

This graph shows the observations with a line representing the regression model. Following the practice in statistics, the Y-axis (vertical) displays the dependent variable, weight. The X-axis (horizontal) shows the independent variable, which is height.

The line is also known as the fitted line, and it produces a smaller SSE than any other line you can draw through these observations.

Like all lines you’ve studied in algebra, you can describe them with an equation. For this analysis, we call it the equation for the regression line. So, let’s quickly revisit algebra!

Learn more about the X and Y Axis.

Equation for a Line

Think back to algebra and the equation for a line: y = mx + b.

In the equation for a line,

  • Y = the vertical value.
  • M = slope (rise/run).
  • X = the horizontal value.
  • B = the value of Y when X = 0 (i.e., y-intercept).

Slope = {\displaystyle \frac {{\text{Change in Y}}}{{\text{Change in X}}}}

So, if the slope is 3, then as X increases by 1, Y increases by 1 X 3 = 3. Conversely, if the slope is -3, then Y decreases as X increases. Consequently, you need to know both the sign and the value to understand the direction of the line and how steeply it rises or falls. Higher absolute values correspond to steeper slopes.

B is the Y-intercept. It tells you the value of Y for the line as it crosses the Y-axis, as indicated by the red dot below.

A line draw that illustrates the constant in regression analysis, also known as the y-intercept.

Using the intercept and slope together allows you to calculate any point on a line. The example below displays lines for two equations.

Examples of graphing line equations using algebra.

Applying these Ideas to a Linear Regression Equation

A regression line equation uses the same ideas. Here’s how the regression concepts correspond to algebra:

  • Y-axis represents values of the dependent variable.
  • X-axis represents values of the independent variable.
  • Sign of coefficient indicates whether the relationship is positive or negative.
  • Coefficient value is the slope.
  • Constant or Y-intercept is B.

In regression analysis, the procedure estimates the best values for the constant and coefficients. Typically, regression models switch the order of terms in the equation compared to algebra by displaying the constant first and then the coefficients. It also uses different notation, as shown below for simple regression.

Y =\beta _{0} + \beta _{1}X_{1}

Using this notation, β0 is the constant, while β1 is the coefficient for X. Multiple regression just adds more βkXk terms to the equation up to K independent variables (Xs).

On the fitted line plots below, I’ve circle portions of the linear regression equation to identify its components.

Coefficient = Slope

Visually, the fitted line has a positive slope corresponding to the positive coefficient sign we obtained earlier. The slope of the line equals the +106.5 coefficient that I circled in the equation for the regression line. This coefficient indicates how much mean weight increases as we increase height by one unit.

Graph that highlights the coefficient in the linear regression equation.

Constant = Y-intercept

You’ll notice that the previous graph doesn’t display the regression line crossing the Y-axis. The constant is so negative that the default axes settings don’t show it. I’ve adjusted the axes in the chart below to include the y-intercept and illustrate the constant more clearly.

Graph that highlights the constant in a linear regression model.

If you extend the regression line downwards until it reaches the Y-axis, you’ll find that the y-intercept value is -114.3—just as the equation indicates.

Interpreting the Regression Line Equation

Let’s combine all these parts of a linear regression equation and see how to interpret them.

  • Coefficient signs: Indicates whether the dependent variable increases (+) or decreases (-) as the IV increases.
  • Coefficient values: Represents the average change in the DV given a one-unit increase in the IV.
  • Constant: Value of the DV when the IV equals zero.

Next, we’ll apply that to the linear regression equation from our model.

Weight kg = -114.3 + 106.5 Height M

The coefficient sign is positive, meaning that weight tends to increase as height increases. Additionally, the coefficient is 106.5. This value indicates that if you increase height by 1m, weight increases by an average of 106.5kg. However, our data have a range of only 0.4M. So, we can’t use a full meter but a proportion of one. For example, with an additional 0.1m, you’d expect a 10.65kg increase.

Learn more about How to Interpret Coefficients and Their P-Values.

For our model, the constant in the linear regression equation technically indicates the mean weight is -114.3kg when height equals zero. However, that doesn’t make intuitive sense, especially because we have a negative constant. A negative weight? How do we explain that?

Notice in the graph that the data are far from the y-axis. Our model doesn’t apply to that region. You can only interpret the model for your data’s observation space.

Hence, we can’t interpret the constant for our model, which is fairly typical in regression analysis for multiple reasons. Learn more about that by reading about The Constant in Regression Analysis.

Using a Linear Regression Equation for Predictions

You can enter values for the independent variables in a regression line equation to predict the mean of the dependent variable. For our model, we can enter a height to predict the average weight.

A couple of caveats. The equation only applies to the range of the data. So, we need to stick with heights between 1.3 – 1.7m. Also, the data are for pre-teen girls. Consequently, the regression model is valid only for that population.

With that in mind, let’s calculate the mean height for a girl who is 1.6m tall by entering that value into our linear regression equation:

Weight kg = 114.3 + 106.5 * 1.6 = 56.1.

The average weight for someone 1.6m tall in this population is 56.1kg. On the chart below, notice how the line passes through this point (1.6, 56.1). Also, note that you’ll see data points above and below the line. The prediction gives the average, but that range of data points represents the prediction’s precision.

Example of using a linear regression equation to make a prediction.

Learn more about Using Regression to Make Predictions.

When you know how to interpret a linear regression equation, you gain valuable insights into the model. However, for your conclusions to be trustworthy, be sure that you satisfy the least squares regression assumptions.

Share this:

  • Tweet

Related

Filed Under: Regression Tagged With: analysis example, interpreting results

Reader Interactions

Comments

  1. Sultan Mahmood says

    January 29, 2023 at 11:24 am

    I was searching during last 4 days for regression calculation & interpretation, I’ve learnt a lot. Still I’m searching to interpret the calculated data by MINITAB.

    Reply
  2. Jeff says

    October 29, 2022 at 4:59 pm

    Hi Jim

    Wondering what regression (or other) test we would run if we had a research question that looked at somethin like : does work out intensity (low, medium, high) and self reporting of sleep quality (good vs. bad) – our independent variables – affect the BMI % of respondents (where BMI % will be our dependent variable). Would multiple regression be the appropriate test? Multiple discriminant analysis perhaps? MANOVA?

    Thanks Jim, looking forward to your thoughts

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Mean, Median, and Mode: Measures of Central Tendency
    • Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • One-Tailed and Two-Tailed Hypothesis Tests Explained
    • Choosing the Correct Type of Regression Analysis
    • The Importance of Statistics
    • Z-table

    Recent Posts

    • Principal Component Analysis Guide & Example
    • Fishers Exact Test: Using & Interpreting
    • Percent Change: Formula and Calculation Steps
    • X and Y Axis in Graphs
    • Simpsons Paradox Explained
    • Covariates: Definition & Uses

    Recent Comments

    • Sultan Mahmood on Linear Regression Equation Explained
    • Sanjay Kumar P on What is the Mean and How to Find It: Definition & Formula
    • Dave on Control Variables: Definition, Uses & Examples
    • Jim Frost on How High Does R-squared Need to Be?
    • Mark Solomons on How High Does R-squared Need to Be?

    Copyright © 2023 · Jim Frost · Privacy Policy