• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Linear Regression Equation Explained

By Jim Frost 3 Comments

A linear regression equation describes the relationship between the independent variables (IVs) and the dependent variable (DV). It can also predict new values of the DV for the IV values you specify.

In this post, we’ll explore the various parts of the regression line equation and understand how to interpret it using an example.

I’ll mainly look at simple regression, which has only one independent variable. These models are easy to graph, and we can more intuitively understand the linear regression equation.

Related post: Independent and Dependent Variables

Deriving the Linear Regression Equation

Least squares regression produces a linear regression equation, providing your key results all in one place. How does the regression procedure calculate the equation?

The process is complex, and analysts always use software to fit the models. For this post, I’ll show you the general process.

Consider this height and weight dataset. There is clearly a relationship between these two variables. But how would you draw the best line to represent it mathematically?

Scatterplot of height and weight data.

Regression analysis draws a line through these points that minimizes their overall distance from the line. More specifically, least squares regression minimizes the sum of the squared differences between the data points and the line, which statisticians call the sum of squared errors (SSE).

To learn how least squares regression calculates the coefficients and y-intercept with a worked example, read my post Least Squares Regression: Definition, Formulas & Example.

Let’s fit the model!

Now, we can see the line and its corresponding linear regression equation, which I’ve circled.

Graph that displays the linear regression equation for our model.

This graph shows the observations with a line representing the regression model. Following the practice in statistics, the Y-axis (vertical) displays the dependent variable, weight. The X-axis (horizontal) shows the independent variable, which is height.

The line is also known as the fitted line, and it produces a smaller SSE than any other line you can draw through these observations.

Like all lines you’ve studied in algebra, you can describe them with an equation. For this analysis, we call it the equation for the regression line. So, let’s quickly revisit algebra!

Learn more about the X and Y Axis.

Equation for a Line

Think back to algebra and the equation for a line: y = mx + b.

In the equation for a line,

  • Y = the vertical value.
  • M = slope (rise/run).
  • X = the horizontal value.
  • B = the value of Y when X = 0 (i.e., y-intercept).

Slope formula.

So, if the slope is 3, then as X increases by 1, Y increases by 1 X 3 = 3. Conversely, if the slope is -3, then Y decreases as X increases. Consequently, you need to know both the sign and the value to understand the direction of the line and how steeply it rises or falls. Higher absolute values correspond to steeper slopes.

B is the Y-intercept. It tells you the value of Y for the line as it crosses the Y-axis, as indicated by the red dot below.

A line draw that illustrates the constant in regression analysis, also known as the y-intercept.

Using the intercept and slope together allows you to calculate any point on a line. The example below displays lines for two equations.

Examples of graphing line equations using algebra.

If you need a refresher, read my Guide to the Slope Intercept Form of Linear Equations.

Applying these Ideas to a Linear Regression Equation

A regression line equation uses the same ideas. Here’s how the regression concepts correspond to algebra:

  • Y-axis represents values of the dependent variable.
  • X-axis represents values of the independent variable.
  • Sign of coefficient indicates whether the relationship is positive or negative.
  • Coefficient value is the slope.
  • Constant or Y-intercept is B.

In regression analysis, the procedure estimates the best values for the constant and coefficients. Typically, regression models switch the order of terms in the equation compared to algebra by displaying the constant first and then the coefficients. It also uses different notation, as shown below for simple regression.

Simple regression equation.

Using this notation, β0 is the constant, while β1 is the coefficient for X. Multiple regression just adds more βkXk terms to the equation up to K independent variables (Xs).

On the fitted line plots below, I’ve circle portions of the linear regression equation to identify its components.

Coefficient = Slope

Visually, the fitted line has a positive slope corresponding to the positive coefficient sign we obtained earlier. The slope of the line equals the +106.5 coefficient that I circled in the equation for the regression line. This coefficient indicates how much mean weight increases as we increase height by one unit.

Graph that highlights the coefficient in the linear regression equation.

Constant = Y-intercept

You’ll notice that the previous graph doesn’t display the regression line crossing the Y-axis. The constant is so negative that the default axes settings don’t show it. I’ve adjusted the axes in the chart below to include the y-intercept and illustrate the constant more clearly.

Graph that highlights the constant in a linear regression model.

If you extend the regression line downwards until it reaches the Y-axis, you’ll find that the y-intercept value is -114.3—just as the equation indicates.

Interpreting the Regression Line Equation

Let’s combine all these parts of a linear regression equation and see how to interpret them.

  • Coefficient signs: Indicates whether the dependent variable increases (+) or decreases (-) as the IV increases.
  • Coefficient values: Represents the average change in the DV given a one-unit increase in the IV.
  • Constant: Value of the DV when the IV equals zero.

Next, we’ll apply that to the linear regression equation from our model.

Weight kg = -114.3 + 106.5 Height M

The coefficient sign is positive, meaning that weight tends to increase as height increases. Additionally, the coefficient is 106.5. This value indicates that if you increase height by 1m, weight increases by an average of 106.5kg. However, our data have a range of only 0.4M. So, we can’t use a full meter but a proportion of one. For example, with an additional 0.1m, you’d expect a 10.65kg increase.

Learn more about How to Interpret Coefficients and Their P-Values.

For our model, the constant in the linear regression equation technically indicates the mean weight is -114.3kg when height equals zero. However, that doesn’t make intuitive sense, especially because we have a negative constant. A negative weight? How do we explain that?

Notice in the graph that the data are far from the y-axis. Our model doesn’t apply to that region. You can only interpret the model for your data’s observation space.

Hence, we can’t interpret the constant for our model, which is fairly typical in regression analysis for multiple reasons. Learn more about that by reading about The Constant in Regression Analysis.

Using a Linear Regression Equation for Predictions

You can enter values for the independent variables in a regression line equation to predict the mean of the dependent variable. For our model, we can enter a height to predict the average weight.

A couple of caveats. The equation only applies to the range of the data. So, we need to stick with heights between 1.3 – 1.7m. Also, the data are for pre-teen girls. Consequently, the regression model is valid only for that population.

With that in mind, let’s calculate the mean height for a girl who is 1.6m tall by entering that value into our linear regression equation:

Weight kg = 114.3 + 106.5 * 1.6 = 56.1.

The average weight for someone 1.6m tall in this population is 56.1kg. On the chart below, notice how the line passes through this point (1.6, 56.1). Also, note that you’ll see data points above and below the line. The prediction gives the average, but that range of data points represents the prediction’s precision.

Example of using a linear regression equation to make a prediction.

Learn more about Using Regression to Make Predictions.

When you know how to interpret a linear regression equation, you gain valuable insights into the model. However, for your conclusions to be trustworthy, be sure that you satisfy the least squares regression assumptions.

Share this:

  • Tweet

Related

Filed Under: Regression Tagged With: analysis example, interpreting results

Reader Interactions

Comments

  1. Klaus says

    May 25, 2023 at 1:20 pm

    Hello Jim,
    first of all, I have to say, it is a pleasure to read your books!
    I’m focused on the linear relationship and the different statistical tests, interpretation etc.
    The final goal is a paper concerning the estimation of investments in the area of plant construction, on the basis of historical data and a recommended non- linear relationship. (y=X^n)
    The accuracy of the mathematical model will be described with a confidence interval for the expected values and a prediction interval for individuals. An addition interesting point is the calculation of a prediction interval for future observations.
    My question based on a book which describes the confidence interval of a linear relationship for the expected values as a linear function! This is contrary to my understanding, statistic books and the law of large numbers.
    I would greatly appreciate to get an information with regard to that discrepancy.
    Best wishes from Germany and thank you in advance
    Klaus

    Reply
  2. Sultan Mahmood says

    January 29, 2023 at 11:24 am

    I was searching during last 4 days for regression calculation & interpretation, I’ve learnt a lot. Still I’m searching to interpret the calculated data by MINITAB.

    Reply
  3. Jeff says

    October 29, 2022 at 4:59 pm

    Hi Jim

    Wondering what regression (or other) test we would run if we had a research question that looked at somethin like : does work out intensity (low, medium, high) and self reporting of sleep quality (good vs. bad) – our independent variables – affect the BMI % of respondents (where BMI % will be our dependent variable). Would multiple regression be the appropriate test? Multiple discriminant analysis perhaps? MANOVA?

    Thanks Jim, looking forward to your thoughts

    Reply

Comments and QuestionsCancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • F-table
    • How To Interpret R-squared in Regression Analysis
    • Z-table
    • How to do t-Tests in Excel
    • How to Find the P value: Process and Calculations
    • Weighted Average: Formula & Calculation Examples
    • T-Distribution Table of Critical Values
    • Cronbach’s Alpha: Definition, Calculations & Example
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

    Recent Posts

    • Longitudinal Study: Overview, Examples & Benefits
    • Correlation vs Causation: Understanding the Differences
    • One Way ANOVA Overview & Example
    • Observational Study vs Experiment with Examples
    • Goodness of Fit: Definition & Tests
    • Binomial Distribution Formula: Probability, Standard Deviation & Mean

    Recent Comments

    • Jim Frost on Joint Probability: Definition, Formula & Examples
    • Harmeet on Joint Probability: Definition, Formula & Examples
    • kafia on Cronbach’s Alpha: Definition, Calculations & Example
    • Jim Frost on How to Interpret P-values and Coefficients in Regression Analysis
    • Jim Frost on Convenience Sampling: Definition & Examples

    Copyright © 2023 · Jim Frost · Privacy Policy