The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the *meaning* of the y-intercept in your regression output.

Why is it difficult to interpret the constant term? Because, the y-intercept is almost always meaningless! Surprisingly, while the constant doesn’t usually have a meaning, it is almost always vital to include it in your regression models!

In this post, I will teach you all about the constant in regression analysis.

## The Definition of the Constant is Correct but Misleading

The constant is often defined as the mean of the dependent variable when you set all of the independent variables in your model to zero. In a purely mathematical sense, this definition is correct. Unfortunately, it’s frequently impossible to set all variables to zero because this combination can be an impossible or irrational arrangement.

I use the example below in my post about how to interpret regression p-values and coefficients. The graph displays a regression model that assesses the relationship between height and weight. For this post, I modified the y-axis scale to illustrate the y-intercept, but the overall results haven’t changed.

If you extend the regression line downwards until you reach the point where it crosses the y-axis, you’ll find that the y-intercept value is negative!

In fact, the regression equation shows us that the negative intercept is -114.3. Using the traditional definition for the regression constant, if height is zero, the expected mean weight is -114.3 kilograms! Huh? Neither a zero height nor a negative weight makes any sense at all!

The negative y-intercept for this regression model has no real meaning, and you should not try attributing one to it.

You think that is a head scratcher? Try imagining a regression analysis with multiple independent variables. The more variables you have, the less likely it is that each and every one of them can equal zero simultaneously.

If the independent variables can’t all equal zero, or you get an impossible negative y-intercept, don’t interpret the value of the y-intercept!

## The Y-Intercept Might Be Outside of the Observed Data

I’ll stipulate that, in a few cases, it is possible for all independent variables to equal zero simultaneously. However, to have any chance of interpreting the constant, this all zero data point must be within the observation space of your dataset.

As a general statistical guideline, never make a prediction for a point that is outside the range of observed values that you used to fit the regression model. The relationship between the variables can change as you move outside the observed region—but you don’t know it changes because you don’t have *that* data!

This guideline comes into play here because the constant predicts the dependent variable for a particular point. If your data don’t include the all-zero data point, don’t believe the y-intercept.

I’ll use the height and weight regression example again to show you how this works. This model estimates its parameters using data from middle school girls whose heights and weights fall within a certain range. We should not trust this estimated relationship for values that fall outside the observed range. Fortunately, for this example, we can deduce that the relationship does change by using common sense.

I’ve indicated the mean height and weight for a newborn baby on the graph with a red circle. This height isn’t exactly zero, but it is as close as possible. By looking at the chart, it is evident that the actual relationship must change over the extended range!

The observed relationship is locally linear, but it must curve as it decreases below the observed values. Don’t predict outside the range of your data! This principle is an additional reason why the y-intercept might not be interpretable.

## The Constant Absorbs the Bias for the Regression Model

Now, let’s assume that all of the predictors in your model can reasonably equal zero *and* you specifically collect data in that area. You should be good to interpret the constant, right? Unfortunately, the y-intercept might still be garbage!

A portion of the estimation process for the y-intercept is based on the exclusion of relevant variables from the regression model. When you leave relevant variables out, this can produce bias in the model. Bias exists if the residuals have an overall positive or negative mean. In other words, the model tends to make predictions that are systematically too high or too low. The constant term prevents this overall bias by forcing the residual mean to equal zero.

Imagine that you can move the regression line up or down to the point where the residual mean equals zero. For example, if the regression produces residuals with a positive average, just move the line up until the mean equals zero. This process is how the constant ensures that the regression model satisfies the critical assumption that the residual average equals zero. However, this process does not focus on producing a y-intercept that is meaningful for your study area. Instead, it focuses entirely on providing that mean of zero.

The constant ensures the residuals don’t have an overall bias, but that might make it meaningless.

**Related post**: Seven Classical Assumptions of OLS Linear Regression

## Generally It Is Essential to Include the Constant in a Regression Model

The reason I just discussed explains why you should almost always have the constant in your regression model—it forces the residuals to have that crucial zero mean.

Furthermore, if you don’t include the constant in your regression model, you are actually setting the constant to equal zero. This action forces the regression line to go through the origin. In other words, a model that doesn’t include the constant requires all of the independent variables *and* the dependent variable to equal zero simultaneously.

If this isn’t correct for your study area, your regression model will exhibit bias without the constant. To illustrate this, I’ll use the height and weight example again, but this time I won’t include the constant. Below, there is only a height coefficient but no constant.

Now, I’ll draw a green line based on this equation on the previous graph. This comparison allows us to assess the regression model when we include and exclude the constant.

Clearly, the green line does not fit the data at all. Its slope is nowhere close to being correct, and its fitted values are biased.

When it comes to using and interpreting the constant in a regression model, you should almost always include the constant in your regression model even though it is almost never worth interpreting. The key benefit of regression analysis is determining how changes in the independent variables are associated with shifts in the dependent variable. Don’t think about the y-intercept too much!

If you’re learning regression, check out my Regression Tutorial!

If you’re learning regression and like the approach I use in my blog, check out my eBook!

**Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and updated it for my blog site.**

Andre says

Hi Jim,

Your explanations of difficult and often confusing statistical concepts are the very best I have come across so far. Simply great!

Cynthia Johnson says

Hi Jim, I’m have a problem interpreting r .085 and r square .007 would you said it’s a strong relationship? I’m learning

Jim Frost says

Hi Cynthia,

An R-squared of 0.007 represents a very weak relationship. For more information, read my post about Interpreting R-squared.

Help says

Hi,

i am have searched many of your articles to find the answer but I am still unsure. Why is it that OLS assumes the average mean of the error to be zero? And why does this cause a problem if it does not hold? I would be very grateful for any help/insight you could offer.

Many thanks!

Jim Frost says

Hi,

The answer is because you want your model’s predictions (i.e., fitted values) to be correct on average. A residual is the difference between the observed value and the value the model predicts. Residual = Observed value – Fitted value. If the average of the residuals equals zero, then your model’s predictions are correct on average. While the residual for any given observation might not equal zero, there’s an equal probability that the fitted value is too high or too low. In other words, your model isn’t systematically predicting too high or too low.

However, if the average of the residuals is a positive value, that indicates that overall the observed values are greater than the fitted values. In other words, your model systematically under-predicts the observed values. Conversely, if the average of the residuals is negative, your model systematically over-predicts the observed values.

For more information about this, read my post about OLS assumptions.

I hope this helps!

Saeideh says

Many thanks for your detailed explanation. Unfortunately I cannot still understand why we use multiple regression. As you mentioned, in simple regression we just consider one independent variable and in multiple regression, we have more than one independent variable but when we want to investigate the effect of one of them on y, we hold other IVs fix and infact we have again simple regression.

Why we don’t do it with several simple regression instead of multiple regression.

Thank you

Jim Frost says

Hi,

Ah, I see where the misunderstanding is. No, when you add additional variables to the model it provides more information about the other variables. Holding other variables constant definitely does not return you to simple regression because you’re learning about all the variables. Furthermore, if you perform simple regression and it turns out multiple variables are involved in the subject area, the coefficient for your single IV might be biased–which is why you wouldn’t want to use several simple regression models instead of one multiple regression model.

I have a post that explains this aspect in much more detail. Please read my post, When Should I Use Regression Analysis? this is a great introductory post that talks in more detail about regression, including the importance of holding other variables constant.

Additionally, see my post about omitted variable bias. It’s a bit more advanced, but it shows the potential bias I mentioned earlier. It provides an example of how a simple regression model was biased until I added another variable.

Those posts should answer your questions!

saeideh says

Hi

thank you for your useful explanation.

could you please describe the difference between simple and multiple regression? because in both of them we consider the other independent variables as a constant or fix. so the interpretation will be the same?

Jim Frost says

Hi,

Simple and multiple regression are really same the analysis. The only thing that changes is the number of independent variables (IVs) in the model. Simple regression indicates there is only one IV. Simple regression models are easy to graph because you can plot the dependent variable (DV) on the y-axis and the IV on the x-axis. Multiple regression simply indicates there are more than one IV in the model. It’s true, when you have multiple IVs, the coefficient represents the effect of one IV when holding the values of the other IV constant. In simple regression, because there is only one IV, there are no other IVs to hold constant. In either case, you interpret the coefficients the same way–the mean change in the DV associated with a 1 unit change in the DV.

I hope that helps!

GARVITA JHAMB says

hi!

do we check the p value given for constant?

Jim Frost says

Hi Garvita,

You can check the p-value for the constant. If it’s less than your significance level (e.g., 0.05), then you value of the constant is significantly different that zero. However, for all the reasons I cite in this post, you usually cannot interpret the constant. Consequently, knowing that it’s different than zero is doesn’t provide much information in most cases.

Dr. N. Rathankar says

Hi jim

In a regression model, i find that sum of the residuals = +1. and the r squared value is 0.64, which clearly says that there is 80% correlation.

1. what additional correction should i make to the model, so that the sum of the residuals = 0

2. if the residual sum is larger, does the correlation coefficient value fall down. i mean are these two inversely correlated

3. while calculating r, how do i infer causation between the two variables and is there any measure to infer causation

Jim Frost says

Hi,

Are you using ordinary least squares regression? And, are you including the constant in the model? Including the constant in the model should cause your residuals to sum to zero even when there are other problems with the model.

If you are fitting the constant, I’m not sure why they sum to 1. If you dependent variables is very large, perhaps 1 is very small in relation? I’m not sure.

I don’t understand your second question. But, if the sum of the residuals does not equal zero, it suggests that your model is biased.

There is no statistical measure that assesses causation. To learn why, read by post about causation versus correlation. This post also shows how you can assess causation.

Reni Kuruvila says

Can the result or output of a regression equation be a negative value. when i am using this to predict sales of a product?

Curtis says

Is it possible to purchase a hard copy of the book?

Harsh Chadha says

Hi Jim,

I wanted to ask a question which might not be directly related to the intercept in a regression model. What if the variation in my dependent variable (Y variable) is very low? In such cases, do we expect the intercept to capture most of the movement in Y?

Jim Frost says

Hi Harsh,

That might or might not be the case, which is true for any model. Let’s start with a more general case for all linear models and then work to the situation that you describe.

The F-test of overall significance determines whether your model predicts the DV better than a model that contains only the intercept. For any model where this test is not statistically significant, the intercept only model, which is the mean, predicts the Y variable as well as your model. In other words, your model does not account for the variability of the DV better than the mean (which is the y-intercept in an intercept-only model). So, what you describe is a possibility for any model and you can test for it with the overall test of significance.

Now, if the variability in the Y variable is very restricted, it’s going to be harder to use the variability in the IVs to explain the variability in the DV because there isn’t much of it. It’s harder for variables to co-vary when one does not vary much. However, it’s still possible to have a significant model in the situation you describe. To see if this condition applies to your model, simply check that F-test of overall significance!

Gandalf says

Hello Jim,

Very clarified explanations, But graphs in this article are not visible.

Jim Frost says

That’s very strange. It’s probably a temporary web connection issue. I’ve had that post up for several years now and haven’t heard others with that problem. Please try again, and hit refresh if necessary. Let me know how that goes.

Achala bhtaatraidhakal says

DEAR SIR,GOOD EVENING

IN MY STUDY I OBTAIN THIS TABLE, HOW I DESCRIBED MY FINDINGS

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) 136.002 4.361 31.189 .000

sex -1.545 2.251 -.043 -.687 .493

smoking history -5.728 2.533 -.143 -2.261 .024

a Dependent Variable: systolic BP2nd

Nikhil says

In the regression analysis, in case p-value for the intercept is higher than 0.05, should the intercept be still considered? Or the p-value of the intercept is immaterial?

Jim Frost says

Hi Nikhil,

Yes, I’d still include the intercept in the model. If the value of the constant is not significant but still not equal to zero, then forcing it through the origin will bias the coefficient estimates. Typically, you don’t need to interpret the constant or the p-value.

Stefan Brovis says

Hi Jim,

Let me reformulate, the exact question (past paper) is stated as follows:

In the standard regression model, the assumption is made that the effect of the

regressors is linear (through Xbeta), and that the disturbances affect the mean in an additive

fashion. This at first sight sounds like a severe limitation of generality. Why is the

limitation not as large as it seems at first sight?

I am guessing that it has to do with a limitation of the OLS assumption.

I am not sure myself whether i really understood the question. I thought perhaps you could make sense of it.

Jim Frost says

Hi Stefan,

The author’s question is pretty vague because he/she doesn’t explain why it appears like a severe limitation. But, I’ll take a stab at it.

The author describes a linear model. Linear models are a very restricted form of all possible regression models–which I describe in my post about the differences between linear and nonlinear models. And, there are various assumption about the residuals that must be met to produce valid results–which I describe in my post about OLS assumptions.

So, when you’re using linear models, you’re in a situation where the form of your model is very restricted and there are requirements for the properties of the residuals. It might seem that this combination of restrictions and requirements is problematic in terms of the general usability of linear regression to model relationships.

However, it’s not as severe as that may sound because there are various techniques for getting around both sets of limitations. You can use polynomials and other forms of variables to model curvature using linear regression. And various data transformations can resolve problems with the residuals.

That might be what the author is getting at but it’s hard to know for sure with such a general statement.

Stefan Brovis says

Hi Jim,

As i understand we assume linearity of the coefficients in the standard regression model as well as that the errors affect the mean in an additive way, which are in fact a limitation of generality but why does it not seem to be such a big issue at first sight?

Jim Frost says

Hi Stefan,

I don’t understand your question. Why do you think additive errors seem like a larger problem than they are?

Daniel says

How do we interprete a negative intercept/ constant. Do we interpret it as it is ex. The average weight of a children is -20kg when height is zero? Please clarify

Jim Frost says

Hi Daniel,

If you look in this post directly below the first fitted line plot, you’ll see that I discuss how there is no interpretation for this particular constant. The rest of the post details the various reasons why you usually can’t interpret the constant. For the height and weight example specifically, I’m sure the nature of the relationship between height and weight changes over the range of the independent variable (Height). Go down to the 2nd fitted line plot in this post, and right under that I discuss how that must be happening. But, we’re looking at a restricted range of heights and weights, and the model works for this range, but not outside the range–hence we can’t interpret the constant because it is outside the range of the data.

Hope that helps!

jean manirere says

Hello Jim, what if Y doesn’t change and X changes. Example: Y=2,2,2,2,2 and X=1,2,3,4,5 How is the Correlation ?

Jim Frost says

Hi Jean,

In order to be correlated, the two variables need to co-vary around their respective means. In other words, as one variables changes relative to its mean, the other variable tends to change in either the same or opposite direction relative to its mean. Because Y does not vary around it’s mean in your example, it’s impossible for them to have a non-zero correlation. Hence, the correlation is zero (or it probably produces an error because there is no variability at all in Y). Be sure to read my post about correlation!

Fabian Moodley says

how will you interpret a constant in a mean equation

that’s highly significant?

Kapil Agrawal says

Very clear and crisp explanation

Jim Frost says

Hi Kapil, I’m so happy to hear that it was easy to understand! That’s always my goal when writing blog posts.