The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the *meaning* of the y-intercept in your regression output.

Why is it difficult to interpret the constant term? Because, the y-intercept is almost always meaningless! Surprisingly, while the constant doesn’t usually have a meaning, it is almost always vital to include it in your regression models!

In this post, I will teach you all about the constant in regression analysis.

## The Definition of the Constant is Correct but Misleading

The constant is often defined as the mean of the dependent variable when you set all of the independent variables in your model to zero. In a purely mathematical sense, this definition is correct. Unfortunately, it’s frequently impossible to set all variables to zero because this combination can be an impossible or irrational arrangement.

I use the example below in my post about how to interpret regression p-values and coefficients. The graph displays a regression model that assesses the relationship between height and weight. For this post, I modified the y-axis scale to illustrate the y-intercept, but the overall results haven’t changed.

If you extend the regression line downwards until you reach the point where it crosses the y-axis, you’ll find that the y-intercept value is negative!

In fact, the regression equation shows us that the negative intercept is -114.3. Using the traditional definition for the regression constant, if height is zero, the expected mean weight is -114.3 kilograms! Huh? Neither a zero height nor a negative weight makes any sense at all!

The negative y-intercept for this regression model has no real meaning, and you should not try attributing one to it.

You think that is a head scratcher? Try imagining a regression analysis with multiple independent variables. The more variables you have, the less likely it is that each and every one of them can equal zero simultaneously.

If the independent variables can’t all equal zero, or you get an impossible negative y-intercept, don’t interpret the value of the y-intercept!

## The Y-Intercept Might Be Outside of the Observed Data

I’ll stipulate that, in a few cases, it is possible for all independent variables to equal zero simultaneously. However, to have any chance of interpreting the constant, this all zero data point must be within the observation space of your dataset.

As a general statistical guideline, never make a prediction for a point that is outside the range of observed values that you used to fit the regression model. The relationship between the variables can change as you move outside the observed region—but you don’t know it changes because you don’t have *that* data!

This guideline comes into play here because the constant predicts the dependent variable for a particular point. If your data don’t include the all-zero data point, don’t believe the y-intercept.

I’ll use the height and weight regression example again to show you how this works. This model estimates its parameters using data from middle school girls whose heights and weights fall within a certain range. We should not trust this estimated relationship for values that fall outside the observed range. Fortunately, for this example, we can deduce that the relationship does change by using common sense.

I’ve indicated the mean height and weight for a newborn baby on the graph with a red circle. This height isn’t exactly zero, but it is as close as possible. By looking at the chart, it is evident that the actual relationship must change over the extended range!

The observed relationship is locally linear, but it must curve as it decreases below the observed values. Don’t predict outside the range of your data! This principle is an additional reason why the y-intercept might not be interpretable.

## The Constant Absorbs the Bias for the Regression Model

Now, let’s assume that all of the predictors in your model can reasonably equal zero *and* you specifically collect data in that area. You should be good to interpret the constant, right? Unfortunately, the y-intercept might still be garbage!

A portion of the estimation process for the y-intercept is based on the exclusion of relevant variables from the regression model. When you leave relevant variables out, this can produce bias in the model. Bias exists if the residuals have an overall positive or negative mean. In other words, the model tends to make predictions that are systematically too high or too low. The constant term prevents this overall bias by forcing the residual mean to equal zero.

Imagine that you can move the regression line up or down to the point where the residual mean equals zero. For example, if the regression produces residuals with a positive average, just move the line up until the mean equals zero. This process is how the constant ensures that the regression model satisfies the critical assumption that the residual average equals zero. However, this process does not focus on producing a y-intercept that is meaningful for your study area. Instead, it focuses entirely on providing that mean of zero.

The constant ensures the residuals don’t have an overall bias, but that might make it meaningless.

**Related post**: Seven Classical Assumptions of OLS Linear Regression

## Generally It Is Essential to Include the Constant in a Regression Model

The reason I just discussed explains why you should almost always have the constant in your regression model—it forces the residuals to have that crucial zero mean.

Furthermore, if you don’t include the constant in your regression model, you are actually setting the constant to equal zero. This action forces the regression line to go through the origin. In other words, a model that doesn’t include the constant requires all of the independent variables *and* the dependent variable to equal zero simultaneously.

If this isn’t correct for your study area, your regression model will exhibit bias without the constant. To illustrate this, I’ll use the height and weight example again, but this time I won’t include the constant. Below, there is only a height coefficient but no constant.

Now, I’ll draw a green line based on this equation on the previous graph. This comparison allows us to assess the regression model when we include and exclude the constant.

Clearly, the green line does not fit the data at all. Its slope is nowhere close to being correct, and its fitted values are biased.

When it comes to using and interpreting the constant in a regression model, you should almost always include the constant in your regression model even though it is almost never worth interpreting. The key benefit of regression analysis is determining how changes in the independent variables are associated with shifts in the dependent variable. Don’t think about the y-intercept too much!

If you’re learning regression, check out my Regression Tutorial!

If you’re learning regression and like the approach I use in my blog, check out my eBook!

**Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and updated it for my blog site.**

Harsh Chadha says

Hi Jim,

I wanted to ask a question which might not be directly related to the intercept in a regression model. What if the variation in my dependent variable (Y variable) is very low? In such cases, do we expect the intercept to capture most of the movement in Y?

Jim Frost says

Hi Harsh,

That might or might not be the case, which is true for any model. Let’s start with a more general case for all linear models and then work to the situation that you describe.

The F-test of overall significance determines whether your model predicts the DV better than a model that contains only the intercept. For any model where this test is not statistically significant, the intercept only model, which is the mean, predicts the Y variable as well as your model. In other words, your model does not account for the variability of the DV better than the mean (which is the y-intercept in an intercept-only model). So, what you describe is a possibility for any model and you can test for it with the overall test of significance.

Now, if the variability in the Y variable is very restricted, it’s going to be harder to use the variability in the IVs to explain the variability in the DV because there isn’t much of it. It’s harder for variables to co-vary when one does not vary much. However, it’s still possible to have a significant model in the situation you describe. To see if this condition applies to your model, simply check that F-test of overall significance!

Gandalf says

Hello Jim,

Very clarified explanations, But graphs in this article are not visible.

Jim Frost says

That’s very strange. It’s probably a temporary web connection issue. I’ve had that post up for several years now and haven’t heard others with that problem. Please try again, and hit refresh if necessary. Let me know how that goes.

Achala bhtaatraidhakal says

DEAR SIR,GOOD EVENING

IN MY STUDY I OBTAIN THIS TABLE, HOW I DESCRIBED MY FINDINGS

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) 136.002 4.361 31.189 .000

sex -1.545 2.251 -.043 -.687 .493

smoking history -5.728 2.533 -.143 -2.261 .024

a Dependent Variable: systolic BP2nd

Nikhil says

In the regression analysis, in case p-value for the intercept is higher than 0.05, should the intercept be still considered? Or the p-value of the intercept is immaterial?

Jim Frost says

Hi Nikhil,

Yes, I’d still include the intercept in the model. If the value of the constant is not significant but still not equal to zero, then forcing it through the origin will bias the coefficient estimates. Typically, you don’t need to interpret the constant or the p-value.

Stefan Brovis says

Hi Jim,

Let me reformulate, the exact question (past paper) is stated as follows:

In the standard regression model, the assumption is made that the effect of the

regressors is linear (through Xbeta), and that the disturbances affect the mean in an additive

fashion. This at first sight sounds like a severe limitation of generality. Why is the

limitation not as large as it seems at first sight?

I am guessing that it has to do with a limitation of the OLS assumption.

I am not sure myself whether i really understood the question. I thought perhaps you could make sense of it.

Jim Frost says

Hi Stefan,

The author’s question is pretty vague because he/she doesn’t explain why it appears like a severe limitation. But, I’ll take a stab at it.

The author describes a linear model. Linear models are a very restricted form of all possible regression models–which I describe in my post about the differences between linear and nonlinear models. And, there are various assumption about the residuals that must be met to produce valid results–which I describe in my post about OLS assumptions.

So, when you’re using linear models, you’re in a situation where the form of your model is very restricted and there are requirements for the properties of the residuals. It might seem that this combination of restrictions and requirements is problematic in terms of the general usability of linear regression to model relationships.

However, it’s not as severe as that may sound because there are various techniques for getting around both sets of limitations. You can use polynomials and other forms of variables to model curvature using linear regression. And various data transformations can resolve problems with the residuals.

That might be what the author is getting at but it’s hard to know for sure with such a general statement.

Stefan Brovis says

Hi Jim,

As i understand we assume linearity of the coefficients in the standard regression model as well as that the errors affect the mean in an additive way, which are in fact a limitation of generality but why does it not seem to be such a big issue at first sight?

Jim Frost says

Hi Stefan,

I don’t understand your question. Why do you think additive errors seem like a larger problem than they are?

Daniel says

How do we interprete a negative intercept/ constant. Do we interpret it as it is ex. The average weight of a children is -20kg when height is zero? Please clarify

Jim Frost says

Hi Daniel,

If you look in this post directly below the first fitted line plot, you’ll see that I discuss how there is no interpretation for this particular constant. The rest of the post details the various reasons why you usually can’t interpret the constant. For the height and weight example specifically, I’m sure the nature of the relationship between height and weight changes over the range of the independent variable (Height). Go down to the 2nd fitted line plot in this post, and right under that I discuss how that must be happening. But, we’re looking at a restricted range of heights and weights, and the model works for this range, but not outside the range–hence we can’t interpret the constant because it is outside the range of the data.

Hope that helps!

jean manirere says

Hello Jim, what if Y doesn’t change and X changes. Example: Y=2,2,2,2,2 and X=1,2,3,4,5 How is the Correlation ?

Jim Frost says

Hi Jean,

In order to be correlated, the two variables need to co-vary around their respective means. In other words, as one variables changes relative to its mean, the other variable tends to change in either the same or opposite direction relative to its mean. Because Y does not vary around it’s mean in your example, it’s impossible for them to have a non-zero correlation. Hence, the correlation is zero (or it probably produces an error because there is no variability at all in Y). Be sure to read my post about correlation!

Fabian Moodley says

how will you interpret a constant in a mean equation

that’s highly significant?

Kapil Agrawal says

Very clear and crisp explanation

Jim Frost says

Hi Kapil, I’m so happy to hear that it was easy to understand! That’s always my goal when writing blog posts.