How to Interpret the Constant (Y Intercept) in Regression Analysis

By Jim Frost 95 Comments

The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the meaning of the y-intercept in your regression output.

Why is it difficult to interpret the constant term? Because the y-intercept is almost always meaningless! Surprisingly, while the constant doesn’t usually have a meaning, it is almost always vital to include it in your regression models!

In this post, I will teach you all about the constant in regression analysis.

Linear regression uses the Slope Intercept Form of a Linear Equation. Click the link for a refresher!

The Definition of the Constant is Correct but Misleading

The constant is often defined as the mean of the dependent variable when you set all of the independent variables in your model to zero. In a purely mathematical sense, this definition is correct. Unfortunately, it’s frequently impossible to set all variables to zero because this combination can be an impossible or irrational arrangement.

I use the example below in my post about how to interpret regression p-values and coefficients. The graph displays a regression model that assesses the relationship between height and weight. For this post, I modified the y-axis scale to illustrate the y-intercept, but the overall results haven’t changed.

If you extend the regression line downwards until you reach the point where it crosses the y-axis, you’ll find that the y-intercept value is negative!

In fact, the regression equation shows us that the negative intercept is -114.3. Using the traditional definition for the regression constant, if height is zero, the expected mean weight is -114.3 kilograms! Huh? Neither a zero height nor a negative weight makes any sense at all!

The negative y-intercept for this regression model has no real meaning, and you should not try attributing one to it.

You think that is a head scratcher? Try imagining a regression analysis with multiple independent variables. The more variables you have, the less likely it is that each and every one of them can equal zero simultaneously.

If the independent variables can’t all equal zero, or you get an impossible negative y-intercept, don’t interpret the value of the y-intercept!

The Y-Intercept Might Be Outside of the Observed Data

I’ll stipulate that, in a few cases, it is possible for all independent variables to equal zero simultaneously. However, to have any chance of interpreting the constant, this all zero data point must be within the observation space of your dataset.

As a general statistical guideline, never make a prediction for a point that is outside the range of observed values that you used to fit the regression model. The relationship between the variables can change as you move outside the observed region—but you don’t know it changes because you don’t have that data!

This guideline comes into play here because the constant predicts the dependent variable for a particular point. If your data don’t include the all-zero data point, don’t believe the y-intercept.

I’ll use the height and weight regression example again to show you how this works. This model estimates its parameters using data from middle school girls whose heights and weights fall within a certain range. We should not trust this estimated relationship for values that fall outside the observed range. Fortunately, for this example, we can deduce that the relationship does change by using common sense.

I’ve indicated the mean height and weight for a newborn baby on the graph with a red circle. This height isn’t exactly zero, but it is as close as possible. By looking at the chart, it is evident that the actual relationship must change over the extended range!

The observed relationship is locally linear, but it must curve as it decreases below the observed values. Don’t predict outside the range of your data! This principle is an additional reason why the y-intercept might not be interpretable.

The Constant Absorbs the Bias for the Regression Model

Now, let’s assume that all of the predictors in your model can reasonably equal zero and you specifically collect data in that area. You should be good to interpret the constant, right? Unfortunately, the y-intercept might still be garbage!

A portion of the estimation process for the y-intercept is based on the exclusion of relevant variables from the regression model. When you leave relevant variables out, this can produce bias in the model. Bias exists if the residuals have an overall positive or negative mean. In other words, the model tends to make predictions that are systematically too high or too low. The constant term prevents this overall bias by forcing the residual mean to equal zero.

Imagine that you can move the regression line up or down to the point where the residual mean equals zero. For example, if the regression produces residuals with a positive average, just move the line up until the mean equals zero. This process is how the constant ensures that the regression model satisfies the critical assumption that the residual average equals zero. However, this process does not focus on producing a y-intercept that is meaningful for your study area. Instead, it focuses entirely on providing that mean of zero.

The constant ensures the residuals don’t have an overall bias, but that might make it meaningless.

Generally It Is Essential to Include the Constant in a Regression Model

The reason I just discussed explains why you should almost always have the constant in your regression model—it forces the residuals to have that crucial zero mean.

Furthermore, if you don’t include the constant in your regression model, you are actually setting the constant to equal zero. This action forces the regression line to go through the origin. In other words, a model that doesn’t include the constant requires all of the independent variables and the dependent variable to equal zero simultaneously.

If this isn’t correct for your study area, your regression model will exhibit bias without the constant. To illustrate this, I’ll use the height and weight example again, but this time I won’t include the constant. Below, there is only a height coefficient but no constant.

Now, I’ll draw a green line based on this equation on the previous graph. This comparison allows us to assess the regression model when we include and exclude the constant.

Clearly, the green line does not fit the data at all. Its slope is nowhere close to being correct, and its fitted values are biased.

When it comes to using and interpreting the constant in a regression model, you should almost always include the constant in your regression model even though it is almost never worth interpreting. The key benefit of regression analysis is determining how changes in the independent variables are associated with shifts in the dependent variable. Don’t think about the y-intercept too much!

To learn how least squares regression calculates the coefficients and y-intercept with a worked example, read my post Least Squares Regression: Definition, Formulas & Example.

If you’re learning regression and like the approach I use in my blog, check out my Intuitive Guide to Regression Analysis book! You can find it on Amazon and other retailers.

Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and updated it for my blog site.

Comments

Gabriel Intriago says

October 24, 2023 at 4:31 pm

Hi Jim,

I have your book! In which chapter/section of your book do you touch about the relation between categorical independent variables and intercept?

Loading...

Reply
Sithembile Shezi says

September 26, 2023 at 3:56 pm

Hi Jim
can you please interpret and explain what it means if I get:
Multiple R = 0,13
R square = 0,02
F significance = 0,0009
Pvalue = 0, X variable1 = 0,0009
Coefficient intercept = 81,27
X variable 1 = 0,44

Thank you

Loading...

Reply
[email protected] says

June 21, 2023 at 1:49 pm

It seems intutitve to me that in a regression analysis that any independent variable that has a significance less than the significance of the constant should be dropped. Is this correct?

Loading...

Reply
- Jim Frost says
  
  June 21, 2023 at 8:18 pm
  
  Hi,
  
  You’re wording was a bit vague. But I’ll assume the scenario you’re referring to is that your model has an IV with a p-value greater than than your significance level, making it statistically insignificant. In this case, it might seem like you should remove it but that’s not always the case.
  
  To answer that question, you need to understand whether there’s a strong theoretical reason to include the IV. If theory suggests that it should be in the model, you might leave it in the model even when it is not significant. Leaving an insignificant variable in the model typically doesn’t harm the model. However, removing a variable that should be in the model even if it is not significant can bias the other IVs. That’s called omitted variable bias. In those cases, it can be better to leave the insignificant variable in the model. Of course, you’d explain all that in your write up.
  
  In short, leaving an extra variable in the model is less problematic than removing a variable that should be included, even if not significant.
  
  Top things to consider are theoretical reasons for retaining the variable even if it is insignificant and whether removing the insignificant IV notably affects the coefficient estimates of the other variables.
  
  So, I always say “consider” removing insignificant IVs from the model. Saying you “should” is too strong.
  
  As for the constant specifically, you almost never want to remove the constant from the model regardless of the p-value or theory. The bias potential for removing the constant is extraordinarily high! Don’t do that!
  
  Finally, you mention comparing the significance of IVs to the constant. Don’t do that either! The significance of the constant has nothing to do with keeping or removing any of the IVs!
  
  Loading...
  
  Reply
Aaqib says

December 12, 2021 at 11:30 am

I have a question Sir, how do regressors have zero sample correlation with the residuals in OLS

Loading...

Reply
- Jim Frost says
  
  December 12, 2021 at 11:32 pm
  
  Hi, I write about the assumptions of linear regression, including the one you mention in this post: Classical Assumptions of Linear Regression. For your question, focus on assumption #3 in that post.
  
  Loading...
  
  Reply
Aaqib says

December 12, 2021 at 11:27 am

Thanks a lot Sir,the points were very lucidly explained even though I’m new to statistics. Helped me a lot. Thanks again.

Loading...

Reply
Mitzi Frances Litan says

November 18, 2021 at 10:28 pm

Hi Jim!
What if the x1-x3 values are less than 0.05 however the value of y intercept is higher than 0.05? What does that mean? Thank you!

Loading...

Reply
- Jim Frost says
  
  November 21, 2021 at 8:18 pm
  
  Hi Mitzi,
  
  That just means that you cant’s say your constant is different from zero, which is not a big deal because you usually can’t interpret the constant anyway. Your IVs are significant, which is a good thing! And the non-significant constant doesn’t affect that.
  
  Loading...
  
  Reply
Chirantha says

October 13, 2021 at 1:51 pm

I’m having the same issue as agus. Thanks Mr Jim frost for your answer. Do you have any documents to refer about this issue.

Loading...

Reply
- Jim Frost says
  
  October 13, 2021 at 4:49 pm
  
  Hi Chirantha,
  
  I suppose you mean besides my own article and book? That’s a basic interpretation issue. I believe any regression textbook will discuss this interpretation. My go to textbook for this area is Applied Linear Statistical Models by Neter et al.
  
  Loading...
  
  Reply
agus salem says

September 18, 2021 at 12:32 am

Hi mr Jim , I wanna ask question, I have run my multiple linear regression analysis via spss , and the best model is :

constant : 5,7 sig : 0,557
Hypertension: 5,87 sig: 0,005
Head volume: 4,89 sig: 0,04
with dependent variable: MMSE score

from the result above the Constant / y intercept is not statistically significant (p>0,005) but the independent variable in the model both of them are statistically significant

How can I interpret this result and how can I draw the conclusion ?

I’m still confusing about this

Thank you and I appreciate with your explanation above

Loading...

Reply
- Jim Frost says
  
  September 19, 2021 at 12:37 am
  
  Hi Agus,
  
  There’s no conflict here. The insignificant p-value for the constant just means that the constant for your model is not significantly different from zero. But that doesn’t really mean much in most cases. And, it doesn’t affect how you interpret the coefficients.
  
  As I mention in this article, you don’t need to worry about the value of the constant in most cases.
  
  Loading...
  
  Reply
Saumya Gupta says

May 25, 2021 at 3:32 am

This is a great post.

Jim, I had another question in mind. What do we usually do in case of negative parameter estimates for predictors where they should show positive contribution, for example media spends for sales. This is when VIF for each of these predictors in the model show no multicollinearity phenomenon and variables are standardized (mean removed, scaled to unit variance).
I don’t wanna remove them, how to interpret such impacts.

I’m new to this, sorry for being silly, if I was !

Loading...

Reply
- Jim Frost says
  
  May 26, 2021 at 4:42 pm
  
  Hi Saumya,
  
  It’s definitely not a silly question. There are several reasons why this might be occurring and you really need to determine why it’s occurring. Unfortunately, I can’t tell you what is causing your unexpected signs, but I’ll point you toward two main culprits that you can investigate.
  
  Omitted variable bias/confounders: Leaving out important variables out of your model can bias the variables that you include.
  Overfitting: If you include too many terms in your model given the number of observations, you can get strange results.
  
  I would have also mentioned multicollinearity, because that can cause signs to flip, but you’ve indicated that’s not a problem.
  
  Also, I’d highly recommend picking up my regression analysis book, which you can find in my web store. In it, I cover many aspect of regression analysis in detail!
  
  Loading...
  
  Reply
Lucia says

May 14, 2021 at 6:41 pm

Does this also apply if the intercept has a p-value that is significant? I’m doing a linear regression and getting a p-value of <2e-16 for both the intercept and the independent variable. Should I subtract the p-value of the independent variable from the p-value for the intercept? The correlation doesn't look very significant when I make a scatterplot for the two variables

Loading...

Reply
- Jim Frost says
  
  May 14, 2021 at 11:50 pm
  
  Hi Lucia,
  
  Yes, even when the intercept has a significant p-value, there’s usually no real interpretation for it.
  
  No, you don’t want to subtract p-values like that! Each p-value applies to a specific term in the model and they’re independent of each other. It sounds like your intercept is significant (which doesn’t really mean anything) and the independent variable is significant (which is important).
  
  As for why the correlation doesn’t look significant, there are several possibilities. One, it could be the scaling you’re using for the graph that makes it look flat. Second, if you have a large enough sample size, even very low correlations can be significant. The significant just means that your sample evidence is strong enough to conclude that a relationship is likely to exist in the population. Significance doesn’t necessarily indicate that it is a practically important effect. So, it’s definitely possible that you have a significant relationship but it’s not very meaningful in the real world! You’ll need use subject area knowledge and look at the coefficient for the IV and see if it suggests a meaningful effect or not.
  
  I hope that helps!
  
  Loading...
  
  Reply
Erick Carvalho Campos says

May 12, 2021 at 9:01 am

That explanation was simply beautiful! Thank you Jim!

Loading...

Reply
- Jim Frost says
  
  May 15, 2021 at 11:16 pm
  
  You’re very welcome, Erick!
  
  Loading...
  
  Reply
Michael says

May 11, 2021 at 6:18 pm

I’m doing a logistic regression analysis. My Y is a binary dependent variable. My independent variables are significant. However, the intercept is NOT significant. Is this okay?

Loading...

Reply
- Jim Frost says
  
  May 12, 2021 at 12:27 am
  
  Hi Michael, that’s entirely fine. The constant usually doesn’t have any meaning for reasons I state in this article. That’s even more the case for logistic regression! So, whether it is significant or not is usually completely unimportant.
  
  Loading...
  
  Reply
Sumayya Kamal says

April 28, 2021 at 8:12 am

Hi there, it was the best article I found across the net! Huge thanks!
But I didn’t understand what shall we do with an intercept that goes outside the observed data especially in Multiple Linear Regression? Shall it be ignored or still can be put inside the equation y=b0+m1x1+m2x2+..?

Loading...

Reply
- Jim Frost says
  
  April 28, 2021 at 10:22 pm
  
  Hi Sumayya,
  
  Basically, yes, you’d just ignore it! It serves a function in producing residuals that have a mean of zero but there’s no interpretation for itself! You’d still include the constant in the equation. And, if you’re using the equation to calculate a predicted value, you’d need to use the constant for that purpose. But, the constant itself really has no interpretation in the situation you describe.
  
  Loading...
  
  Reply
Gabriel Setiawan says

March 22, 2021 at 3:00 am

Hi Jim, first of all, thank you for your explanation, it was very understandable. I have one problem that maybe you can help me sort it out.
In my SPSS regression result, my (X) variable is statistically significant, while the constant is not. From your explanation, I know that this is not a problem, since it is just constant. However, the problem is that my supervisor requires me to explain why it’s not significant. Can you please help me solve this?
Note: both (X) and (Y) variables are using likert scale, so they have no real 0 value. From what I’ve read in several websites, I suppose this could be the reason why my constant is not significant (the value of constant won’t be significant since my variables can’t reach 0 value).

Loading...

Reply
- Jim Frost says
  
  March 23, 2021 at 3:33 pm
  
  Hi Gabriel,
  
  The strict technical reason is that whatever value of your constant is, the difference between it and zero is not statistically significant. Your data contain insufficient evidence to suggest that the constant for the population does not equal zero. This result suggests that when all your IVs equal zero, the mean of the DV also equals zero.
  
  However, that’s no big deal either way in most cases. As I show, there are many reason to not trust the value of the constant. And, even if you did trust it, if often does not matter whether it equals zero or not when the IVs are all zero. In other words, there are very few cases where you both trust the value of the constant and care if it’s different from zero!
  
  Because you don’t have zero an all zero data point (for the Y and all the Xs), it is outside your observation space. You have no reason at all to trust the constant. It’s only purpose in your model is prevent bias, as I show in the post.
  
  Finally, if you’re using Likert scale data for the DV, should be using ordinal logistic regression to model it. And, for the IVs, you’ll need to include them either as continuous or categorical data. Ordinal IVs (such as Likert items) do present some difficulties in analyzing.
  
  I hope that helps!
  
  Loading...
  
  Reply
Joseph Lombardi says

January 28, 2021 at 2:41 pm

Jim, thanks for setting me straight. This is great stuff.

Loading...

Reply
Joseph Lombardi says

January 27, 2021 at 10:43 am

“If you center all your continuous variables, then the constant equals the mean DV value when all IVs are at their means. That combination will be in the range of your dataset. However, you still have to worry about the bias.”

I just finished (re)reading your post on centering the data: https://statisticsbyjim.com/regression/standardize-variables-regression/

Yeah, I think I’m going to rerun all my models after converting all the (continuous) IVs to their Z-scores. (And it might even reduce some uncomfortably high VIFs I’m seeing.)

Loading...

Reply
- Jim Frost says
  
  January 27, 2021 at 9:56 pm
  
  Hi Joe,
  
  I’d recommend centering (just subtracting the means) rather than standardizing (subtracting mean and dividing by the standard deviation, i.e., z-scores). When you center your continuous variables, the interpretation of the coefficients remains the same and you get the same benefits when it comes to reducing multicollinearity. Note that centering/standardizing only helps reduce structural multicollinearity, which is occurs when you have polynomials or interaction terms. However, anytime you include those types of terms, I do recommend centering your variables.
  
  Loading...
  
  Reply
Joseph Lombardi says

January 26, 2021 at 10:58 am

Thanks, Jim, for the reply.

I always have to keep reminding myself that the Y-intercept is there to absorb any bias, even if it makes sense for the IV and the DV to be zero simultaneously. But I was still curious, from a purely theoretical POV, about the Confidence Interval of the constant. Regardless whether zero falls within the CI, it should be included. And regarding bias, I find using the AVERAGE and SLOPE functions on the Predicted-DV and Error columns will point out trouble right away (although heteroscedasticity isn’t apparent until I run a plot).

Loading...

Reply
- Jim Frost says
  
  January 26, 2021 at 9:22 pm
  
  Hi Joe,
  
  The CI will have a similar bias as the estimate of the constant. For example, if the constant is biased high, then the CI will also be biased high.
  
  In theory, if there was no bias in the constant and the origin is included in your data, then the constant, it’s p-value, and the CI all work together to tell a tale, just like they do for coefficient estimates. The p-value tells you whether the estimate of the constant is significantly different from zero. If you have a significant p-value at the 0.05 significance level, then the CI will also exclude zero. Altogether, this condition indicates that your sample provides sufficient evidence to conclude that the population value of the constant for your model does not equal zero. The CI will indicate the range it is likely to fall within.
  
  But, that’s all in theory. In practice, it’s usually difficult to pull meaning from the constant. If you center all your continuous variables, then the constant equals the mean DV value when all IVs are at their means. That combination will be in the range of your dataset. However, you still have to worry about the bias.
  
  Loading...
  
  Reply
Joseph Lombardi says

January 11, 2021 at 4:16 pm

Hey, Jim. Happy New Year!

I just reread pp. 64-73 of your book “Regression Analysis: An Intuitive Guide” on the Y-Intercept, and I have a(nother) question: Can you tell us about the p-value and Standard Error associated with that Constant?

Can we draw similar conclusions about it that we might make with the p-values and SE of the coefficients of the IVs? If the value of the constant plus or minus 2SE includes the zero value, can we say the Constant is not statistically significant and, conversely, if it doesn’t include zero, that it is?

Cheers,
Joe

Loading...

Reply
- Jim Frost says
  
  January 12, 2021 at 5:08 pm
  
  Hi Joe,
  
  Technically, what you write is correct. If the constant is statistically significant, you can reject the null hypothesis that the constant equals zero. Similarly, when the constant is statistically significant, its confidence interval will exclude zero.
  
  However, all the reasons I state for why it’s hard to interpret the value of the constant itself also makes it difficult to interpret those other statistics related to the constant. The constant might be significant and you’ll conclude that it doesn’t equal zero. But, does the origin fall within your range of data? Are you sure the constant isn’t absorbing bias? It’s just much more difficult to interpret the constant and draw conclusions about it.
  
  That’s why I don’t spend time on those statistics in the context of the constant. But, in the rare cases where it is valid to interpret them, you use the same principals as you do for the coefficient estimates.
  
  Loading...
  
  Reply
Joseph Lombardi says

December 3, 2020 at 12:31 pm

Which makes this sentence from your blog post above even more meaningful: The constant term prevents this overall bias by forcing the residual mean to equal zero.

Thanks a ton!

Loading...

Reply
Joseph Lombardi says

December 2, 2020 at 12:33 pm

Hey, Jim. I just reread the section of your book “Regression Analysis: An Intuitive Guide…” on the regression constant / Y-Intercept. I must say, you have a few funny lines in there.

I am glad I found this blog post, b/c I have a question about a simple model with one continuous IV. When I force the regression line through the origin, the R-square improves dramatically, but the Standard Error gets worse slightly. Does that make ANY sense at all to you? It is worth mentioning that I understand my subject data pretty well, and it makes sense that the IV and DV would be zero simultaneously, which is why I even bothered to run the regression twice.

Loading...

Reply
- Jim Frost says
  
  December 3, 2020 at 1:49 am
  
  Hi Joseph,
  
  I’m glad my humor is OK! I try to lighten things up a bit with humor. Although, according to my daughter, my sense of humor is terrible. So, I don’t include too much! Thanks so much for supporting my book. I really appreciate it!
  
  Yes, what you describe does make sense. When you don’t include the constant in the model (force it through the origin), R-squared changes. With the constant, R-squared is the amount of variability around the dependent variable’s mean for which the model accounts. Without the constant, it’s the variability around zero. Consequently, you can’t even compare R-squared values between models with and without the constant because they’re measuring different things. In other words, the higher R-squared you’re seeing without the constant doesn’t mean it’s a better model.
  
  Even when you’d expect the the DV and IV to equal zero together, I’d recommend including the constant. When you expect it to go through the origin for theoretical reasons, it might not quite go through the origin in your sample because of random sampling error. If that’s the case, it’ll bias your coefficient estimates. You really don’t lose anything by including the constant in your model.
  
  Loading...
  
  Reply
pheej says

October 16, 2020 at 2:53 pm

if my intercept is 129 a positive one how do i interpret it?

Loading...

Reply
- Jim Frost says
  
  October 16, 2020 at 3:20 pm
  
  Hi Pheej,
  
  You’re looking at the correct article to answer your question. Read the section near the beginning titled, “The Definition of the Constant is Correct but Misleading.” You’ll find your answer in that section. It doesn’t make sense for me to retype it here in the comment but all the details are there! If anything is still not clear, please don’t hesitate to ask!
  
  Loading...
  
  Reply
John says

October 13, 2020 at 10:52 am

So what does the intercept mean when you have an interaction? I thought the intercept was when all independent terms are 0. Below I have a regression with an interaction of height x education to predict earnings. When I just include the main effects, the intercept looks like what I expect (-85294), but when I add the interaction it is 38433. What’s going on?
Thanks,
John

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -85294.7 9239.3 -9.232 |t|)
(Intercept) 38433.60 48668.60 0.790 0.42986
height -744.22 726.70 -1.024 0.30599
ed -6699.50 3581.35 -1.871 0.06164 .
height:ed 138.16 53.36 2.589 0.00974 **

Loading...

Reply
- Jim Frost says
  
  October 13, 2020 at 12:45 pm
  
  Hi John,
  
  Yes, that’s correct in the mathematical sense. However, as I point out, there are many reason to not trust the value of the constant.
  
  When you have an interaction term, you need to think about what makes that term equal zero. That determination depends on the types of variables in the interaction term, continuous, categorical, or both.
  
  If you have two continuous variables, the both need to equal zero because 0*0 = 0. Technically, this term equals zero if only one variable is zero. But, if the other variable isn’t zero, then that doesn’t satisfy the all zero condition.
  
  If you have categorical variables, then both variables are at their baseline/reference value. Whenever you include a categorical variable, one level needs to be removed and it becomes the baseline value. I don’t have a post about how this works (i.e., the recoding for categorical variables), but I discuss it in my regression ebook. Changing the baseline value will change the value of the constant, whether you have an interaction term or not.
  
  Finally, if you have a continuous and categorical variable, the continuous variable must equal zero and the categorical variable must be at its baseline value.
  
  That was background information for what the constant represents when you have interaction terms.
  
  For your case, it’s not surprising for the estimates to change when you change the terms in the model. On the face of it, that’s not unusual at all. I’m not sure the units for the dependent variable but that does seem to be a fairly large change. However, it’s possible that without the interaction term, the constant might be very biased.
  
  Again, as I point out in the blog post, there are many reasons to not trust and try to interpret the constant. As a general rule, don’t evaluate the quality of the model by whether the constant equals the value you expect. There are a few exceptions to that rule, but in general don’t read much into the constant’s value.
  
  As you fit the model, be sure to check the residual plots and compare the coefficients to what you expect theoretically–not including the constant. In your case, I’d create an interaction plot and see if what it depicts also matches theoretical expectations. If everything else suggests including the interaction term, don’t let the change in the constant prevent you from including the interaction. The value of the constant usually isn’t reliable enough to use to assess the model’s quality.
  
  I hope this helps!
  
  Loading...
  
  Reply
Arman says

August 24, 2020 at 1:57 pm

Thank you, Jim! This answer is exactly what I was looking for on the Internet. Glad someone else (Fabiha) has asked about it already and you answered. Saved me time!

Loading...

Reply
nishantha arunapriya says

August 22, 2020 at 1:14 pm

is there intercept essential for regression equation?

Loading...

Reply
- Jim Frost says
  
  August 24, 2020 at 12:39 am
  
  Hi Nishantha, as I write in this post, you can fit a regression model with out the intercept, but doing so is almost always a bad idea.
  
  Loading...
  
  Reply
Ali says

July 30, 2020 at 11:21 am

Thank you for the helpful tips.
Would appreciate if you can help interpret the coefficient table result of multiple regression analysis (Dependent variable/Constant/y intercept -0.124, Sig 0.538 but all my 4 independent variables are significant and has positive B values.

Loading...

Reply
- Jim Frost says
  
  July 31, 2020 at 2:43 pm
  
  Hi Ali,
  
  I’m not sure what you need to know? If your constant is -0.124, that indicates that when all your IVs are zero (or at the mean if you center all continuous variables), the predicted mean DV equals -0.124. However, that value is not significantly different from zero.
  
  However, I hope you’ve read this article and realize that interpreting the constant is often not wise!
  
  Finally, having an insignificant constant is not a problem–in case that’s what you’re wondering about. It’s far more important that your IVs are significant.
  
  I hope that helps!
  
  Loading...
  
  Reply
PF Duralwes says

July 1, 2020 at 9:47 am

Hi Jim. If I’m not mistaken you are speaking exclusively about continuous independent variables in this article, as the interpretation (of coefficients and p-values, especially for the intercept) changes when you are dealing with categorical variables. Do you have any recommendations here? I am having very little luck in my online search thus far…

Loading...

Reply
- Jim Frost says
  
  July 1, 2020 at 11:00 pm
  
  Hi PF,
  
  Yes, that’s true. I’m talking about continuous independent variables. I don’t really have a post about categorical variables and the items you mention. However, I do write about categorical independent variables in detail in my regression analysis ebook. I cover how regression handles this type of variable, interpreting the coefficients and p-values, etc.
  
  Loading...
  
  Reply
Kate says

April 21, 2020 at 10:35 am

Hi Jim,

This post was so helpful! If I am presenting standardized OLS estimates, should I omit the constant from my results table since it is equal to zero? In the journal that I’m interested in submitting to, I’ve seen some tables where the constant is still include although they are presenting standardized coefficients. I’m assuming they just pulled those from the unstandardized output. Is that okay or is that actually wrong?

Thank you so much!
Kate

Loading...

Reply
- Jim Frost says
  
  April 23, 2020 at 1:07 am
  
  Hi Kate,
  
  First and foremost, I’d follow the standard practices in the journal. However, generally, I prefer to include all the results. If you standardize your IVs, you still obtain a constant, however its interpretation changes. Usually, the constants if the mean dependent value when all IVs are equal to zero. Of course, I outline many reasons in this post why don’t usually interpret the constant, which might the reason it’s not usually reported.
  
  However, when you standardize your IVs to get the standardized coefficients, you still obtain a constant but now it represents the mean dependent value when all IVs are at their means. That can actually be a bit useful in some situations.
  
  When you say that your constant is zero, do you mean it has a higher p-value. It’s not exactly zero right?
  
  But, in the end, I’d follow the standards of the journal.
  
  Loading...
  
  Reply
Fabiha says

April 16, 2020 at 5:00 pm

hi jim
if my y intercept is grater than p value 0.05 should it have any effect on it and what will be the significant level ?

Loading...

Reply
- Jim Frost says
  
  April 16, 2020 at 10:34 pm
  
  Hi Fabiha,
  
  Typically, you don’t need to worry about the significance, or lack thereof, for the constant. When you have a constant that is not statistically significant, it just indicates that you have insufficient evidence to conclude that it is different from zero. However, there many reasons not interpret the constant as I discuss in this post.
  
  Loading...
  
  Reply
Ksenia says

April 9, 2020 at 6:26 pm

Hi Jim,

My constant has a very low p-value which is below significance level of (ie. 0.05). Which means we have a 99% confidence of rejecting a null hypothesis. Thus it must mean that constant has an effect on my dependent variable (Trade). I am just unsure if a constant can possibly be significant in bilateral trade model.

Loading...

Reply
- Jim Frost says
  
  April 10, 2020 at 7:39 pm
  
  Hi Ksenia,
  
  The constant doesn’t really have an effect. It’s a estimate of the dependent variable mean when all independent variables equal zero. A significant p-value for the constant simply indicates that you have sufficient evidence to conclude that the constant doesn’t equal zero.
  
  However, be sure to note all the warnings I include throughout this post about interpreting the constant. It usually doesn’t have a meaningful interpretation for various reasons. A significant p-value does not indicate that you can interpret the constant in a meaningful way. Instead, if any of the problems I mention apply to your model, not only is the constant potentially biased, but your p-value is invalid.
  
  Unless you have some strong need to interpret the constant, I wouldn’t spend much time thinking about it.
  
  Loading...
  
  Reply
Bhoomika Batra says

February 20, 2020 at 10:56 pm

I have 1 dv and many IDv including dummy variables. These variables came out of literature review. I don’t understand that why my dv and IDV are not correlated when many studies have been taken place using same methodology which is the reason my r square is just 11%

Loading...

Reply
Djawed says

December 31, 2019 at 5:16 am

Dear Jim Frost ? please how to prove that the y-intercept must not be significantly different from zero ?
Thanks in advance

Loading...

Reply
- Jim Frost says
  
  December 31, 2019 at 3:34 pm
  
  Hi Djawed,
  
  I’m not sure that I understand your question entirely.
  
  If you’re asking whether the y-intercept must be different from zero, that’s not typically true. Usually, it’s ok if your intercept does not equal zero. In fact, as I discuss in this blog post, the value of the intercept is usually meaningless for a variety of reasons.
  
  There might be specialized applications where you’d theoretically expect the intercept to equal zero. In those cases, you’d need to check the p-value for the intercept. If the p-value is less than your significance level (e.g., 0.05), then your intercept is significantly different from zero. If your p-value is greater than the significance level, the intercept is not significantly different from zero.
  
  Please note that even when you’d expect the constant to equal zero, given all the issues that I describe in this post, it might not equal zero even when you have a good model. Look at the height and weight model for an example of where you’d expect the intercept to be zero but it legitimately is not zero for the data collected.
  
  I hope this helps!
  
  Loading...
  
  Reply
Peeyush says

December 30, 2019 at 10:01 pm

Yes, problem solved. Thank you sir.

Loading...

Reply
peeyush bangur says

December 30, 2019 at 4:43 pm

Dear Jim,
In my regression eqution I have 1 dependent variable and 1 independent variable. The p value of constant is 0.8863. does it impact the result.

Loading...

Reply
- Jim Frost says
  
  December 30, 2019 at 4:51 pm
  
  Hi Peeyush,
  
  Typically, we don’t interpret the constant or its p-value. There might be specialized cases where it is both important and valid to do so, but that’s not typical. So, unless it has special meaning for your analysis, I wouldn’t worry about it.
  
  Technically, the high p-value indicates that your constant is not significantly different from zero. However, as I describe in this post, there are many reasons why you can’t trust the value of the constant. Consequently, knowing whether this untrustworthy value is significantly different from zero is usually meaningless.
  
  If the p-value for your independent variable is significant and the residual plots look good, then you can feel confident that there is a statistically significant relationship between your IV and DV. The non-significant constant doesn’t impact that relationship at all.
  
  I hope this helps!
  
  Loading...
  
  Reply
isaac says

December 14, 2019 at 9:55 pm

please, if your p-value in a regression is far above the 0.05 significant level you set, is that a cause for worry, and how do you interpret the results.

Loading...

Reply
- Jim Frost says
  
  December 16, 2019 at 4:53 pm
  
  Hi Isaac,
  
  If you’re referring to p-values for regression coefficients (aka parameter estimates), please read my post about how to interpret p-values and regression coefficients. If you have further questions after reading that post, please put them in the comment section for that post. Thanks.
  
  Loading...
  
  Reply
Dinesh satyal says

December 3, 2019 at 8:56 am

Hi jim
I got the quotation= -0.517+1.868 X in regression, How to interpret it.

Loading...

Reply
- Jim Frost says
  
  December 3, 2019 at 9:09 am
  
  Hi Dinesh,
  
  The post above tells you about the constant, which is -0.517 in your equation. Read my post about how to interpret regression coefficients to interpret your coefficient of 1.868.
  
  Loading...
  
  Reply
Andre says

November 4, 2019 at 3:51 pm

Hi Jim,

Your explanations of difficult and often confusing statistical concepts are the very best I have come across so far. Simply great!

Loading...

Reply
Cynthia Johnson says

October 30, 2019 at 11:02 pm

Hi Jim, I’m have a problem interpreting r .085 and r square .007 would you said it’s a strong relationship? I’m learning

Loading...

Reply
- Jim Frost says
  
  October 31, 2019 at 4:51 pm
  
  Hi Cynthia,
  
  An R-squared of 0.007 represents a very weak relationship. For more information, read my post about Interpreting R-squared.
  
  Loading...
  
  Reply
Help says

October 25, 2019 at 5:38 am

Hi,
i am have searched many of your articles to find the answer but I am still unsure. Why is it that OLS assumes the average mean of the error to be zero? And why does this cause a problem if it does not hold? I would be very grateful for any help/insight you could offer.
Many thanks!

Loading...

Reply
- Jim Frost says
  
  October 25, 2019 at 2:24 pm
  
  Hi,
  
  The answer is because you want your model’s predictions (i.e., fitted values) to be correct on average. A residual is the difference between the observed value and the value the model predicts. Residual = Observed value – Fitted value. If the average of the residuals equals zero, then your model’s predictions are correct on average. While the residual for any given observation might not equal zero, there’s an equal probability that the fitted value is too high or too low. In other words, your model isn’t systematically predicting too high or too low.
  
  However, if the average of the residuals is a positive value, that indicates that overall the observed values are greater than the fitted values. In other words, your model systematically under-predicts the observed values. Conversely, if the average of the residuals is negative, your model systematically over-predicts the observed values.
  
  For more information about this, read my post about OLS assumptions.
  
  I hope this helps!
  
  Loading...
  
  Reply
Saeideh says

October 20, 2019 at 1:15 am

Many thanks for your detailed explanation. Unfortunately I cannot still understand why we use multiple regression. As you mentioned, in simple regression we just consider one independent variable and in multiple regression, we have more than one independent variable but when we want to investigate the effect of one of them on y, we hold other IVs fix and infact we have again simple regression.
Why we don’t do it with several simple regression instead of multiple regression.
Thank you

Loading...

Reply
- Jim Frost says
  
  October 20, 2019 at 1:24 am
  
  Hi,
  
  Ah, I see where the misunderstanding is. No, when you add additional variables to the model it provides more information about the other variables. Holding other variables constant definitely does not return you to simple regression because you’re learning about all the variables. Furthermore, if you perform simple regression and it turns out multiple variables are involved in the subject area, the coefficient for your single IV might be biased–which is why you wouldn’t want to use several simple regression models instead of one multiple regression model.
  
  I have a post that explains this aspect in much more detail. Please read my post, When Should I Use Regression Analysis? this is a great introductory post that talks in more detail about regression, including the importance of holding other variables constant.
  
  Additionally, see my post about omitted variable bias. It’s a bit more advanced, but it shows the potential bias I mentioned earlier. It provides an example of how a simple regression model was biased until I added another variable.
  
  Those posts should answer your questions!
  
  Loading...
  
  Reply
saeideh says

October 19, 2019 at 4:14 am

Hi
thank you for your useful explanation.
could you please describe the difference between simple and multiple regression? because in both of them we consider the other independent variables as a constant or fix. so the interpretation will be the same?

Loading...

Reply
- Jim Frost says
  
  October 19, 2019 at 11:59 pm
  
  Hi,
  
  Simple and multiple regression are really same the analysis. The only thing that changes is the number of independent variables (IVs) in the model. Simple regression indicates there is only one IV. Simple regression models are easy to graph because you can plot the dependent variable (DV) on the y-axis and the IV on the x-axis. Multiple regression simply indicates there are more than one IV in the model. It’s true, when you have multiple IVs, the coefficient represents the effect of one IV when holding the values of the other IV constant. In simple regression, because there is only one IV, there are no other IVs to hold constant. In either case, you interpret the coefficients the same way–the mean change in the DV associated with a 1 unit change in the DV.
  
  I hope that helps!
  
  Loading...
  
  Reply
GARVITA JHAMB says

September 9, 2019 at 5:38 am

hi!
do we check the p value given for constant?

Loading...

Reply
- Jim Frost says
  
  September 10, 2019 at 11:26 pm
  
  Hi Garvita,
  
  You can check the p-value for the constant. If it’s less than your significance level (e.g., 0.05), then you value of the constant is significantly different that zero. However, for all the reasons I cite in this post, you usually cannot interpret the constant. Consequently, knowing that it’s different than zero is doesn’t provide much information in most cases.
  
  Loading...
  
  Reply
Dr. N. Rathankar says

August 1, 2019 at 8:48 am

Hi jim
In a regression model, i find that sum of the residuals = +1. and the r squared value is 0.64, which clearly says that there is 80% correlation.
1. what additional correction should i make to the model, so that the sum of the residuals = 0
2. if the residual sum is larger, does the correlation coefficient value fall down. i mean are these two inversely correlated
3. while calculating r, how do i infer causation between the two variables and is there any measure to infer causation

Loading...

Reply
- Jim Frost says
  
  August 1, 2019 at 11:41 pm
  
  Hi,
  
  Are you using ordinary least squares regression? And, are you including the constant in the model? Including the constant in the model should cause your residuals to sum to zero even when there are other problems with the model.
  
  If you are fitting the constant, I’m not sure why they sum to 1. If you dependent variables is very large, perhaps 1 is very small in relation? I’m not sure.
  
  I don’t understand your second question. But, if the sum of the residuals does not equal zero, it suggests that your model is biased.
  
  There is no statistical measure that assesses causation. To learn why, read by post about causation versus correlation. This post also shows how you can assess causation.
  
  Loading...
  
  Reply
Reni Kuruvila says

June 2, 2019 at 9:23 am

Can the result or output of a regression equation be a negative value. when i am using this to predict sales of a product?

Loading...

Reply
Curtis says

May 11, 2019 at 3:14 am

Is it possible to purchase a hard copy of the book?

Loading...

Reply
Harsh Chadha says

March 11, 2019 at 2:21 am

Hi Jim,

I wanted to ask a question which might not be directly related to the intercept in a regression model. What if the variation in my dependent variable (Y variable) is very low? In such cases, do we expect the intercept to capture most of the movement in Y?

Loading...

Reply
- Jim Frost says
  
  March 11, 2019 at 4:58 pm
  
  Hi Harsh,
  
  That might or might not be the case, which is true for any model. Let’s start with a more general case for all linear models and then work to the situation that you describe.
  
  The F-test of overall significance determines whether your model predicts the DV better than a model that contains only the intercept. For any model where this test is not statistically significant, the intercept only model, which is the mean, predicts the Y variable as well as your model. In other words, your model does not account for the variability of the DV better than the mean (which is the y-intercept in an intercept-only model). So, what you describe is a possibility for any model and you can test for it with the overall test of significance.
  
  Now, if the variability in the Y variable is very restricted, it’s going to be harder to use the variability in the IVs to explain the variability in the DV because there isn’t much of it. It’s harder for variables to co-vary when one does not vary much. However, it’s still possible to have a significant model in the situation you describe. To see if this condition applies to your model, simply check that F-test of overall significance!
  
  Loading...
  
  Reply
Gandalf says

February 8, 2019 at 2:42 pm

Hello Jim,

Very clarified explanations, But graphs in this article are not visible.

Loading...

Reply
- Jim Frost says
  
  February 8, 2019 at 4:58 pm
  
  That’s very strange. It’s probably a temporary web connection issue. I’ve had that post up for several years now and haven’t heard others with that problem. Please try again, and hit refresh if necessary. Let me know how that goes.
  
  Loading...
  
  Reply
Achala bhtaatraidhakal says

November 6, 2018 at 2:29 am

DEAR SIR,GOOD EVENING

IN MY STUDY I OBTAIN THIS TABLE, HOW I DESCRIBED MY FINDINGS
Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 136.002 4.361 31.189 .000
sex -1.545 2.251 -.043 -.687 .493
smoking history -5.728 2.533 -.143 -2.261 .024
a Dependent Variable: systolic BP2nd

Loading...

Reply
Nikhil says

October 17, 2018 at 11:21 pm

In the regression analysis, in case p-value for the intercept is higher than 0.05, should the intercept be still considered? Or the p-value of the intercept is immaterial?

Loading...

Reply
- Jim Frost says
  
  October 17, 2018 at 11:30 pm
  
  Hi Nikhil,
  
  Yes, I’d still include the intercept in the model. If the value of the constant is not significant but still not equal to zero, then forcing it through the origin will bias the coefficient estimates. Typically, you don’t need to interpret the constant or the p-value.
  
  Loading...
  
  Reply
Stefan Brovis says

October 17, 2018 at 7:16 am

Hi Jim,

Let me reformulate, the exact question (past paper) is stated as follows:

In the standard regression model, the assumption is made that the effect of the
regressors is linear (through Xbeta), and that the disturbances affect the mean in an additive
fashion. This at first sight sounds like a severe limitation of generality. Why is the
limitation not as large as it seems at first sight?

I am guessing that it has to do with a limitation of the OLS assumption.
I am not sure myself whether i really understood the question. I thought perhaps you could make sense of it.

Loading...

Reply
- Jim Frost says
  
  October 17, 2018 at 10:57 am
  
  Hi Stefan,
  
  The author’s question is pretty vague because he/she doesn’t explain why it appears like a severe limitation. But, I’ll take a stab at it.
  
  The author describes a linear model. Linear models are a very restricted form of all possible regression models–which I describe in my post about the differences between linear and nonlinear models. And, there are various assumption about the residuals that must be met to produce valid results–which I describe in my post about OLS assumptions.
  
  So, when you’re using linear models, you’re in a situation where the form of your model is very restricted and there are requirements for the properties of the residuals. It might seem that this combination of restrictions and requirements is problematic in terms of the general usability of linear regression to model relationships.
  
  However, it’s not as severe as that may sound because there are various techniques for getting around both sets of limitations. You can use polynomials and other forms of variables to model curvature using linear regression. And various data transformations can resolve problems with the residuals.
  
  That might be what the author is getting at but it’s hard to know for sure with such a general statement.
  
  Loading...
  
  Reply
Stefan Brovis says

October 16, 2018 at 8:20 am

Hi Jim,

As i understand we assume linearity of the coefficients in the standard regression model as well as that the errors affect the mean in an additive way, which are in fact a limitation of generality but why does it not seem to be such a big issue at first sight?

Loading...

Reply
- Jim Frost says
  
  October 16, 2018 at 10:02 am
  
  Hi Stefan,
  
  I don’t understand your question. Why do you think additive errors seem like a larger problem than they are?
  
  Loading...
  
  Reply
Daniel says

October 11, 2018 at 12:05 pm

How do we interprete a negative intercept/ constant. Do we interpret it as it is ex. The average weight of a children is -20kg when height is zero? Please clarify

Loading...

Reply
- Jim Frost says
  
  October 11, 2018 at 2:19 pm
  
  Hi Daniel,
  
  If you look in this post directly below the first fitted line plot, you’ll see that I discuss how there is no interpretation for this particular constant. The rest of the post details the various reasons why you usually can’t interpret the constant. For the height and weight example specifically, I’m sure the nature of the relationship between height and weight changes over the range of the independent variable (Height). Go down to the 2nd fitted line plot in this post, and right under that I discuss how that must be happening. But, we’re looking at a restricted range of heights and weights, and the model works for this range, but not outside the range–hence we can’t interpret the constant because it is outside the range of the data.
  
  Hope that helps!
  
  Loading...
  
  Reply
jean manirere says

September 10, 2018 at 11:50 pm

Hello Jim, what if Y doesn’t change and X changes. Example: Y=2,2,2,2,2 and X=1,2,3,4,5 How is the Correlation ?

Loading...

Reply
- Jim Frost says
  
  September 11, 2018 at 12:06 am
  
  Hi Jean,
  
  In order to be correlated, the two variables need to co-vary around their respective means. In other words, as one variables changes relative to its mean, the other variable tends to change in either the same or opposite direction relative to its mean. Because Y does not vary around it’s mean in your example, it’s impossible for them to have a non-zero correlation. Hence, the correlation is zero (or it probably produces an error because there is no variability at all in Y). Be sure to read my post about correlation!
  
  Loading...
  
  Reply
Fabian Moodley says

August 16, 2018 at 7:15 am

how will you interpret a constant in a mean equation
that’s highly significant?

Loading...

Reply
Kapil Agrawal says

May 1, 2018 at 6:45 pm

Very clear and crisp explanation

Loading...

Reply
- Jim Frost says
  
  May 1, 2018 at 11:02 pm
  
  Hi Kapil, I’m so happy to hear that it was easy to understand! That’s always my goal when writing blog posts.
  
  Loading...
  
  Reply