The standard error of the regression (S) and R-squared are two key goodness-of-fit measures for regression analysis. While R-squared is the most well-known amongst the goodness-of-fit statistics, I think it is a bit over-hyped.

In this post, I’ll compare these two statistics. We’ll also work through a regression example to help make the comparison. I think you’ll see that the oft overlooked standard error of the regression can tell you things that the high and mighty R-squared simply can’t. At the very least, you’ll find that the standard error of the regression is a great tool to add to your statistical toolkit!

## Comparison of R-squared to the Standard Error of the Regression (S)

You can find the standard error of the regression, also known as the standard error of the estimate, near R-squared in the goodness-of-fit section of most statistical output. Both of these measures give you a numeric assessment of how well a model fits the sample data. However, there are differences between the two statistics.

- The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.
- R-squared provides the relative measure of the percentage of the dependent variable variance that the model explains. R-squared can range from 0 to 100%.

An analogy makes the difference very clear. Suppose we’re talking about how fast a car is traveling.

R-squared is equivalent to saying that the car went 80% faster. That sounds a lot faster! However, it makes a huge difference whether the initial speed was 20 MPH or 90 MPH. The *increased* velocity based on the percentage can be either 16 MPH or 72 MPH, respectively. One is lame, and the other is very impressive. If you need to know exactly how much faster, the relative measure just isn’t going to tell you.

The standard error of the regression is equivalent to telling you directly how many MPH faster the car is traveling. The car went 72 MPH faster. Now that’s impressive!

Let’s move on to how we can use these two goodness-of-fits measures in regression analysis.

## Standard Error of the Regression and R-squared in Practice

In my view, the standard error of the regression has several advantages. S tells you straight up how precise the model’s predictions are using the units of the dependent variable. This statistic indicates how far the data points are from the regression line on average. You want lower values of S because it signifies that the distances between the data points and the fitted values are smaller. S is also valid for both linear and nonlinear regression models. This fact is convenient if you need to compare the fit between both types of models.

For R-squared, you want the regression model to explain higher percentages of the variance. Higher R-squared values indicate that the data points are closer to the fitted values. While higher R-squared values are good, they don’t tell you how far the data points are from the regression line. Additionally, R-squared is valid for only linear models. You can’t use R-squared to compare a linear model to a nonlinear model.

Note: Linear models can use polynomials to model curvature. I’m using the term linear to refer to models that are linear in the parameters. Read my post that explains the difference between linear and nonlinear regression models.

## Example Regression Model: BMI and Body Fat Percentage

This regression model describes the relationship between body mass index (BMI) and body fat percentage in middle school girls. It’s a linear model that uses a polynomial term to model the curvature. The fitted line plot indicates that the standard error of the regression is 3.53399% body fat. The interpretation of this S is that the standard distance between the observations and the regression line is 3.5% body fat.

S measures the precision of the model’s predictions. Consequently, we can use S to obtain a rough estimate of the 95% prediction interval. About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.

For the regression example, approximately 95% of the data points lie between the regression line and +/- 7% body fat.

The R-squared is 76.1%. I have an entire blog post dedicated to interpreting R-squared. So, I won’t cover that in detail here.

**Related posts**: Making Predictions with Regression Analysis and Understand Precision in Applied Regression to Avoid Costly Mistakes

## I Often Prefer the Standard Error of the Regression

R-squared is a percentage, which seems easy to understand. However, I often appreciate the standard error of the regression a bit more. I value the concrete insight provided by using the original units of the dependent variable. If I’m using the regression model to produce predictions, S tells me at a glance if the model is sufficiently precise.

On the other hand, R-squared doesn’t have any units, and it feels more ambiguous than S. If all we know is that R-squared is 76.1%, we don’t know how wrong the model is on average. You do need a high R-squared to produce precise predictions, but you don’t know how high it must be exactly. It’s impossible to use R-squared to evaluate the precision of the predictions.

To demonstrate this, we’ll look at the regression example. Let’s assume that our predictions must be within +/- 5% of the observed values to be useful. If we know only that R-squared is 76.1%, can we determine whether our model is sufficiently precise? No, you can’t tell using R-squared.

However, you *can* use the standard error of the regression. For our model to have the required precision, S must be less than 2.5% because 2.5 * 2 = 5. In an instant, we know that our S (3.5) is too large. We need a more precise model. Thanks S!

While I really like the standard error of the regression, you can, of course, consider both goodness-of-fit measures simultaneously. This is the statistical equivalent of having your caking and eating it!

If you’re learning regression and like the approach I use in my blog, check out my eBook!

Stelios says

Hi Jim,

You’re doing an amazing work in this site and your book is also very insightful.

I have a question which I am not sure if it is answered either here or in the nook.

My understanding is that as long as we use 2 OLS models on the same data, we can compare their R^2 to find out which provides a better fit. Is this also the case with S.E. of regression ?

Paul Madus Ejikeme says

Hi Jim,

I came accross the following statement and is not sure I understood what it means!

“If the adjusted R2 in your output is 60%, you can be 90% confident that the population value is between 40-80%.”

Please help.

Jim Frost says

Hi Paul,

What it sounds like to me is that someone is creating a 90% confidence interval around the adjusted R-squared. That’s entirely valid even though it’s not usually done. The adjusted R-squared is a sample estimate of the population parameter–just like the mean, standard deviation, and regression coefficients. You’ll always have a margin of error around sample estimates. Not understanding the model or the context, I can’t vouch for the range itself. But, it sounds like a 90% CI to me.

The fact that adjusted R-squared is in the center of the distribution rings true to me. The sample adjusted R-squared is an unbiased estimator of the population value. However,

regularR-squared is biased too high. In other words, the sample R-squared tends to overestimate the population R-squared. Not relevant to your question exactly, but an interesting side point.I hope this helps!

Jerry Miller says

Hi Jim, your article states that R-squared from a linear model can’t be compared to an R-squared from a different type of model. But doesn’t logistic regression also produce an R-squared value? I was under the impression that R-squared means the same thing regardless of the type of regression used, which is the percent of the variation in the dependent variable that is accounted by the independent variable(s) in the model.

Jim Frost says

Hi Jerry,

What I say is very specifically that you can’t compare R-square between linear models and nonlinear models because R-squared is not valid for nonlinear models. As you’ll read in that post, you should not even calculate R-squared for nonlinear models–although some statistical software packages do that erroneously. And, I’m using “nonlinear” in the strict statistical sense for nonlinear models. A regression model with a polynomial models curvature but it is actually a linear model and you can compare R-square values in that case.

That all said, I’d be careful about comparing R-squared between linear and logistic regression models. For one things, it’s often a deviance R-squared that is reported for logistic models. And the R-squared for those models can be influenced by the method in which the data are recorded. I don’t know all the ins and outs for that but if you’re planning to make that comparison, do some research first to be sure!

R-squared is very much a linear model construct.

philoinme says

Talking about the first part, are the 95% of data points which have to be within the 2*SE , observed values or predicted values? I mean should we check for observed values to be close to the fitted line or vice-versa (predicted values must be close to observed values). I am asking because both doesn’t seem to be same

Jim Frost says

Oh, sorry, I misunderstood! As a general rule of thumb, 95% of the observed values should fall within +/- 2SE of the fitted values. It helps you determine how close most of the observed values are to the fitted value. Equivalently, in a fitted line plot, how close do most of the plotted data points fall to the fitted line. It’s an assessment of how far the observed values tend to fall from the fitted values.

philoinme says

Thanks for the clarification!

Now it is clear about what is compared to what. While Standard Error for regression is one method to understand and compare model precision. My inference from this article: https://statisticsbyjim.com/regression/standard-error-regression-vs-r-squared is that calculation of SE and using it for model precision is no different from calculation of CI. Only that, the calculation is done on every fitted value and observed value is checked to be within the range. And finally, the number of such data points which fall matters for the model to be precise.

I am considering this means to compare models as calculation of prediction intervals is out of scope for the software or I do not have the programming knowledge to implement it for the same purpose.

philoinme says

Hello Jim,

I am trying to compare two models for precision.

Within this post, by reading this – ” About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.”, I get that observed values must be within pred_value +/- 2*SE.

But there is also another conception I get when you demonstrate in below paragraphs. It is that the predictions must be within +/- 5% of the observed values to be useful.

Which one is right?

And how do I calculate prediction interval, the calculation for both seems same (using SE) but PI is wider than CI, how?

Be well!

Akshay Kotha

Jim Frost says

Hi Akshay,

These are two different things and they don’t necessarily agree!

The first part about “95% of the data points . . .” is a factual statement derived from statistical analysis of the prediction error. It’s the capability of your regression model to make precise predictions.

The second statement “+/- 5% . . .” is the analyst’s requirements for precision. Requirements are generated by the analysts (or their clients) based on the specific needs of their application. In other words, it’s something that is determined outside of the statistical analysis–such as business, manufacturing, engineering, or decision-making requirements. I need the predictions to be this precise or they’re not useful.

The idea is that you compare the actual precision (the first part) to the required precision (the second part) to be able to determine whether the model’s predictions are sufficiently precise. Does the model satisfy the requirements? Hopefully yes but they might not. In the example in this post, the model does not satisfy the requirements (which I generated just to show how the idea works).

I don’t have the formula handy but you should be able to find it in most textbooks or software.

Mani says

Thanks a lot sir.

But how can i calculate S from the model .

S=sqrt (SS(res)/n-p-1)

Is the above formula right?

Jim Frost says

Hi Mani, yep, that’s the formula for S!

Mani says

Hey sir ,hope you will be fine. I have visited your site for a lot of time.Its really helpful.

I have question that as you mention above that the value of R squared should be high .But i have seen some published paper,they have used OLS but their R squared values are very low like 0.04,0.1.Then how these paper are published?because they have very low square.These paper used cross_sectional data.and problem is related to child health under 5 years age.The link of paper is given below.And also what the value of F tell.

And secondly, you discuss about slandered error of regression? Is it RMSE (root mean square error) or some other statistic?I’m bit confuse.please help.

https://academic.oup.com/jn/article/134/10/2579/4688437

Jim Frost says

Hi Mani,

I’m very happy to hear that you’ve found my site to be helpful! 🙂

I briefly looked over the article. The R-squared values I saw ranged from 0.15 to 0.28. Those are fairly low but not necessarily problematic. Even though the R-squared is low, they do have statistically significant main effects and interaction effects. This fact allows them to draw conclusions about the relationships between these variables even though R-squared is low. The low R-squared indicates that the model accounts for a small portion of the variability in their outcome variables. However, there are still relationships between the independent variables and the dependent variables. This combination might not appear to make sense but it is not necessarily a problem. I’ve written a blog post about this topic exactly. If you read my post about regression models with low R-squared values and significant variables, I think it’ll answer many of your questions! The situation I describe in that post is occurring in this study.

As for the F-value, it’s used for the overall significance of the model. Typically, you don’t interpret the F-value itself but rather the p-value which is based on the F-value. For more information, read my post about the F-test of Overall Significance.

The standard error of the regression (s) is the square root of the adjusted mean squares. That’s related but a bit different from the RMSE because S adjusts for the error degrees of freedom. I’ll add that to this post to make that point clear. Thanks!

Des says

Hello Jim,

Many thanks for your reply! I was actually all set to extract the median responses from the scales and do a logistic ordinal regression, but I was guided away from that by one of my professors and led toward standard multiple regression, so I feel I have no choice but to walk this path (I want to get a good grade, and they are the ones that give it to me…).

Perhaps I should also clarify that none of my data are single item Likert responses, rather they are multi-item Likert scales of which the means were calculated and then have been treated as continuous data for multiple regression (for example to measure the effects of curiosity and certain orientations on learning motivation). I can confirm that I have well over N > 50 + (8 x number of variables), all the regressions thankfully, sit safely within the parameters of the OLS assumptions, and that they generate robust p-values ( < .001 ) for the ANOVAs and almost all of the regression model coefficient values (one is <.01) . That being the case, would it make more sense to focus on the R-sq values as opposed to the Std-Errors of the Coefficient? For example one regression model (3 IVs) I get an R-sq value of .71, and the Std-Errors of the Coefficient seem reasonable (.04 ~ .07). With there being no units, but the values derived from a scale of 1-7, would it still sound reasonable to say something regarding those Std Errors?

Apologies if I'm not making sense, this is my first rodeo. I'm very much enjoying it though! Fascinating stuff.

Cheers,

Des

Jim Frost says

Ah, ok, if the dependent variable is an average of Likert scale responses, I can see how there’s a good chance you could treat that as a continuous dependent variable. Just be sure to check those residual plots. That’s always good advice, but doubly so in your case! And, yes, if you’re averaging them, ordinal logistic regression would not be a good choice. So, it sounds like they’re suggesting the right path!

I’m glad you’re enjoying it. The field of statistics is the science of learning from data, which is pretty cool stuff! 🙂

Des says

Hello Jim,

I have to say this whole blog site is a goldmine of useful information for those of us who are still at the nascent stages of our statistical journeys (in my case, I’ve gradually been cranking up the stats throughout a masters in the social sciences and am now now well into my dissertation). Many thanks for your clear and easy to follow explanations!

It’s a fascinating subject, but also surprising how much opinion there is when it comes to statistics, which before I started, looking from the outside, thought was cut-and-dry objective. My question relates to the much debated subject of using parametric analyses on the means of Likert scale responses from questionnaires, specifically multiple regression. I was advised to do so by the stats guru in our department and the previous studies I have used as foundations for my own work also used parametric analyses on very similar data. In the case of using means from Likert scales of 1–7 (Strongly disagree – Strongly agree, N = 269, normally distributed data) for both the IVs and DV, given that there are no simple units that are ready to be divided up like Mph or BMI (kg/m-sq), can one still make reasonable sense of the Std-Error of the Coefficient? In that case, isn’t the R-sq value more intuitively coherent?

Many thanks for this site!

Des

Jim Frost says

Hi Des,

Thanks for the kind words. I’m very happy to hear that my site has been helpful! That makes my day!

In terms of using Likert variables in regression analysis, if you’re talking about using one for the dependent variable, there’s a straightforward answer. Use ordinal logistic regression because that’s designed for an ordinal dependent variable. Likert data are ordinal.

It’s more complicated if you want to use them as independent variables. In that case, you can try including them as continuous variables, but be sure to check the residual plots to see if you’re satisfying all of the OLS assumptions. If the residual plots look good, then there’s no problem. You can try the usual methods of fitting curvature to help with that.

If that doesn’t work, you can include ordinal data as a categorical variable and then the analysis treats each response as a group in your data and assesses the difference between the group means. The downside of that it a 7 point Likert scale will use six degrees of freedom, which can be problematic if you have a small sample size or a number of such variables. If necessary, you could try combining groups.

Outside of regression analysis, you might be interested in this blog post I wrote about using a parametric or non-parametric test to assess means for Likert data.

ross wright says

Thank you for your insights and explanations regarding the ‘S’ and ‘R-Squared’ statistics. Can I ask you to interpret a set of data for a different context so that I can make sure I am interpreting the data correctly (my statistics knowledge is quite rusty); The context I am investigating is the prediction of counter productive work behaviours (as measured by a bespoke measure) by a general personality test. The test publisher provides the following regression statistics:-

Rsquared = .416; Adjusted R squared = .399; Standard error of the estimate = .3972.

How should I interpret the ‘S’ statistic in this context. Thankyou

PJ says

Hi Jim,

Thanks.

I got the concept of CI and PI.

The data set I am working has 6 observations for each x and y. x and y have r2 = 0.84.

I need to predict y value for any given value of x. So in this case, which one is good PI or CI ?

I understand that PI is more reliable as it has broad range but May I request to help suggesting on it with reason.

Jim Frost says

Hi PJ,

Both the CI and the PI are equally reliable. Notice how they both have 95% confidence levels. It’s just that you use them for different reasons. The link I gave you in my previous comment takes you to a post that I wrote where I cover why you’d use each one. Rather than retyping what I wrote there, I’d just ask that you go and check that one out. Either one can be good. It depends what you need to use it for (I know for the prediction but read the post and you’ll see what I mean).

There are two sections in that post which you should pay particular attention to.:

Precision of the Predictions

Interpreting the Regression Prediction Results

PJ says

Hi Jim,

Great explanation, thank you.

I have query regarding scatter distribution prediction and linear regression.

Suppose we have x values as -> 120.0, 161.9, 142.8, 102.5, 113.8 and y-> 111.1, 142.9, 142.1, 117.3,120.3

r2 is 0.7624 and straight line equation is y = 0.5396x + 57.562

>Suppose if x = 130 then y=185.272 +/- delta

what will be this delta and how to calculate it? So that we can say with confidence that y will lie within this range.

Jim Frost says

Hi PJ

There’s really nothing called delta in that context, at least not that I’m aware of. I think you’re referring to either confidence intervals of the prediction or prediction intervals. Those are two different types of intervals for fitted values.

The fitted value for 130 using that model is 127.711.

The 95% confidence interval of the prediction is: [115.840, 139.582]

The 95% prediction interval is: [98.7193, 156.703]

To learn about the difference between these intervals, read my post about using regression to make predictions. You can use the information in that post to determine which interval you need.

I hope this helps!

Jeremy says

Does logistic regression produce an R-squared?

Asif Razzaq says

Excellent way of teaching! Thank you Prof.

merve says

Thanks a lot! So nice!

Josh says

Jim you are an absolute legend, thanks for the great posts!

Jim Frost says

Thank you, Josh! I really appreciate that. I’m glad the posts are helpful!

Akis says

So what is the formula for the S? I have been looking for it, and different websites give different answers. Is it closely related to RMSE? Two sites provide the formula for the RMSE as the S and in general, I find it hard to find some good resources for it.

Thank you in advance.

Keryn says

Great explanation Jim, thank you. How do I interpret standard error for categorical variables? I have gender (male=0, felamle=1) and std error is 1.030. Std Coefficient beta = .082 and unstandardised B =2.040.

Jim Frost says

Hi Keryn,

I think there’s some confusing terminology here that is making this more complicated for you!

Standard errors of the coefficients are different statistics than the standard error of the regression, which I talk about in this post. The similarity is that these different standard errors measure the precision of an estimate. For the standard error of the regression, it’s the precision of the prediction. For the S.E. of the coefficients, it’s the precision of the coefficient estimate. Smaller values represent more precise estimates. However, when I want to assess precision of the coefficient estimates, I use confidence intervals of those estimates as they’re easier to interpret.

And, it’s not totally clear to me, but I think you might also be mixing in standardized coefficients, which SPSS refers to as beta (for some strange reason). Standardized coefficients are when you take the continuous independent variables and subtract the mean and divide by the standard deviation to get their standardized scores. And then you fit the model using these standardize variables rather than with the original data. You can read about one reason why here: identifying the most important independent variable in a regression model. However, you can only do that with continuous variables, but you mention categorical variables–so that confuses me! (Even though you’re using 1s and 0s, the software shouldn’t treat it as continuous data.)

I hope this helps!

Keegan says

Hi Jim,

First off, great weblinks and explanations. I have used MATLAB for non linear regression and obtained the following results:

Sea level ~ b1 + b2*SST + b3*Air temp + b4*VLM

Estimated Coefficients:

Estimate SE tStat pValue

_________ ________ ________ __________

b1 0.0089803 0.78185 0.011486 0.99084

b2 73.269 14.867 4.9284 1.2143e-06

b3 126.71 13.924 9.1001 4.2421e-18

b4 0.21917 0.047559 4.6083 5.4584e-06

Number of observations: 406, Error degrees of freedom: 402

Root Mean Squared Error: 15.3

R-Squared: 1, Adjusted R-Squared 1

F-statistic vs. zero model: 1.16e+06, p-value = 0

Firstly, I get an R2 value of 1. So I am a little confused on whether this result is good. And secondly, how would I know if the model is good?

Jim Frost says

Hi Keegan,

Assuming these are real data rather than numbers from a mathematical function, I’d guess that Matlab is rounding up for R-squared–i.e. 99.9999 to 1. If there’s even a little bit of noise in the data, you won’t have an R-squared of one. Given the extremely high R-squared, the model appears to provide a great fit to the data. Additionally, you have relatively many observations to the number of terms in the model, which is great.

Here are a couple of additional things I’d check.

Take a look at the residual plots just to be sure that it’s an unbiased fit.

Check the signs and magnitudes of the coefficient estimates to be sure they match theoretical values.

If those look good, then I’d say you have a good model!

Ahmed says

Hey Jim. I really appreciate this post. I learned something new today. In this short time I have also become bit of a fan of S 🙂 Thank you!

Jim Frost says

Hi Ahmed, that’s great! S needs fans because, you know, R-squared gets all of the attention! 🙂

Khushpal says

Hi Jim, Thanks for your efforts. I have 1 question around Standard error. When we run linear regression using Excel, there are 2 Standard Error displayed.

First standard Error in Summary section and another column along with Cofficient Column.

Can you please explain different .

Thanks in Advance

Jim Frost says

Hi Khushpal,

It’s been awhile since I’ve used Excel to perform regression! I looked into it. The standard error of the regression in Excel is referred to as SEY or S(Y). In the 2X5 matrix of results, the standard error of the regression is in the 2nd column, 3rd row. You can read about that in my post about the standard error of the regression.

I believe the other standard error is for the coefficient. I’m not sure how it works in Excel when you have more than one coefficient. Typically, you’d have a standard error for each coefficient. The standard error is the standard deviation of the sampling distribution for coefficient estimates. Smaller values represent more precise estimates. However, you don’t usually interpret standard errors for parameter estimates directly. Standard errors are used to calculate confidence intervals, which I find are easier to interpret (outside of the standard error of the regression).

I hope this helps!

Khurram Siddiqui says

Hi Jim,

Thanks for detail – i was comparing SSE unnormalized one e.g. Sum(Yi – Y^i). Not the the one normalized one SQRT(SSE/N).

That’s why i was a bot confused – you covered normalized one not un normalized thats make sense. Thanks for giving such insights

Geoff says

Hi Jim, thanks for the explanation. To explain why I am asking this question, we do a regular test where the “X” values are markers that never change. The “Y” values are the positions of the marker in a solution. We run a Regression to see if the markers are sufficiently close to the fitted line. If I look at the equations for Standard Error of the Regression and Standard Error of the Slope, the only difference is, SE of the Slope has sqrt((∑(Xi – X-bar )^2) in the denominator. Since the Xi and X-bar values are always the same, it seems in this case that SE of the Slope can be used with the same validity as SE of Regression. In fact, the SE of Slope value is always > 1 and SE of Regression is a tiny decimal, which is not as friendly if checking visually.

SE Regression = sqrt((∑(Yi – Y-hat)^2/(n-2))

SE Slope = sqrt((∑(Yi – Y-hat)^2/(n-2)) / sqrt(∑(Xi – X-bar )^2)

Jim Frost says

Hi Geoff, thanks for providing the context, which is always key in statistics!

Yes, I think you’re correct about that. Essentially the denominator is a constant in your scenario. You’re just dividing the SE Regression value by a constant. If you’re looking for a more user friendly value and only need to identify smaller values (rather than interpreting the meaning of the value), I agree with what you’re saying. Additionally, if you have standards for the SE Regression value that you have to meet, you can use that constant value to map SE Regression values to SE Slope values.

Be aware that if you ever change the X values, that’s all thrown off. You wouldn’t be able to compare the pre- and post-change SE slope values to each other and you’d need to re-calculate any standards– at least for how you want to use them as a proxy for SE Regression.

Usually, the two statistics provide different types of information that you’d use for different reasons and that was what was confusing me earlier!

Geoff says

Hi Jim, I just wondered if you had any feedback on my comment on April 15. Thanks

Jim Frost says

Hi Geoff, so sorry, I somehow missed your original comment. I’ll get back to you shortly about that. (I’m a little travel weary at the moment!)

Julia Hunter says

Hey Jim,

Thank you for your help regarding the regression constant. This helps explain the errors I may face with making my y-intercept = 0. However, I still need to calculate the standard error of regression. If I am setting my y-intercept to 0, what is the best equation for this situation? I have found the following equation, but want to make sure it is correct:

s = √((∑y^2 – b0 ∑y – b1 ∑xy) / (n-2))

I assume that the middle part of the equation (b0 ∑y) would just be equal to 0 because of the forced y-intercept to 0.

Thank you for all of your help!

Jim Frost says

Hey Julia,

Sorry, when I meant that it affects the standard error of the regression (s), I meant that it can affect the value of s rather than the equation itself. As you can see in the equation below, you’re simply summing the squared differences between the observed and fitted values and dividing by the DF–and then taking the square root of that. When you force the equation through zero, you still sum the squared differences. However, the fitted values will be different if the forced zero intercept changes the slope of the fitted line.

The standard error of the regression (s) is the square root of the mean square error (MSE), where:

p = the number of terms in the model not counting the constant.

Julia Hunter says

Hey Jim,

I am comparing data with a forced zero intercept. Does forcing the intercept to zero affect the outcome of the standard error of regression?

Thank you for the better understanding of standard error of regression.

Jim Frost says

Hi Julia,

Yes it does! I’ve written a blog post about the regression constant that touches on this. Towards the end, I talk about models that models where you leave the constant out of the model, which forces the fitted line to go through the origin. This affects both R-squared and the standard error of the regression. For reasons that I discuss in that post, you have to be very careful when you force the line to go through zero!

I hope this helps!

Allif says

Thank you so much jim

Geoff says

Hi Jim, great stuff. Would you consider the combined Standard Error of the Slope and the Standard Error of the Intercept as useful as the Standard Error of the Regression (after summing the slope and intercept variances, then taking the square root)?

Jim Frost says

Hi Geoff, sorry for the delay in replying to this! I don’t think those two standard errors (of the slope and of the intercept) are useful as the standard error of the regression because the usefulness of the information they provide is different.

The standard error of the regression tells you how far the observations tend to fall from the fitted values. It’s essentially the standard deviation for the population of residuals. That seems to be useful information because it’s telling you in absolute terms the typical size of a residual. You can also obtain similar type of information with prediction intervals.

The standard error of the slope tells you the standard deviation of the sampling distribution for the slope. And for the constant, it’s the standard deviation of the sampling distribution for the constant. I don’t find that type of information to be as useful. They’re starting to get at the precision of the estimates for the slope and the constant. However, if you’re looking for that type of information, I’d recommend obtaining the confidence intervals for the slopes and the constant instead. These CIs provide ranges that are likely to include the population values for the slope and the constant.

I hope this helps!

Jozef says

Dear Jim, thanks for your explanation.

I have got a question concerning this topic. Whe I used two models (Cubic and Power), I have received values of Standard Error of Estimate (Cubic – 213, Power – 0,113). Based on these two numbers the Power model is better model than Cubic model. But when I look at the graphs, I see, that curve for Cubic model better desribes my data than Power model curve. Is there any explanation why this situation happens to me?

Many thanks.

Jim Frost says

Hi Jozef,

Without fully understanding the model you fit, it’s hard to say. However, two possibilities come to mind.

1. Is it possible that differences in graph scaling are making it appear that Cubic model fits better when it actually doesn’t?

2. Are you taking the log of both sides of the equation for the power model?

I suspect that #2 is more likely the case. If that is true, keep in mind that S is based on the transformed data rather than the natural units. In this case, it is not valid to compare S and R-squared between these models.

Wordpektif says

Hi Jim,

thanks for your amazing explanation! but I am still confused about the unit for S. how S (S^2 = SSE / N-P) gives a percent unit?

Thanks so much.

Jim Frost says

Hi, please note that as I explain in the blog post, that S is unlike R-squared because it is not a percentage. Although, I can see how the specific example I used can create a bit of confusion.

The standard error of the regression (S) provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.

For the specific example that I used, the dependent variable is body fat percentage. So, in this case the units of the dependent variable is a percentage, which means that S is a percentage. However, suppose we create a different model where the dependent variable is weight in kilograms. For that model, S represents kilograms.

I hope this clarifies things. That was a great question!

Troy says

Great information! Im studying stats and im wondering if there is a coefficient of variation that can be use to measure the variation in the regression line? And is it possible to interpret standard error of regression as a percentage without unit?

Colin Ware says

Thanks so much.

-Colin

Colin Ware says

Dear Jim,

I want to compare two models using the criteria of which best fits the data through a linear regression.

E.g. model A gives me r2 = 0.9, model B gives me r2 = 0.97.

The models are each designed to account for the same 300 observations, and there are two model parameters.

Can I take the ratio of the residual variance Ra =(1-r2) for A and the residual variance (Rb) for B and use this to do a simple F test?

Perhaps where the number of degrees of freedom = 300 -4. The number 4 relates to the number of degrees of freedom in my model (2) and the number of degrees of freedom in the regression (2).

For this example, the F ratio will be 0.1/0.03 = 3.33 and is highly significant.

Hope this all makes sense. Also, if it does, is there a standard reference I can use in a publication?

Jim Frost says

Hi Colin, statistically, that approach sounds OK. F-tests are good for comparing different models. The publication I know offhand that discusses this is the standard linear model textbook that I always recommend: Applied Linear Regression Models by Kutner et al. Here’s a paper from Duke about using F-tests to compare models.

As you do this, keep the following points in mind:

– check the residual plots

– make sure that the models (i.e., coefficient signs and magnitudes) make theoretical sense

It’s never good practice to choose the final model based solely on statistical significance. The residuals have to look good and the model has to make sense.

I hope this helps!

Paul says

Sorry – I meant

1 – SS(residual) / SS(total)

Paul says

Thanks Jim!

Those links are very helpful. Although I now cant’ help wondering *why* this identity:

Explained variance + Error variance = Total variance

no longer holds for nonlinear models. Is there a proof anywhere that you can point me to? I understand that Spiess and Neumeyer have done an experimental study, it’s just that I’d like to convince myself one way or the other about the particular technique I’m using.

A related question: If I define R-squared a different way, say as:

1 – SS(regression) / SS(total)

is this any better for the nonlinear case?

thanks!

Jim Frost says

Hi Paul, apologies for the delay in replying. I haven’t looked into a specific proof about why this is true, but it has to exist. The key point is that R-squared was designed for linear regression, which is a very specific case. When you think about all of the potential model forms (virtually infinite), linear models are just one very restricted type. In a way, it’s surprising how often linear models provide an adequate fit. We like them because they’re easier to interpret, we can get p-values for the predictors, and, of course, R-squared. However, for all the math to work out correctly, you need the overarching framework that linear models provide. Once you move outside that framework, linear calculations literally don’t add up correctly!

I don’t think redefining R-squared resolves the underlying problem.

I know this doesn’t answer your specific question directly, but that’s the gist of the problem.

Paul says

Hi Jim,

thanks for this great article! I have one question though: I’m completely unclear about why R-squared would be an appropriate measure of fit, if I have a model that is linear in the parameters, but potentially highly nonlinear in terms of the mapping from inputs to outputs. The reason I ask is that I’m using Gaussian Process models for regression, which can produce highly nonlinear interpolant functions, but the response is linear in the parameters. I can’t decide if R-squared is a valid measure or not! Any pointers?

thanks,

Paul

Jim Frost says

Hi Paul,

Thanks! As you surmise, it comes down to whether the model is linear in the parameters. If it is, you have a linear model and R-squared is appropriate. To be sure that we’re on the same page about what “linear in the parameters” means, read my post about the difference between linear and nonlinear models.

As for why it is appropriate, it’s based on the math behind the scenes. I talk about this in a post where I explain why R-squared is not valid for nonlinear models. I include a reference that can help you understand in more depth.

Also, be aware that if you transform the data, which is a way to use a linear model to fit nonlinear data, then both R-squared and S apply to the transformed data rather than the original data. This fact can cause both of these statistics to be misleading.

I hope this helps!

Jim

YS KIM says

Thanks for the posting. Now, I am studying STAT course and this posting is really helpful to understand R^2 vs. SE

Jim Frost says

You’re very welcome. I’m happy to hear that you found it to be helpful! Best of luck with your Stats course!

Meysam Rahmanian Shahri says

Hi, Thank you for your complete explanation.

Actually we have a experimental data set and also we have several power correlations trying to estimate the experimental data. I want to know which one of these correlation provide the best goodness of fit !!

How can we determine the SEE of our correlations? Can we transform the power correlation to a linear correlation and then calculate the R square and SSE of the linear correlation as a criteria to compare the goodness of fit between the presented correlation? Because i think the calculated SSE for power and transformed correlation (linear correlation) would not be the same. What do you recommend me to do to present a more quantitative way to compare the difference of goodness between the correlations?

Thanks in advance for your help.

Jim Frost says

Hi, I’m not completely clear about the situation you are describing. Can you fit the different nonlinear models and compare the SEE?

Meysam Rahmanian Shahri says

Hi Jim, Thank you for you amazing explanation.

If a nonlinear function such as power function used to fit the data set , can we convert the nonlinear function to linear function just by calculating the Ln of dependent and independent variables and then use R-square as a parameter which can be used to show the goodness of fit?

Jim Frost says

Hi Meysam, there are some ways of using linear models to fit nonlinear functions, such as using logs to transform the data. In this case, the linear model produces an R-squared value. However, the R-squared value is applicable only to the transformed data and not the original data.

Meysam Rahmanian Shahri says

Hi Jim, Do you think that using Mean Relative Error (MRE) is an appropriate way to evaluate the goodness of fit for non linear functions?

Jim Frost says

Hi Meysam, I’m not familiar with using MRE to evaluate nonlinear regression models, so I can’t comment.

Tugba says

Hi Jim. Thank you very much for the explanation! Since the program can calculate SSE, I can calculate the S! I have one more question for clarification. You said N is the number of observation. It is probably very basic statistic definition. But, my understanding is that the number of observation is the number of data points (for example in x-y scatter plot). Is my thinking correct? Again, thanks!

One side note. In general, your website is excellent! Easy to read, follow and understand! I read some other of your posts and I have just started to read basics too. Moreover, your responds are quick and super helpful! Thank you very much!

Jim Frost says

Hi, yes, that’s correct about data points.

Thanks for the kind words about my website. I really appreciate them and gives me good motivation to keep going with it!

Happy new year!

Khurram Siddiqui says

Hi Jim,

Nice article two question here

what about SE increasing when more variables comes in the model?If we built the same model with twice as much variable , SE might be twice as big?

SE tells you exactly the absolute and true value of your line goodness fit , on the other hand R square tell you how much (in %) you are good compare to with your baseline. SE does not tell you this

Thanks

Jim Frost says

Hi Khurram,

For the same dataset, R-squared and S will tend to move in opposite directions based on your model. As R-squared increases, S will tend to get smaller. Remember, smaller is better for S. With R-squared, it will always increase as you add any variable even when it’s not statistically significant. However, S is more like adjusted R-squared. Adjusted R-squared only increases when you add good independent variable (technically t>1). S uses the same adjustment as adjusted R-squared. It will only decrease when you add “good” variables. If you add twice as many variables to your model, S should decreases unless all of those extra variables are not good. Read my post about adjusted R-squared for more details about that.

Yes, S tells you the absolute value for the standard distance that the residuals fall from the fitted values. R-squared indicates the percentage of the variance of the dependent variable around its mean that the model accounts for. Different types of information–absolute versus relative measures. You can use them both as needed.

I hope this helps!

Tugba says

Excellent explanation! Thank you very much! I use nonlinear fitting function to fit my experimental data to an equation to estimate two parameters. I would like to use the Standard Error of Regression instead of R square. However, I could not find the Standard Error of Regression in the program I am using. Among Goodness-of-Fit Statistics methods listed in the program’s website, I have seen Sum of Squares Due to Error which is described as “This statistic measures the total deviation of the response values from the fit to the response values. It is also called the summed square of residuals and is usually labeled as SSE. A value closer to 0 indicates that the model has a smaller random error component, and that the fit will be more useful for prediction.” I wonder if you could explain what the difference between these two methods? Many thanks!

Jim Frost says

Hi Tugba, SSE and S are related. In fact, you need to calculate SSE before you can calculate S. SSE is the sum of the squared residuals. And, it’s true that higher values indicate higher error, but it’s impossible to interpret SSE by itself. For one thing, it’s in squared units rather than the natural units. And, it’s also a sum that increases based on both the square of each residual but also based on the number of residuals. A larger sample size will have a large SSE just because there are more residuals. As you add more residuals, this sum just keeps going up even when error is low. The formula for S is below:

N is the number of observations and P is the number of parameters. S takes the squared sum and divides it by (N-P). This controls for the number of observations and parameters. And, then takes the square root so it’s in natural units.

I hope this helps!

Garima Jain says

Very helpful. Great explanations. Thank you so so so much!

Jim Frost says

Thank you! I’m so glad to hear that it was helpful!

Kevin M Armengol says

you explain statistical concepts very well!

Jim Frost says

Thank you!

Michael Heitmeier says

Would it be correct to say that R-squared does not work for non-linear models because the mean (which the R2 calculation depends on) is not capturing the essence of non-linear data in the way that it does for linear data? A mean can be calculated for both but linear data varies around the mean in a much more straightforward way.

Jim Frost says

Hi Machael, that’s not quite it. There are some calculations for R-squared that literally don’t add up correctly in nonlinear regression. I’ve written a post about this which covers those calculations and the problems it produces. R-squared is Not Valid for Nonlinear Regression.

I hope this helps clarify things!

Jim

sid says

Hi Jim,

I am a beginner in statistics and have below doubts:

1) Is Standard error of regression(S) same as “Mean Squared error(MSE)” and this S or MSE is used in actually figuring out Standard error for b1 and bo which are essentially estimates of B1 and B0 respectively for population regression line :

Y = B0 + B1*X + e

?

2) I didn’t get the below line:

“For the regression example, approximately 95% of the data points lie between the regression line and +/- 7% body fat”

It’ll be really helpful if you can guide me, how did we arrive at “7%” value ?

3) For the line just before this:

“About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.”

I believe, for population regression line :

Y = B0 + B1*X + e

95% confidence interval for B1 approximately takes the form(:

b1 +- 2*SE(b1)

95% confidence interval for B0 approximately takes the form(:

b0 +- 2*SE(b0)

and that is how we get the /- 2 * standard error of the regression from the fitted line. Please correct me if I am wrong on this.

Thanks in advance!

hamza says

great site for understand statistic.

thanks a Lot sir

Jim Frost says

You’re very welcome! I’m glad you found it to be helpful!

yadawananda neog says

Really appreciable explanation.

Jim Frost says

Thank you!