The standard error of the regression (S) and R-squared are two key goodness-of-fit measures for regression analysis. While R-squared is the most well-known amongst the goodness-of-fit statistics, I think it is a bit over-hyped.

In this post, I’ll compare these two statistics. We’ll also work through a regression example to help make the comparison. I think you’ll see that the oft overlooked standard error of the regression can tell you things that the high and mighty R-squared simply can’t. At the very least, you’ll find that the standard error of the regression is a great tool to add to your statistical toolkit!

## Comparison of R-squared to the Standard Error of the Regression (S)

You can find the standard error of the regression, also known as the standard error of the estimate, near R-squared in the goodness-of-fit section of most statistical output. Both of these measures give you a numeric assessment of how well a model fits the sample data. However, there are differences between the two statistics.

- The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.
- R-squared provides the relative measure of the percentage of the dependent variable variance that the model explains. R-squared can range from 0 to 100%.

An analogy makes the difference very clear. Suppose we’re talking about how fast a car is traveling.

R-squared is equivalent to saying that the car went 80% faster. That sounds a lot faster! However, it makes a huge difference whether the initial speed was 20 MPH or 90 MPH. The *increased* velocity based on the percentage can be either 16 MPH or 72 MPH, respectively. One is lame, and the other is very impressive. If you need to know exactly how much faster, the relative measure just isn’t going to tell you.

The standard error of the regression is equivalent to telling you directly how many MPH faster the car is traveling. The car went 72 MPH faster. Now that’s impressive!

Let’s move on to how we can use these two goodness-of-fits measures in regression analysis.

## Standard Error of the Regression and R-squared in Practice

In my view, the standard error of the regression has several advantages. S tells you straight up how precise the model’s predictions are using the units of the dependent variable. This statistic indicates how far the data points are from the regression line on average. You want lower values of S because it signifies that the distances between the data points and the fitted values are smaller. S is also valid for both linear and nonlinear regression models. This fact is convenient if you need to compare the fit between both types of models.

For R-squared, you want the regression model to explain higher percentages of the variance. Higher R-squared values indicate that the data points are closer to the fitted values. While higher R-squared values are good, they don’t tell you how far the data points are from the regression line. Additionally, R-squared is valid for only linear models. You can’t use R-squared to compare a linear model to a nonlinear model.

Note: Linear models can use polynomials to model curvature. I’m using the term linear to refer to models that are linear in the parameters. Read my post that explains the difference between linear and nonlinear regression models.

## Example Regression Model: BMI and Body Fat Percentage

This regression model describes the relationship between body mass index (BMI) and body fat percentage in middle school girls. It’s a linear model that uses a polynomial term to model the curvature. The fitted line plot indicates that the standard error of the regression is 3.53399% body fat. The interpretation of this S is that the standard distance between the observations and the regression line is 3.5% body fat.

S measures the precision of the model’s predictions. Consequently, we can use S to obtain a rough estimate of the 95% prediction interval. About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.

For the regression example, approximately 95% of the data points lie between the regression line and +/- 7% body fat.

The R-squared is 76.1%. I have an entire blog post dedicated to interpreting R-squared. So, I won’t cover that in detail here.

**Related posts**: Making Predictions with Regression Analysis and Understand Precision in Applied Regression to Avoid Costly Mistakes

## I Often Prefer the Standard Error of the Regression

R-squared is a percentage, which seems easy to understand. However, I often appreciate the standard error of the regression a bit more. I value the concrete insight provided by using the original units of the dependent variable. If I’m using the regression model to produce predictions, S tells me at a glance if the model is sufficiently precise.

On the other hand, R-squared doesn’t have any units, and it feels more ambiguous than S. If all we know is that R-squared is 76.1%, we don’t know how wrong the model is on average. You do need a high R-squared to produce precise predictions, but you don’t know how high it must be exactly. It’s impossible to use R-squared to evaluate the precision of the predictions.

To demonstrate this, we’ll look at the regression example. Let’s assume that our predictions must be within +/- 5% of the observed values to be useful. If we know only that R-squared is 76.1%, can we determine whether our model is sufficiently precise? No, you can’t tell using R-squared.

However, you *can* use the standard error of the regression. For our model to have the required precision, S must be less than 2.5% because 2.5 * 2 = 5. In an instant, we know that our S (3.5) is too large. We need a more precise model. Thanks S!

While I really like the standard error of the regression, you can, of course, consider both goodness-of-fit measures simultaneously. This is the statistical equivalent of having your caking and eating it!

If you’re learning regression, check out my Regression Tutorial!

yadawananda neog says

Really appreciable explanation.

Jim Frost says

Thank you!

hamza says

great site for understand statistic.

thanks a Lot sir

Jim Frost says

You’re very welcome! I’m glad you found it to be helpful!

sid says

Hi Jim,

I am a beginner in statistics and have below doubts:

1) Is Standard error of regression(S) same as “Mean Squared error(MSE)” and this S or MSE is used in actually figuring out Standard error for b1 and bo which are essentially estimates of B1 and B0 respectively for population regression line :

Y = B0 + B1*X + e

?

2) I didn’t get the below line:

“For the regression example, approximately 95% of the data points lie between the regression line and +/- 7% body fat”

It’ll be really helpful if you can guide me, how did we arrive at “7%” value ?

3) For the line just before this:

“About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.”

I believe, for population regression line :

Y = B0 + B1*X + e

95% confidence interval for B1 approximately takes the form(:

b1 +- 2*SE(b1)

95% confidence interval for B0 approximately takes the form(:

b0 +- 2*SE(b0)

and that is how we get the /- 2 * standard error of the regression from the fitted line. Please correct me if I am wrong on this.

Thanks in advance!

Michael Heitmeier says

Would it be correct to say that R-squared does not work for non-linear models because the mean (which the R2 calculation depends on) is not capturing the essence of non-linear data in the way that it does for linear data? A mean can be calculated for both but linear data varies around the mean in a much more straightforward way.

Jim Frost says

Hi Machael, that’s not quite it. There are some calculations for R-squared that literally don’t add up correctly in nonlinear regression. I’ve written a post about this which covers those calculations and the problems it produces. R-squared is Not Valid for Nonlinear Regression.

I hope this helps clarify things!

Jim

Kevin M Armengol says

you explain statistical concepts very well!

Jim Frost says

Thank you!

Garima Jain says

Very helpful. Great explanations. Thank you so so so much!

Jim Frost says

Thank you! I’m so glad to hear that it was helpful!

Tugba says

Excellent explanation! Thank you very much! I use nonlinear fitting function to fit my experimental data to an equation to estimate two parameters. I would like to use the Standard Error of Regression instead of R square. However, I could not find the Standard Error of Regression in the program I am using. Among Goodness-of-Fit Statistics methods listed in the program’s website, I have seen Sum of Squares Due to Error which is described as “This statistic measures the total deviation of the response values from the fit to the response values. It is also called the summed square of residuals and is usually labeled as SSE. A value closer to 0 indicates that the model has a smaller random error component, and that the fit will be more useful for prediction.” I wonder if you could explain what the difference between these two methods? Many thanks!

Jim Frost says

Hi Tugba, SSE and S are related. In fact, you need to calculate SSE before you can calculate S. SSE is the sum of the squared residuals. And, it’s true that higher values indicate higher error, but it’s impossible to interpret SSE by itself. For one thing, it’s in squared units rather than the natural units. And, it’s also a sum that increases based on both the square of each residual but also based on the number of residuals. A larger sample size will have a large SSE just because there are more residuals. As you add more residuals, this sum just keeps going up even when error is low. The formula for S is below:

N is the number of observations and P is the number of parameters. S takes the squared sum and divides it by (N-P). This controls for the number of observations and parameters. And, then takes the square root so it’s in natural units.

I hope this helps!

Khurram Siddiqui says

Hi Jim,

Nice article two question here

what about SE increasing when more variables comes in the model?If we built the same model with twice as much variable , SE might be twice as big?

SE tells you exactly the absolute and true value of your line goodness fit , on the other hand R square tell you how much (in %) you are good compare to with your baseline. SE does not tell you this

Thanks

Jim Frost says

Hi Khurram,

For the same dataset, R-squared and S will tend to move in opposite directions based on your model. As R-squared increases, S will tend to get smaller. Remember, smaller is better for S. With R-squared, it will always increase as you add any variable even when it’s not statistically significant. However, S is more like adjusted R-squared. Adjusted R-squared only increases when you add good independent variable (technically t>1). S uses the same adjustment as adjusted R-squared. It will only decrease when you add “good” variables. If you add twice as many variables to your model, S should decreases unless all of those extra variables are not good. Read my post about adjusted R-squared for more details about that.

Yes, S tells you the absolute value for the standard distance that the residuals fall from the fitted values. R-squared indicates the percentage of the variance of the dependent variable around its mean that the model accounts for. Different types of information–absolute versus relative measures. You can use them both as needed.

I hope this helps!

Tugba says

Hi Jim. Thank you very much for the explanation! Since the program can calculate SSE, I can calculate the S! I have one more question for clarification. You said N is the number of observation. It is probably very basic statistic definition. But, my understanding is that the number of observation is the number of data points (for example in x-y scatter plot). Is my thinking correct? Again, thanks!

One side note. In general, your website is excellent! Easy to read, follow and understand! I read some other of your posts and I have just started to read basics too. Moreover, your responds are quick and super helpful! Thank you very much!

Jim Frost says

Hi, yes, that’s correct about data points.

Thanks for the kind words about my website. I really appreciate them and gives me good motivation to keep going with it!

Happy new year!

Meysam Rahmanian Shahri says

Hi Jim, Thank you for you amazing explanation.

If a nonlinear function such as power function used to fit the data set , can we convert the nonlinear function to linear function just by calculating the Ln of dependent and independent variables and then use R-square as a parameter which can be used to show the goodness of fit?

Jim Frost says

Hi Meysam, there are some ways of using linear models to fit nonlinear functions, such as using logs to transform the data. In this case, the linear model produces an R-squared value. However, the R-squared value is applicable only to the transformed data and not the original data.

Meysam Rahmanian Shahri says

Hi Jim, Do you think that using Mean Relative Error (MRE) is an appropriate way to evaluate the goodness of fit for non linear functions?

Jim Frost says

Hi Meysam, I’m not familiar with using MRE to evaluate nonlinear regression models, so I can’t comment.

Meysam Rahmanian Shahri says

Hi, Thank you for your complete explanation.

Actually we have a experimental data set and also we have several power correlations trying to estimate the experimental data. I want to know which one of these correlation provide the best goodness of fit !!

How can we determine the SEE of our correlations? Can we transform the power correlation to a linear correlation and then calculate the R square and SSE of the linear correlation as a criteria to compare the goodness of fit between the presented correlation? Because i think the calculated SSE for power and transformed correlation (linear correlation) would not be the same. What do you recommend me to do to present a more quantitative way to compare the difference of goodness between the correlations?

Thanks in advance for your help.

Jim Frost says

Hi, I’m not completely clear about the situation you are describing. Can you fit the different nonlinear models and compare the SEE?

YS KIM says

Thanks for the posting. Now, I am studying STAT course and this posting is really helpful to understand R^2 vs. SE

Jim Frost says

You’re very welcome. I’m happy to hear that you found it to be helpful! Best of luck with your Stats course!

Paul says

Hi Jim,

thanks for this great article! I have one question though: I’m completely unclear about why R-squared would be an appropriate measure of fit, if I have a model that is linear in the parameters, but potentially highly nonlinear in terms of the mapping from inputs to outputs. The reason I ask is that I’m using Gaussian Process models for regression, which can produce highly nonlinear interpolant functions, but the response is linear in the parameters. I can’t decide if R-squared is a valid measure or not! Any pointers?

thanks,

Paul

Jim Frost says

Hi Paul,

Thanks! As you surmise, it comes down to whether the model is linear in the parameters. If it is, you have a linear model and R-squared is appropriate. To be sure that we’re on the same page about what “linear in the parameters” means, read my post about the difference between linear and nonlinear models.

As for why it is appropriate, it’s based on the math behind the scenes. I talk about this in a post where I explain why R-squared is not valid for nonlinear models. I include a reference that can help you understand in more depth.

Also, be aware that if you transform the data, which is a way to use a linear model to fit nonlinear data, then both R-squared and S apply to the transformed data rather than the original data. This fact can cause both of these statistics to be misleading.

I hope this helps!

Jim

Paul says

Thanks Jim!

Those links are very helpful. Although I now cant’ help wondering *why* this identity:

Explained variance + Error variance = Total variance

no longer holds for nonlinear models. Is there a proof anywhere that you can point me to? I understand that Spiess and Neumeyer have done an experimental study, it’s just that I’d like to convince myself one way or the other about the particular technique I’m using.

A related question: If I define R-squared a different way, say as:

1 – SS(regression) / SS(total)

is this any better for the nonlinear case?

thanks!

Jim Frost says

Hi Paul, apologies for the delay in replying. I haven’t looked into a specific proof about why this is true, but it has to exist. The key point is that R-squared was designed for linear regression, which is a very specific case. When you think about all of the potential model forms (virtually infinite), linear models are just one very restricted type. In a way, it’s surprising how often linear models provide an adequate fit. We like them because they’re easier to interpret, we can get p-values for the predictors, and, of course, R-squared. However, for all the math to work out correctly, you need the overarching framework that linear models provide. Once you move outside that framework, linear calculations literally don’t add up correctly!

I don’t think redefining R-squared resolves the underlying problem.

I know this doesn’t answer your specific question directly, but that’s the gist of the problem.

Paul says

Sorry – I meant

1 – SS(residual) / SS(total)

Colin Ware says

Dear Jim,

I want to compare two models using the criteria of which best fits the data through a linear regression.

E.g. model A gives me r2 = 0.9, model B gives me r2 = 0.97.

The models are each designed to account for the same 300 observations, and there are two model parameters.

Can I take the ratio of the residual variance Ra =(1-r2) for A and the residual variance (Rb) for B and use this to do a simple F test?

Perhaps where the number of degrees of freedom = 300 -4. The number 4 relates to the number of degrees of freedom in my model (2) and the number of degrees of freedom in the regression (2).

For this example, the F ratio will be 0.1/0.03 = 3.33 and is highly significant.

Hope this all makes sense. Also, if it does, is there a standard reference I can use in a publication?

Jim Frost says

Hi Colin, statistically, that approach sounds OK. F-tests are good for comparing different models. The publication I know offhand that discusses this is the standard linear model textbook that I always recommend: Applied Linear Regression Models by Kutner et al. Here’s a paper from Duke about using F-tests to compare models.

As you do this, keep the following points in mind:

– check the residual plots

– make sure that the models (i.e., coefficient signs and magnitudes) make theoretical sense

It’s never good practice to choose the final model based solely on statistical significance. The residuals have to look good and the model has to make sense.

I hope this helps!

Colin Ware says

Thanks so much.

-Colin

Troy says

Great information! Im studying stats and im wondering if there is a coefficient of variation that can be use to measure the variation in the regression line? And is it possible to interpret standard error of regression as a percentage without unit?

Wordpektif says

Hi Jim,

thanks for your amazing explanation! but I am still confused about the unit for S. how S (S^2 = SSE / N-P) gives a percent unit?

Thanks so much.

Jim Frost says

Hi, please note that as I explain in the blog post, that S is unlike R-squared because it is not a percentage. Although, I can see how the specific example I used can create a bit of confusion.

The standard error of the regression (S) provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.

For the specific example that I used, the dependent variable is body fat percentage. So, in this case the units of the dependent variable is a percentage, which means that S is a percentage. However, suppose we create a different model where the dependent variable is weight in kilograms. For that model, S represents kilograms.

I hope this clarifies things. That was a great question!

Jozef says

Dear Jim, thanks for your explanation.

I have got a question concerning this topic. Whe I used two models (Cubic and Power), I have received values of Standard Error of Estimate (Cubic – 213, Power – 0,113). Based on these two numbers the Power model is better model than Cubic model. But when I look at the graphs, I see, that curve for Cubic model better desribes my data than Power model curve. Is there any explanation why this situation happens to me?

Many thanks.

Jim Frost says

Hi Jozef,

Without fully understanding the model you fit, it’s hard to say. However, two possibilities come to mind.

1. Is it possible that differences in graph scaling are making it appear that Cubic model fits better when it actually doesn’t?

2. Are you taking the log of both sides of the equation for the power model?

I suspect that #2 is more likely the case. If that is true, keep in mind that S is based on the transformed data rather than the natural units. In this case, it is not valid to compare S and R-squared between these models.

Geoff says

Hi Jim, great stuff. Would you consider the combined Standard Error of the Slope and the Standard Error of the Intercept as useful as the Standard Error of the Regression (after summing the slope and intercept variances, then taking the square root)?

Jim Frost says

Hi Geoff, sorry for the delay in replying to this! I don’t think those two standard errors (of the slope and of the intercept) are useful as the standard error of the regression because the usefulness of the information they provide is different.

The standard error of the regression tells you how far the observations tend to fall from the fitted values. It’s essentially the standard deviation for the population of residuals. That seems to be useful information because it’s telling you in absolute terms the typical size of a residual. You can also obtain similar type of information with prediction intervals.

The standard error of the slope tells you the standard deviation of the sampling distribution for the slope. And for the constant, it’s the standard deviation of the sampling distribution for the constant. I don’t find that type of information to be as useful. They’re starting to get at the precision of the estimates for the slope and the constant. However, if you’re looking for that type of information, I’d recommend obtaining the confidence intervals for the slopes and the constant instead. These CIs provide ranges that are likely to include the population values for the slope and the constant.

I hope this helps!

Allif says

Thank you so much jim

Julia Hunter says

Hey Jim,

I am comparing data with a forced zero intercept. Does forcing the intercept to zero affect the outcome of the standard error of regression?

Thank you for the better understanding of standard error of regression.

Jim Frost says

Hi Julia,

Yes it does! I’ve written a blog post about the regression constant that touches on this. Towards the end, I talk about models that models where you leave the constant out of the model, which forces the fitted line to go through the origin. This affects both R-squared and the standard error of the regression. For reasons that I discuss in that post, you have to be very careful when you force the line to go through zero!

I hope this helps!

Julia Hunter says

Hey Jim,

Thank you for your help regarding the regression constant. This helps explain the errors I may face with making my y-intercept = 0. However, I still need to calculate the standard error of regression. If I am setting my y-intercept to 0, what is the best equation for this situation? I have found the following equation, but want to make sure it is correct:

s = √((∑y^2 – b0 ∑y – b1 ∑xy) / (n-2))

I assume that the middle part of the equation (b0 ∑y) would just be equal to 0 because of the forced y-intercept to 0.

Thank you for all of your help!

Jim Frost says

Hey Julia,

Sorry, when I meant that it affects the standard error of the regression (s), I meant that it can affect the value of s rather than the equation itself. As you can see in the equation below, you’re simply summing the squared differences between the observed and fitted values and dividing by the DF–and then taking the square root of that. When you force the equation through zero, you still sum the squared differences. However, the fitted values will be different if the forced zero intercept changes the slope of the fitted line.

The standard error of the regression (s) is the square root of the mean square error (MSE), where:

p = the number of terms in the model not counting the constant.

Geoff says

Hi Jim, I just wondered if you had any feedback on my comment on April 15. Thanks

Jim Frost says

Hi Geoff, so sorry, I somehow missed your original comment. I’ll get back to you shortly about that. (I’m a little travel weary at the moment!)

Geoff says

Hi Jim, thanks for the explanation. To explain why I am asking this question, we do a regular test where the “X” values are markers that never change. The “Y” values are the positions of the marker in a solution. We run a Regression to see if the markers are sufficiently close to the fitted line. If I look at the equations for Standard Error of the Regression and Standard Error of the Slope, the only difference is, SE of the Slope has sqrt((∑(Xi – X-bar )^2) in the denominator. Since the Xi and X-bar values are always the same, it seems in this case that SE of the Slope can be used with the same validity as SE of Regression. In fact, the SE of Slope value is always > 1 and SE of Regression is a tiny decimal, which is not as friendly if checking visually.

SE Regression = sqrt((∑(Yi – Y-hat)^2/(n-2))

SE Slope = sqrt((∑(Yi – Y-hat)^2/(n-2)) / sqrt(∑(Xi – X-bar )^2)

Jim Frost says

Hi Geoff, thanks for providing the context, which is always key in statistics!

Yes, I think you’re correct about that. Essentially the denominator is a constant in your scenario. You’re just dividing the SE Regression value by a constant. If you’re looking for a more user friendly value and only need to identify smaller values (rather than interpreting the meaning of the value), I agree with what you’re saying. Additionally, if you have standards for the SE Regression value that you have to meet, you can use that constant value to map SE Regression values to SE Slope values.

Be aware that if you ever change the X values, that’s all thrown off. You wouldn’t be able to compare the pre- and post-change SE slope values to each other and you’d need to re-calculate any standards– at least for how you want to use them as a proxy for SE Regression.

Usually, the two statistics provide different types of information that you’d use for different reasons and that was what was confusing me earlier!

Khurram Siddiqui says

Hi Jim,

Thanks for detail – i was comparing SSE unnormalized one e.g. Sum(Yi – Y^i). Not the the one normalized one SQRT(SSE/N).

That’s why i was a bot confused – you covered normalized one not un normalized thats make sense. Thanks for giving such insights