The standard error of the regression (S) and R-squared are two key goodness-of-fit measures for regression analysis. While R-squared is the most well-known amongst the goodness-of-fit statistics, I think it is a bit over-hyped.

In this post, I’ll compare these two statistics. We’ll also work through a regression example to help make the comparison. I think you’ll see that the oft overlooked standard error of the regression can tell you things that the high and mighty R-squared simply can’t. At the very least, you’ll find that the standard error of the regression is a great tool to add to your statistical toolkit!

## Comparison of R-squared to the Standard Error of the Regression (S)

You can find the standard error of the regression, also known as the standard error of the estimate, near R-squared in the goodness-of-fit section of most statistical output. Both of these measures give you a numeric assessment of how well a model fits the sample data. However, there are differences between the two statistics.

- The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.
- R-squared provides the relative measure of the percentage of the dependent variable variance that the model explains. R-squared can range from 0 to 100%.

An analogy makes the difference very clear. Suppose we’re talking about how fast a car is traveling.

R-squared is equivalent to saying that the car went 80% faster. That sounds a lot faster! However, it makes a huge difference whether the initial speed was 20 MPH or 90 MPH. The *increased* velocity based on the percentage can be either 16 MPH or 72 MPH, respectively. One is lame, and the other is very impressive. If you need to know exactly how much faster, the relative measure just isn’t going to tell you.

The standard error of the regression is equivalent to telling you directly how many MPH faster the car is traveling. The car went 72 MPH faster. Now that’s impressive!

Let’s move on to how we can use these two goodness-of-fits measures in regression analysis.

## Standard Error of the Regression and R-squared in Practice

In my view, the standard error of the regression has several advantages. S tells you straight up how precise the model’s predictions are using the units of the dependent variable. This statistic indicates how far the data points are from the regression line on average. You want lower values of S because it signifies that the distances between the data points and the fitted values are smaller. S is also valid for both linear and nonlinear regression models. This fact is convenient if you need to compare the fit between both types of models.

For R-squared, you want the regression model to explain higher percentages of the variance. Higher R-squared values indicate that the data points are closer to the fitted values. While higher R-squared values are good, they don’t tell you how far the data points are from the regression line. Additionally, R-squared is valid for only linear models. You can’t use R-squared to compare a linear model to a nonlinear model.

Note: Linear models can use polynomials to model curvature. I’m using the term linear to refer to models that are linear in the parameters. Read my post that explains the difference between linear and nonlinear regression models.

## Example Regression Model: BMI and Body Fat Percentage

This regression model describes the relationship between body mass index (BMI) and body fat percentage in middle school girls. It’s a linear model that uses a polynomial term to model the curvature. The fitted line plot indicates that the standard error of the regression is 3.53399% body fat. The interpretation of this S is that the standard distance between the observations and the regression line is 3.5% body fat.

S measures the precision of the model’s predictions. Consequently, we can use S to obtain a rough estimate of the 95% prediction interval. About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.

For the regression example, approximately 95% of the data points lie between the regression line and +/- 7% body fat.

The R-squared is 76.1%. I have an entire blog post dedicated to interpreting R-squared. So, I won’t cover that in detail here.

**Related posts**: Making Predictions with Regression Analysis and Understand Precision in Applied Regression to Avoid Costly Mistakes

## I Often Prefer the Standard Error of the Regression

R-squared is a percentage, which seems easy to understand. However, I often appreciate the standard error of the regression a bit more. I value the concrete insight provided by using the original units of the dependent variable. If I’m using the regression model to produce predictions, S tells me at a glance if the model is sufficiently precise.

On the other hand, R-squared doesn’t have any units, and it feels more ambiguous than S. If all we know is that R-squared is 76.1%, we don’t know how wrong the model is on average. You do need a high R-squared to produce precise predictions, but you don’t know how high it must be exactly. It’s impossible to use R-squared to evaluate the precision of the predictions.

To demonstrate this, we’ll look at the regression example. Let’s assume that our predictions must be within +/- 5% of the observed values to be useful. If we know only that R-squared is 76.1%, can we determine whether our model is sufficiently precise? No, you can’t tell using R-squared.

However, you *can* use the standard error of the regression. For our model to have the required precision, S must be less than 2.5% because 2.5 * 2 = 5. In an instant, we know that our S (3.5) is too large. We need a more precise model. Thanks S!

While I really like the standard error of the regression, you can, of course, consider both goodness-of-fit measures simultaneously. This is the statistical equivalent of having your caking and eating it!

yadawananda neog says

Really appreciable explanation.

Jim Frost says

Thank you!

hamza says

great site for understand statistic.

thanks a Lot sir

Jim Frost says

You’re very welcome! I’m glad you found it to be helpful!

sid says

Hi Jim,

I am a beginner in statistics and have below doubts:

1) Is Standard error of regression(S) same as “Mean Squared error(MSE)” and this S or MSE is used in actually figuring out Standard error for b1 and bo which are essentially estimates of B1 and B0 respectively for population regression line :

Y = B0 + B1*X + e

?

2) I didn’t get the below line:

“For the regression example, approximately 95% of the data points lie between the regression line and +/- 7% body fat”

It’ll be really helpful if you can guide me, how did we arrive at “7%” value ?

3) For the line just before this:

“About 95% of the data points are within a range that extends from +/- 2 * standard error of the regression from the fitted line.”

I believe, for population regression line :

Y = B0 + B1*X + e

95% confidence interval for B1 approximately takes the form(:

b1 +- 2*SE(b1)

95% confidence interval for B0 approximately takes the form(:

b0 +- 2*SE(b0)

and that is how we get the /- 2 * standard error of the regression from the fitted line. Please correct me if I am wrong on this.

Thanks in advance!

Michael Heitmeier says

Would it be correct to say that R-squared does not work for non-linear models because the mean (which the R2 calculation depends on) is not capturing the essence of non-linear data in the way that it does for linear data? A mean can be calculated for both but linear data varies around the mean in a much more straightforward way.

Jim Frost says

Hi Machael, that’s not quite it. There are some calculations for R-squared that literally don’t add up correctly in nonlinear regression. I’ve written a post about this which covers those calculations and the problems it produces. R-squared is Not Valid for Nonlinear Regression.

I hope this helps clarify things!

Jim

Kevin M Armengol says

you explain statistical concepts very well!

Jim Frost says

Thank you!