R-squared tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model.

Multiple regression analysis can seduce you! Yep, you read it here first. It’s an incredibly tempting statistical analysis that practically begs you to include additional independent variables in your model. Every time you add a variable, the R-squared increases, which tempts you to add more. Some of the independent variables *will* be statistically significant. Perhaps there is an actual relationship? Or, is it just a chance correlation?

You just pop the variables into the model as they occur to you or just because the data are readily available. Higher-order polynomials curve your regression line any which way you want. But, are you fitting real relationships or just playing connect the dots? Meanwhile, the R-squared increases, mischievously convincing you to include yet more variables!

In my post about interpreting R-squared, I show how evaluating how well a linear regression model fits the data is not as intuitive as you may think. Now, I’ll explore reasons why you need to use adjusted R-squared and predicted R-squared to help you specify a good regression model!

## Some Problems with R-squared

Previously, I demonstrated that you cannot use R-squared to conclude whether your model is biased. To check for this bias, you need to check your residual plots. Unfortunately, there are yet more problems with R-squared that we need to address.

**Problem 1:** R-squared increases every time you add an independent variable to the model. The R-squared *never* decreases, not even when it’s just a chance correlation between variables. A regression model that contains more independent variables than another model can look like it provides a better fit merely because it contains more variables.

**Problem 2:** When a model contains an excessive number of independent variables and polynomial terms, it becomes overly customized to fit the peculiarities and random noise in your sample rather than reflecting the entire population. Statisticians call this overfitting the model, and it produces deceptively high R-squared values and a decreased capability for precise predictions.

Fortunately for us, adjusted R-squared and predicted R-squared address both of these problems.

## What Is the Adjusted R-squared?

Use adjusted R-squared to compare the goodness-of-fit for regression models that contain differing numbers of independent variables.

Let’s say you are comparing a model with five independent variables to a model with one variable and the five variable model has a higher R-squared. Is the model with five variables actually a better model, or does it just have more variables? To determine this, just compare the adjusted R-squared values!

The adjusted R-squared adjusts for the number of terms in the model. Importantly, its value increases only when the new term improves the model fit more than expected by chance alone. The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.

The example below shows how the adjusted R-squared increases up to a point and then decreases. On the other hand, R-squared blithely increases with each and every additional independent variable.

In this example, the researchers might want to include only three independent variables in their regression model. My R-squared blog post shows how an under-specified model (too few terms) can produce biased estimates. However, an overspecified model (too many terms) can reduce the model’s precision. In other words, both the coefficient estimates and predicted values can have larger margins of error around them. That’s why you don’t want to include too many terms in the regression model!

## What Is the Predicted R-squared?

Use predicted R-squared to determine how well a regression model makes predictions. This statistic helps you identify cases where the model provides a good fit for the existing data but isn’t as good at making predictions. However, even if you aren’t using your model to make predictions, predicted R-squared still offers valuable insights about your model.

Statistical software calculates predicted R-squared using the following procedure:

- It removes a data point from the dataset.
- Calculates the regression equation.
- Evaluates how well the model predicts the missing observation.
- And, repeats this for all data points in the dataset.

Predicted R-squared helps you determine whether you are overfitting a regression model. Again, an overfit model includes an excessive number of terms, and it begins to fit the random noise in your sample.

By its very definition, it is not possible to predict random noise. Consequently, if your model fits a lot of random noise, the predicted R-squared value must fall. A predicted R-squared that is distinctly smaller than R-squared is a warning sign that you are overfitting the model. Try reducing the number of terms.

If I had to name my favorite flavor of R-squared, it would be predicted R-squared!

**Related post**: Overfitting Regression Models: Problems, Detection, and Avoidance

## Example of an Overfit Model and Predicted R-squared

You can try this example using this CSV data file: PresidentRanking.

These data come from an analysis I performed that assessed the relationship between the highest approval rating that a U.S. President achieved and their rank by historians. I found no correlation between these variables, as shown in the fitted line plot. It’s nearly a perfect example of no relationship because it is a flat line with an R-squared of 0.7%!

Now, imagine that we are chasing a high R-squared and we fit the model using a cubic term that provides an S-shape.

Amazing! R-squared and adjusted R-squared look great! The coefficients are statistically significant because their p-values are all less than 0.05. I didn’t show the residual plots, but they look good as well.

Hold on a moment! We’re just twisting the regression line to force it to connect the dots rather than finding an actual relationship. We overfit the model, and the predicted R-squared of 0% gives this away.

If the predicted R-squared is small compared to R-squared, you might be over-fitting the model even if the independent variables are statistically significant.

## A Caution about the Problems of Chasing a High R-squared

All study areas involve a certain amount of variability that you can’t explain. If you chase a high R-squared by including an excessive number of variables, you force the model to explain the unexplainable. This is not good. While this approach *can* obtain higher R-squared values, it comes at the cost of misleading regression coefficients, p-values, R-squared, and imprecise predictions.

Adjusted R-squared and predicted R-square help you resist the urge to add too many independent variables to your model.

- Adjusted R-square compares models with different numbers of variables.
- Predicted R-square can guard against models that are too complicated.

Remember, the great power that comes with multiple regression analysis requires your restraint to use it wisely!

If you’re learning regression, check out my Regression Tutorial!

Duc-Anh Luong says

Hi Jim,

I have question about calculation of the predicted R squared in the linear regression.

(1). Is it true that in each time when we remove 1 data point, we have to fit model again and use this model to predict the values of removed data point?

(2). Is it possible to get negative predicted R-squared?

Many thanks

Duc Anh

Jim Frost says

Hi Duc-Anh,

When the statistical software calculates predict R-squared, it systematically removes each observation and determines how well the model based on all of the other observations predicts that value. The software does this for all observations in the dataset and calculates the predicted error sums of squared (PRESS). It then uses the PRESS to calculate the predicted R-squared. Usually, it uses the error sum of squares (ESS) to calculate R-squared. All of these calculations occur behind the scenes. You don’t need to worry about refitting the model for each observation. All you need to do is assess the predicted R-squared with that process in mind so you know what it really means.

Yes, it is possible to obtain a negative predicted R-squared. However, some statistical software, such as Minitab, rounds these negative values up to zero.

Thank you for writing with your excellent questions,

Jim

Franklin Moormann says

I’m trying to create my own formula to calculate predicted rsquared and this was the only information that I found on how to do it. I believe the formula to do this is predicted r2 = 1 – (press / tss) so would you systematically leave off one data point at a time and calculate the press statistic and tss statistic and add those values to a final total and calculate predicted r2 at the end?

Jim Frost says

Hi Franklin, here’s the predicted R-squared and PRESS formulas. The formulas don’t actually go through and remove each observation one-at-a-time, but it is equivalent to that process.

Franklin Moormann says

I’m only supposed to remove one observation at a time to recalculate the prediction model but after that, I’m supposed to use all original observations to run the calculations for press and tss?

Tim says

Hi Jim,

I know the way how R-squared is calculated in logistic regression is different. I wonder what would you do if a reviewer asks you to provide similar indicator.

Thanks!

Tim

Jim Frost says

Hi Tim,

There are two measures I’m most familiar with for logistic regression. One is deviance R-squared for binary logistic regression. This statistic measure the proportion of the deviance in the dependent variable that the model explains. Unlike R-squared, the format of the data affects the deviance R-squared.

The other is Akaike Information Criterion (AIC), which measures the quality of a model based on fit and the number of terms in the model.

Jim

Franklin Moormann says

I have no clue how to do diagonal elements in C# so I guess I’m going to have to go through and eliminate one observation at a time and then calculate the press and rss after each elimination. Since I’m doing that, how would I calculate the press statistic instead of doing the diagonal matrix stuff?

Franklin Moormann says

I found a workaround but I’m now getting a negative value for the press statistic so when I divide by the total sum of squares it is returning 1 which I know isn’t correct

Jim Frost says

Hi Franklin, actually for predicted R-squared (and adjusted R-squared) it is possible to get negative values!

Jun Li says

Hello Jim,

I develop an nonlinear regression model in R studio with R2 (0.904), R2(adj) 0.864 and R2 (predicted) 0.919. I wonder if it is possible that predicted R2 higher than the normal R2?

Hope for your reply.

Jun Li

Jim Frost says

Hi Jun Li,

First we need to make sure we’re clear on some terminology. Did you develop a true nonlinear model or is it a linear model that uses polynomials to model curvature? You can read about the differences in my post: The Difference between Linear and Nonlinear Models.

It’s an important distinction because R-squared and its variants are not valid for nonlinear models. If you are truly using a nonlinear model, I suppose it might be possible to obtain a Predicted R-squared that is higher than R-squared. Maybe. But, you shouldn’t be using any of those R-squared values because they are invalid. You can use another goodness-of-fit statistic, such as the standard error of the regression.

For linear models, you can’t obtain a predicted R-squared that is higher than R-squared. That scenario would indicate that the model predicts new observations

betterthan it predicts the values used during the model fitting process. That makes no sense.I hope this helps!

Franklin Moormann says

I’m not explaining well enough I believe. This is my formula results using junk data (with a rsquared value of 0.2)

Predicted Rsquared = 1 – (PRESS / TSS) = 1 – (-1.04 / 67408.86) = 1.00

So as you can see something is definitely wrong.

Allan Paolo Labartinos Almajose says

Hi Jim! I’d like to ask for help regarding the calculation of predicted R-squared values. To be honest, my nose bled (lol) after seeing the formula for the PRESS you provided in one of the comments above. Is there a ‘layman’s way’ of computing this?

Actually, I had this idea:

– I remove one data point

– I regress the remaining points using the same model

– I try to predict the missing data point using the same model previously recalculated (the one with the reduced data point)

– The difference between the prediction of the model with complete data points and the prediction of the new model with one data point removed is the PRESS of that point?

– I do this again for all of the remaining points

– I add all of the PRESS for each point, then sum-square everything, then compute R^2 normally, then this R^2 is now the predicted R^2?

Is this even correct? I don’t know, this is just a wild guess. Please help me, I am totally at a loss here. Thanks!

Jim Frost says

Hi Allan, you’re very close! Think about how you usually calculate sums of squares. It’s the sum of the squared deviations between the the fitted values and the observations. PRESS is similar except it is the sum of the squared deviations between the fitted value of each removed observation and the removed observation. So, the procedure basically removes each observation and uses the model to predict that observation and squares the difference between the two. It does that systematically for all observations and sums those squared differences. For your 4th point, you never fit the model with all observations when calculating predicted R-squared. Instead, there is always one removed observation and you’re essentially seeing how well the model predicts each removed observation. I hope this makes it more clear!

ALMAS KHURSHEED says

hi sir

i am very confuse how to write interpret statement for r2 if value is 0.68

can u please help me out

thank you

Jim Frost says

Hi Almas, it means that the independent variables in your model collectively account for 68% of the variability in the dependent variable around its mean. Click the link in the post to go to my post where I talk about R-squared in more detail. I hope this helps!

MUHAMMAD K. N. says

Hi Jim ! I am working for a research on monitoring insect pest population fluctuation in Entomological field, but I obtained mostly weaker r squared regression results and felt disturbed. What advise can you give me in this regards.

Thanks

Muhammad.

Jim Frost says

Hi Muhammad! Unfortunately, that situation isn’t too uncommon and I’ve written a blog post that is specifically about it:

Interpreting a Regression Model with a Low R-squared

A low R-square might or might not be a problem. If you have significant independent variables and your main goal is to understand the relationships between the variables, a low R-squared is not necessarily a problem.

However, if your main goal is to produce precise predictions, it can be a problem.

The blog post I recommend covers these scenarios and shows how it works. I think it’ll make your situation more clear!

Emanuel Lindström says

Hi Jim!

Awesome blog, and awesome posts! I’m learning a lot!!

I have 2 questions;

1. How is the predicted R-squared actually calculated? The step-by-step process you describe is iterated for each data point in the population, but does that mean you get as many predicted R-squared as there are data points, or do you do an additional step after iterating over all the data points?

2. Does predicted R-squared work even for large samples? I mean, it’s easy to see how the polynomial line in the image changes if you remove a data point, but if there are more data points (100 more, or even 1000 more), wouldn’t the over-fitted polynomial line stay the same and predict the one omitted data point?

Again, thanks for an amazing resource!

Jim Frost says

Hi Emanuel,

Thanks so much! I’m glad you have found it to be helpful!

About predicted R-squared, which is really my favorite type of R-squared. Think about the error sum of squares (SSE). This is where you take the squared differences between each observation and the fitted value and sum them up across all observationa. It’s also known as the residual sum of squares because it’s the sum of the squared residuals. A small value produces a high R-squared.

For predicted R-squared, you use the predicted error sum of squares (PRESS), which is similar to the SSE. To calculate PRESS, you remove a point, refit the model, and then use the model to predict the removed observation. Then, you take the removed value and subtract the predicted value and then square this difference. You repeat for all of the removed values. You end up with a squared difference for each value when it is removed. You then sum those squared differences and you have PRESS. A low PRESS value produces a high predicted R-squared. So, it’s fairly analogous to the SSE but the squared differences are based on predicting the missing values versus values that were used to fit the model.

Regarding point 2, yes, you’re correct, when you have more data points, it’s harder to overfit your model and, hence, you wouldn’t expect a much lower predicted R-squared. Imagine you have a 1000 data points that follow the same U-shaped pattern. In that case, you’d be really sure about that curved relationship because such a large number of data points aren’t going to follow that curve by chance. That’s why you wouldn’t expect the predicted R-squared to drop when you have many data points. However, fewer data points can produce that pattern by chance. If you remove one, it changes that relationship noticeably. You’re not really certain that the relationship really is that U-shape. Predicted R-squared detects this uncertainty and that’s why it drops.

Overfitting depends on the number of observations per term in the model, as you can read about in my post about overfitting. You’d need a very, very complex model to overfit a dataset with 1000 observations!

I hope this helps!

Franklin Moormann says

When calculating predicted rsquared for a full dataset of 4000 data points, you would do all 4000 or a random sample of those 4000 data points?

Jim Frost says

The procedure always cycles through the complete dataset and systematically removes one data point at a time to calculate predicted R-squared.

Sundar says

Dear Jim,

As usual brilliant post. However I would like to know about “The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.” How does the adjusted R-square determines if addition of a variable has a positive or negative effect on the model.

Thanks

\

Jim Frost says

Hi Sundar, the adjusted R-squared value decreases when the t-value for the coefficient is less than 1.

Juan says

Dear Jim,

I recently started using Minitab for DoE. I work with an extraction process to evaluate the recovery (Yield) of proteins. Evaluation of a half-factorial set of experiments with 5 variables DoE gave me a very good regression model with R2(98.29%); adj-R2 (97.35%) and pred-R2 (95.57%). However I noticed that my model indicates that the curvature is significant (P = 0.022). What is the effect of this curvature on the predictive power of the model? in other words, is this model still good to make predictions? or is a CCD required?

Jim Frost says

Hi Juan,

Yes, if the software detects curvature, it is usually a good idea to model that curvature. While R-squared is high, you are trying to model a curve using a straight line, and that will lead to biased predictions. For example, certain ranges of predictions might be systematically too high while other ranges could be systematically too low. In my post about R-squared, in the section “Are High R-squared Values Always Great?”, I show an example where the R-squared value is at 98.5% but the predictions are biased. Your case is probably something like that–although obviously not necessarily mirroring the specific relationship that I show. A high R-squared, and adjusted R-squared, don’t necessarily indicate that the model provides an unbiased fit. Check those residual plots!

Thanks for writing. I hope this helps!

Juan says

Dear Jim,

Thanks for your explanation and fast response. Congratulations on such a good blog, it is very valuable to be able to discuss / understand this topics in more friendly manner.

With respect to my question, I still have a couple of doubts.

– I can understand that one could obtain a high R2 and R2 (adj) in a model with significant curvature but, shouldn’t the R2(pred) be generally low?

– isn’t the prediction power of the regression covered by including in the regression equation the center point?

I other words (correct me if i’m wrong), when curvature is significant in the regression model, then the R2(Pred) is not relevant anymore and the model should not be used for predictive purposes?

Considering your comment on the residual plots, My versus fit seems (not clear though) that there might be a pattern (scatter reduced as the fitted value is higher). Thus I did a regression after a Box-cox transformation (Lambda =0.25) , eliminated variables with P>0.1 and I obtain a regression where curvature is not significant (P= 0.2!!) and again great R2 values (R2:99%; R2adj: 98.7 and R2Pred: 96.8%)…how to interpret this? is this resulting regression trustworthy and could it be used for predictive purposes?

Thanks in advance for your time,

Regards,

Juan S.

Jim Frost says

Hi Juan,

Yes, it’s definitely possible that Predicted R-squared would be affected by inadequately modeling the curvature. However, the degree to which the lack-of-fit affects it depends on how inadequate the fit is and the number of observations. So, I couldn’t tell you specifically for your case whether it would be low or not. But, definitely the lack of fit would impact it to some degree.

Center points allow you to detect curvature but are not sufficient to model the curvature.

I would agree, as I mention in my previous response, that I would not use the model to make predictions when you know that it inadequately fits curvature that is present in the data. In that sense, yes, it doesn’t matter what Predicted R-squared is because you know the predictions are biased. As I mentioned, high R-squared values of any type do not indicate that your model provides an unbiased fit.

That pattern that you describe is heteroscedasticity. In my post about it, I discuss other options for resolving it. A Box-Cox transformation is a recognized way to fix this problem, but I usually save that for last solution I try. I prefer solutions that involve less data manipulation. I’m also a bit leery of how it transformed away the curvature issue. However, I don’t have any specific reason to say that you shouldn’t trust the model based on the limited information that I have. Just be sure to closely examine the coefficients and be really certain that the signs and magnitudes fit with theory.

Also, be aware that the model fit statistics (the various R-squared values and S) apply to the transformed response variable and not the response using natural units. That can make the model appear better than it is. Although, they were high before the transformation, so no reason for concern.

Juan says

Dear Jim,

Thanks a lot for your response, it answered some questions that I had for quite some time without finding a clear/understandable explanation. I will certainly continue to follow the blog, it is a very valuable source of information specially for us non-statisticians. I have already recommended it to my colleges and i’m sure they will agree with me.

Best regards,

Juan F.

Jim Frost says

Thanks so much, Juan. I appreciate that!

Tejaswi Dalavi says

what is the exact difference between R square & adjusted R square.which is better?

Jim Frost says

Hi Tejaswi, you’re in the right place to learn about the differences. This blog post describes adjusted R-squared. In it, there’s a link to my blog post about the regular R-squared. Between the two posts, you’ll know all about both types. Adjusted R-squared is the better of the two. Although, my favorite is actually predicted R-squared.

Kripa says

Hi Sir,

Can you help me to interpret R squared value of .166 and Adjusted R squared value of .158?

Jim Frost says

Hi Kripa,

These blog posts should provide you with enough information so you know how to interpret these values.

The R-squared value indicates that your model accounts for 16.6% of the variation in the dependent variable around its mean. That’s usually considered a low amount. You typically interpret adjusted R-squared in conjunction with the adjusted R-squared values from other models. Use adjusted R-squared to compare the fit of models with a different number of independent variables.

Additionally, regular R-squared from a sample is biased. It tends to over-estimate the true R-squared for the population. Adjusted R-squared is an unbiased estimate of the population value.

I hope this helps!