R-squared tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model.

Multiple regression analysis can seduce you! Yep, you read it here first. It’s an incredibly tempting statistical analysis that practically begs you to include additional independent variables in your model. Every time you add a variable, the R-squared increases, which tempts you to add more. Some of the independent variables *will* be statistically significant. Perhaps there is an actual relationship? Or, is it just a chance correlation?

You just pop the variables into the model as they occur to you or just because the data are readily available. Higher-order polynomials curve your regression line any which way you want. But, are you fitting real relationships or just playing connect the dots? Meanwhile, the R-squared increases, mischievously convincing you to include yet more variables!

In my post about interpreting R-squared, I show how evaluating how well a linear regression model fits the data is not as intuitive as you may think. Now, I’ll explore reasons why you need to use adjusted R-squared and predicted R-squared to help you specify a good regression model!

## Some Problems with R-squared

Previously, I demonstrated that you cannot use R-squared to conclude whether your model is biased. To check for this bias, you need to check your residual plots. Unfortunately, there are yet more problems with R-squared that we need to address.

**Problem 1:** R-squared increases every time you add an independent variable to the model. The R-squared *never* decreases, not even when it’s just a chance correlation between variables. A regression model that contains more independent variables than another model can look like it provides a better fit merely because it contains more variables.

**Problem 2:** When a model contains an excessive number of independent variables and polynomial terms, it becomes overly customized to fit the peculiarities and random noise in your sample rather than reflecting the entire population. Statisticians call this overfitting the model, and it produces deceptively high R-squared values and a decreased capability for precise predictions.

Fortunately for us, adjusted R-squared and predicted R-squared address both of these problems.

## What Is the Adjusted R-squared?

Use adjusted R-squared to compare the goodness-of-fit for regression models that contain differing numbers of independent variables.

Let’s say you are comparing a model with five independent variables to a model with one variable and the five variable model has a higher R-squared. Is the model with five variables actually a better model, or does it just have more variables? To determine this, just compare the adjusted R-squared values!

The adjusted R-squared adjusts for the number of terms in the model. Importantly, its value increases only when the new term improves the model fit more than expected by chance alone. The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.

The example below shows how the adjusted R-squared increases up to a point and then decreases. On the other hand, R-squared blithely increases with each and every additional independent variable.

In this example, the researchers might want to include only three independent variables in their regression model. My R-squared blog post shows how an under-specified model (too few terms) can produce biased estimates. However, an overspecified model (too many terms) can reduce the model’s precision. In other words, both the coefficient estimates and predicted values can have larger margins of error around them. That’s why you don’t want to include too many terms in the regression model!

## What Is the Predicted R-squared?

Use predicted R-squared to determine how well a regression model makes predictions. This statistic helps you identify cases where the model provides a good fit for the existing data but isn’t as good at making predictions. However, even if you aren’t using your model to make predictions, predicted R-squared still offers valuable insights about your model.

Statistical software calculates predicted R-squared using the following procedure:

- It removes a data point from the dataset.
- Calculates the regression equation.
- Evaluates how well the model predicts the missing observation.
- And, repeats this for all data points in the dataset.

Predicted R-squared helps you determine whether you are overfitting a regression model. Again, an overfit model includes an excessive number of terms, and it begins to fit the random noise in your sample.

By its very definition, it is not possible to predict random noise. Consequently, if your model fits a lot of random noise, the predicted R-squared value must fall. A predicted R-squared that is distinctly smaller than R-squared is a warning sign that you are overfitting the model. Try reducing the number of terms.

If I had to name my favorite flavor of R-squared, it would be predicted R-squared!

**Related post**: Overfitting Regression Models: Problems, Detection, and Avoidance

## Example of an Overfit Model and Predicted R-squared

You can try this example using this CSV data file: PresidentRanking.

These data come from an analysis I performed that assessed the relationship between the highest approval rating that a U.S. President achieved and their rank by historians. I found no correlation between these variables, as shown in the fitted line plot. It’s nearly a perfect example of no relationship because it is a flat line with an R-squared of 0.7%!

Now, imagine that we are chasing a high R-squared and we fit the model using a cubic term that provides an S-shape.

Amazing! R-squared and adjusted R-squared look great! The coefficients are statistically significant because their p-values are all less than 0.05. I didn’t show the residual plots, but they look good as well.

Hold on a moment! We’re just twisting the regression line to force it to connect the dots rather than finding an actual relationship. We overfit the model, and the predicted R-squared of 0% gives this away.

If the predicted R-squared is small compared to R-squared, you might be over-fitting the model even if the independent variables are statistically significant.

## A Caution about the Problems of Chasing a High R-squared

All study areas involve a certain amount of variability that you can’t explain. If you chase a high R-squared by including an excessive number of variables, you force the model to explain the unexplainable. This is not good. While this approach *can* obtain higher R-squared values, it comes at the cost of misleading regression coefficients, p-values, R-squared, and imprecise predictions.

Adjusted R-squared and predicted R-square help you resist the urge to add too many independent variables to your model.

- Adjusted R-square compares models with different numbers of variables.
- Predicted R-square can guard against models that are too complicated.

Remember, the great power that comes with multiple regression analysis requires your restraint to use it wisely!

If you’re learning regression, check out my Regression Tutorial!

Duc-Anh Luong says

Hi Jim,

I have question about calculation of the predicted R squared in the linear regression.

(1). Is it true that in each time when we remove 1 data point, we have to fit model again and use this model to predict the values of removed data point?

(2). Is it possible to get negative predicted R-squared?

Many thanks

Duc Anh

Jim Frost says

Hi Duc-Anh,

When the statistical software calculates predict R-squared, it systematically removes each observation and determines how well the model based on all of the other observations predicts that value. The software does this for all observations in the dataset and calculates the predicted error sums of squared (PRESS). It then uses the PRESS to calculate the predicted R-squared. Usually, it uses the error sum of squares (ESS) to calculate R-squared. All of these calculations occur behind the scenes. You don’t need to worry about refitting the model for each observation. All you need to do is assess the predicted R-squared with that process in mind so you know what it really means.

Yes, it is possible to obtain a negative predicted R-squared. However, some statistical software, such as Minitab, rounds these negative values up to zero.

Thank you for writing with your excellent questions,

Jim

Franklin Moormann says

I’m trying to create my own formula to calculate predicted rsquared and this was the only information that I found on how to do it. I believe the formula to do this is predicted r2 = 1 – (press / tss) so would you systematically leave off one data point at a time and calculate the press statistic and tss statistic and add those values to a final total and calculate predicted r2 at the end?

Jim Frost says

Hi Franklin, here’s the predicted R-squared and PRESS formulas. The formulas don’t actually go through and remove each observation one-at-a-time, but it is equivalent to that process.

Franklin Moormann says

I’m only supposed to remove one observation at a time to recalculate the prediction model but after that, I’m supposed to use all original observations to run the calculations for press and tss?

Tim says

Hi Jim,

I know the way how R-squared is calculated in logistic regression is different. I wonder what would you do if a reviewer asks you to provide similar indicator.

Thanks!

Tim

Jim Frost says

Hi Tim,

There are two measures I’m most familiar with for logistic regression. One is deviance R-squared for binary logistic regression. This statistic measure the proportion of the deviance in the dependent variable that the model explains. Unlike R-squared, the format of the data affects the deviance R-squared.

The other is Akaike Information Criterion (AIC), which measures the quality of a model based on fit and the number of terms in the model.

Jim

Franklin Moormann says

I have no clue how to do diagonal elements in C# so I guess I’m going to have to go through and eliminate one observation at a time and then calculate the press and rss after each elimination. Since I’m doing that, how would I calculate the press statistic instead of doing the diagonal matrix stuff?

Franklin Moormann says

I found a workaround but I’m now getting a negative value for the press statistic so when I divide by the total sum of squares it is returning 1 which I know isn’t correct

Jim Frost says

Hi Franklin, actually for predicted R-squared (and adjusted R-squared) it is possible to get negative values!

Franklin Moormann says

I’m not explaining well enough I believe. This is my formula results using junk data (with a rsquared value of 0.2)

Predicted Rsquared = 1 – (PRESS / TSS) = 1 – (-1.04 / 67408.86) = 1.00

So as you can see something is definitely wrong.

Allan Paolo Labartinos Almajose says

Hi Jim! I’d like to ask for help regarding the calculation of predicted R-squared values. To be honest, my nose bled (lol) after seeing the formula for the PRESS you provided in one of the comments above. Is there a ‘layman’s way’ of computing this?

Actually, I had this idea:

– I remove one data point

– I regress the remaining points using the same model

– I try to predict the missing data point using the same model previously recalculated (the one with the reduced data point)

– The difference between the prediction of the model with complete data points and the prediction of the new model with one data point removed is the PRESS of that point?

– I do this again for all of the remaining points

– I add all of the PRESS for each point, then sum-square everything, then compute R^2 normally, then this R^2 is now the predicted R^2?

Is this even correct? I don’t know, this is just a wild guess. Please help me, I am totally at a loss here. Thanks!

Jim Frost says

Hi Allan, you’re very close! Think about how you usually calculate sums of squares. It’s the sum of the squared deviations between the the fitted values and the observations. PRESS is similar except it is the sum of the squared deviations between the fitted value of each removed observation and the removed observation. So, the procedure basically removes each observation and uses the model to predict that observation and squares the difference between the two. It does that systematically for all observations and sums those squared differences. For your 4th point, you never fit the model with all observations when calculating predicted R-squared. Instead, there is always one removed observation and you’re essentially seeing how well the model predicts each removed observation. I hope this makes it more clear!

ALMAS KHURSHEED says

hi sir

i am very confuse how to write interpret statement for r2 if value is 0.68

can u please help me out

thank you

Jim Frost says

Hi Almas, it means that the independent variables in your model collectively account for 68% of the variability in the dependent variable around its mean. Click the link in the post to go to my post where I talk about R-squared in more detail. I hope this helps!

MUHAMMAD K. N. says

Hi Jim ! I am working for a research on monitoring insect pest population fluctuation in Entomological field, but I obtained mostly weaker r squared regression results and felt disturbed. What advise can you give me in this regards.

Thanks

Muhammad.

Jim Frost says

Hi Muhammad! Unfortunately, that situation isn’t too uncommon and I’ve written a blog post that is specifically about it:

Interpreting a Regression Model with a Low R-squared

A low R-square might or might not be a problem. If you have significant independent variables and your main goal is to understand the relationships between the variables, a low R-squared is not necessarily a problem.

However, if your main goal is to produce precise predictions, it can be a problem.

The blog post I recommend covers these scenarios and shows how it works. I think it’ll make your situation more clear!

Emanuel Lindström says

Hi Jim!

Awesome blog, and awesome posts! I’m learning a lot!!

I have 2 questions;

1. How is the predicted R-squared actually calculated? The step-by-step process you describe is iterated for each data point in the population, but does that mean you get as many predicted R-squared as there are data points, or do you do an additional step after iterating over all the data points?

2. Does predicted R-squared work even for large samples? I mean, it’s easy to see how the polynomial line in the image changes if you remove a data point, but if there are more data points (100 more, or even 1000 more), wouldn’t the over-fitted polynomial line stay the same and predict the one omitted data point?

Again, thanks for an amazing resource!

Jim Frost says

Hi Emanuel,

Thanks so much! I’m glad you have found it to be helpful!

About predicted R-squared, which is really my favorite type of R-squared. Think about the error sum of squares (SSE). This is where you take the squared differences between each observation and the fitted value and sum them up across all observationa. It’s also known as the residual sum of squares because it’s the sum of the squared residuals. A small value produces a high R-squared.

For predicted R-squared, you use the predicted error sum of squares (PRESS), which is similar to the SSE. To calculate PRESS, you remove a point, refit the model, and then use the model to predict the removed observation. Then, you take the removed value and subtract the predicted value and then square this difference. You repeat for all of the removed values. You end up with a squared difference for each value when it is removed. You then sum those squared differences and you have PRESS. A low PRESS value produces a high predicted R-squared. So, it’s fairly analogous to the SSE but the squared differences are based on predicting the missing values versus values that were used to fit the model.

Regarding point 2, yes, you’re correct, when you have more data points, it’s harder to overfit your model and, hence, you wouldn’t expect a much lower predicted R-squared. Imagine you have a 1000 data points that follow the same U-shaped pattern. In that case, you’d be really sure about that curved relationship because such a large number of data points aren’t going to follow that curve by chance. That’s why you wouldn’t expect the predicted R-squared to drop when you have many data points. However, fewer data points can produce that pattern by chance. If you remove one, it changes that relationship noticeably. You’re not really certain that the relationship really is that U-shape. Predicted R-squared detects this uncertainty and that’s why it drops.

Overfitting depends on the number of observations per term in the model, as you can read about in my post about overfitting. You’d need a very, very complex model to overfit a dataset with 1000 observations!

I hope this helps!

Franklin Moormann says

When calculating predicted rsquared for a full dataset of 4000 data points, you would do all 4000 or a random sample of those 4000 data points?

Jim Frost says

The procedure always cycles through the complete dataset and systematically removes one data point at a time to calculate predicted R-squared.