R-squared tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model.

Multiple regression analysis can seduce you! Yep, you read it here first. It’s an incredibly tempting statistical analysis that practically begs you to include additional independent variables in your model. Every time you add a variable, the R-squared increases, which tempts you to add more. Some of the independent variables *will* be statistically significant. Perhaps there is an actual relationship? Or, is it just a chance correlation?

You just pop the variables into the model as they occur to you or just because the data are readily available. Higher-order polynomials curve your regression line any which way you want. But, are you fitting real relationships or just playing connect the dots? Meanwhile, the R-squared increases, mischievously convincing you to include yet more variables!

In my post about interpreting R-squared, I show how evaluating how well a linear regression model fits the data is not as intuitive as you may think. Now, I’ll explore reasons why you need to use adjusted R-squared and predicted R-squared to help you specify a good regression model!

## Some Problems with R-squared

Previously, I demonstrated that you cannot use R-squared to conclude whether your model is biased. To check for this bias, you need to check your residual plots. Unfortunately, there are yet more problems with R-squared that we need to address.

**Problem 1:** R-squared increases every time you add an independent variable to the model. The R-squared *never* decreases, not even when it’s just a chance correlation between variables. A regression model that contains more independent variables than another model can look like it provides a better fit merely because it contains more variables.

**Problem 2:** When a model contains an excessive number of independent variables and polynomial terms, it becomes overly customized to fit the peculiarities and random noise in your sample rather than reflecting the entire population. Statisticians call this overfitting the model, and it produces deceptively high R-squared values and a decreased capability for precise predictions.

Fortunately for us, adjusted R-squared and predicted R-squared address both of these problems.

## What Is the Adjusted R-squared?

Use adjusted R-squared to compare the goodness-of-fit for regression models that contain differing numbers of independent variables.

Let’s say you are comparing a model with five independent variables to a model with one variable and the five variable model has a higher R-squared. Is the model with five variables actually a better model, or does it just have more variables? To determine this, just compare the adjusted R-squared values!

The adjusted R-squared adjusts for the number of terms in the model. Importantly, its value increases only when the new term improves the model fit more than expected by chance alone. The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.

The example below shows how the adjusted R-squared increases up to a point and then decreases. On the other hand, R-squared blithely increases with each and every additional independent variable.

In this example, the researchers might want to include only three independent variables in their regression model. My R-squared blog post shows how an under-specified model (too few terms) can produce biased estimates. However, an overspecified model (too many terms) can reduce the model’s precision. In other words, both the coefficient estimates and predicted values can have larger margins of error around them. That’s why you don’t want to include too many terms in the regression model!

## What Is the Predicted R-squared?

Use predicted R-squared to determine how well a regression model makes predictions. This statistic helps you identify cases where the model provides a good fit for the existing data but isn’t as good at making predictions. However, even if you aren’t using your model to make predictions, predicted R-squared still offers valuable insights about your model.

Statistical software calculates predicted R-squared using the following procedure:

- It removes a data point from the dataset.
- Calculates the regression equation.
- Evaluates how well the model predicts the missing observation.
- And, repeats this for all data points in the dataset.

Predicted R-squared helps you determine whether you are overfitting a regression model. Again, an overfit model includes an excessive number of terms, and it begins to fit the random noise in your sample.

By its very definition, it is not possible to predict random noise. Consequently, if your model fits a lot of random noise, the predicted R-squared value must fall. A predicted R-squared that is distinctly smaller than R-squared is a warning sign that you are overfitting the model. Try reducing the number of terms.

If I had to name my favorite flavor of R-squared, it would be predicted R-squared!

**Related post**: Overfitting Regression Models: Problems, Detection, and Avoidance

## Example of an Overfit Model and Predicted R-squared

You can try this example using this CSV data file: PresidentRanking.

These data come from an analysis I performed that assessed the relationship between the highest approval rating that a U.S. President achieved and their rank by historians. I found no correlation between these variables, as shown in the fitted line plot. It’s nearly a perfect example of no relationship because it is a flat line with an R-squared of 0.7%!

Now, imagine that we are chasing a high R-squared and we fit the model using a cubic term that provides an S-shape.

Amazing! R-squared and adjusted R-squared look great! The coefficients are statistically significant because their p-values are all less than 0.05. I didn’t show the residual plots, but they look good as well.

Hold on a moment! We’re just twisting the regression line to force it to connect the dots rather than finding an actual relationship. We overfit the model, and the predicted R-squared of 0% gives this away.

If the predicted R-squared is small compared to R-squared, you might be over-fitting the model even if the independent variables are statistically significant.

## A Caution about the Problems of Chasing a High R-squared

All study areas involve a certain amount of variability that you can’t explain. If you chase a high R-squared by including an excessive number of variables, you force the model to explain the unexplainable. This is not good. While this approach *can* obtain higher R-squared values, it comes at the cost of misleading regression coefficients, p-values, R-squared, and imprecise predictions.

Adjusted R-squared and predicted R-square help you resist the urge to add too many independent variables to your model.

- Adjusted R-square compares models with different numbers of variables.
- Predicted R-square can guard against models that are too complicated.

Remember, the great power that comes with multiple regression analysis requires your restraint to use it wisely!

If you’re learning regression, check out my Regression Tutorial!

**Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and updated it for my blog site.**

Amogh Bharadwaj D N says

Hello Jim, based on the inrepretation of the Predicted R-squared and Adj R-squared above, why is that in the last example Adj R-squared is 50% but predicted R-squared is 0% ? does not 50% Adj R-squared provide good estimate for an estimate on the population ?

Jim Frost says

Hi Amogh,

These two statistics are telling you different things. Adjusted R-squared includes a shrinkage factor to counteract the fact that regular R-squared is a biased estimator. Sample R-squared values tend to be higher than the true population value and adjusted R-squared corrects for that bias.

Predicted R-squared indicates how well a model without each observation would predict that observation.

Because what they measure is so different, it’s not surprising that the results can be different. I find that predicted R-squared tends to be more sensitive to models that are overly complicated. Overfitting is when the model starts to fit the random noise in the data. Because random noise is, by definition, not predictable, this problem shows up in the predicted R-squared. Adjusted R-squared is not designed to detect that problem–hence it doesn’t show up there.

I hope that helps clarify it!

Kyle Seibenick says

Hi Jim. Thanks for writing the regression ebook, this is a great refresher and enhancement of my skills.

I already saw this question and your response was “you don’t know of any easy way”. So I’ll ask – do you know of a hard way or manual way to calc predicted r2 (or “PRESS” as I’m seeing in other places) in Excel?

Tomingan says

Hi Jim

How can we conclusively tell that the number of IV are optimum for a given DV. As you did mentioned that the more we add ID the r squared will continue increase. So when or where is the stopping point. Any simple test that can be done. Please help Jim. Thank you

Jim Frost says

Hi Tomingan,

Read the section about adjusted R-squared more carefully. I talk about that there! Adjusted R-squared gives you an idea.

You can also look at the number of observations for each term in the model, as I discuss in my post about overfitting regression models. Ultimately, the number of IVs you can add is limited by the number of observations.

Also, you should let theory guide your model building. There’s no simple test that’ll tell you when you get your model just right. However, by letting theory be your guide, you can get a better sense. Read my post about choosing the correct model for more information.

Also, given all your questions about regression analysis, you should get my regression analysis ebook! It covers all of this and more!

Tomingan says

Hi Jim

I have this reading, r = .344, r Squared = .118 and Adj. r sqared = .084. form 1 DV and 5 IV. My initial analysis is that there is a low positive correlation. About 11.8% of the DV is explained or supported by the IV. There is no telling that the 5 IV is the sufficient number. I believe 5 IV is not the optimum number. What other testing can we do to identify the optimum number of IV? Thank you Jim.

Tomingan says

Hi Jim

I value greatly your comment on the r and r squared as well as the adjusted r squared. Can you please indicate the best reference for this please. Thank you

Maxim says

Hello Jim,

Thank you for your blog! It helps a lot in doing my research. Could you please provide any reference for the predicted R squared? I have found a method for its calculation, but all I can reference so far is various posts in the internet. Thank you!

Derek says

Hey Jim,

I appreciate you sharing this article. I know that if an adjusted r-squared is 0.58, then the independent variables in my model collectively account for 58% of the variability in the dependent variable around its mean. I know that this is a basic question, but how would the interpretation differ if the predicted r-squared

is 0.58 (instead of the adjusted r-squared?

Thank you!

Derek

Jim Frost says

Hi Derek,

Typically, analysts use adjusted R-squared to compare models with different numbers of predictors, as I show in the post. But, interestingly, it has its own unique interpretation. While regular R-squared is the amount of variation the model accounts for in your sample, adjusted R-squared is an estimate for how much your model accounts for in the population. I write about this interpretation in this post.

But, on to your question! For predicted R-squared, the interpretation is the amount of variability that your model accounts for in new observations that were not used during the parameter estimation process.

Simon McGree says

Hi Jim

Thank you for your earlier reply to my comment. Below is a summary of my analysis

I have 43 years of annual crop yield data and 360 climate indices (rain, maximum and minimum temperature individual month and seasonal combinations). I use PCA to reduce the number of climate variables and deal with multicollinearity. The scree plot shows no obvious elbow so I retain 32 PCs or 99.9% of the variance. Some of the variables have a weak relationship with sugarcane so it is possible the first PCs have a weak relationship with sugarcane, another reason to perhaps retain more PCs. I then examine the absolute value of the PC coefficients, I select the climate variable with the highest coefficients to represent that PC.

I then use stepwise regression backward elimination. I stop at the highest R-sq predicted.

In the process of my paper undergoing review. I received the following “all data are used to screen for hindcast skill, and hence there is potential for “artificial skill”. The authors indicate that they used “leave‐one‐out‐cross validation”. However, they are using PCAs which does utilises all data in calculation of principal

components. When this is done, statistical models have artificial skill in cross‐validation mode. Statistical models so derived will be useless in actual prediction; their apparent skill results from the fact that the crossvalidation is not truly on independent data because the entire sample was used to screen the

predictors from PCAs.

Appreciate your thoughts.

Rao says

Hi Jim,

Thanks for a great blog.

I’m curious why you say that adjusted R-square has no associated p-value.

The difference between R-square and adjusted R-square is in their degrees of freedom. I assume the sampling distribution of both is F; however, while these F distributions – defined by numerator and denominator degrees of freedom – should be different, it should be just as easy to show p-values for adjusted R-square as for R-square.

And yet, all regression output show just one ANOVA table – for population R-square estimated by sample R-square. Why is this?

Rao

Jim Frost says

Hi Rao,

I’ve never seen one developed or used. Usually, adjusted R-squared is used to compare models with differing numbers of predictors. Even with regular R-squared, you don’t usually see it discussed in relation to its p-value.

I don’t really know why. I’ve never seen a discussion of this issue. R-squared is a biased estimator whereas adjusted R-squared is not. I agree with your reasoning, but I don’t have answer for why it’s not done.

Daisy says

Hi Jim, this is a hugely helpful website.

I am trying to calculated predicted R2 in stata following mixed effects ML regression. Do you have any syntax for how to create it?

Jim Frost says

Hi Daisy, sorry, I’m not a stata user so I don’t know what command you’d use.

Mukhtar says

Thanks for the comment and suggestions. I really appreciate your effort in educating masses through your blog.

Jim Frost says

You’re very welcome, Mukhtar!

Mukhtar says

Hello Jim,

is there any benchmark that for the difference in r-suare, r-square (adj) and R-square (pred) values.

i have the following case and suspect if the model is overfit.

Source DF Adj SS Adj MS F-Value P-Value

Regression 5 0.174696 0.034939 14.81 0.000

Vc 1 0.033814 0.033814 14.33 0.001

ap*2 1 0.162968 0.162968 69.06 0.000

fr*2 1 0.143943 0.143943 61.00 0.000

A*2 1 0.015032 0.015032 6.37 0.020

ap*fr 1 0.151329 0.151329 64.13 0.000

Error 21 0.049556 0.002360

Lack-of-Fit 3 0.008540 0.002847 1.25 0.321

Pure Error 18 0.041017 0.002279

Total 26 0.224253

Model Summary

S R-sq R-sq(adj) R-sq(pred)

0.0485781 77.90% 72.64% 62.38%

other models have the following results, with all p-values significant

S R-sq R-sq(adj) R-sq(pred)

0.0587089 81.32% 76.87% 69.63%

S R-sq R-sq(adj) R-sq(pred)

0.0058268 73.94% 69.21% 60.91%

Thanks……

Jim Frost says

Hi Mukhtar,

There’s no standard guideline that I’m familiar with. But, I always start to worry when the difference is greater than 10%. For overfitting, you also need to consider the number of observations per model term. Given your output, I’d say you have some reason for concern about overfitting. I think reading the other post will help you out.

Parikshit says

Thanks Jim.

Parikshit says

Hi Jim,

I have two questions

1. What is the minimum r-square value, above which relation between variable and response can be considered significant?? Why??

2. In model if r-square is 0.80, what should be minimum level of r-square adjusted, to use the model for prediction??

Thanks

Parikshit

Jim Frost says

Hi,

For an R-squared to be statistically significant, the overall F-test for the model must be significant. To be practically significant, that depends on the field of study.

Use predicted R-squared to assess prediction, not adjusted R-squared. There’s no exact guideline for how close it must be. I start to worry when the difference is more than 0.1 (10%). However, you probably should be assessing the precision of the prediction as I describe in this post about S vs. R-squared.

Anne Wambui says

Hello, Thank you for the explanations. I have a questions. I have used multiple regression to compare three groups, when I removed one variable the model was not significant for one group, other independent variables became significant(they were not before) R squared decreased significantly for the second group, and one group has a slight decrease in R squared. How can I interpret this or meaning of this factor.

Jim Frost says

Hi Anne,

If you’re find that the significance of predictor changes depending on specifically which variables are include in the model, you might well have multicollinearity (correlated IVs). Read my post about multicollinearity for more information.

Heidi says

I completed a multi regression analysis in Exel with three independent variables and the results show an R-squared value is 0.11 but the adjusted R-squared is 0.98. How could these values be so different? Also, excel doesn’t give a predicted R squared value. is there another [easy] way to get it? The residuals show values for the predicted [dependent variable] but that can’t be it.

BTW, I really appreciate your blog – it is the only onestatistics info I’ve found that makes any sense at all.My textbook is all but useless. I still can’t claim to understand any of it, really, but reading your pages helps a lot – if only to get through the assignments with a passing grade. Thanks.

Jim Frost says

Hi Heidi,

I’m so glad my blog has been helpful!

For the first thing, it’s impossible for the R-squared value to be lower than the adjusted R-squared for the same model. There’s something off there. I don’t think there’s any easy way to get predicted R-squared with Excel.

It is possible to have a large difference between R-squared and adjusted R-squared. However, adjusted R-squared will always be smaller than R-squared. If there is a large difference, it might indicate you have too many predictors (IV) in your model. It comes down to the number of observations per term in your model. To see how this works, look at my post about Five Reasons Why Your R-squared can be Too High. In the first reason, you’ll read about adjusted R-squared and see a graph that shows how adjusted R-squared decreases by the sample size per term.

Julie Nielsen says

Hi Jim,

I have two models where I add time fixed-effects and robust and clustered standard errors. When I add FE and robust and clustered standard errors to my models, model 1’s R-squared increases while model 2’s R-squared becomes negatives (from 0,301 to -0,385). If I look at my coefficient in the two models, none of them seems to be significant, but I don’t understand how one of the R-squared can become negative?

ankita says

sir, i am getting predicted R squared value as zero. Is it normal? please help me out.

Jim Frost says

Hi Ankita,

Yes, it’s possible. Unlike regular R-squared, both adjusted and predicted R-squared can fall below 0%. In terms of interpretation, just interpret it as if it were 0%. It’s not good. Usually when you get a negative value, it means you have a very small sample size along with an overly complex model.

Elizabeth Causley says

I am new to this whole process and I am still learning. If I have an adjusted R-Squared of 0.05448 for data that includes 4 IV to 1 DV, what would I interpret that as? Also, I’m not sure if you can answer this here, but this also gives me a F-Statistic as 79.2 on 4, how would that be interpreted?

Any help would be greatly appreciated!

Jim Frost says

Hi Elizabeth,

It sounds like your four IVs explain a very low proportion of the variance in the DV. Are any of the p-values for the coefficients statistically significant?

What’s the p-value associated with the F-statistic. You’ll usually only interpret the p-value for the F-statistic rather than the F-value itself. You can read my post about the overall F-test for more information.

Swapnil says

Hi Jim,

Can you please help me out with this data. Is it statistically significant or not

Model Summary

S R-sq R-sq(adj) R-sq(pred)

52.0410 97.63% 93.49% 78.69%

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value

Model 7 446694 63813 23.56 0.004

Linear 7 446694 63813 23.56 0.004

A 1 150522 150522 55.58 0.002

B 1 885 885 0.33 0.598

C 1 138967 138967 51.31 0.002

D 1 118108 118108 43.61 0.003

E 1 19212 19212 7.09 0.056

F 1 9624 9624 3.55 0.133

G 1 9377 9377 3.46 0.136

Error 4 10833 2708

Total 11 457527

Jim Frost says

Hi Swapnil,

Please read my post about regression coefficients and p-values. That post will show you how to determine significance and what it means. You have some insignificant terms that you should consider dropping from the model.

In a nutshell, it looks like overall your model is significant. Some of the predictors are significant while others are not. However, it looks like you might be overfitting your model. You might be including too many terms given your sample size, which can distort the results. Click the link to read about that.

If after reading those posts you have more specific questions, please post them in the comments for the relevant article. Thanks!

Suku says

Hi Jim,

I need your help in understanding the following :

R square value is 0.018

Adjusted R Square value is -0.024

R is.0.133

What does a negative Adjusted R Square value predict about the relationship between 1 DV and 1 IVs ?

Thanks

Suku

Jim Frost says

Hi Suku,

Just interpret the negative value as if it were zero. Your model does not explain variability in the DV.

Alfer Jann D. Tantog says

Hello! How do you split R-squared among the predictor variables? I have read a journal wherein the R-squared is .400 = 40% and then they split the value between 3 predictors. 18 for predictor 1, 21.6 for predictor 2, and 0.4 for predictor 3. May I ask how can I calculate it?

Jim Frost says

Hi Alfer,

I suspect that you’re referring to the practice of the increase in R-squared that occurs when you include each predictor in the model last. That’s not exactly “splitting” the R-squared but I think it is what you’re referring to. I’ve written a post that talks about this method as a way of determining the importance of each predictor. I’d read that post to see if it answers your questions!

Ida says

Hi Jim

I know I am a little slow here, but:

How can you tell if the adjusted R^2 is significant? Is it always significant if the p-value is higher than 0.5, or is there a number I can navigate from when it comes to interpreting the adjusted R^2

Tank you!

Jim Frost says

Hi Ida,

There’s no p-value for adjusted R-squared. Typically, you use it to compare models with different numbers of predictors/IVs. It’s more for comparing models rather than determining statistical significance. However, there is a p-value for the regular r-squared, although you might need to hunt for it in the statistical output. The F-test of overall significance produces a p-value. When that p-value is less than your significance level, you can reject the null hypothesis that R-squared equals zero.

I hope this helps!

Simon McGree says

Hi Jim

I’m concerned I have over fitted my models but first let me give you a bit of background.

I have 43 years of annual sugarcane and sugar data. I have 852 climate indices (rain, maximum and minimum temperature individual month and seasonal combinations). I use PCA to reduce the number of climate variables and deal with multicollinearity. The scree plot shows no obvious elbow so I retain 25 PCs or 99.9% of the variance. Some of the variables have a weak relationship with sugarcane so it is possible the first PCs have a weak relationship with sugarcane, another reason to perhaps retain more PCs. I then examine the absolute value of the PC coefficients, I focus on the four climate variables with the four highest coefficients. The representative variable for each coefficient that I take to the next stage is the one that has the strongest correlation coefficient with sugarcane and sugar yield respectively.

I then use stepwise regression backward elimination. I stop at the highest R-sq predicted. For the sugarcane model I have an adjusted R squared of 79% and predicted R squared of 73% (DF = 8). For the sugar model I have an adjusted R squared of 81% and predicted R squared of 73% (DF=11). How am I doing? Appreciate your thoughts.

I repeated the above with 70% of the variance retained. the R-sq adjusted and predicted values are much lower. It appears some key climate variables are lost by only retaining 70%

Regards

Simon

Jim Frost says

Hi Simon,

Based on what you write, I’d say you’re doing very well! I’d agree that the model with the higher predicted R-squared is likely to be better. As always, use your subject area knowledge to apply statistics correctly. But, I don’t see any obvious errors in your approach.

Rei says

So i’ve got to do a paper using regression analysis. I use 3 model, linier, quadratic and exponential as comparison. Each of them got :

Linier R2 : 0.197 R2 ad : 0.875

Quadratic R2 : 0.931 R2 ad : 0.794

Exponential R2 : 0.919 R2 ad : 0.879

Which model i choose..?

Jim Frost says

Hi Rei,

Choosing a model is more than just going by several R-squared statistics! Check graphs and theory. For more information, read my post about choosing the correct model. It’s not even possible to say that

anyof those three are the correct model with the information provided. And, if your model has curvature, which seems likely, read my post about curve fitting, which describes different methods and how to compare the resulting models.Best of luck with your analysis!

MD VASEEM CHAVHAN says

Thanks for explanaition.

please comment on the following model Model Summary

S R-sq R-sq(adj) R-sq(pred)

14.7955 99.33% 88.60% 0.00%

Jim Frost says

Hi,

Your example closely matches the example that I use in the section of this post titled, “Example of an Overfit Model and Predicted R-squared.” Read that section more closely. You have an overfit model.

I’ve also written a post about overfitting models that will help you understand.

mahesh says

Hi Jim

I run the regression analysis and getting following results of R squared, adjusted R2 and predicted R2.

Model Summary

S R-sq R-sq(adj) R-sq(pred)

2047.24 99.11% 99.03% *

my question,

why predicted R2 has * value?

Is model good adjusted R2 or predicted R2 has * value?

Thanks,

Mahesh

Jim Frost says

Hi Mahesh,

I’m not really sure. I’m drawing a blank as to why the procedure would be able to calculate R-squared and adjusted R-squared but not predicted R-squared. Is there a chance that you have only three observations? I’m thinking of a scenario where you have enough degrees of freedom to fit the model when you use all the observations but not enough for predicted R-squared where you’re systematically fitting the model multiple times where each time one observation is removed. That would suggest you have just barely enough degrees of freedom to begin with and you’re probably overfitting the model anyway. But, when one observation is removed you no longer have a sufficient number of DF.

I’m not sure that’s what is happening but it’s one possible scenario.

rezvan says

Hi Jim,

I am writing my paper about optimization the leaching process of Cd by RSM using DX7. I obtained R2= 0.79, adjusted R2=0.74, and predicted R2 = 0.59. The software in box cox proposed me to normalize data by transforming λ from 1 to 3, Then the results would change as follow R2 = 0.85, adjusted R2 = 0.80, and predicted R2 = 0.71. the other statistical tools like F-value , P-value and others would be approximately constant in terms of being significant or not significant. I am confused if i do transformation or not.

Thanks

Jim Frost says

Hi,

Check the residual plots for the model that does not transform the data. If the residual plots look good, you don’t need to transform the data. On the other hand, if you see a problem in the residual plots, such as severe nonnormality or heteroscedasticity, consider transforming the data. However, I always recommend that transformation should be the last resort. There are other methods that can fix this problems in some cases. These other methods involve fitting a better model. For example, a misspecified model can produce nonnormal residuals and heteroscedasticity. You’d want to be sure that you are specifying the correct model

beforeconsidering a data transformation.My article about heteroscedasticity (see link) discusses some of those other options for non-constant variance. My ebook about regression analysis goes into much more detail about when and why you might want to transform your data, when you wouldn’t, how to transform data, and how it all works. Those details would apply to your analysis as well.

Again, if your residual plots for the model that uses the untransformed data look good, don’t transform your data! Transformations can fix particular types of problems as a last resort.

I hope this helps! Best of luck with your analysis.

Mike says

Hi Jim.

Great Article. I would like some advice. I’m trying to build a linear regression model. I’ve determined what the control variables are going to be based on prior knowledge and previous literature. I now need to work out which of my 7 predictors to include in my final model with those control variables. In the past I have decided on which predictors to include in the final model based on significance, adding those with a p value <0.10. However, I've been speaking to a statistician, and instead they recommend choosing the model with the best adjusted r2 value. I've seen lots of studies using my usual method for variable selection, but I haven't come across any that selected variables based on adjusted r-squared values. So, I'm just wondering whether you would recommend choosing the model with the highest adjusted r-squared value, and whether you know of any papers that have selected variables for the final model using this method? Looking forward to hearing from you.

Mike

Jim Frost says

Hi Mike,

Choosing the correct model is almost as much of an art as it is a science. One thing I always highlight is the need to incorporate your subject-area knowledge about the underlying process/research question. Never go solely by statistical measures. I’d also add to that by saying, there’s no single statistical measure that is best. In fact, the various measures can disagree. Adjusted R-squared is a good on to keep an eye, but it can lead you astray. For example, if you start to overfit your model, the adjusted R-squared can look great, but your coefficients and their p-values are all messed up (technical term there!). Chasing a high R-squared or adjusted R-squared can lead to problems.

Also, it’s important at least to pay attention to the p-values of the coefficients. If you include too many variables that are not significant it reduces the precision of your model. Taken further, it can lead to the overfitting I referred to before. However, if you have to choose between the possibility of leaving out an important variable even though it’s not significant versus leaving it in even though you’re not sure, yes, it’s generally better to include it. And, perhaps that’s the thinking behind the recommendation. However, you shouldn’t take that too far!

I’d suggest reading my post about specifying the correct model. And, then for an illustration of how R-squared and adjusted R-squared can lead you astray, read my posts about overfitting and data mining which shows the dangers of only going by statistical measures. And, finally, automated variable selection procedures can point you in the correct direction, but research has found that they don’t identify the correct model in the majority of cases. Read my post about automated variable selection procedures for more information.

If after those post you have more questions, don’t hesitate to post them. Also consider my ebook that focuses regression in more detail!

I hope this helps!

Merpati says

Hello! I want to ask. All my R2, adj.R2 and predicted R2 got the value of 1.0000.

Is it acceptable? And if possible, could you help me to deduce this information bcs I, myself not so good in statistical analysis. Btw, the results were from three independent variables (pressure, time and temperature) with one dependent variable (antioxidant activity).

I hope that I can improve my understanding on this matter.

Thank you in advance ^^

Jim Frost says

Hi,

Unfortunately, no, that’s not normal. Usually you only obtain an R-squared of 1 under several related problematic circumstances.

If you fit a model that contains the same number of independent variables as observations, you’ll always get an R-squared of 1 (or 100%).

If you overfit a model, which means too many terms for your number of observations, you can get the same thing.

This can also happen with an automated procedure such as stepwise regression with a relative small dataset and lots of candidate predictors.

I’m not sure what is going on with your data. If it’s physical process where the measurements are very precise/accurate and there’s extremely low noise in the system, you can get R-squared values in the 90-99% range. Unless your software is rounding up, I’d be very skeptical. I’ve never seen a legitimate 100% in practice. 100% would indicate no random error in the model at all AND no measurement error all. That just doesn’t happen in the real world. I’m assuming this is real world data rather than generated data.

Jeff says

It’sincredible how clear and simple you can explain difficult concepts. Thank you, really

Ronnie says

Hello Jim,

Thanks so much for these posts! Just recently came across them and they’re incredibly useful!

I’m using partial least squares regression to model a response variable against spectral data. If I select a number of latent variables that produces regular R2=0.49 and predicted R2=0.27, using 56 observations of the response variables, what are your thoughts? Certainly, I’m fitting the calibration better than when making new prediction, but I also know that we should always expect the predicted R2 to be somewhat higher than the regular R2; and this is probably certainly the case with small number of observations.

Do you believe this type of fit would be justifiable given the relatively small number of observations used to calibrate?

Thanks very much for your help!

Ronnie

Jim Frost says

Hi Ronnie,

I’m really happy to hear that you found my site to be helpful!

Regular R-squared should be greater than Predicted R-squared. The model can’t predict new observations better than the data used to fit the model. You might be thinking of the test R-squared. The test R-squared is generally lower than the Predicted R-squared. A test R-squared is based on validation data. The software uses an existing model and a new dataset to see how well the model predicts values that were not used to estimate the model.

To make good predictions, you want Predicted R-squared to be close to the regular R-squared. And, you want the test R-squared to be close to the Predicted R-squared.

For your dataset, it appears like the regular R-squared and predicted R-squared are not that close. This condition indicates that your model doesn’t predict new observations as well as it fits the data used to fit the model. Chances are that your test R-squared would be even lower than the predicted R-squared.

I’m not knowledgeable in model spectral data, so I’m not sure how this fit compares to similar models and industry standards. I’d recommend doing some research to see what sort of fit is typical for this type of data and see how your model compares. Some study areas are inherently more or less predictable than other areas. So, I can’t really say whether the fit you’ve obtained is “justifiable.” The basic question you need to answer is whether the fit you obtain is representative of the study area and really the best you can do given the nature of the data. Or, do you need to improve the model to obtain a better fit. Those answers depend on subject-area knowledge.

Allan Paolo says

Hi again Jim!

I just want to take time to thank you. Thanks to this article (and to you of course) I was able to get my master’s degree. Thanks a lot!

Jim Frost says

Hi Allan!

You’re very welcome! I’m so happy for you, and your comment absolutely makes my day! 🙂

Patrik Silva says

Dear Jim!

I would like to know if you can clarify some of this points to me:

In the text section where the title is “What is the Predicted R-Squared”, I have read this:

“Statistical software calculates predicted R-squared using the following procedure:

1- It removes a data point from the dataset.

2 – Calculates the regression equation.

3 – Evaluates how well the model predicts the missing observation.

4 – And, repeats this for all data points in the dataset.

a) Is this procedure the same as what is called LOOCV (Leave one out Cross Validation)?

b) Which values do we compare to R-squared? Do we need to record the R-Squared in each time that we leave one out till the last observation?

I want to understand this procedure to see which statistic it corresponds to in SPSS software.

Thank you in advance!

PS

Al says

Hi Jim,

Very helpful post.

Regarding the issue of “how much of the variation in the y values does the regression model explain”

1. The adjusted-R-squared is the answer to this for multiple regression, yes?

2. Why don’t we also use adjusted-R-squared when answering the question for simple regression? (most stats textbooks use R-squared for this)

In general, when comparing a regression model with one independent variable to a model with multiple independent variables — do we compare them on adjusted-R-squared, or do we compare the adjusted-R-square of the second model with the R-squared of the first?

Thanks,

Al

Jim Frost says

Hi Al,

These are great questions! And, there is confusion in this area because many people don’t know exactly what R-squared measures.

Let’s start with the easy part. When you’re comparing models with different numbers of independent variables, use adjusted R-squared. Specifically, compare the adjusted R-squared from one model to the adjusted R-squared values of the other models. Don’t use the regular R-squared for any of the models.

Now, onto which R-squared to report for what models. Typically, analysts will report the regular R-squared for the final model that a study settles on. That’s the norm. However, I disagree with that practice a bit. I think that analysts should normally report the adjusted R-squared for all final models, even when it has only one independent variable. The reason why is because regular R-squared is a biased estimate. It tends to be too high. How much too high depends on the number of observations per term in the model. Adjusted R-squared corrects for this upwards bias. In other words, adjusted R-squared is an

unbiasedestimate of the amount of variance the model accounts for in the population–which is why I think it should be the value that is reported. I write more about this in my post Five Reasons Why Your R-squared can be Too High. It’s reason number 1.Thanks for the great questions!

Jonathan says

Hello,

I have a challenge here i have rsquared of 0.6596 and an adjusted rsquared of -0.3617! How can this be interpreted? what can you say about this ?

Thanks

Jim Frost says

Hi Jonathan,

Chances are that you are severely overfitting your model. You probably have very few observations per model term. To learn more about this problem, read my post about overfitting!

Kripa says

Hi Sir,

Can you help me to interpret R squared value of .166 and Adjusted R squared value of .158?

Jim Frost says

Hi Kripa,

These blog posts should provide you with enough information so you know how to interpret these values.

The R-squared value indicates that your model accounts for 16.6% of the variation in the dependent variable around its mean. That’s usually considered a low amount. You typically interpret adjusted R-squared in conjunction with the adjusted R-squared values from other models. Use adjusted R-squared to compare the fit of models with a different number of independent variables.

Additionally, regular R-squared from a sample is biased. It tends to over-estimate the true R-squared for the population. Adjusted R-squared is an unbiased estimate of the population value.

I hope this helps!

Tejaswi Dalavi says

what is the exact difference between R square & adjusted R square.which is better?

Jim Frost says

Hi Tejaswi, you’re in the right place to learn about the differences. This blog post describes adjusted R-squared. In it, there’s a link to my blog post about the regular R-squared. Between the two posts, you’ll know all about both types. Adjusted R-squared is the better of the two. Although, my favorite is actually predicted R-squared.

Juan says

Dear Jim,

Thanks a lot for your response, it answered some questions that I had for quite some time without finding a clear/understandable explanation. I will certainly continue to follow the blog, it is a very valuable source of information specially for us non-statisticians. I have already recommended it to my colleges and i’m sure they will agree with me.

Best regards,

Juan F.

Jim Frost says

Thanks so much, Juan. I appreciate that!

Juan says

Dear Jim,

Thanks for your explanation and fast response. Congratulations on such a good blog, it is very valuable to be able to discuss / understand this topics in more friendly manner.

With respect to my question, I still have a couple of doubts.

– I can understand that one could obtain a high R2 and R2 (adj) in a model with significant curvature but, shouldn’t the R2(pred) be generally low?

– isn’t the prediction power of the regression covered by including in the regression equation the center point?

I other words (correct me if i’m wrong), when curvature is significant in the regression model, then the R2(Pred) is not relevant anymore and the model should not be used for predictive purposes?

Considering your comment on the residual plots, My versus fit seems (not clear though) that there might be a pattern (scatter reduced as the fitted value is higher). Thus I did a regression after a Box-cox transformation (Lambda =0.25) , eliminated variables with P>0.1 and I obtain a regression where curvature is not significant (P= 0.2!!) and again great R2 values (R2:99%; R2adj: 98.7 and R2Pred: 96.8%)…how to interpret this? is this resulting regression trustworthy and could it be used for predictive purposes?

Thanks in advance for your time,

Regards,

Juan S.

Jim Frost says

Hi Juan,

Yes, it’s definitely possible that Predicted R-squared would be affected by inadequately modeling the curvature. However, the degree to which the lack-of-fit affects it depends on how inadequate the fit is and the number of observations. So, I couldn’t tell you specifically for your case whether it would be low or not. But, definitely the lack of fit would impact it to some degree.

Center points allow you to detect curvature but are not sufficient to model the curvature.

I would agree, as I mention in my previous response, that I would not use the model to make predictions when you know that it inadequately fits curvature that is present in the data. In that sense, yes, it doesn’t matter what Predicted R-squared is because you know the predictions are biased. As I mentioned, high R-squared values of any type do not indicate that your model provides an unbiased fit.

That pattern that you describe is heteroscedasticity. In my post about it, I discuss other options for resolving it. A Box-Cox transformation is a recognized way to fix this problem, but I usually save that for last solution I try. I prefer solutions that involve less data manipulation. I’m also a bit leery of how it transformed away the curvature issue. However, I don’t have any specific reason to say that you shouldn’t trust the model based on the limited information that I have. Just be sure to closely examine the coefficients and be really certain that the signs and magnitudes fit with theory.

Also, be aware that the model fit statistics (the various R-squared values and S) apply to the transformed response variable and not the response using natural units. That can make the model appear better than it is. Although, they were high before the transformation, so no reason for concern.

Juan says

Dear Jim,

I recently started using Minitab for DoE. I work with an extraction process to evaluate the recovery (Yield) of proteins. Evaluation of a half-factorial set of experiments with 5 variables DoE gave me a very good regression model with R2(98.29%); adj-R2 (97.35%) and pred-R2 (95.57%). However I noticed that my model indicates that the curvature is significant (P = 0.022). What is the effect of this curvature on the predictive power of the model? in other words, is this model still good to make predictions? or is a CCD required?

Jim Frost says

Hi Juan,

Yes, if the software detects curvature, it is usually a good idea to model that curvature. While R-squared is high, you are trying to model a curve using a straight line, and that will lead to biased predictions. For example, certain ranges of predictions might be systematically too high while other ranges could be systematically too low. In my post about R-squared, in the section “Are High R-squared Values Always Great?”, I show an example where the R-squared value is at 98.5% but the predictions are biased. Your case is probably something like that–although obviously not necessarily mirroring the specific relationship that I show. A high R-squared, and adjusted R-squared, don’t necessarily indicate that the model provides an unbiased fit. Check those residual plots!

Thanks for writing. I hope this helps!

Sundar says

Dear Jim,

As usual brilliant post. However I would like to know about “The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.” How does the adjusted R-square determines if addition of a variable has a positive or negative effect on the model.

Thanks

\

Jim Frost says

Hi Sundar, the adjusted R-squared value decreases when the t-value for the coefficient is less than 1.

Franklin Moormann says

When calculating predicted rsquared for a full dataset of 4000 data points, you would do all 4000 or a random sample of those 4000 data points?

Jim Frost says

The procedure always cycles through the complete dataset and systematically removes one data point at a time to calculate predicted R-squared.

Emanuel Lindström says

Hi Jim!

Awesome blog, and awesome posts! I’m learning a lot!!

I have 2 questions;

1. How is the predicted R-squared actually calculated? The step-by-step process you describe is iterated for each data point in the population, but does that mean you get as many predicted R-squared as there are data points, or do you do an additional step after iterating over all the data points?

2. Does predicted R-squared work even for large samples? I mean, it’s easy to see how the polynomial line in the image changes if you remove a data point, but if there are more data points (100 more, or even 1000 more), wouldn’t the over-fitted polynomial line stay the same and predict the one omitted data point?

Again, thanks for an amazing resource!

Jim Frost says

Hi Emanuel,

Thanks so much! I’m glad you have found it to be helpful!

About predicted R-squared, which is really my favorite type of R-squared. Think about the error sum of squares (SSE). This is where you take the squared differences between each observation and the fitted value and sum them up across all observationa. It’s also known as the residual sum of squares because it’s the sum of the squared residuals. A small value produces a high R-squared.

For predicted R-squared, you use the predicted error sum of squares (PRESS), which is similar to the SSE. To calculate PRESS, you remove a point, refit the model, and then use the model to predict the removed observation. Then, you take the removed value and subtract the predicted value and then square this difference. You repeat for all of the removed values. You end up with a squared difference for each value when it is removed. You then sum those squared differences and you have PRESS. A low PRESS value produces a high predicted R-squared. So, it’s fairly analogous to the SSE but the squared differences are based on predicting the missing values versus values that were used to fit the model.

Regarding point 2, yes, you’re correct, when you have more data points, it’s harder to overfit your model and, hence, you wouldn’t expect a much lower predicted R-squared. Imagine you have a 1000 data points that follow the same U-shaped pattern. In that case, you’d be really sure about that curved relationship because such a large number of data points aren’t going to follow that curve by chance. That’s why you wouldn’t expect the predicted R-squared to drop when you have many data points. However, fewer data points can produce that pattern by chance. If you remove one, it changes that relationship noticeably. You’re not really certain that the relationship really is that U-shape. Predicted R-squared detects this uncertainty and that’s why it drops.

Overfitting depends on the number of observations per term in the model, as you can read about in my post about overfitting. You’d need a very, very complex model to overfit a dataset with 1000 observations!

I hope this helps!

MUHAMMAD K. N. says

Hi Jim ! I am working for a research on monitoring insect pest population fluctuation in Entomological field, but I obtained mostly weaker r squared regression results and felt disturbed. What advise can you give me in this regards.

Thanks

Muhammad.

Jim Frost says

Hi Muhammad! Unfortunately, that situation isn’t too uncommon and I’ve written a blog post that is specifically about it:

Interpreting a Regression Model with a Low R-squared

A low R-square might or might not be a problem. If you have significant independent variables and your main goal is to understand the relationships between the variables, a low R-squared is not necessarily a problem.

However, if your main goal is to produce precise predictions, it can be a problem.

The blog post I recommend covers these scenarios and shows how it works. I think it’ll make your situation more clear!

ALMAS KHURSHEED says

hi sir

i am very confuse how to write interpret statement for r2 if value is 0.68

can u please help me out

thank you

Jim Frost says

Hi Almas, it means that the independent variables in your model collectively account for 68% of the variability in the dependent variable around its mean. Click the link in the post to go to my post where I talk about R-squared in more detail. I hope this helps!

Allan Paolo Labartinos Almajose says

Hi Jim! I’d like to ask for help regarding the calculation of predicted R-squared values. To be honest, my nose bled (lol) after seeing the formula for the PRESS you provided in one of the comments above. Is there a ‘layman’s way’ of computing this?

Actually, I had this idea:

– I remove one data point

– I regress the remaining points using the same model

– I try to predict the missing data point using the same model previously recalculated (the one with the reduced data point)

– The difference between the prediction of the model with complete data points and the prediction of the new model with one data point removed is the PRESS of that point?

– I do this again for all of the remaining points

– I add all of the PRESS for each point, then sum-square everything, then compute R^2 normally, then this R^2 is now the predicted R^2?

Is this even correct? I don’t know, this is just a wild guess. Please help me, I am totally at a loss here. Thanks!

Jim Frost says

Hi Allan, you’re very close! Think about how you usually calculate sums of squares. It’s the sum of the squared deviations between the the fitted values and the observations. PRESS is similar except it is the sum of the squared deviations between the fitted value of each removed observation and the removed observation. So, the procedure basically removes each observation and uses the model to predict that observation and squares the difference between the two. It does that systematically for all observations and sums those squared differences. For your 4th point, you never fit the model with all observations when calculating predicted R-squared. Instead, there is always one removed observation and you’re essentially seeing how well the model predicts each removed observation. I hope this makes it more clear!

Franklin Moormann says

I’m not explaining well enough I believe. This is my formula results using junk data (with a rsquared value of 0.2)

Predicted Rsquared = 1 – (PRESS / TSS) = 1 – (-1.04 / 67408.86) = 1.00

So as you can see something is definitely wrong.

Franklin Moormann says

I have no clue how to do diagonal elements in C# so I guess I’m going to have to go through and eliminate one observation at a time and then calculate the press and rss after each elimination. Since I’m doing that, how would I calculate the press statistic instead of doing the diagonal matrix stuff?

Franklin Moormann says

I found a workaround but I’m now getting a negative value for the press statistic so when I divide by the total sum of squares it is returning 1 which I know isn’t correct

Jim Frost says

Hi Franklin, actually for predicted R-squared (and adjusted R-squared) it is possible to get negative values!

Jun Li says

Hello Jim,

I develop an nonlinear regression model in R studio with R2 (0.904), R2(adj) 0.864 and R2 (predicted) 0.919. I wonder if it is possible that predicted R2 higher than the normal R2?

Hope for your reply.

Jun Li

Jim Frost says

Hi Jun Li,

First we need to make sure we’re clear on some terminology. Did you develop a true nonlinear model or is it a linear model that uses polynomials to model curvature? You can read about the differences in my post: The Difference between Linear and Nonlinear Models.

It’s an important distinction because R-squared and its variants are not valid for nonlinear models. If you are truly using a nonlinear model, I suppose it might be possible to obtain a Predicted R-squared that is higher than R-squared. Maybe. But, you shouldn’t be using any of those R-squared values because they are invalid. You can use another goodness-of-fit statistic, such as the standard error of the regression.

For linear models, you can’t obtain a predicted R-squared that is higher than R-squared. That scenario would indicate that the model predicts new observations

betterthan it predicts the values used during the model fitting process. That makes no sense.I hope this helps!

Tim says

Hi Jim,

I know the way how R-squared is calculated in logistic regression is different. I wonder what would you do if a reviewer asks you to provide similar indicator.

Thanks!

Tim

Jim Frost says

Hi Tim,

There are two measures I’m most familiar with for logistic regression. One is deviance R-squared for binary logistic regression. This statistic measure the proportion of the deviance in the dependent variable that the model explains. Unlike R-squared, the format of the data affects the deviance R-squared.

The other is Akaike Information Criterion (AIC), which measures the quality of a model based on fit and the number of terms in the model.

Jim

Franklin Moormann says

I’m only supposed to remove one observation at a time to recalculate the prediction model but after that, I’m supposed to use all original observations to run the calculations for press and tss?

Franklin Moormann says

I’m trying to create my own formula to calculate predicted rsquared and this was the only information that I found on how to do it. I believe the formula to do this is predicted r2 = 1 – (press / tss) so would you systematically leave off one data point at a time and calculate the press statistic and tss statistic and add those values to a final total and calculate predicted r2 at the end?

Jim Frost says

Hi Franklin, here’s the predicted R-squared and PRESS formulas. The formulas don’t actually go through and remove each observation one-at-a-time, but it is equivalent to that process.

Duc-Anh Luong says

Hi Jim,

I have question about calculation of the predicted R squared in the linear regression.

(1). Is it true that in each time when we remove 1 data point, we have to fit model again and use this model to predict the values of removed data point?

(2). Is it possible to get negative predicted R-squared?

Many thanks

Duc Anh

Jim Frost says

Hi Duc-Anh,

When the statistical software calculates predict R-squared, it systematically removes each observation and determines how well the model based on all of the other observations predicts that value. The software does this for all observations in the dataset and calculates the predicted error sums of squared (PRESS). It then uses the PRESS to calculate the predicted R-squared. Usually, it uses the error sum of squares (ESS) to calculate R-squared. All of these calculations occur behind the scenes. You don’t need to worry about refitting the model for each observation. All you need to do is assess the predicted R-squared with that process in mind so you know what it really means.

Yes, it is possible to obtain a negative predicted R-squared. However, some statistical software, such as Minitab, rounds these negative values up to zero.

Thank you for writing with your excellent questions,

Jim