• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

How High Does R-squared Need to Be?

By Jim Frost 9 Comments

How high does R-squared need to be in regression analysis? That seems to be an eternal question.

Image of R-squared.Previously, I explained how to interpret R-squared. I showed how the interpretation of R2 is not always straightforward. A low R-squared isn’t always a problem, and a high R-squared doesn’t automatically indicate that you have a good model.

So, how high should R-squared be? The definitive answer is . . . it depends. You’ll need some patience because my assertion is that this question is the wrong question. In this post, I reveal why it is the wrong question and which questions you should ask instead.

Related post: Five Reasons Why Your R-squared can be Too High

How High Does R-squared Need to be is the Wrong Question

How high does R-squared need to be? If you think about it, there is only one correct answer. R-squared should accurately reflect the percentage of the dependent variable variation that the linear model explains. Your R2 should not be any higher or lower than this value.

The correct R2 value depends on your study area. Different research questions have different amounts of variability that are inherently unexplainable. Case in point, humans are hard to predict. Any study that attempts to predict human behavior will tend to have R-squared values less than 50%. However, if you analyze a physical process and have very good measurements, you might expect R-squared values over 90%. There is no one-size fits all best answer for how high R-squared should be.

Consequently, the answer to “how high does R-squared need to be?” is that it depends on the amount of variability that is actually explainable. Clearly, your R-squared should not be greater than the amount of variability that is actually explainable—which can happen in regression. To see if your R-squared is in the right ballpark, compare your R2 to those from other studies.

Chasing a high R2 value can produce an inflated value and a misleading model. Read my post about adjusted R-squared and predicted R-squared to see how these statistics can help you avoid these problems.

Related post: Model Specification: Choosing the Correct Regression Model

Define Your Objectives for the Regression Model

When you wonder if the R-squared is high enough, it’s probably because you want to know if the regression model satisfies your objectives. Given your requirements, does the model meet your needs? Therefore, you need to define your objectives before proceeding.

To determine whether a model meets your objectives, you’ll need to ask different questions because R2 doesn’t address this issue. The correct questions depend on whether your primary goal for the model is:

  • To understand the relationships between the independent variables and dependent variable. Or,
  • To predict the dependent variable.

R-squared and Understanding the Relationships between the Variables

If your primary goal is to understand the relationships between the variables in your model, the answer to how high R-squared needs to be is very simple. For this objective, R2 is irrelevant.

This statement might surprise you. However, the interpretation of the significant relationships in a regression model does not change regardless of whether your R2 is 15% or 85%! The regression coefficients define the relationship between each independent variable and the dependent variable. The interpretation of the coefficients doesn’t change based on the value of R-squared.

Suppose we have a statistically significant coefficient that equals 2. This coefficient indicates that the mean of the dependent variable increases by 2 for every one-unit increase in the independent variable irrespective of the R2 value.

Related post: See a graphical illustration of why the interpretation of coefficients does not depend on R-squared.

The question about how high R-squared needs to be doesn’t make sense in this context because it doesn’t matter. A small R2 doesn’t nullify or change the interpretation of the coefficient for an independent variable that is statistically significant.

Instead of wondering if your R-squared value is high enough, you should ask the following questions to ensure that you can trust your results:

  • Do I have a sound basis for my model?
  • Can I trust my data?
  • Do the residual plots look good?
  • Do the results fit theory?
  • How do I interpret the regression coefficients and P-values?

R-squared and Predicting the Dependent Variable

On the other hand, if your primary goal is to use your regression model to predict the value of the dependent variable, R-squared is a consideration.

Predictions are more complex than just the single predicted value. Predictions include a margin of error. More precise predictions have a smaller amount of error.

R2 is relevant in this context because it is a measure of the error. Lower R2 values correspond to models with more error, which in turn produces predictions that are less precise. In other words, if your R2 is too low, your predictions will be too imprecise to be useful.

A low R-squared can be an indicator of imprecision predictions. However, R2 doesn’t tell you directly whether the predictions are sufficiently precise for your requirements.

We need a direct measure of precision that uses the units of the dependent variable. That’s why asking, “How high does R-squared need to be?” still is not the correct question.

Instead, you should ask the questions above plus the following question:

  • Are the prediction intervals precise enough for my requirements?

Using Prediction intervals to Assess Precision

Most statistical software can calculate prediction intervals, and they are easy to use.

A prediction interval is a range where a single new observation is likely to fall given values of the independent variable(s) that you specify. These ranges incorporate the margin of error around the predicted value. If the prediction intervals are too wide, the predictions don’t provide useful information. Narrow prediction intervals represent more precise predictions.

Fitted line plot that displays the prediction intervals for using BMI to predict body fat percentage.In my post about using regression analysis to make predictions, I present the model displayed in the graph. This model uses BMI to predict the percentage of body fat. The 95% prediction interval for a BMI of 18 is 16-30% body fat. We can be 95% confident that an individual with a BMI of 18 will fall within this range.

At this point, you need to use client requirements, spec limits, and subject area knowledge to determine whether the prediction intervals are narrow enough to represent meaningful predictions. By assessing the prediction intervals, you are evaluating the precision of the model directly rather than relying on an arbitrary cut-off value for R-squared.

I’m not a medical expert, but I’d guess that the 14 point range of 16-30% is too wide to provide meaningful information. If this is true, our regression model is too imprecise to be useful.

Related posts: Understand Precision in Applied Regression to Avoid Costly Mistakes and Confidence Intervals vs Prediction Intervals vs Tolerance Intervals.

R-squared Is Overrated!

Asking “How high does R-squared need to be?” is usually not the correct question to ask. You probably want to know if the regression model can meet your needs. To this end, there are better questions that you should ask.

R-squared gets all of the attention for assessing the goodness-of-fit. It seems like a simple statistic to interpret. However, evaluating the fit involves more than just this single statistic. You need to use subject area knowledge, residual plots, coefficients, and prediction intervals if you’re making predictions.

However, R-squared does have some good uses. For one thing, compare your R2 value to values from similar studies. If your R2 is markedly higher or lower, you should investigate because there might be a problem.

Be sure to read my post about the standard error of the regression (S), which is a different type of goodness-of-fit measure that is more useful when you need to make predictions.

If you’re learning regression, check out my Regression Tutorial!

Share this:

  • Tweet

Related

Filed Under: Regression Tagged With: conceptual

Reader Interactions

Comments

  1. Jerick Galindo Gingatan says

    May 14, 2022 at 11:36 am

    Hello sir! Do you have citation for this? Lower R2 values correspond to models with more error, which in turn produces predictions that are less precise. In other words, if your R2 is too low, your predictions will be too imprecise to be useful.

    Reply
    • Jim Frost says

      May 14, 2022 at 4:01 pm

      Hi Jerick,

      This is a general property of linear models just due to how they work and their underlying calculations. Most any textbook should cover it. I always refer to Applied Linear Statistical Models by Neter et al.

      Reply
  2. Arnola says

    April 27, 2022 at 8:54 pm

    Hi! thank you for this post and i was wondering if you have a citation for “If your primary goal is to understand the relationships between the variables in your model, the answer to how high R-squared needs to be is very simple. For this objective, R2 is irrelevant.”? thank you in advanced

    Reply
    • Jim Frost says

      April 28, 2022 at 12:18 am

      Hi Arnola,

      That’s a fundamental property of linear models. I’d imagine most textbooks would explain that. I always use Applied Linear Statistical Models by Neter et al.

      Reply
  3. Kaitlyn says

    May 26, 2021 at 4:47 pm

    OK, thanks so much for the reply!

    Reply
  4. Kaitlyn Suski says

    May 25, 2021 at 2:06 pm

    Hi. I was wondering if you have a citation for the assertion “Any study that attempts to predict human behavior will tend to have R-squared values less than 50%. ” Thanks!

    Reply
    • Jim Frost says

      May 26, 2021 at 3:59 pm

      Hi Kaitlyn,

      Unfortunately, I don’t have a citation on hand. I have read that in the literature, but don’t remember where. I’ve also observed this occurring in the many studies I’ve read over the years. It’s just hard to predict human behavior compared to, say, a physical process. That shows up in the R-squared! Keep in mind that this is not a hard and fast rule. It’s a tendency. But, a strong one.

      Reply
  5. kembhootha says

    September 21, 2020 at 3:28 am

    Mr. Frost, Thank you for these articles. I have generally taken a math-first approach to all of the statistical learning methods, and after hours of sweat, the intuition and logical conclusions made their appearance. I have been using your articles to supplement my own learning and I have found them to be incredibly enlightening in identifying patterns, intuitive thinking etc. Thank you for your efforts.

    Reply
  6. avianto nugroho says

    December 29, 2018 at 10:06 am

    Hi Jim,

    thank you very much for your posts, very helpful. Although, I am still trying to figure every single theory out.

    I am Avi, a master student who is currently writing master’s thesis.

    So, I am analysing my data using GAM. To this point, I have come up with several models and done model selections. As a result, I got a model which I think (still not sure) that it is the best model. Considering, the best model is the model having the lowest AIC.

    My question is, which one do I have to choose between the highest R-squared and the lowest AIC? Or in between?

    Kindly, give me some advices on this case. Likewise, I think you should consider writing a post concerning GAM modelling or similar models particularly its operation using R. Because I would say that your blog is a more simple and understandable for the beginners in statistics.

    Thank you again. You’re just cool.

    All the best,
    Avi

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics eBook!

New! Buy My Hypothesis Testing eBook!

Buy My Regression eBook!

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter
    • Popular
    • Latest
    Popular
    • How To Interpret R-squared in Regression Analysis
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Measures of Central Tendency: Mean, Median, and Mode
    • Normal Distribution in Statistics
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Understanding Interaction Effects in Statistics
    Latest
    • How to Find the P value: Process and Calculations
    • Sampling Methods: Different Types in Research
    • Beta Distribution: Uses, Parameters & Examples
    • Geometric Distribution: Uses, Calculator & Formula
    • What is Power in Statistics?
    • Conditional Distribution: Definition & Finding
    • Marginal Distribution: Definition & Finding

    Recent Comments

    • James on Introduction to Bootstrapping in Statistics with an Example
    • Jim Frost on Introduction to Bootstrapping in Statistics with an Example
    • Jim Frost on How To Interpret R-squared in Regression Analysis
    • Jim Frost on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Poisson Distribution: Definition & Uses

    Copyright © 2022 · Jim Frost · Privacy Policy