Regression analysis mathematically describes the relationship between independent variables and the dependent variable. It also allows you to predict the mean value of the dependent variable when you specify values for the independent variables. In this regression tutorial, I gather together a wide range of posts that I’ve written about regression analysis. My tutorial helps you go through the regression content in a systematic and logical order.

This tutorial covers many facets of regression analysis including selecting the correct type of regression analysis, specifying the best model, interpreting the results, assessing the fit of the model, generating predictions, and checking the assumptions. I close the post with examples of different types of regression analyses.

If you’re learning regression analysis, you might want to bookmark this tutorial!

## When to Use Regression and the Signs of a High-Quality Analysis

Before we get to the regression tutorials, I’ll cover several overarching issues.

Why use regression at all? What are common problems that trip up analysts? And, how do you differentiate a high-quality regression analysis from a less rigorous study? Read these posts to find out:

- When Should I Use Regression Analysis?: Learn what regression can do for you and when you should use it.
- Five Regression Tips for a Better Analysis: These tips help ensure that you perform a top-quality regression analysis.

## Tutorial: Choosing the Right Type of Regression Analysis

There are many different types of regression analysis. Choosing the right procedure depends on your data and the nature of the relationships, as these posts explain.

- Choosing the Correct Type of Regression Analysis: Reviews different regression methods by focusing on data types.
- How to Choose Between Linear and Nonlinear Regression: Determining which one to use by assessing the statistical output.
- The Difference between Linear and Nonlinear Models: Both kinds of models can fit curves, so what’s the difference?

## Tutorial: Specifying the Regression Model

Selecting the right type of regression analysis is just the start of the process. Next, you need to specify the model. Model specification is the process of determining which independent variables belong in the model and whether modeling curvature and interaction effects are appropriate.

Model specification is an iterative process. The interpretation and assumption confirmation sections of this tutorial explain how to assess your model and how to change the model based on the statistical output and graphs.

- Model Specification: Choosing the Correct Regression Model: I review standard statistical approaches, difficulties you may face, and offer some real-world advice.
- Using Data Mining to Select Your Regression Model Can Create Problems: This approach to choosing a model can produce misleading results. Learn how to detect and avoid this problem.
- Guide to Stepwise Regression and Best Subsets Regression: Two common tools for identifying candidate variables during the investigative stages of model building.
- Overfitting Regression Models: Overly complicated models can produce misleading R-squared values, regression coefficients, and p-values. Learn how to detect and avoid this problem.
- Curve Fitting Using Linear and Nonlinear Regression: When your data don’t follow a straight line, the model must fit the curvature. This post covers various methods for fitting curves.
- Understanding Interaction Effects: When the effect of one variable depends on the value of another variable, you need to include an interaction effect in your model otherwise the results will be misleading.
- When Do You Need to Standardize the Variables?: In specific situations, standardizing the independent variables can uncover statistically significant results.

## Tutorial: Interpreting Regression Results

After choosing the type of regression and specifying the model, you need to interpret the results. The next set of posts explain how to interpret the results for various regression analysis statistics:

- Coefficients and p-values
- Constant (Y-intercept)
- Comparing regression slopes and constants with hypothesis tests
- R-squared and the goodness-of-fit
- How high does R-squared need be?
- Interpreting a model with a low R-squared
- Adjusted R-squared and Predicted R-squared
- Standard error of the regression (S) vs. R-squared
- Five Reasons Your R-squared can be Too High: A high R-squared can occasionally signify a problem with your model.

- F-test of overall significance
- Identifying the Most Important Independent Variables: After settling on a model, analysts frequently ask, “Which variable is most important?”

## Tutorial: Using Regression to Make Predictions

Analysts often use regression analysis to make predictions. In this section of the regression tutorial, learn how to make predictions and assess their precision.

- Making Predictions with Regression Analysis: This guide uses BMI to predict body fat percentage.
- Predicted R-squared: This statistic evaluates how well a model predicts the dependent variable for new observations.
- Understand Prediction Precision to Avoid Costly Mistakes: Research shows that presentation affects the number of interpretation mistakes. Covers prediction intervals.
- Prediction intervals versus other intervals: Prediction intervals indicate the precision of the predictions. I compare prediction intervals to different types of intervals.

## Tutorial: Checking Regression Assumptions and Fixing Problems

Like other statistical procedures, regression analysis has assumptions that you need to meet, or the results can be unreliable. In regression, you primarily verify the assumptions by assessing the residual plots. The posts below explain how to do this and present some methods for fixing problems.

- The Seven Classical Assumptions of OLS Linear Regression
- Residual plots: Shows what the graphs should look like and why they might not!
- Heteroscedasticity: The residuals should have a constant scatter (homoscedasticity). Shows how to detect this problem and various methods of fixing it.
- Multicollinearity: Highly correlated independent variables can be problematic, but not always! Explains how to identify this problem and several ways of resolving it.

## Examples of Different Types of Regression Analyses

The last part of the regression tutorial contains regression analysis examples. I’ll be adding more. Some of the examples are included in previous tutorial sections. Most of these regression examples include the datasets so you can try it yourself!

- Linear regression with a double-log transformation: Models the relationship between mammal mass and metabolic rate using a fitted line plot.
- Modeling the relationship between BMI and Body Fat Percentage with linear regression.
- Curve fitting with linear and nonlinear regression.

Tobden says

would you also throw some ideas on Instrumental variable and 2 SLS method please?

Jim Frost says

Those are great ideas! I’ll write about them in future posts.

bwbjlt says

such a splendid compilation, Thanks Jim

Jim Frost says

Thank you!

Farmanullah says

great work by great man,, it is easily accessible source to access the scholars,, sir i am going to analyse data plz send me guidlines for selection of best simple linear/ multiple linear regression model, thanks

Jim Frost says

Hi, thank you so much for your kind words. I really appreciate it! I’ve written a blog post that I think is exactly what you need. It’ll help you choose the best regression model.

Nivedan says

Hi Jim!

Can you write on Logistic regression please!

Thank you

Jim Frost says

Hi! You bet! I plan to write about it in the near future!

Dina says

Hi, Jim!

I’m really happy to find your blog. It’s really helping, especially that you use basic English so non-native speaker can understand it better than reading most textbooks. Thanks!

Jim Frost says

Hi Dina, you’re welcome! And, thanks so much for your kind words–you made my day!

Aftab Siddiqui says

yes.the language of the topic is very easy , i would appreciate you sir ,if you let me know that ,If rank

correlation is r =0.8,sum of “D”square=33.how we will calculate /find no. observations (n).

Jim Frost says

I’m not sure what you mean by “D” square, but I believe you’ll need more information for that.

Md zishan hussain says

Hello Jim,

I am using Step-wise regression to select significant variables in the model for prediction.how to interpret BIC in variable selection?

regards,

Zishan

Jim Frost says

Hi, when comparing candidate models, you look for models with a lower BIC. A lower BIC indicates that a model is more likely to be the true model. BIC identifies the model that is more likely to have generated the observed data.

Yud says

Hello Jim

I’d like to

Know what your suggestions are with regards to choice of regression for predicting:

the likelihood of participants falling into

One of two categories (low Fear group codes 1 and high Fear 2 … when looking at scores from several variables ( e.g. external

Other locus of control, external social locus of control , internal locus of control and social phobia and sleep quality )

It was suggested that I break the question up to smaller components … I’d appreciate your thoughts on it …. thanks!

Jim Frost says

Because you have a binary response (dependent variable), you’ll need to use binary logistic regression. I don’t know what types of predictors you have. If they’re continuous, you can just use them in the model and see how it works.

If they’re ordinal data, such as a Likert scale, you can still try using them as predictors in the model. However, ordinal data are less likely to satisfy all the assumptions. Check the residual plots. If including the ordinal data in the model doesn’t work, you can recode them as indicator variables (1s and 0s only based on whether an observation meets a criteria or not. For example, if you have a scale of -2, -1. 0, 1, 2 you could recode it so observations with a positive score get a 1 while all other scores get a 0.

Those are some ideas to try. Of course, what works best for your case depends on the subject area and types of data that you have.

I hope this helps!

Lisa says

thank you Jim this is helpful

Jim Frost says

You’re very welcome, Lisa! I’m glad you found it to be helpful!

Patrik Silva says

Hi Jim, I would like to see you writing something about Cross Validation (Training and test).

Patrik

RAJKUMAR R says

Very Good Explanation about regression ….Thank you sir for such a wonderful post….

Arnab Paul says

Independent variables range from 0 to 1 and corresponding dependent variables range from 1 to 5 . If we apply regression analysis to above and predict the value of y for any value of x that also ranges from 0 to 1, whether the value of y will always lie in the range 1 to 5?

Jim Frost says

In my experience, the predicted values will fall outside the range of the actual dependent variable. Assuming that you are referring to actual limits at 1 and 5, the regression analysis does not “understand” that those are hard limits. The extent that the predicted values fall outside these limits depends on the amount of error in the model.