Regression analysis mathematically describes the relationship between independent variables and the dependent variable. It also allows you to predict the mean value of the dependent variable when you specify values for the independent variables. In this regression tutorial, I gather together a wide range of posts that I’ve written about regression analysis. My tutorial helps you go through the regression content in a systematic and logical order.
This tutorial covers many facets of regression analysis including selecting the correct type of regression analysis, specifying the best model, interpreting the results, assessing the fit of the model, generating predictions, and checking the assumptions. I close the post with examples of different types of regression analyses.
If you’re learning regression analysis, you might want to bookmark this tutorial!
When to Use Regression and the Signs of a High-Quality Analysis
Before we get to the regression tutorials, I’ll cover several overarching issues.
Why use regression at all? What are common problems that trip up analysts? And, how do you differentiate a high-quality regression analysis from a less rigorous study? Read these posts to find out:
- When Should I Use Regression Analysis?: Learn what regression can do for you and when you should use it.
- Five Regression Tips for a Better Analysis: These tips help ensure that you perform a top-quality regression analysis.
Tutorial: Choosing the Right Type of Regression Analysis
There are many different types of regression analysis. Choosing the right procedure depends on your data and the nature of the relationships, as these posts explain.
- Choosing the Correct Type of Regression Analysis: Reviews different regression methods by focusing on data types.
- How to Choose Between Linear and Nonlinear Regression: Determining which one to use by assessing the statistical output.
- The Difference between Linear and Nonlinear Models: Both kinds of models can fit curves, so what’s the difference?
Tutorial: Specifying the Regression Model
Selecting the right type of regression analysis is just the start of the process. Next, you need to specify the model. Model specification is the process of determining which independent variables belong in the model and whether modeling curvature and interaction effects are appropriate.
Model specification is an iterative process. The interpretation and assumption confirmation sections of this tutorial explain how to assess your model and how to change the model based on the statistical output and graphs.
- Model Specification: Choosing the Correct Regression Model: I review standard statistical approaches, difficulties you may face, and offer some real-world advice.
- Using Data Mining to Select Your Regression Model Can Create Problems: This approach to choosing a model can produce misleading results. Learn how to detect and avoid this problem.
- Guide to Stepwise Regression and Best Subsets Regression: Two common tools for identifying candidate variables during the investigative stages of model building.
- Overfitting Regression Models: Overly complicated models can produce misleading R-squared values, regression coefficients, and p-values. Learn how to detect and avoid this problem.
- Curve Fitting Using Linear and Nonlinear Regression: When your data don’t follow a straight line, the model must fit the curvature. This post covers various methods for fitting curves.
- Understanding Interaction Effects: When the effect of one variable depends on the value of another variable, you need to include an interaction effect in your model otherwise the results will be misleading.
- When Do You Need to Standardize the Variables?: In specific situations, standardizing the independent variables can uncover statistically significant results.
- Confounding Variables and Omitted Variable Bias: The variables that you leave out of the model can bias the variables that you include.
Tutorial: Interpreting Regression Results
After choosing the type of regression and specifying the model, you need to interpret the results. The next set of posts explain how to interpret the results for various regression analysis statistics:
- Coefficients and p-values
- Constant (Y-intercept)
- Comparing regression slopes and constants with hypothesis tests
- R-squared and the goodness-of-fit
- How high does R-squared need be?
- Interpreting a model with a low R-squared
- Adjusted R-squared and Predicted R-squared
- Standard error of the regression (S) vs. R-squared
- Five Reasons Your R-squared can be Too High: A high R-squared can occasionally signify a problem with your model.
- F-test of overall significance
- Identifying the Most Important Independent Variables: After settling on a model, analysts frequently ask, “Which variable is most important?”
Tutorial: Using Regression to Make Predictions
Analysts often use regression analysis to make predictions. In this section of the regression tutorial, learn how to make predictions and assess their precision.
- Making Predictions with Regression Analysis: This guide uses BMI to predict body fat percentage.
- Predicted R-squared: This statistic evaluates how well a model predicts the dependent variable for new observations.
- Understand Prediction Precision to Avoid Costly Mistakes: Research shows that presentation affects the number of interpretation mistakes. Covers prediction intervals.
- Prediction intervals versus other intervals: Prediction intervals indicate the precision of the predictions. I compare prediction intervals to different types of intervals.
Tutorial: Checking Regression Assumptions and Fixing Problems
Like other statistical procedures, regression analysis has assumptions that you need to meet, or the results can be unreliable. In regression, you primarily verify the assumptions by assessing the residual plots. The posts below explain how to do this and present some methods for fixing problems.
- The Seven Classical Assumptions of OLS Linear Regression
- Residual plots: Shows what the graphs should look like and why they might not!
- Heteroscedasticity: The residuals should have a constant scatter (homoscedasticity). Shows how to detect this problem and various methods of fixing it.
- Multicollinearity: Highly correlated independent variables can be problematic, but not always! Explains how to identify this problem and several ways of resolving it.
Examples of Different Types of Regression Analyses
The last part of the regression tutorial contains regression analysis examples. I’ll be adding more. Some of the examples are included in previous tutorial sections. Most of these regression examples include the datasets so you can try it yourself!
- Linear regression with a double-log transformation: Models the relationship between mammal mass and metabolic rate using a fitted line plot.
- Modeling the relationship between BMI and Body Fat Percentage with linear regression.
- Curve fitting with linear and nonlinear regression.