Omitted variable bias occurs when a regression model leaves out relevant independent variables, which are known as confounding variables. This condition forces the model to attribute the effects of omitted variables to variables that are in the model, which biases the coefficient estimates. [Read more…] about Confounding Variables Can Bias Your Results

# Regression

## The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares (OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators. [Read more…] about The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

## 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

Ordinary Least Squares (OLS) is the most common estimation method for linear models—and that’s true for a good reason. As long as your model satisfies the OLS assumptions for linear regression, you can rest easy knowing that you’re getting the best possible estimates. [Read more…] about 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

## Regression Tutorial with Analysis Examples

Regression analysis mathematically describes the relationship between independent variables and the dependent variable. It also allows you to predict the mean value of the dependent variable when you specify values for the independent variables. In this regression tutorial, I gather together a wide range of posts that I’ve written about regression analysis. My tutorial helps you go through the regression content in a systematic and logical order. [Read more…] about Regression Tutorial with Analysis Examples

## Choosing the Correct Type of Regression Analysis

Regression analysis mathematically describes the relationship between a set of independent variables and a dependent variable. There are numerous types of regression models that you can use. This choice often depends on the kind of data you have for the dependent variable and the type of model that provides the best fit. In this post, I cover the more common types of regression analyses and how to decide which one is right for your data. [Read more…] about Choosing the Correct Type of Regression Analysis

## Understanding Interaction Effects in Statistics

Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. In this blog post, I explain interaction effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your model. [Read more…] about Understanding Interaction Effects in Statistics

## When Should I Use Regression Analysis?

Use regression analysis to describe the relationships between a set of independent variables and the dependent variable. Regression analysis produces a regression equation where the coefficients represent the relationship between each independent variable and the dependent variable. You can also use the equation to make predictions.

As a statistician, I should probably tell you that I love all statistical analyses equally—like parents with their kids. But, shhh, I have secret! Regression analysis is my favorite because it provides tremendous flexibility, which makes it useful in so many different circumstances. In fact, I’ve described regression analysis as taking correlation to the next level!

In this blog post, I explain the capabilities of regression analysis, the types of relationships it can assess, how it controls the variables, and generally why I love it! You’ll learn when you should consider using regression analysis. [Read more…] about When Should I Use Regression Analysis?

## Using Log-Log Plots to Determine Whether Size Matters

Log-log plots display data in two dimensions where both axes use logarithmic scales. When one variable changes as a constant power of another, a log-log graph shows the relationship as a straight line. In this post, I’ll show you why these graphs are valuable and how to interpret them. [Read more…] about Using Log-Log Plots to Determine Whether Size Matters

## When Do You Need to Standardize the Variables in a Regression Model?

Standardization is the process of putting different variables on the same scale. In regression analysis, there are some scenarios where it is crucial to standardize your independent variables or risk obtaining misleading results.

In this blog post, I show when and why you need to standardize your variables in regression analysis. Don’t worry, this process is simple and helps ensure that you can trust your results. In fact, standardizing your variables can reveal essential findings that you would otherwise miss! [Read more…] about When Do You Need to Standardize the Variables in a Regression Model?

## Why Are There No P Values in Nonlinear Regression?

Nonlinear regression analysis cannot calculate P values for the independent variables in your model. Why not? And, what do you use instead? Those are the topics of this blog post. [Read more…] about Why Are There No P Values in Nonlinear Regression?

## Five Regression Analysis Tips to Avoid Common Problems

Regression is a very powerful statistical analysis. It allows you to isolate and understand the effects of individual variables, model curvature and interactions, and make predictions. Regression analysis offers high flexibility but presents a variety of potential pitfalls. Great power requires great responsibility!

In this post, I offer five tips that will not only help you avoid common problems but also make the modeling process easier. I’ll close by showing you the difference between the modeling process that a top analyst uses versus the procedure of a less rigorous analyst. [Read more…] about Five Regression Analysis Tips to Avoid Common Problems

## Understand Precision in Predictive Analytics to Avoid Costly Mistakes

Precision in predictive analytics refers to how close the model’s predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes

## Heteroscedasticity in Regression Analysis

Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).

To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis

## How to Choose Between Linear and Nonlinear Regression

As you fit regression models, you might need to make a choice between linear and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models. [Read more…] about How to Choose Between Linear and Nonlinear Regression

## Model Specification: Choosing the Correct Regression Model

Model specification is the process of determining which independent variables to include and exclude from a regression equation. How do you choose the best regression model? The world is complicated, and trying to explain it with a small sample doesn’t help. In this post, I’ll show you how to select the correct model. I’ll cover statistical methods, difficulties that can arise, and provide practical suggestions for selecting your model. Often, the variable selection process is a mixture of statistics, theory, and practical knowledge. [Read more…] about Model Specification: Choosing the Correct Regression Model

## Comparing Regression Lines with Hypothesis Tests

How do you compare regression lines statistically? Imagine you are studying the relationship between height and weight and want to determine whether this relationship differs between basketball players and non-basketball players. You can graph the two regression lines to see if they look different. However, you should perform hypothesis tests to determine whether the visible differences are statistically significant. In this blog post, I show you how to determine whether the differences between coefficients and constants in different regression models are statistically significant. [Read more…] about Comparing Regression Lines with Hypothesis Tests

## Identifying the Most Important Independent Variables in Regression Models

You’ve settled on a regression model that contains independent variables that are statistically significant. By interpreting the statistical results, you can understand how changes in the independent variables are related to shifts in the dependent variable. At this point, it’s natural to wonder, “Which independent variable is the most important?” [Read more…] about Identifying the Most Important Independent Variables in Regression Models

## Using Data Mining to Select Regression Models Can Create Serious Problems

Data mining and regression seem to go together naturally. I’ve described regression as a seductive analysis because it is so tempting and so easy to add more variables in the pursuit of a larger R-squared. In this post, I’ll begin by illustrating the problems that data mining creates. To do this, I’ll show how data mining with regression analysis can take randomly generated data and produce a misleading model that appears to have significant variables and a good R-squared. Then, I’ll explain how data mining creates these deceptive results and how to avoid them. [Read more…] about Using Data Mining to Select Regression Models Can Create Serious Problems

## Five Reasons Why Your R-squared can be Too High

When your regression model has a high R-squared, you assume it’s a good thing. You want a high R-squared, right? However, as I’ll show in this post, a high R-squared can occasionally indicate that there is a problem with your model. I’ll explain five reasons why your R-squared can be too high and how to determine whether one of them affects your regression model. [Read more…] about Five Reasons Why Your R-squared can be Too High

## Overfitting Regression Models: Problems, Detection, and Avoidance

Overfitting a model is a condition where a statistical model begins to describe the random error in the data rather than the relationships between variables. This problem occurs when the model is too complex. In regression analysis, overfitting can produce misleading R-squared values, regression coefficients, and p-values. In this post, I explain how overfitting models is a problem and how you can identify and avoid it. [Read more…] about Overfitting Regression Models: Problems, Detection, and Avoidance