Regression

Residual Sum of Squares (RSS) Explained

The residual sum of squares (RSS) measures the difference between your observed data and the model’s predictions. It is the portion of variability your regression model does not explain, also known as the model’s error. Use RSS to evaluate how well your model fits the data. [Read more…] about Residual Sum of Squares (RSS) Explained

Omitted Variable Bias: Definition, Avoiding & Example

By Jim Frost 1 Comment

What is Omitted Variable Bias?

Omitted variable bias (OVB) occurs when a regression model excludes a relevant variable. The absence of these critical variables can skew the estimated relationships between variables in the model, potentially leading to erroneous interpretations. This bias can exaggerate, mask, or entirely flip the direction of the estimated relationship between an independent and dependent variable. [Read more…] about Omitted Variable Bias: Definition, Avoiding & Example

What is a Parsimonious Model? Benefits and Selecting

By Jim Frost Leave a Comment

What is a Parsimonious Model?

A parsimonious model in statistics is one that uses relatively few independent variables to obtain a good fit to the data. [Read more…] about What is a Parsimonious Model? Benefits and Selecting

Sum of Squares: Definition, Formula & Types

By Jim Frost 1 Comment

What is the Sum of Squares?

The sum of squares (SS) is a statistic that measures the variability of a dataset’s observations around the mean. It’s the cumulative total of each data point’s squared difference from the mean. [Read more…] about Sum of Squares: Definition, Formula & Types

Root Mean Square Error (RMSE)

By Jim Frost 4 Comments

What is the Root Mean Square Error?

The root mean square error (RMSE) measures the average difference between a statistical model’s predicted values and the actual values. Mathematically, it is the standard deviation of the residuals. Residuals represent the distance between the regression line and the data points. [Read more…] about Root Mean Square Error (RMSE)

Least Squares Regression: Definition, Formulas & Example

By Jim Frost 2 Comments

A least squares regression line represents the relationship between variables in a scatterplot. The procedure fits the line to the data points in a way that minimizes the sum of the squared vertical distances between the line and the points. It is also known as a line of best fit or a trend line. [Read more…] about Least Squares Regression: Definition, Formulas & Example

Linear Regression Equation Explained

By Jim Frost 3 Comments

A linear regression equation describes the relationship between the independent variables (IVs) and the dependent variable (DV). It can also predict new values of the DV for the IV values you specify. [Read more…] about Linear Regression Equation Explained

Linear Regression

By Jim Frost 11 Comments

What is Linear Regression?

Linear regression models the relationships between at least one explanatory variable and an outcome variable. These variables are known as the independent and dependent variables, respectively. When there is one independent variable (IV), the procedure is known as simple linear regression. When there are more IVs, statisticians refer to it as multiple regression. [Read more…] about Linear Regression

Mean Squared Error (MSE)

By Jim Frost Leave a Comment

Mean squared error (MSE) measures the amount of error in statistical models. It assesses the average squared difference between the observed and predicted values. When a model has no error, the MSE equals zero. As model error increases, its value increases. The mean squared error is also known as the mean squared deviation (MSD). [Read more…] about Mean Squared Error (MSE)

Orthogonal: Models, Definition & Finding

By Jim Frost 7 Comments

Orthogonality is a mathematical property that is beneficial for statistical models. It’s particularly helpful when performing factorial analysis of designed experiments. [Read more…] about Orthogonal: Models, Definition & Finding

Independent and Dependent Variables: Differences & Examples

By Jim Frost 15 Comments

Independent variables and dependent variables are the two fundamental types of variables in statistical modeling and experimental designs. Analysts use these methods to understand the relationships between the variables and estimate effect sizes. What effect does one variable have on another?

In this post, learn the definitions of independent and dependent variables, how to identify each type, how they differ between different types of studies, and see examples of them in use. [Read more…] about Independent and Dependent Variables: Differences & Examples

Understanding Historians’ Rankings of U.S. Presidents using Regression Models

By Jim Frost 9 Comments

Historians rank the U.S. Presidents from best to worse using all the historical knowledge at their disposal. Frequently, groups, such as C-Span, ask these historians to rank the Presidents and average the results together to help reduce bias. The idea is to produce a set of rankings that incorporates a broad range of historians, a vast array of information, and a historical perspective. These rankings include informed assessments of each President’s effectiveness, leadership, moral authority, administrative skills, economic management, vision, and so on. [Read more…] about Understanding Historians’ Rankings of U.S. Presidents using Regression Models

Proxy Variables: The Good Twin of Confounding Variables

By Jim Frost 10 Comments

Proxy variables are easily measurable variables that analysts include in a model in place of a variable that cannot be measured or is difficult to measure. Proxy variables can be something that is not of any great interest itself, but has a close correlation with the variable of interest. [Read more…] about Proxy Variables: The Good Twin of Confounding Variables

Variance Inflation Factors (VIFs)

By Jim Frost 22 Comments

Variance Inflation Factors (VIFs) measure the correlation among independent variables in least squares regression models. Statisticians refer to this type of correlation as multicollinearity. Excessive multicollinearity can cause problems for regression models.

In this post, I focus on VIFs and how they detect multicollinearity, why they’re better than pairwise correlations, how to calculate VIFs yourself, and interpreting VIFs. If you need a refresher about the types of problems that multicollinearity causes and how to fix them, read my post: Multicollinearity: Problems, Detection, and Solutions. [Read more…] about Variance Inflation Factors (VIFs)

How to Perform Regression Analysis using Excel

By Jim Frost 26 Comments

Excel can perform various statistical analyses, including regression analysis. It is a great option because nearly everyone can access Excel. This post is an excellent introduction to performing and interpreting regression analysis, even if Excel isn’t your primary statistical software package.

New eBook Release! Regression Analysis: An Intuitive Guide

By Jim Frost 96 Comments

I’m thrilled to announce the release of my first book! Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

If you like the clear writing style I use on this website, you’ll love this book! The end of the post displays the entire table of contents! [Read more…] about New eBook Release! Regression Analysis: An Intuitive Guide

Confounding Variable: Definition & Examples

By Jim Frost 86 Comments

Confounding Variable Definition

In studies examining possible causal links, a confounding variable is an unaccounted factor that impacts both the potential cause and effect and can distort the results. Recognizing and addressing these variables in your experimental design is crucial for producing valid findings. Statisticians also refer to confounding variables that cause bias as confounders, omitted variables, and lurking variables. [Read more…] about Confounding Variable: Definition & Examples

The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

By Jim Frost 32 Comments

The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares (OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators. [Read more…] about The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

By Jim Frost 159 Comments

Ordinary Least Squares (OLS) is the most common estimation method for linear models—and that’s true for a good reason. As long as your model satisfies the OLS assumptions for linear regression, you can rest easy knowing that you’re getting the best possible estimates. [Read more…] about 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

Regression Tutorial with Analysis Examples

By Jim Frost 84 Comments

Regression analysis mathematically describes the relationship between independent variables and the dependent variable. It also allows you to predict the mean value of the dependent variable when you specify values for the independent variables. In this regression tutorial, I gather together a wide range of posts that I’ve written about regression analysis. My tutorial helps you go through the regression content in a systematic and logical order. [Read more…] about Regression Tutorial with Analysis Examples