Precision in predictive analytics refers to how close the model’s predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes
Regression
Heteroscedasticity in Regression Analysis
Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).
To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis
How to Choose Between Linear and Nonlinear Regression
As you fit regression models, you might need to make a choice between linear and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models. [Read more…] about How to Choose Between Linear and Nonlinear Regression
Model Specification: Choosing the Best Regression Model
Model specification is the process of determining which independent variables to include and exclude from a regression equation. How do you choose the best regression model? The world is complicated and trying to explain it with a small sample doesn’t help. In this post, I’ll show you how to decide on the model. I’ll cover statistical methods, difficulties that can arise, and provide practical suggestions for selecting your model. Often, the variable selection process is a mixture of statistics, theory, and practical knowledge. [Read more…] about Model Specification: Choosing the Best Regression Model
Comparing Regression Lines with Hypothesis Tests
How do you compare regression lines statistically? Imagine you are studying the relationship between height and weight and want to determine whether this relationship differs between basketball players and non-basketball players. You can graph the two regression lines to see if they look different. However, you should perform hypothesis tests to determine whether the visible differences are statistically significant. In this blog post, I show you how to determine whether the differences between coefficients and constants in different regression models are statistically significant. [Read more…] about Comparing Regression Lines with Hypothesis Tests
Identifying the Most Important Independent Variables in Regression Models
You’ve settled on a regression model that contains independent variables that are statistically significant. By interpreting the statistical results, you can understand how changes in the independent variables are related to shifts in the dependent variable. At this point, it’s natural to wonder, “Which independent variable is the most important?” [Read more…] about Identifying the Most Important Independent Variables in Regression Models
Using Data Mining to Select Regression Models Can Create Serious Problems
Data mining and regression seem to go together naturally. I’ve described regression as a seductive analysis because it is so tempting and so easy to add more variables in the pursuit of a larger R-squared. In this post, I’ll begin by illustrating the problems that data mining creates. To do this, I’ll show how data mining with regression analysis can take randomly generated data and produce a misleading model that appears to have significant variables and a good R-squared. Then, I’ll explain how data mining creates these deceptive results and how to avoid them. [Read more…] about Using Data Mining to Select Regression Models Can Create Serious Problems
Five Reasons Why Your R-squared can be Too High
When your regression model has a high R-squared, you assume it’s a good thing. You want a high R-squared, right? However, as I’ll show in this post, a high R-squared can occasionally indicate that there is a problem with your model. I’ll explain five reasons why your R-squared can be too high and how to determine whether one of them affects your regression model. [Read more…] about Five Reasons Why Your R-squared can be Too High
Overfitting Regression Models: Problems, Detection, and Avoidance
Overfitting a model is a condition where a statistical model begins to describe the random error in the data rather than the relationships between variables. This problem occurs when the model is too complex. In regression analysis, overfitting can produce misleading R-squared values, regression coefficients, and p-values. In this post, I explain how overfitting models is a problem and how you can identify and avoid it. [Read more…] about Overfitting Regression Models: Problems, Detection, and Avoidance
Guide to Stepwise Regression and Best Subsets Regression
Automatic variable selection procedures are algorithms that pick the variables to include in your regression model. Stepwise regression and Best Subsets regression are two of the more common variable selection methods. In this post, I compare how these methods work and which one provides better results. [Read more…] about Guide to Stepwise Regression and Best Subsets Regression
How to Interpret Regression Models that have Significant Variables but a Low R-squared
Does your regression model have a low R-squared? That seems like a problem—but it might not be. Learn what a low R-squared does and does not mean for your model. [Read more…] about How to Interpret Regression Models that have Significant Variables but a Low R-squared
How High Does R-squared Need to Be?
How high does R-squared need to be in regression analysis? That seems to be an eternal question. [Read more…] about How High Does R-squared Need to Be?
Making Predictions with Regression Analysis
If you were able to make predictions about something important to you, you’d probably love that, right? It’s even better if you know that your predictions are sound. In this post, I show how to use regression analysis to make predictions and determine whether they are both unbiased and precise. [Read more…] about Making Predictions with Regression Analysis
Curve Fitting using Linear and Nonlinear Regression
In regression analysis, curve fitting is the process of specifying the model that provides the best fit to the specific curves in your dataset. Curved relationships between variables are not as straightforward to fit and interpret as linear relationships. [Read more…] about Curve Fitting using Linear and Nonlinear Regression
How To Interpret R-squared in Regression Analysis
R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale. [Read more…] about How To Interpret R-squared in Regression Analysis
How to Interpret P-values and Coefficients in Regression Analysis
P-values and coefficients in regression analysis work together to tell you which relationships in your model are statistically significant and the nature of those relationships. The coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p-values for the coefficients indicate whether these relationships are statistically significant. [Read more…] about How to Interpret P-values and Coefficients in Regression Analysis
R-squared Is Not Valid for Nonlinear Regression
Nonlinear regression is an extremely flexible analysis that can fit most any curve that is present in your data. R-squared seems like a very intuitive way to assess the goodness-of-fit for a regression model. Unfortunately, the two just don’t go together. R-squared is invalid for nonlinear regression. [Read more…] about R-squared Is Not Valid for Nonlinear Regression
How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis
R-squared tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model. [Read more…] about How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis
How to Interpret the Constant (Y Intercept) in Regression Analysis
The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the meaning of the y-intercept in your regression output. [Read more…] about How to Interpret the Constant (Y Intercept) in Regression Analysis
Check Your Residual Plots to Ensure Trustworthy Regression Results!
Use residual plots to check the assumptions of an OLS linear regression model. If you violate the assumptions, you risk producing results that you can’t trust. Residual plots display the residual values on the y-axis and fitted values, or another variable, on the x-axis. After you fit a regression model, it is crucial to check the residual plots. If your plots display unwanted patterns, you can’t trust the regression coefficients and other numeric results.
In this post, I explain the conceptual reasons why residual plots help ensure that your regression model is valid. I’ll also show you what to look for and how to fix the problems. [Read more…] about Check Your Residual Plots to Ensure Trustworthy Regression Results!