Nonlinear regression analysis cannot calculate P values for the independent variables inย your model. Why not? And, what do you use instead? Those are the topics of this blog post. [Read more…] about Why Are There No P Values in Nonlinear Regression?
Blog
Five Regression Analysis Tips to Avoid Common Problems

In this post, I offer five tips that will not only help you avoid common problems but also make the modeling process easier. Iโll close by showing you the difference between the modeling process that a top analyst uses versus the procedure of a less rigorous analyst. [Read more…] about Five Regression Analysis Tips to Avoid Common Problems
What is the Relationship Between the Reproducibility of Experimental Results and P Values?
The ability to reproduce experimental results should be related to P values. After all, both of these statistical concepts have similar foundations.
- P values help you separate the signal of population level effects from the noise in sample data.
- Reproducible results support the notion that the findings can be generalized to the population rather than applying only to a specific sample.
So, P values are related to reproducibility in theory. But, does this relationship exist in the real world? In this blog post, I present the findings of an exciting study that answers this question! [Read more…] about What is the Relationship Between the Reproducibility of Experimental Results and P Values?
Understand Precision in Predictive Analytics to Avoid Costly Mistakes
Precision in predictive analytics refers to how close the modelโs predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes
Heteroscedasticity in Regression Analysis
Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).
To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis
How to Choose Between Linear and Nonlinear Regression
As you fit regression models, you might need to make a choice between linearย and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models. [Read more…] about How to Choose Between Linear and Nonlinear Regression
Statistics, Old Love Letters, and Changing Times
Have you ever seen your present reflected in an object from the past? This summer I’ve discovered glimpses of my daily life working with statistical software in words written more than 70 years ago. Bear with me because this blog post takes the scenic route to arrive at modern statistics. [Read more…] about Statistics, Old Love Letters, and Changing Times
Lessons in Quality During a Long and Strange Journey Home
Back in January of 2014, I didnโt expect that our family trip to Florida would end with me driving a plane load of passengers nearly 200 miles to their homes, but it did. [Read more…] about Lessons in Quality During a Long and Strange Journey Home
Why Are P Values Misinterpreted So Frequently?
P values are commonly misinterpreted. Itโs a very slippery concept that requires a lot of background knowledge to understand. Not surprisingly, Iโve received many questions about P values in statistical hypothesis testing over the years. However, one question stands out. Why are P value misinterpretations so prevalent? I answer that question in this blog post, and help you avoid making the same mistakes. [Read more…] about Why Are P Values Misinterpreted So Frequently?
Model Specification: Choosing the Best Regression Model
Model specification is the process of determining which independent variables to include and exclude from a regression equation. How do you choose the best regression model? The world is complicated and trying to explain it with a small sample doesnโt help. In this post, Iโll show you how to decide on the model. Iโll cover statistical methods, difficulties that can arise, and provide practical suggestions for selecting your model. Often, the variable selection process is a mixture of statistics, theory, and practical knowledge. [Read more…] about Model Specification: Choosing the Best Regression Model
Comparing Regression Lines with Hypothesis Tests
How do you compare regression lines statistically? Imagine you are studying the relationship between height and weight and want to determine whether this relationship differs between basketball players and non-basketball players. You can graph the two regression lines to see if they look different. However, you should perform hypothesis tests to determine whether the visible differences are statistically significant. In this blog post, I show you how to determine whether the differences between coefficients and constantsย inย different regressionย models are statistically significant. [Read more…] about Comparing Regression Lines with Hypothesis Tests
Identifying the Most Important Independent Variables in Regression Models
Youโve settled on a regression model that contains independent variables that are statistically significant. By interpreting the statistical results, you can understand how changes in the independent variables are related to shifts in the dependent variable. At this point, itโs natural to wonder, โWhich independent variable is the most important?โ [Read more…] about Identifying the Most Important Independent Variables in Regression Models
Confidence Intervals vs Prediction Intervals vs Tolerance Intervals
Intervals are estimation methods in statistics that use sample data to produce ranges of values that are likely to contain the population value of interest. In contrast, point estimates are single value estimates of a population value. Of the different types of statistical intervals, confidence intervals are the most well-known. However, certain kinds of analyses and situations call for other types of ranges that provide different information. [Read more…] about Confidence Intervals vs Prediction Intervals vs Tolerance Intervals
As a Statistician, Can I Say Age is Just a Number?
My last birthday wasnโt one of those difficult ages that end with a zero. Thank goodness! However, the passage of another year got me thinking. At that point, I told myself that age is just a number. Can you do a mental double-take? I think I did one. Can a statistician say that age is just a number? After all, itโs through numbers that statisticians understand the world and how it works. [Read more…] about As a Statistician, Can I Say Age is Just a Number?
Using Data Mining to Select Regression Models Can Create Serious Problems
Data mining and regression seem to go together naturally. Iโve described regression as a seductive analysis because it is so tempting and so easy to add more variablesย in the pursuitย of a larger R-squared. In this post, Iโll begin byย illustrating the problems that data mining creates. To do this, I’ll show how data mining with regression analysis can take randomly generated data and produce a misleading modelย that appears to haveย significant variables and a good R-squared. Then, Iโll explain how data mining creates theseย deceptive resultsย and how to avoid them. [Read more…] about Using Data Mining to Select Regression Models Can Create Serious Problems
Five Reasons Why Your R-squared can be Too High
When your regression model has a high R-squared, you assume itโs a good thing because it measures goodness-of-fit. You want a high R-squared, right? However, as Iโll show in this post, a high R-squared can occasionally indicate that there is a problem with your model. Iโll explain five reasons why your R-squared can be too high and how to determine whether one of them affects your regression model. [Read more…] about Five Reasons Why Your R-squared can be Too High
Five P Value Tips to Avoid Being Fooled by False Positives and other Misleading Hypothesis Test Results
Despite the popular notion to the contrary, understanding the results of your statistical hypothesis test is not as simple as determining only whether your P value is less than your significance level. In this post, I present additional considerations that help you assess and minimize the possibility of being fooled by false positives and other misleading results. [Read more…] about Five P Value Tips to Avoid Being Fooled by False Positives and other Misleading Hypothesis Test Results
Overfitting Regression Models: Problems, Detection, and Avoidance
Overfitting a model is a condition where a statistical model begins to describe the random error in the data rather than the relationships between variables. This problem occurs when the model is too complex. In regression analysis, overfitting can produce misleading R-squared values, regression coefficients, and p-values. In this post, I explain how overfitting models is a problem andย how you canย identify and avoid it. [Read more…] about Overfitting Regression Models: Problems, Detection, and Avoidance
World Travel, Rough Roads, and Manually Adjusting Graph Scales!
As my family and I were being rattled around in a four-wheel drive vehicle in the remote Osa Peninsula in Costa Rica, it struck me that traveling to exotic locations is just like manually adjusting the scales on graphs! Thatโs probably not what you were expecting, but let me explain! Unlike most of my statistical blog posts, this one gets a bit philosophical! [Read more…] about World Travel, Rough Roads, and Manually Adjusting Graph Scales!
Guide to Stepwise Regression and Best Subsets Regression
Automatic variable selection procedures are algorithms that pick the variables to include in your regression model. Stepwise regression and Best Subsets regression are two of the more common variable selection methods. In this post, I compare how these methods work and which one provides better results. [Read more…] about Guide to Stepwise Regression and Best Subsets Regression