Automatic variable selection procedures are algorithms that pick the variables to include in your regression model. Stepwise regression and Best Subsets regression are two of the more common variable selection methods. In this post, I compare how these methods work and which one provides better results. [Read more…] about Guide to Stepwise Regression and Best Subsets Regression
Discrete probability distributions are based on discrete variables, which have a finite or countable number of values. In this post, I show you how to perform goodness-of-fit tests to determine how well your data fit various discrete probability distributions. [Read more…] about Goodness-of-Fit Tests for Discrete Distributions
Does your regression model have a low R-squared? That seems like a problem—but it might not be. Learn what a low R-squared does and does not mean for your model. [Read more…] about How to Interpret Regression Models that have Significant Variables but a Low R-squared
How high does R-squared need to be in regression analysis? That seems to be an eternal question. [Read more…] about How High Does R-squared Need to Be?
In my house, we love the Mythbusters TV show on the Discovery Channel. The Mythbusters conduct scientific investigations in their quest to test myths and urban legends. In the process, the show provides some fun examples of when and how you should use statistical hypothesis tests to analyze data. [Read more…] about Examples of Hypothesis Tests: Busting Myths about the Battle of the Sexes
If you were able to make predictions about something important to you, you’d probably love that, right? It’s even better if you know that your predictions are sound. In this post, I show how to use regression analysis to make predictions and determine whether they are both unbiased and precise. [Read more…] about Making Predictions with Regression Analysis
You’re probably familiar with data that follow the normal distribution. The normal distribution is that nice, familiar bell-shaped curve. Unfortunately, not all data are normally distributed or as intuitive to understand. You can picture the symmetric normal distribution, but what about the Weibull or Gamma distributions? This uncertainty might leave you feeling unsettled. In this post, I show you how to identify the probability distribution of your data. [Read more…] about How to Identify the Distribution of Your Data
In regression analysis, curve fitting is the process of specifying the model that provides the best fit to the specific curves in your dataset. Curved relationships between variables are not as straightforward to fit and interpret as linear relationships. [Read more…] about Curve Fitting using Linear and Nonlinear Regression
What do P values mean? P values tell you whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!
Ironically, despite being so influential, P values are misinterpreted very frequently. What is the correct interpretation of P values? What do P values really mean? That’s the topic of this post! [Read more…] about How to Interpret P values Correctly
R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale. [Read more…] about How To Interpret R-squared in Regression Analysis
Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population. In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.
You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics? [Read more…] about How Hypothesis Tests Work: Significance Levels (Alpha) and P values
P-values and coefficients in regression analysis work together to tell you which relationships in your model are statistically significant and the nature of those relationships. The coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p-values for the coefficients indicate whether these relationships are statistically significant. [Read more…] about How to Interpret P-values and Coefficients in Regression Analysis
Nonparametric tests don’t require that your data follow the normal distribution. They’re also known as distribution-free tests and can provide benefits in certain situations. Typically, people who perform statistical hypothesis tests are more comfortable with parametric tests than nonparametric tests.
You’ve probably heard it’s best to use nonparametric tests if your data are not normally distributed—or something along these lines. That seems like an easy way to choose, but there’s more to the decision than that. [Read more…] about Nonparametric Tests vs. Parametric Tests
A confidence interval is calculated from a sample and provides a range of values that likely contains the unknown value of a population parameter. In this post, I demonstrate how confidence intervals and confidence levels work using graphs and concepts instead of formulas. In the process, you’ll see how confidence intervals are very similar to P values and significance levels. [Read more…] about How Hypothesis Tests Work: Confidence Intervals and Confidence Levels
Nonlinear regression is an extremely flexible analysis that can fit most any curve that is present in your data. R-squared seems like a very intuitive way to assess the goodness-of-fit for a regression model. Unfortunately, the two just don’t go together. R-squared is invalid for nonlinear regression. [Read more…] about R-squared Is Not Valid for Nonlinear Regression
R-squared tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model. [Read more…] about How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis
T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.
As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How t-Tests Work: t-Values, t-Distributions, and Probabilities
T-tests are statistical hypothesis tests that analyze one or two sample means. When you analyze your data with any t-test, the procedure reduces your entire sample to a single value, the t-value. In this post, I describe how each type of t-test calculates the t-value. I don’t explain this just so you can understand the calculation, but I describe it in a way that really helps you grasp how t-tests work. [Read more…] about How t-Tests Work: 1-sample, 2-sample, and Paired t-Tests
The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the meaning of the y-intercept in your regression output. [Read more…] about How to Interpret the Constant (Y Intercept) in Regression Analysis
Analysis of variance (ANOVA) uses F-tests to statistically assess the equality of means when you have three or more groups. In this post, I’ll answer several common questions about the F-test.
- How do F-tests work?
- Why do we analyze variances to test means?
I’ll use concepts and graphs to answer these questions about F-tests in the context of a one-way ANOVA example. I’ll use the same approach that I use to explain how t-tests work. If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How F-tests work in Analysis of Variance (ANOVA)