In my house, we love the Mythbusters TV show on the Discovery Channel. The Mythbusters conduct scientific investigations in their quest to test myths and urban legends. In the process, the show provides some fun examples of when and how you should use statistical hypothesis tests to analyze data. [Read more…] about Examples of Hypothesis Tests: Busting Myths about the Battle of the Sexes

# Blog

## Making Predictions with Regression Analysis

If you were able to make predictions about something important to you, you’d probably love that, right? It’s even better if you know that your predictions are sound. In this post, I show how to use regression analysis to make predictions and determine whether they are both unbiased and precise. [Read more…] about Making Predictions with Regression Analysis

## How to Identify the Distribution of Your Data

You’re probably familiar with data that follow the normal distribution. The normal distribution is that nice, familiar bell-shaped curve. Unfortunately, not all data are normally distributed or as intuitive to understand. You can picture the symmetric normal distribution, but what about the Weibull or Gamma distributions? This uncertainty might leave you feeling unsettled. In this post, I show you how to identify the distribution of your data. [Read more…] about How to Identify the Distribution of Your Data

## Curve Fitting using Linear and Nonlinear Regression

In regression analysis, curve fitting is the process of specifying the model that provides the best fit to the specific curves in your dataset. Curved relationships between variables are not as straightforward to fit and interpret as linear relationships. [Read more…] about Curve Fitting using Linear and Nonlinear Regression

## How to Interpret P values Correctly

What do P values mean? P values tell you whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!

Ironically, despite being so influential, P values are misinterpreted very frequently. What *is* the correct interpretation of P values? What do P values *really* mean? That’s the topic of this post! [Read more…] about How to Interpret P values Correctly

## How To Interpret R-squared in Regression Analysis

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale. [Read more…] about How To Interpret R-squared in Regression Analysis

## How Hypothesis Tests Work: Significance Levels (Alpha) and P values

Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population. In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.

You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics? [Read more…] about How Hypothesis Tests Work: Significance Levels (Alpha) and P values

## How to Interpret P-values and Coefficients in Regression Analysis

P-values and coefficients in regression analysis work together to tell you which relationships in your model are statistically significant and the nature of those relationships. The coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p-values for the coefficients indicate whether these relationships are statistically significant. [Read more…] about How to Interpret P-values and Coefficients in Regression Analysis

## Nonparametric Tests vs. Parametric Tests

Nonparametric tests don’t require that your data follow the normal distribution. They’re also known as distribution-free tests and can provide benefits in certain situations. Typically, people who perform statistical hypothesis tests are more comfortable with parametric tests than nonparametric tests.

You’ve probably heard it’s best to use nonparametric tests if your data are not normally distributed—or something along these lines. That seems like an easy way to choose, but there’s more to the decision than that. [Read more…] about Nonparametric Tests vs. Parametric Tests

## How Hypothesis Tests Work: Confidence Intervals and Confidence Levels

A confidence interval is calculated from a sample and provides a range of values that likely contains the unknown value of a population parameter. In this post, I demonstrate how confidence intervals and confidence levels work using graphs and concepts instead of formulas. In the process, you’ll see how confidence intervals are very similar to P values and significance levels. [Read more…] about How Hypothesis Tests Work: Confidence Intervals and Confidence Levels

## R-squared Is Not Valid for Nonlinear Regression

Nonlinear regression is an extremely flexible analysis that can fit most any curve that is present in your data. R-squared seems like a very intuitive way to assess the goodness-of-fit for a regression model. Unfortunately, the two just don’t go together. R-squared is invalid for nonlinear regression. [Read more…] about R-squared Is Not Valid for Nonlinear Regression

## How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis

R-squared tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model. [Read more…] about How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis

## How t-Tests Work: t-Values, t-Distributions, and Probabilities

T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.

As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How t-Tests Work: t-Values, t-Distributions, and Probabilities

## How t-Tests Work: 1-sample, 2-sample, and Paired t-Tests

T-tests are statistical hypothesis tests that analyze one or two sample means. When you analyze your data with any t-test, the procedure reduces your entire sample to a single value, the t-value. In this post, I describe how each type of t-test calculates the t-value. I don’t explain this just so you can understand the calculation, but I describe it in a way that really helps you grasp how t-tests work. [Read more…] about How t-Tests Work: 1-sample, 2-sample, and Paired t-Tests

## How to Interpret the Constant (Y Intercept) in Regression Analysis

The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the *meaning* of the y-intercept in your regression output. [Read more…] about How to Interpret the Constant (Y Intercept) in Regression Analysis

## How F-tests work in Analysis of Variance (ANOVA)

Analysis of variance (ANOVA) uses F-tests to statistically assess the equality of means when you have three or more groups. In this post, I’ll answer several common questions about the F-test.

- How do F-tests work?
- Why do we analyze
*variances*to test*means*?

I’ll use concepts and graphs to answer these questions about F-tests in the context of a one-way ANOVA example. I’ll use the same approach that I use to explain how t-tests work. If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How F-tests work in Analysis of Variance (ANOVA)

## Check Your Residual Plots to Ensure Trustworthy Regression Results!

Residual plots display the residual values on the y-axis and fitted values, or another variable, on the x-axis. After you fit a regression model, it is crucial to check the residual plots. If your plots display unwanted patterns, you can’t trust the regression coefficients and other numeric results. In this post, I explain the conceptual reasons why residual plots help ensure that your regression model is valid. I’ll also show you what to look for and how to fix the problems. [Read more…] about Check Your Residual Plots to Ensure Trustworthy Regression Results!

## How to Interpret the F-test of Overall Significance in Regression Analysis

The F-test of overall significance indicates whether your linear regression model provides a better fit to the data than a model that contains no independent variables. In this post, I look at how the F-test of overall significance fits in with other regression statistics, such as R-squared. R-squared tells you how well your model fits the data, and the F-test is related to it. [Read more…] about How to Interpret the F-test of Overall Significance in Regression Analysis

## Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

Multicollinearity occurs when independent variables in a regression model are correlated. This correlation is a problem because independent variables should be *independent*. If the degree of correlation between variables is high enough, it can cause problems when you fit the model and interpret the results. [Read more…] about Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

## Benefits of Welch’s ANOVA Compared to the Classic One-Way ANOVA

Welch’s ANOVA is an alternative to the traditional analysis of variance (ANOVA) and it offers some serious benefits. One-way ANOVA determines whether differences between the means of at least three groups are statistically significant. For decades, introductory statistics classes have taught the classic Fishers one-way ANOVA that uses the F-test. It’s a standard statistical analysis, and you might think it’s pretty much set in stone by now. Surprise, there’s a significant change occurring in the world of one-way ANOVA! [Read more…] about Benefits of Welch’s ANOVA Compared to the Classic One-Way ANOVA