In a previous blog post, I introduced the basic concepts of hypothesis testing and explained the need for performing these tests. In this post, I’ll build on that and compare various types of hypothesis tests that you can use with different types of data, explore some of the options, and explain how to interpret the results. Along the way, I’ll point out important planning considerations, related analyses, and pitfalls to avoid. [Read more…] about Comparing Hypothesis Tests for Continuous, Binary, and Count Data

# Hypothesis Testing

## Statistical Hypothesis Testing Overview

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.

This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide.

[Read more…] about Statistical Hypothesis Testing Overview

## Flu Shots, How Effective Are They?

With the arrival of Fall in the Northern hemisphere, it’s flu season again.

Do you debate getting a flu shot every year? I do get flu shots every year. I realize that they’re not perfect, but I figure they’re a low-cost way to reduce my chances of a crummy week suffering from the flu.

The media report that flu shots have an effectiveness of approximately 68%. But what does that mean exactly? What is the absolute reduction in risk? Are there long-term benefits?

In this blog post, I explore the effectiveness of flu shots from a statistical viewpoint. We’ll statistically analyze the data ourselves to go beyond the simplified accounts that the media presents. I’ll also model the long-term outcomes you can expect with regular flu vaccinations. By the time you finish this post, you’ll have a crystal clear picture of flu shot effectiveness. Some of the results surprised me! [Read more…] about Flu Shots, How Effective Are They?

## Degrees of Freedom in Statistics

## What are Degrees of Freedom?

The degrees of freedom (DF) in statistics indicate the number of independent values that can vary in an analysis without breaking any constraints. It is an essential idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and linear regression. Learn how this fundamental concept affects the power and precision of your analysis!

In this post, I bring this concept to life in an intuitive manner. You’ll learn the degrees of freedom definition and know how to find degrees of freedom for various analyses, such as linear regression, t-tests, and chi-square. I’ll start by defining degrees of freedom and providing the formula. However, I’ll quickly move on to practical examples in the context of various statistical analyses because they make this concept easier to understand.

[Read more…] about Degrees of Freedom in Statistics

## Use Control Charts with Hypothesis Tests

Typically, quality improvement analysts use control charts to assess business processes and don’t have hypothesis tests in mind. Do you know how control charts provide tremendous benefits in other settings and with hypothesis testing? Spoilers—control charts check an assumption that we often forget about for hypothesis tests! [Read more…] about Use Control Charts with Hypothesis Tests

## What is the Relationship Between the Reproducibility of Experimental Results and P Values?

The ability to reproduce experimental results should be related to P values. After all, both of these statistical concepts have similar foundations.

- P values help you separate the signal of population level effects from the noise in sample data.
- Reproducible results support the notion that the findings can be generalized to the population rather than applying only to a specific sample.

So, P values are related to reproducibility in theory. But, does this relationship exist in the real world? In this blog post, I present the findings of an exciting study that answers this question! [Read more…] about What is the Relationship Between the Reproducibility of Experimental Results and P Values?

## Why Are P Values Misinterpreted So Frequently?

P values are commonly misinterpreted. It’s a very slippery concept that requires a lot of background knowledge to understand. Not surprisingly, I’ve received many questions about P values in statistical hypothesis testing over the years. However, one question stands out. *Why* are P value misinterpretations so prevalent? I answer that question in this blog post, and help you avoid making the same mistakes. [Read more…] about Why Are P Values Misinterpreted So Frequently?

## Confidence Intervals vs Prediction Intervals vs Tolerance Intervals

Intervals are estimation methods in statistics that use sample data to produce ranges of values that are likely to contain the population value of interest. In contrast, point estimates are single value estimates of a population value. Of the different types of statistical intervals, confidence intervals are the most well-known. However, certain kinds of analyses and situations call for other types of ranges that provide different information. [Read more…] about Confidence Intervals vs Prediction Intervals vs Tolerance Intervals

## Five P Value Tips to Avoid Being Fooled by False Positives and other Misleading Hypothesis Test Results

Despite the popular notion to the contrary, understanding the results of your statistical hypothesis test is not as simple as determining only whether your P value is less than your significance level. In this post, I present additional considerations that help you assess and minimize the possibility of being fooled by false positives and other misleading results. [Read more…] about Five P Value Tips to Avoid Being Fooled by False Positives and other Misleading Hypothesis Test Results

## Goodness-of-Fit Tests for Discrete Distributions

Discrete probability distributions are based on discrete variables, which have a finite or countable number of values. In this post, I show you how to perform goodness-of-fit tests to determine how well your data fit various discrete probability distributions. [Read more…] about Goodness-of-Fit Tests for Discrete Distributions

## Examples of Hypothesis Tests: Busting Myths about the Battle of the Sexes

In my house, we love the Mythbusters TV show on the Discovery Channel. The Mythbusters conduct scientific investigations in their quest to test myths and urban legends. In the process, the show provides some fun examples of when and how you should use statistical hypothesis tests to analyze data. [Read more…] about Examples of Hypothesis Tests: Busting Myths about the Battle of the Sexes

## How to Identify the Distribution of Your Data

You’re probably familiar with data that follow the normal distribution. The normal distribution is that nice, familiar bell-shaped curve. Unfortunately, not all data are normally distributed or as intuitive to understand. You can picture the symmetric normal distribution, but what about the Weibull or Gamma distributions? This uncertainty might leave you feeling unsettled. In this post, I show you how to identify the probability distribution of your data. [Read more…] about How to Identify the Distribution of Your Data

## Interpreting P values

P values determine whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!

Ironically, despite being so influential, P values are misinterpreted very frequently. What *is* the correct interpretation of P values? What do P values *really* mean? That’s the topic of this post! [Read more…] about Interpreting P values

## How Hypothesis Tests Work: Significance Levels (Alpha) and P values

Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population. In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.

You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics? [Read more…] about How Hypothesis Tests Work: Significance Levels (Alpha) and P values

## Nonparametric Tests vs. Parametric Tests

Nonparametric tests don’t require that your data follow the normal distribution. They’re also known as distribution-free tests and can provide benefits in certain situations. Typically, people who perform statistical hypothesis tests are more comfortable with parametric tests than nonparametric tests.

You’ve probably heard it’s best to use nonparametric tests if your data are not normally distributed—or something along these lines. That seems like an easy way to choose, but there’s more to the decision than that. [Read more…] about Nonparametric Tests vs. Parametric Tests

## Hypothesis Testing and Confidence Intervals

Confidence intervals and hypothesis testing are closely related because both methods use the same underlying methodology. Additionally, there is a close connection between significance levels and confidence levels. Indeed, there is such a strong link between them that hypothesis tests and the corresponding confidence intervals always agree about statistical significance.

A confidence interval is calculated from a sample and provides a range of values that likely contains the unknown value of a population parameter. To learn more about confidence intervals in general, how to interpret them, and how to calculate them, read my post about Understanding Confidence Intervals.

In this post, I demonstrate how confidence intervals work using graphs and concepts instead of formulas. In the process, I compare and contrast significance and confidence levels. You’ll learn how confidence intervals are similar to significance levels in hypothesis testing. You can even use confidence intervals to determine statistical significance.

Read the companion post for this one: How Hypothesis Tests Work: Significance Levels (Alpha) and P-values. In that post, I use the same graphical approach to illustrate why we need hypothesis tests, how significance levels and P-values can determine whether a result is statistically significant, and what that actually means.

**Significance Level vs. Confidence Level**

Let’s delve into how confidence intervals incorporate the margin of error. Like the previous post, I’ll use the same type of sampling distribution that showed us how hypothesis tests work. This sampling distribution is based on the t-distribution, our sample size, and the variability in our sample. Download the CSV data file: FuelsCosts.

There are two critical differences between the sampling distribution graphs for significance levels and confidence intervals–the value that the distribution centers on and the portion we shade.

The significance level chart centers on the null value, and we shade the outside 5% of the distribution.

Conversely, the confidence interval graph centers on the sample mean, and we shade the center 95% of the distribution.

The shaded range of sample means [267 394] covers 95% of this sampling distribution. This range is the 95% confidence interval for our sample data. We can be 95% confident that the population mean for fuel costs fall between 267 and 394.

## Confidence Intervals and the Inherent Uncertainty of Using Sample Data

The graph emphasizes the role of uncertainty around the point estimate. This graph centers on our sample mean. If the population mean equals our sample mean, random samples from this population (N=25) will fall within this range 95% of the time.

We don’t know whether our sample mean is near the population mean. However, we know that the sample mean is an unbiased estimate of the population mean. An unbiased estimate does not tend to be too high or too low. It’s correct on average. Confidence intervals are correct on average because they use sample estimates that are correct on average. Given what we know, the sample mean is the most likely value for the population mean.

Given the sampling distribution, it would not be unusual for other random samples drawn from the same population to have means that fall within the shaded area. In other words, given that we did, in fact, obtain the sample mean of 330.6, it would not be surprising to get other sample means within the shaded range.

If these other sample means would not be unusual, we must conclude that these other values are also plausible candidates for the population mean. There is inherent uncertainty when using sample data to make inferences about the entire population. Confidence intervals help gauge the degree of uncertainty, also known as the margin of error.

**Related post**: Sampling Distributions

**Confidence Intervals and Statistical Significance**

If you want to determine whether your hypothesis test results are statistically significant, you can use either P-values with significance levels or confidence intervals. These two approaches always agree.

The relationship between the confidence level and the significance level for a hypothesis test is as follows:

Confidence level = 1 – Significance level (alpha)

For example, if your significance level is 0.05, the equivalent confidence level is 95%.

Both of the following conditions represent statistically significant results:

- The P-value in a hypothesis test is smaller than the significance level.
- The confidence interval excludes the null hypothesis value.

Further, it is always true that when the P-value is less than your significance level, the interval excludes the value of the null hypothesis.

In the fuel cost example, our hypothesis test results are statistically significant because the P-value (0.03112) is less than the significance level (0.05). Likewise, the 95% confidence interval [267 394] excludes the null hypotheses value (260). Using either method, we draw the same conclusion.

**Hypothesis Testing and Confidence Intervals Always Agree**

The hypothesis testing and confidence interval results always agree. To understand the basis of this agreement, remember how confidence levels and significance levels function:

- A confidence level determines the distance between the sample mean and the confidence limits.
- A significance level determines the distance between the null hypothesis value and the critical regions.

Both of these concepts specify a distance from the mean to a limit. Surprise! These distances are precisely the same length.

A 1-sample t-test calculates this distance as follows:

The critical t-value * standard error of the mean

Interpreting these statistics goes beyond the scope of this article. But, using this equation, the distance for our fuel cost example is $63.57.

**P-value and significance level approach**: If the sample mean is more than $63.57 from the null hypothesis mean, the sample mean falls within the critical region, and the difference is statistically significant.

**Confidence interval approach**: If the null hypothesis mean is more than $63.57 from the sample mean, the interval does not contain this value, and the difference is statistically significant.

Of course, they always agree!

The two approaches always agree as long as the same hypothesis test generates the P-values and confidence intervals and uses equivalent confidence levels and significance levels.

**Related posts**: Standard Error of the Mean and Critical Values

## I Really Like Confidence Intervals!

In statistics, analysts often emphasize using hypothesis tests to determine statistical significance. Unfortunately, a statistically significant effect might not always be practically meaningful. For example, a significant effect can be too small to be important in the real world. Confidence intervals help you navigate this issue!

Similarly, the margin of error in a survey tells you how near you can expect the survey results to be to the correct population value.

Learn more about this distinction in my post about Practical vs. Statistical Significance.

Learn how to use confidence intervals to compare group means!

Finally, learn about bootstrapping in statistics to see an alternative to traditional confidence intervals that do not use probability distributions and test statistics. In that post, I create bootstrapped confidence intervals.

## Reference

Neyman, J. (1937). Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability. *Philosophical Transactions of the Royal Society A*. **236** (767): 333–380.

## How t-Tests Work: t-Values, t-Distributions, and Probabilities

T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.

As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How t-Tests Work: t-Values, t-Distributions, and Probabilities

## How t-Tests Work: 1-sample, 2-sample, and Paired t-Tests

T-tests are statistical hypothesis tests that analyze one or two sample means. When you analyze your data with any t-test, the procedure reduces your entire sample to a single value, the t-value. In this post, I describe how each type of t-test calculates the t-value. I don’t explain this just so you can understand the calculation, but I describe it in a way that really helps you grasp how t-tests work. [Read more…] about How t-Tests Work: 1-sample, 2-sample, and Paired t-Tests

## How to Analyze Likert Scale Data

How do you analyze Likert scale data? Likert scales are the most broadly used method for scaling responses in survey studies. Survey questions that ask you to indicate your level of agreement, from strongly agree to strongly disagree, use the Likert scale. The data in the worksheet are five-point Likert scale data for two groups [Read more…] about How to Analyze Likert Scale Data

## Chi-Square Test of Independence and an Example

The Chi-square test of independence determines whether there is a statistically significant relationship between categorical variables. It is a hypothesis test that answers the question—do the values of one categorical variable depend on the value of other categorical variables? This test is also known as the chi-square test of association.

[Read more…] about Chi-Square Test of Independence and an Example