In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.
This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide.
Why You Should Perform Statistical Hypothesis Testing
Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. You gain tremendous benefits by working with a sample. In most cases, it is simply impossible to observe the entire population to understand its properties. The only alternative is to collect a random sample and then use statistics to analyze it.
While samples are much more practical and less expensive to work with, there are trade-offs. When you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly. For instance, your sample mean is unlikely to equal the population mean. The difference between the sample statistic and the population value is the sample error.
Differences that researchers observe in samples might be due to sampling error rather than representing a true effect at the population level. If sampling error causes the observed difference, the next time someone performs the same experiment the results might be different. Hypothesis testing incorporates estimates of the sampling error to help you make the correct decision. Learn more about Sampling Error.
For example, if you are studying the proportion of defects produced by two manufacturing methods, any difference you observe between the two sample proportions might be sample error rather than a true difference. If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics. That can be a costly mistake!
Let’s cover some basic hypothesis testing terms that you need to know.
Hypothesis testing is a statistical analysis that uses sample data to assess two mutually exclusive theories about the properties of a population. Statisticians call these theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your sample statistic and factors in an estimate of the sample error to determine which hypothesis the data support.
When you can reject the null hypothesis, the results are statistically significant, and your data support the theory that an effect exists at the population level.
The effect is the difference between the population value and the null hypothesis value. The effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect.
Typically, you do not know the size of the actual effect. However, you can use a hypothesis test to help you determine whether an effect exists and to estimate its size. Hypothesis tests convert your sample effect into a test statistic, which it evaluates for statistical significance. Learn more about Test Statistics.
An effect can be statistically significant, but that doesn’t necessarily indicate that it is important in a real-world, practical sense. For more information, read my post about Statistical vs. Practical Significance.
The null hypothesis is one of two mutually exclusive theories about the properties of the population in hypothesis testing. Typically, the null hypothesis states that there is no effect (i.e., the effect size equals zero). The null is often signified by H0.
In all hypothesis testing, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, the proportion of defect in a manufacturing process, and so on. There is some benefit or difference that the researchers hope to identify.
However, it’s possible that there is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore, if you can reject the null, you can favor the alternative hypothesis, which states that the effect exists (doesn’t equal zero) at the population level.
You can think of the null as the default theory that requires sufficiently strong evidence against in order to reject it.
For example, in a 2-sample t-test, the null often states that the difference between the two means equals zero.
When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning.
Related post: Understanding the Null Hypothesis in More Detail
The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect. If your sample contains sufficient evidence, you can reject the null and favor the alternative hypothesis. The alternative is often identified with H1 or HA.
For example, in a 2-sample t-test, the alternative often states that the difference between the two means does not equal zero.
You can specify either a one- or two-tailed alternative hypothesis:
If you perform a two-tailed hypothesis test, the alternative states that the population parameter does not equal the null value. For example, when the alternative hypothesis is HA: μ ≠ 0, the test can detect differences both greater than and less than the null value.
A one-tailed alternative has more power to detect an effect but it can test for a difference in only one direction. For example, HA: μ > 0 can only test for differences that are greater than zero.
Related posts: Understanding T-tests and One-Tailed and Two-Tailed Hypothesis Tests Explained
P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null. You use P-values in conjunction with the significance level to determine whether your data favor the null or alternative hypothesis.
Related post: Interpreting P-values Correctly
Significance Level (Alpha)
The significance level, also known as alpha or α, is an evidentiary standard that researchers set before the study. It specifies how strongly the sample evidence must contradict the null hypothesis before you can reject the null for the entire population. This standard is defined by the probability of rejecting a null hypothesis that is true. In other words, it is the probability that you say there is an effect when there is no effect. Lower significance levels indicate that you require stronger evidence before you will reject the null.
For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.
Use p-values and significance levels together to help you determine which hypothesis the data support. If the p-value is less than your significance level, you can reject the null and conclude that the effect is statistically significant. In other words, the evidence in your sample is strong enough to be able to reject the null hypothesis at the population level.
Types of Errors in Hypothesis Testing
Statistical hypothesis tests are not 100% accurate because they use a random sample to draw conclusions about entire populations. There are two types of errors related to drawing an incorrect conclusion.
- False positives: You reject a null that is true. Statisticians call this a Type I error. The Type I error rate equals your significance level or alpha (α).
- False negatives: You fail to reject a null that is false. Statisticians call this a Type II error. Generally, you do not know the Type II error rate. However, it is a larger risk when you have a small sample size, noisy data, or a small effect size. The type II error rate is also known as beta (β).
Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. Learn more about Power in Statistics.
Which Type of Hypothesis Test is Right for You?
There are many different types of procedures you can use. The correct choice depends on your research goals and the data you collect. Do you need to understand the mean or the differences between means? Or, perhaps you need to assess proportions. You can even use hypothesis testing to determine whether the relationships between variables are statistically significant.
To choose the proper statistical procedure, you’ll need to assess your study objectives and collect the correct type of data. This background research is necessary before you begin a study.
Related Post: Hypothesis Tests for Continuous, Binary, and Count Data
Statistical tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and p-values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.
To see an alternative approach to these traditional hypothesis testing methods, learn about bootstrapping in statistics!
If you want to see examples of hypothesis testing in action, I recommend the following posts that I have written:
- How Effective Are Flu Shots? This example shows how you can use statistics to test proportions.
- Fatality Rates in Star Trek. This example shows how to use hypothesis testing with categorical data.
- Busting Myths About the Battle of the Sexes. A fun example based on a Mythbusters episode that assess continuous data using several different tests.
- Are Yawns Contagious? Another fun example inspired by a Mythbusters episode.