What is Hypothesis Testing?
Hypothesis testing in statistics uses sample data to infer the properties of a whole population. These tests determine whether a random sample provides sufficient evidence to conclude an effect or relationship exists in the population. Researchers use them to help separate genuine population-level effects from false effects that random chance can create in samples. These methods are also known as significance testing.
Hypothesis tests are vital statistical analysis tools that evaluate the validity of new theories by comparing them to empirical data. They provide a structured approach to decision-making, emphasizing data-driven insights over personal biases or subjective opinions. This method allows researchers to determine if their data supports their hypotheses, helping to prevent inaccurate claims and conclusions.
For example, researchers are testing a new medication to see if it lowers blood pressure. They compare a group taking the drug to a control group taking a placebo. If their hypothesis test results are statistically significant, the medication’s effect of lowering blood pressure likely exists in the broader population, not just the sample studied.
Using Hypothesis Tests
A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement the sample data best supports. These two statements are called the null hypothesis and the alternative hypothesis. The following are typical examples:
- Null Hypothesis: The effect does not exist in the population.
- Alternative Hypothesis: The effect does exist in the population.
Hypothesis testing accounts for the inherent uncertainty of using a sample to draw conclusions about a population, which reduces the chances of false discoveries. These procedures determine whether the sample data are sufficiently inconsistent with the null hypothesis that you can reject it. If you can reject the null, your data favor the alternative statement that an effect exists in the population.
Statistical significance in hypothesis testing indicates that an effect you see in sample data also likely exists in the population after accounting for random sampling error, variability, and sample size. Your results are statistically significant when the p-value is less than your significance level or, equivalently, when your confidence interval excludes the null hypothesis value.
Conversely, non-significant results indicate that despite an apparent sample effect, you can’t be sure it exists in the population. It could be chance variation in the sample and not a genuine effect.
Learn more about Failing to Reject the Null.
5 Steps of Significance Testing
Hypothesis testing involves five key steps, each critical to validating a research hypothesis using statistical methods:
- Formulate the Hypotheses: Write your research hypotheses as a null hypothesis (H0) and an alternative hypothesis (HA).
- Data Collection: Gather data specifically aimed at testing the hypothesis.
- Conduct A Test: Use a suitable statistical test to analyze your data.
- Make a Decision: Based on the statistical test results, decide whether to reject the null hypothesis or fail to reject it.
- Report the Results: Summarize and present the outcomes in your report’s results and discussion sections.
While the specifics of these steps can vary depending on the research context and the data type, the fundamental process of hypothesis testing remains consistent across different studies.
Let’s work through these steps in an example!
Hypothesis Testing Example
Researchers want to determine if a new educational program improves student performance on standardized tests. They randomly assign 30 students to a control group, which follows the standard curriculum, and another 30 students to a treatment group, which participates in the new educational program. After a semester, they compare the test scores of both groups.
Download the CSV data file to perform the hypothesis testing yourself: Hypothesis_Testing.
The researchers write their hypotheses. These statements apply to the population, so they use the mu (μ) symbol for the population mean parameter.
- Null Hypothesis (H0): The population means of the test scores for the two groups are equal (μ1= μ2).
- Alternative Hypothesis (HA): The population means of the test scores for the two groups are unequal (μ1≠ μ2).
Choosing the correct hypothesis test depends on attributes such as data type and number of groups. Because they’re using continuous data and comparing two means, the researchers use a 2-sample t-test.
Here are the results.
The treatment group’s mean is 58.70, compared to the control group’s mean of 48.12. The mean difference is 10.67 points. Use the test’s p-value and significance level to determine whether this difference is likely a product of random fluctuation in the sample or a genuine population effect.
Because the p-value (0.000) is less than the standard significance level of 0.05, the results are statistically significant, and we can reject the null hypothesis. The sample data provides sufficient evidence to conclude that the new program’s effect exists in the population.
Limitations
Hypothesis testing improves your effectiveness in making data-driven decisions. However, it is not 100% accurate because random samples occasionally produce fluky results. Hypothesis tests have two types of errors, both relating to drawing incorrect conclusions.
- Type I error: The test rejects a true null hypothesis—a false positive.
- Type II error: The test fails to reject a false null hypothesis—a false negative.
Learn more about Type I and Type II Errors.
Our exploration of hypothesis testing using a practical example of an educational program reveals its powerful ability to guide decisions based on statistical evidence. Whether you’re a student, researcher, or professional, understanding and applying these procedures can open new doors to discovering insights and making informed decisions. Let this tool empower your analytical endeavors as you navigate through the vast seas of data.
Learn more about the Hypothesis Tests for Various Data Types.
Marty Shudak says
Thank you, Jim, for another helpful article; timely too since I have started reading your new book on hypothesis testing and, now that we are at the end of the school year, my district is asking me to perform a number of evaluations on instructional programs. This is where my question/concern comes in. You mention that hypothesis testing is all about testing samples. However, I use all the students in my district when I make these comparisons. Since I am using the entire “population” in my evaluations (I don’t select a sample of third grade students, for example, but I use all 700 third graders), am I somehow misusing the tests? Or can I rest assured that my district’s student population is only a sample of the universal population of students?
Jim Frost says
Hi Marty,
I hope you are finding the book helpful!
Yes, the purpose of hypothesis testing is to infer the properties of a population while accounting for random sampling error.
In your case, it comes down to how you want to use the results. Who do you want the results to apply to?
If you’re summarizing the sample, looking for trends and patterns, or evaluating those students and don’t plan to apply those results to other students, you don’t need hypothesis testing because there is no sampling error. They are the population and you can just use descriptive statistics. In this case, you’d only need to focus on the practical significance of the effect sizes.
On the other hand, if you want to apply the results from this group to other students, you’ll need hypothesis testing. However, there is the complicating issue of what population your sample of students represent. I’m sure your district has its own unique characteristics, demographics, etc. Your district’s students probably don’t adequately represent a universal population. At the very least, you’d need to recognize any special attributes of your district and how they could bias the results when trying to apply them outside the district. Or they might apply to similar districts in your region.
However, I’d imagine your 3rd graders probably adequately represent future classes of 3rd graders in your district. You need to be alert to changing demographics. At least in the short run I’d imagine they’d be representative of future classes.
Think about how these results will be used. Do they just apply to the students you measured? Then you don’t need hypothesis tests. However, if the results are being used to infer things about other students outside of the sample, you’ll need hypothesis testing along with considering how well your students represent the other students and how they differ.
I hope that helps!
Marty Shudak says
Thank you so much, Jim, for the suggestions in terms of what I need to think about and consider! You are always so clear in your explanations!!!!
Jim Frost says
You’re very welcome! Best of luck with your evaluations!