What is a Type 2 Error?
A type 2 error (AKA Type II error) occurs when you fail to reject a false null hypothesis in a hypothesis test. In other words, a statistically non-significant test result indicates that a population effect does not exist when it actually does. A type 2 error is a false negative because the effect exists in the population, but the test doesn’t detect it in the sample.
In hypothesis testing, the null hypothesis typically posits the absence of an effect in the population. Therefore, failing to reject the null suggests that the effect does not exist.
By failing to reject a false null hypothesis, you incorrectly conclude that the effect does not exist when it does. Unfortunately, you’re unaware of this error at the time. You’re simply interpreting the results of your hypothesis test.
Type 2 errors can have profound implications. For example, a false negative in medical testing might mean overlooking an effective treatment. Recognizing and controlling these errors is crucial for sound statistical findings.
Related post: Hypothesis Testing Overview
Type 2 Error Example
Let’s illustrate this concept with an example of a type 2 error in practice. For our scenario, we’ll assume the effect exists — a detail typically unknown in real-world situations, hence the need for the study.
Imagine we’re testing a new drug that genuinely is effective. We conduct a study, gather data, and carry out the hypothesis test.
The hypotheses for this study are:
- Null: The drug has no effect in the population.
- Alternative: The drug is effective in the population.
Our analysis yields a p-value of 0.08, above our alpha level of 0.05. The study is not statistically significant. Consequently, we fail to reject the null and conclude the drug is ineffective.
Regrettably, this conclusion is incorrect because the drug is effective. The non-significant results lead us to believe the medication doesn’t work even though it is effective. It’s a false negative. A type 2 error has occurred, and we’re none the wiser!
Learn more about the Null Hypothesis.
Why Do They Occur?
Hypothesis tests employ sample data to make inferences about populations. Using random samples is beneficial as examining entire populations is often impractical.
However, relying on samples can introduce issues, including Type 2 errors. While random samples usually represent the population accurately, they can sometimes give a misleading picture and produce false negatives.
Consider flipping a coin. Occasionally, by sheer chance, you might get fewer heads than expected. Similarly, randomness can yield atypical samples that do not accurately portray the population.
However, unlike Type I errors, which primarily arise from random sampling error, Type 2 errors stem from various factors. These include sampling error but also small effects, small samples, and high data variability.
These conditions make it more difficult for a hypothesis test to use a sample to detect a population effect when one truly exists.
Learn more about Representative Samples and Random Sampling.
Probability of a Type 2 Error
While it’s impossible to identify when studies yield false negative results, we can estimate their rate of occurrence rate. Statisticians denote the probability of making a Type 2 error using the Greek letter beta (β). By designing your study effectively, you minimize the false negative rate.
The Type 2 error rate is the probability of a false negative. Therefore, 1 – β is the probability of correctly detecting an effect that exists. Statisticians call this the power of a hypothesis test. Analysts typically estimate power rather than beta itself.
Unlike Type I errors, you can’t set the Type 2 error rate for your analysis. Instead, analysts estimate the properties of the alternative hypothesis and enter them into statistical software to approximate statistical power. This process is known as power analysis.
A crucial benefit of hypothesis testing is that when the null hypothesis is false because an effect exists in the population, researchers can design a study with a low false negative rate and high statistical power. This process lends credibility to the results because the study has a low probability of producing a false negative.
Related post: What is Power in Statistics?
Minimizing False Negatives
Analysts can’t wholly avoid Type 2 errors, but increasing statistical power can lessen their likelihood. However, augmenting power usually requires spending more time and resources on the study. It’s a matter of balancing false negatives with the resources available for the analysis.
Reduced variability and larger effect sizes can lower the Type 2 error rate. Unfortunately, these aspects are frequently challenging for researchers to control because they are properties inherent to the population under study.
Generally, the aspect researchers can influence the most is sample size, making it the primary factor in regulating false negatives. Keeping all other aspects constant, increasing the sample size leads to a lower Type 2 error rate (β) and, correspondingly, higher statistical power (1 – β). Learn how to calculate the sample size for statistical power.
In hypothesis testing, understanding Type 2 errors is essential. They represent a false negative, where we fail to detect a significant effect that genuinely exists. By thoughtfully designing our studies, we can reduce the risk of these errors and make more informed statistical decisions.
Compare and contrast Type I vs. Type II Errors.
Reference
Acosta, Griselda; Smith, Eric; and Kreinovich, Vladik, “Why Area Under the Curve in Hypothesis Testing?” (2019). Departmental Technical Reports (CS). 1360.
Yusuke says
Hi Jim, I have a question about Type 2 error computation.
In type 2 error computation, why do we compute the standard deviation of the proportion using the value assumed under the null hypothesis even if we provide the alternative hypothesis as possible true value?
Let’s say we have following problem:
An airline claims that 92% of its flights leave on schedule, but an FAA investigator believes the true figure is lower. He decides that 125 flights will be checked at the 5% significance level. What is the probability of a Type II error if the true percentage is 90%?
STD using H0 = √((.92)(.08)/125) = 0.0243. With a = .05 the critical z-score is 1.645, and the critical proportion is .92 – 1.645(0.0243) = .880. If the true proportion of on-schedule flights is .90, then the z-score of .880 is (.880-.90)/0.0243 = 0.82, and B = 5 + .2939 = .7939.
My question is why we use STD based on H0 instead of alternative hypothesis i.e. the true percentage .90? Because if the alternative hypothesis were used to calculate the standard deviation, it would lead to a different value, which would not represent the probability of a type 2 error under the null hypothesis?
Thank you so much for your time!
Jim Frost says
Hi Yusuke,
When calculating the power of a hypothesis test, you primarily use the standard deviation under the null hypothesis to determine the critical values. Then, you use the standard deviation under the alternative hypothesis to calculate the probability that the test statistic falls within the non-rejection region when the alternative hypothesis is true.
I haven’t done the manual calculations myself, but I plugged the values into my statistical software and got a different answer than you. For N = 125, a hypothesized value of 0.9, and a comparison proportion of 0.92, alpha = 0.05, I get a power of 0.09, which equates to a Type 2 error rate of 0.91.
I hope that helps!
Yusuke says
Thank you Jim for detailed explanation 🙏 Really appreciate it.
About β, I got 79.4% using the following R script.
# SE under H0: 0.0242652
# z critical: 1.644854
# critical proportion: 0.8800873
# z for true proportion -0.820628
# Type II error (Beta): 0.7940709
# Define parameters
n <- 125 # sample size
p0 <- 0.92 # assumed proportion under H0
p1 <- 0.90 # true proportion
alpha <- 0.05 # significance level
# Calculate standard error under H0
se0 <- sqrt(p0 * (1 – p0) / n)
cat("SE under H0:", se0, "\n")
# Find critical z-value and rejection region boundary
z_crit <- qnorm(alpha, lower.tail = FALSE)
cat("z critical:", z_crit, "\n")
p_hat_crit <- p0 – z_crit * se0
cat("critical proportion: ", p_hat_crit, "\n")
# Calculate z-value for the true proportion
z1 <- (p_hat_crit – p1) / se0
cat("z for true proportion", z1, "\n")
# Calculate the probability of Type II error
beta <- pnorm(z1, lower.tail = FALSE)
# Print the results
cat("Type II error (Beta):", beta, "\n")