A confidence interval is calculated from a sample and provides a range of values that likely contains the unknown value of a population parameter. In this post, I demonstrate how confidence intervals and confidence levels work using graphs and concepts instead of formulas. In the process, you’ll see how confidence intervals are very similar to P values and significance levels.
Read the companion post for this one: How Hypothesis Tests Work: Significance Levels (Alpha) and P-values. In that post, I use the same graphical approach to illustrate why we need hypothesis tests, how significance levels and P values can determine whether a result is statistically significant, and what that actually means.
How to Interpret Confidence Intervals
You can calculate a confidence interval from a sample to obtain a range for where the population parameter is likely to reside. For example, a confidence interval of [9 11] indicates that the population mean is likely to be between 9 and 11.
Different random samples drawn from the same population are liable to produce slightly different intervals. If you draw many random samples and calculate a confidence interval for each sample, a specific proportion of the intervals contain the population parameter. That percentage is the confidence level.
For example, a 95% confidence level suggests that if you draw 20 random samples from the same population, you’d expect 19 of the confidence intervals to include the population value.
The confidence interval procedure provides meaningful estimates because it produces ranges that usually contain the parameter.
We’ll create a confidence interval for the population mean using the fuel cost example that we’ve been developing. With other types of data, you can create intervals for proportions, frequencies, regression coefficients, and differences between populations.
Confidence Intervals Indicate the Precision of the Estimate
Confidence intervals include the point estimate for the sample with a margin of error around the point estimate. The point estimate is the most likely value of the parameter and equals the sample value. The margin of error accounts for the amount of doubt involved in estimating the population parameter. The more variability there is in the sample data, the less precise the estimate, which causes the margin of error to extend further out from the point estimate. Confidence intervals help you navigate the uncertainty of how well a sample estimates a value for an entire population.
With this in mind, confidence intervals can help you compare the precision of different estimates. Suppose two different samples estimate the same population parameter with 95% confidence intervals. One interval is [5 15] while the other is [9 11]. The later confidence interval is narrower, which suggests that it is a more precise estimate.
Related post: Sample Statistics Are Always Wrong (to Some Extent)!
Creating Confidence Intervals Graphically
Let’s delve into how confidence intervals incorporate the margin of error. Like the previous posts, I’ll use the same type of sampling distribution that showed us how hypothesis tests work. This sampling distribution is based on the t-distribution, our sample size, and the variability in our sample. Download the CSV data file: FuelsCosts.
There are two key differences between the sampling distribution graphs for significance levels and confidence intervals. The significance level chart centers on the null value, and we shade the outside 5% of the distribution. Conversely, the confidence interval graph centers on the sample mean, and we shade the center 95% of the distribution.
The shaded range of sample means [267 392] covers 95% of this sampling distribution. This range is the 95% confidence interval for our sample data. We can be 95% confident that the population mean for fuel costs fall between 267 and 392.
Confidence Intervals and the Inherent Uncertainty of Using Sample Data
The graph emphasizes the role of uncertainty around the point estimate. This graph centers on our sample mean. If the population mean equals our sample mean, random samples from this population (N=25) will fall within this range 95% of the time.
We don’t really know whether our sample mean is near the population mean. However, we know that the sample mean is an unbiased estimate of the population mean. An unbiased estimate is one that doesn’t tend to be too high or too low. It’s correct on average. Confidence intervals are correct on average because they use sample estimates that are correct on average. Given what we know, the sample mean is the most likely value for the population mean.
Given the sampling distribution, it would not be unusual for other random samples drawn from the same population to have means that fall within the shaded area. In other words, given that we did, in fact, obtain the sample mean of 330.6, it would not be surprising to get other sample means within the shaded range.
If these other sample means would not be unusual, then we must conclude that these other values are also likely candidates for the population mean. There is an inherent uncertainty when you use sample data to make inferences about the entire population. Confidence intervals help you gauge the amount of uncertainty in your sample.
Confidence Intervals and P Values Always Agree on Statistical Significance
If you want to determine whether your test results are statistically significant, you can use either P values with significance levels or confidence intervals. These two approaches always agree.
The relationship between the confidence level and the significance level for a hypothesis test is as follows:
For example, if your significance level is 0.05, the equivalent confidence level is 95%.
Both of the following conditions represent a hypothesis test with statistically significant results:
- The P value is smaller than the significance level.
- The confidence interval excludes the null hypothesis value.
Further, it is always true that when the P value is less than your significance level, the interval excludes the value of the null hypothesis.
In the fuel cost example, our hypothesis test results are statistically significant because the P value (0.03112) is less than the significance level (0.05). Likewise, the 95% confidence interval [267 394] excludes the null hypotheses value (260). Using either method, we draw the same conclusion.
Why They Always Agree
The P-value and confidence interval results always agree. To understand the basis of this agreement, we need to remember how confidence levels and significance levels function:
- A confidence level determines the distance between the sample mean and the confidence limits.
- A significance level determines the distance between the sample mean and the critical regions.
Both of these concepts specify a distance from the mean to a limit. Surprise! These distances are precisely the same length.
A 1-sample t-test calculates this distance as follows:
The critical t-value * standard error of the mean
Interpreting these statistics goes beyond the scope of this article. But, using this equation, the distance for our fuel cost example is $63.57.
P Value and significance level approach: If the sample mean is more than $63.57 from the null hypothesis mean, the sample mean falls within the critical region and the difference is statistically significant.
Confidence interval approach: If the null hypothesis mean is more than $63.57 from the sample mean, the interval does not contain this value, and the difference is statistically significant.
Of course, they always agree!
As long as the P values and confidence intervals are generated by the same hypothesis test, and you use an equivalent confidence level and significance level, the two approaches always agree.
I Really Like Confidence Intervals!
In statistics, more emphasis is placed on using P values to determine whether a result is statistically significant. Unfortunately, an effect that is statistically significant might not always be practically significant. For example, a significant effect can be too small to be of any importance in the real world.
You should always consider both the size and precision of the estimated effect. Ideally, an estimated effect is both large enough to be meaningful and sufficiently precise for you to trust. Confidence intervals allow you to assess both of these considerations! Learn more about this distinction in my post about Practical vs. Statistical Significance.
To see an alternative to traditional confidence intervals that does not use probability distributions and test statistics, learn about bootstrapping in statistics! In that post, I create bootstrapped confidence intervals.