Intervals are estimation methods in statistics that use sample data to produce ranges of values that are likely to contain the population value of interest. In contrast, point estimates are single value estimates of a population value. Of the different types of statistical intervals, confidence intervals are the most well-known. However, certain kinds of analyses and situations call for other types of ranges that provide different information.
In this post, I’ll compare confidence intervals, prediction intervals, and tolerance intervals, so you’ll know when to use each type. I’ll include an example of each type of range to make them easier to understand!
What are Confidence Intervals?
Confidence interval calculations take sample data and produce a range of values that likely contains the population parameter that you are interested in. For example, the confidence interval of the mean [9 11] suggests that the population mean is likely to be between 9 and 11.
Different random samples drawn from the same population are liable to produce slightly different confidence intervals. If you collect numerous random samples from the same population and calculate a confidence interval for each sample, a certain proportion of the ranges contain the population parameter. That percentage is the confidence level.
For example, a 95% confidence level indicates that if you draw 20 random samples from the same population, you’d expect 19 of the confidence intervals to include the population value. The confidence interval procedure is useful because it produces ranges that usually contain the parameter.
Use confidence intervals to produce ranges for all types of population parameters. A confidence interval for a population mean is probably the most common type, but you can also use these ranges for the standard deviation, proportions, rates of occurrence, regression coefficients, and the differences between populations.
Example of a Confidence Interval
Suppose that you randomly sample a product, measure the strength, and the 95% confidence interval is 100 – 120 units. You can be 95% confident that the mean strength of the entire population falls within this range. However, the 95% confidence level does not indicate that 95% of observations fall within this range. To draw that type of conclusion, we need to use a different kind of interval.
Here are some important considerations for confidence intervals.
- As you draw larger and larger random samples from the same population, the confidence intervals tend to become narrower.
- As you increase the confidence level for a given same sample, say from 95% to 99%, the range becomes wider. At first, this fact might seem counter-intuitive, but think about it. To have greater confidence that an interval contains the parameter, it makes sense that the range must become wider. Conversely, a narrower range is less likely to include the parameter, which lowers your confidence.
- A confidence interval for the mean says nothing about the dispersion of values around the mean.
For a graphical representation that makes these concepts more intuitive, please read my blog post: How Confidence Intervals and Confidence Levels Work.
What Are Prediction Intervals?
After you fit a regression model, you can obtain prediction intervals. These intervals predict the value of the dependent variable given specific settings of the independent variables. I’ll cover two types of prediction intervals that provide different types of predictions.
Confidence interval of the prediction
A confidence interval of the prediction is a range that likely contains the mean value of the dependent variable given specific values of the independent variables. Like regular confidence intervals, these intervals provide a range for the population average. In this case, it’s a particular population defined by the values of your independent variables. Similarly, these ranges don’t tell you anything about the spread of the individual data points around the population mean.
Going back to our product strength example, let’s assume it is a plastic product, and our independent variables are the plastic type (A or B) and the processing temperature. After we fit our model, the statistical software can produce the confidence interval of the prediction for specific settings.
We want to predict the mean strength for our product if we use plastic type A with a processing temperature of 125 degrees Celsius. The resulting confidence interval of the prediction is 140 – 150. These results indicate we can be 95% confident that the population defined by plastic type A and 125C has a mean that falls within this range. However, it provides no indication of the distribution of strength values for individual products.
A prediction interval is a range that likely contains the value of the dependent variable for a single new observation given specific values of the independent variables. With this type of interval, we’re predicting ranges for individual observations rather than the mean value.
Let’s use the same model and the same values that we used above. The statistical software produces a prediction interval of 130 – 160. We can be 95% confident that the strength of the next individual item produced using our settings will fall within this range.
There is greater uncertainty when you predict an individual value rather than the mean value. Consequently, a prediction interval is always wider than the confidence interval of the prediction.
We can predict the range for an individual observation, but we need a model. For more information, read my post about using regression to make predictions.
What Are Tolerance Intervals?
Use tolerance intervals to answer the question, “what range of values covers X% of the population?” If you want to know the range where most values fall, use a tolerance interval.
A tolerance interval is a range that likely contains a specific proportion of a population. For example, you might want to know where 99% of the population falls for a particular characteristic. With tolerance intervals, we are specifically dealing with the spread of individual values around the mean.
To create a tolerance interval, you need to specify both the confidence level and the proportion. The confidence level is required because we’re still working with samples and their inherent uncertainties.
For example, we want to create a tolerance interval where we’ll be 95% confident that the interval contains 99% of the population.
I think it’s a lot easier to understand confidence intervals using an example!
Example of a tolerance interval
As the plastic manufacturer, we need to know the strength of our product. However, we need to know more than just the mean strength. It’s important to understand the distribution of the individual values around the average.
For instance, the mean strength can be higher than our minimum requirement, which sounds great. However, if the spread around the average is too broad, too many products can fall below the minimum required strength.
To create a tolerance interval, we’ll start by randomly sampling 100 plastic products and recording their strengths. Download the CSV data file: Strength. Here is the statistical output for tolerance intervals.
Tolerance intervals are sensitive to the distribution of the data. In the output, the normality test indicates that our plastic strength data are normally distributed. Therefore, we’ll use the Normal interval, which is 110—140 (rounded values). We can be 95% confident that at least 99% of all strength values for the product will be between 110 and 140.
How do we use these tolerance interval results? As the manufacturer, we need to compare the tolerance limits to our client’s requirements. If our tolerance interval is broader than the requirements, our production process produces too many defects.
Tolerance Intervals vs Confidence Intervals
To help distinguish confidence intervals from tolerance intervals, here are some key differences.
A confidence interval estimates only the mean and the sampling error determines the width of a confidence interval. As the sample size approaches the whole population, the sample error decreases and the width of the CI approaches zero as it converges on the single value of the population mean.
A tolerance interval reflects the spread of values around the average. Both the sampling error and the dispersion of values in the entire population determine the widths of these ranges. As the sample size approaches the whole population, tolerance intervals don’t converge on a zero width. Instead, they converge on the actual width of the population associated with the percentage you specify.
The width is based on percentiles. For example, to determine where 99% of the population lies, the software determines the data values that correspond to the 99.5th percentile and the 0.5th percentile (99.5 – 0.5 = 99% of the population). Tolerance interval calculations factor in the sampling error associated with the sample estimates of the percentiles.
Tolerance intervals can help you identify cases where excess variation can cause problems. Compare your requirements to the tolerance intervals to determine whether excessive variation is a problem for your study area.
Confidence intervals are the most well-known ranges in statistics. However, you might need to use a different type of range based on your specific needs.