What is the Empirical Rule?
The empirical rule in statistics, also known as the 68 95 99 rule, states that for normal distributions, 68% of observed data points will lie inside one standard deviation of the mean, 95% will fall within two standard deviations, and 99.7% will occur within three standard deviations.
Thanks to the empirical rule, the mean and standard deviation become extra valuable when you reasonably expect that your data approximate a normal distribution. Simply knowing these two statistics allows you to calculate probabilities and percentages for various outcomes.
The name of the empirical rule comes from empirical research, which uses observations and measurements of real-world outcomes rather than theory. In other words, empirical means it is grounded in practical reality. The empirical rule takes these recorded outcomes and lets you use them to make forecasts and calculate probabilities.
Additionally, statisticians also refer to the empirical rule as the three-sigma rule because nearly all observations occur within three standard deviations. This rule sets a statistical control chart’s upper and lower limits at +/- three standard deviations. In general, this limit serves as a valuable way to identify outliers because 99.7% of all values should fall within it.
The empirical rule graph below displays the standard normal distribution with the ranges and percentages.
The graph makes it clear why it is also known as the 68 95 99 rule. Those numbers are the percentages that correspond to the standard deviation ranges.
In this post, learn the ways you can use the empirical rule, the formula for calculating the data ranges, and work through examples to solve problems.
Related post: Understanding the Normal Distribution
How to Use the Empirical Rule
Analysts use the empirical rule to predict the probabilities and distributions of the outcomes that they’re studying. It’s a valuable tool because it lets you make predictions using several easy-to-calculate statistics. Verify that your data follow a normal distribution at least roughly. If it does, you can start making forecasts by calculating the mean and standard deviation.
Many organizations use the empirical rule as a quality control method because you can safely assume many variables follow the normal distribution, and it’s easy to calculate the mean and standard deviation. Similarly, the value-at-risk (VaR) financial risk assessment assumes that the probabilities for outcomes follow a normal distribution. In short, the empirical rule is a quick and easy prediction method that provides good results.
Additionally, the empirical rule is an easy way to identify outliers. Because 99.7% of all observations should be within three standard deviations of the mean, analysts frequently use the limit of three standard deviations to identify outliers. Investigate observations outside this limit as potential outliers.
Learn how the empirical rule works behind the scenes of the range rule of thumb, which allows you to use the range to estimate the standard deviation.
Related post: Five Ways to Identify Outliers
The empirical rule is also a simple normality test. Based on the probabilities, you know that 99.7% of all observations should fall within three standard deviations from the mean. Therefore, only 100 – 99.7 = 0.3% should be outside the limit for a normal distribution. If too many values fall outside this limit, your data might not follow a normal distribution. Using the three-sigma limit of the empirical rule, you’d expect about 1 in every 370 observations to exceed the limit. Consequently, if you have 500 observations and 10 (2%) are outside the empirical rule limit, your data might not be normally distributed.
Outliers vs. Non-Normal Data
As an analyst using the empirical rule, you must distinguish between outliers and non-normal distributions. Both conditions can cause an unusual number of data points to lie outside the three-sigma limit. For example, the observations might be valid but follow a skewed distribution, which can create the appearance of outliers. To sort through this question, you’ll need to evaluate your data carefully, determine how it’s distributed, assess the data points in question, and apply a large amount of subject area knowledge.
Related post: How to Identify the Distribution of Your Data
Empirical Rule Formula (68 95 99 Rule)
To calculate the data ranges associated with the empirical rule percentages of 68%, 95%, and 99.7%, start by calculating the sample mean (x̅) and standard deviation (s). Then input those values into the formulas below to derive the ranges.
| Data range | Percentage of data in the range |
| x̅ − s, x̅ + s | 68% |
| x̅ − 2s, x̅ + 2s | 95% |
| x̅ − 3s, x̅ + 3s | 99.7% |
Use my free online Empirical Rule Calculator to quickly find the solutions to the example problems below. I’m including the calculator here but click the link to go to its standalone page, which you might want to bookmark.
