Chebyshev’s Theorem estimates the minimum proportion of observations that fall within a specified number of standard deviations from the mean. This theorem applies to a broad range of probability distributions. Chebyshev’s Theorem is also known as Chebyshev’s Inequality.
If you have a mean and standard deviation, you might need to know the proportion of values that lie within, say, plus and minus two standard deviations of the mean. If your data follow the normal distribution, that’s easy using the Empirical Rule! However, what if you don’t know the distribution of your data or you know that it doesn’t follow the normal distribution? In that case, Chebyshev’s Theorem can help you out!
In this post, learn why Chebyshev’s theorem is valuable and how to use it to solve problems. Additionally, I’ll compare the theorem to the Empirical Rule, which serves a similar purpose.
Equation for Chebyshev’s Theorem
Chebyshev’s Theorem helps you determine where most of your data fall within a distribution of values. This theorem provides helpful results when you have only the mean and standard deviation. You do not need to know the distribution your data follow.
There are two forms of the equation. One determines how close to the mean the data lie and the other calculates how far away from the mean they fall:
|Maximum proportion of observations that are more than k standard deviations from the mean|
|Minimum proportion of observations that are within k standard deviations of the mean|
Where k equals the number of standard deviations in which you are interested. K must be greater than 1.
As you can see, it’s a fairly straightforward equation.
Using Chebyshev’s Theorem
By entering values for k into the equation, I’ve created the table below that displays proportions for various standard deviations.
|Standard Deviations||Minimum % within||Max % outside|
For example, if you’re interested in a range of three standard deviations around the mean, Chebyshev’s Theorem states that at least 89% of the observations fall inside that range, and no more than 11% fall outside that range.
A crucial point to notice is that Chebyshev’s Theorem produces minimum and maximum proportions. For example, at least 56% of the observations fall inside 1.5 standard deviations, and a maximum of 44% fall outside.
The theorem does not provide exact answers, but it places limits on the possible proportions. For the example above, more than 56% of the observations can lie within 1.5 standard deviations of the mean.
The minimum and maximum proportions arise due to uncertainties about the data’s distribution. While the theorem is valuable because it applies to all distributions, it also limits the precision of the results.
Suppose you know a dataset has a mean of 100 and a standard deviation of 10, and you’re interested in a range of ± 2 standard deviations. Two standard deviations equal 2 X 10 = 20. Consequently, Chebyshev’s Theorem tells you that at least 75% of the values fall between 100 ± 20, equating to a range of 80 – 120. Conversely, no more than 25% fall outside that range.
An interesting range is ± 1.41 standard deviations. With that range, you know that at least half the observations fall within it, and no more than half fall outside of it. If we use a mean of 100 and a standard deviation of 10 again, 1.41 standard deviations equal 14.1. Hence, at least half the values lie in the range 100 ± 14.1, or 85.9 – 114.1.
Suppose a class takes a test. The average score is 75 and the standard deviation is 5. What is the proportion of scores that fall between 65 and 85?
The mean is 75. 65 is 10 points below the mean and 85 is 10 points above the mean. The standard deviation is 5. Consequently, you want to determine the proportion of scores that fall within 10 / 5 = 2 standard deviations of the mean. Using the table above, you know that at least 75% of the scores will fall within the range of 65 – 85.
Chebyshev’s Theorem compared to The Empirical Rule
The Empirical Rule also describes the proportion of data that fall within a specified number of standard deviations from the mean. However, there are several crucial differences between Chebyshev’s Theorem and the Empirical Rule.
Chebyshev’s Theorem applies to all probability distributions where you can calculate the mean and standard deviation. On the other hand, the Empirical Rule applies only to the normal distribution.
As you saw above, Chebyshev’s Theorem provides approximations. Conversely, the Empirical Rule provides exact answers for the proportions because the data are known to follow the normal distribution.
Related post: Identifying the Distribution of Your Data
The table below compares the results from both methods for the proportions of data falling within the specified number of standard deviations.
|Standard Deviations||Empirical Rule||Chebyshev’s Theorem|
Again, notice that the Empirical Rule provides exact answers while Chebyshev’s Theorem gives approximations.
If you know that your data follow the normal distribution, use the Empirical Rule. Otherwise, Chebyshev’s Theorem might be your best choice!
For more information about the Empirical Rule, read my post about The Normal Distribution, which discusses it.