What is the Range Rule of Thumb?
The range rule of thumb allows you to estimate the standard deviation of a dataset quickly. This process is not as accurate as the actual calculation for the standard deviation, but it’s so simple you can do it in your head.
Use this method when you need a rough estimate quickly or have a dataset summary that doesn’t provide enough information to calculate the actual standard deviation.
The range of a dataset is simply the maximum value minus the minimum value. So, you can estimate the StDev knowing only those two values.
The range and standard deviation are both measures of variability, but there is no precise mathematical relationship between the two. However, you can use the range to estimate the standard deviation.
Range Rule of Thumb Formula
The range rule of thumb formula is the following:
Subtract the smallest value in a dataset from the largest and divide the result by four to estimate the standard deviation.
In other words, the StDev is roughly ¼ the range of the data.
Let’s apply the range rule of thumb formula to actual data. From a research study I helped run, I have a dataset containing 88 heights. Download the Excel data file: RangeRuleofThumb.
Excel’s descriptive statistics say the height data have the following properties:
- Mean: 1.51m
- Standard Deviation: 0.07m
- Maximum: 1.66m
- Minimum: 1.33m
Now suppose you’re reading a summary of the height dataset. While the summary includes the maximum and minimum value, pretend it doesn’t list the StDev. Let’s try it out!
Voila, you have estimated the StDev using two numbers from the dataset!
The range rule of thumb’s estimate (0.08) is close to the correct standard deviation of 0.07.
How Does the Range Rule of Thumb Work?
It might seem odd that you can just divide the range by four to estimate the standard deviation. However, the range rule of thumb uses the properties of the normal distribution and the empirical rule.
When data follow the normal distribution, the empirical rule states that 95% of the values fall between the mean ± 2 StDevs.
Given these properties, virtually all values in a sample fall within a four standard deviation spread that centers on the mean.
Therefore, the range of a dataset approximates the four StDev spread. Hence, dividing the range by four approximates one StDev.
By understanding how it works, you can work out its limitations. The range rule of thumb:
- Works best with data that at least roughly follow a normal distribution.
- Is sensitive to outliers. One unusually high or low value can affect the estimate.
- Depends on the sample size. Very small samples tend to underestimate, while very large samples overestimate.
- Does not produce more precise estimates with larger sample sizes, unlike most estimates.
Which is the optimal dimension of the datased to make this rule of thumb work best? Is it in the order of 100 sample points as in the example?
Jim Frost says
I was curious about this myself, so I ran a simulation study. I drew random samples from a standard normal distribution that ranged in size from 5 to 200. I calculated the actual standard deviation and the range rule of thumb estimate for each sample. Then found the difference: Range Rule of Thumb – Actual StDev.
The Y-units are standard deviations. So, a difference of 1 indicates that the rule of thumb was higher than the actual standard deviation by 1 SD (or 100%). -1 indicates it’s lower by 1 SD (-100%). Of course, no difference was a large as a full SD in either direction.
Below are the results.
There seems to be a sweet spot for samples between 15 – 35. In this range, the rule of thumb is off by an overall average of 0.6%. The lower end of that range tends to underestimate by about 3% while the upper end overestimates by 4%. As your sample size goes further outside that range, smaller samples tend to underestimate by more and larger samples will tend to overestimate by more.
Samples sizes around 100 (between 95 and 105), tend to overestimate by 23%. The largest samples in my study overestimated by an average of about 40%.
Also notice how larger sample sizes do not produce more precise estimates. They’re not only biased high (see above), but the vertical spread of estimates also increases as sample size increases. Usually with estimates, larger sample sizes will reduce the spread (greater precision).
The range rule of thumbs only gives you a rough idea of the standard deviation. However, knowing the sample size can help you adjust its estimate. Also note that these results apply to data the follow a normal distribution. It’s likely to differ with other distributions.
My dataset has n=88. Based on the above, you’d expect the range rule of thumb estimate to be about 23% too high. In reality it was 14.3% too high, but it’s well within the spread of values on the graph for a sample of that size.