What is the Range Rule of Thumb?
The range rule of thumb allows you to estimate the standard deviation of a dataset quickly. This process is not as accurate as the actual calculation for the standard deviation, but it’s so simple you can do it in your head.
Use this method when you need a rough estimate quickly or have a dataset summary that doesn’t provide enough information to calculate the actual standard deviation.
The range of a dataset is simply the maximum value minus the minimum value. So, you can estimate the StDev knowing only those two values.
The range and standard deviation are both measures of variability, but there is no precise mathematical relationship between the two. However, you can use the range to estimate the standard deviation.
Range Rule of Thumb Formula
The range rule of thumb formula is the following:
Subtract the smallest value in a dataset from the largest and divide the result by four to estimate the standard deviation.
In other words, the StDev is roughly ¼ the range of the data.
Example Calculations
Let’s apply the range rule of thumb formula to actual data. From a research study I helped run, I have a dataset containing 88 heights. Download the Excel data file: RangeRuleofThumb.
Excel’s descriptive statistics say the height data have the following properties:
- Mean: 1.51m
- Standard Deviation: 0.07m
- Maximum: 1.66m
- Minimum: 1.33m
Now suppose you’re reading a summary of the height dataset. While the summary includes the maximum and minimum value, pretend it doesn’t list the StDev. Let’s try it out!
Voila, you have estimated the StDev using two numbers from the dataset!
The range rule of thumb’s estimate (0.08) is close to the correct standard deviation of 0.07.
How Does the Range Rule of Thumb Work?
It might seem odd that you can just divide the range by four to estimate the standard deviation. However, the range rule of thumb uses the properties of the normal distribution and the empirical rule.
When data follow the normal distribution, the empirical rule states that 95% of the values fall between the mean ± 2 StDevs.
Given these properties, virtually all values in a sample fall within a four standard deviation spread that centers on the mean.
Therefore, the range of a dataset approximates the four StDev spread. Hence, dividing the range by four approximates one StDev.
Limitations
By understanding how it works, you can work out its limitations. The range rule of thumb:
- Works best with data that at least roughly follow a normal distribution.
- Is sensitive to outliers. One unusually high or low value can affect the estimate.
- Depends on the sample size. Very small samples tend to underestimate, while very large samples overestimate.
- Does not produce more precise estimates with larger sample sizes, unlike most estimates.
Kenji says
Hello again,
The end of your answer to my comment spiked my curiosity, so I went in and followed through the other comment your referred to me and, while not being much versed in statistics, I tried to reproduce your instructions (while hoping I did everything correctly!).
So I dusted-off my R-fu and in my free time I generated 2 graphs (default and with the x-axis scaled to log10) of the same 3
plots with factors of 4 (blue), 6 (yellow), 8 (red), but with the range size starting from 5 to a whooping 20,000!
I can tell from a quick glance that the rule of thumb begins to be the
most accurate around 1000 += ~100 for the factor of 6 and around
20000 for the factor of 8.
Kind regards.
Jim Frost says
Hi Kenji,
That’s awesome! The only way I could get your images to display was to include them in my comment but thanks for doing that!
Kenji says
Hi Jim, very nice and comprehensive article !
I would like to know why this Rule of Thumb is generally tailored around 2 deviations/95% confidence?
Can we use this Rule with a factor of 6 for the greater confidence interval of +- 3s.d/99.7%? If so, under which circumstances? If not, for which reasons?
Kind regards.
Jim Frost says
Hi Kenji,
That’s a great question! In a way, you could extend that thinking and say why not +/- 4 SDs, or +/- 5 SDs, and so on. As you increase the range, you’ll capture more and more of the observations.
The answer really comes down to sample size. While it’s unstated in the range rule of thumb, I did simulation study in response to another question and posted it as comment here. The range rule of thumb with +/-2 2SDs seems to work best with a sample size of between 15 – 35. Smaller samples underestimate the SD and larger samples overestimate it. As you increase the sample size, you’d need to increase the the range to get an unbiased estimate. I haven’t done that simulation but it makes sense. At any rate, click the link to see my conclusions!
Neil Higgs says
This is only a good rule of thumb for large values of n. For small values, use the Studentised Range distribution
Jim Frost says
Hi Neil,
Actually, the range rule of thumb tends to overestimate the standard deviation for large sample sizes. For example, sample sizes around n = 100 tend to overestimate the standard deviation by an average of 23%. Look at my reply to Riccardo in the comments below for details and graph. The sweet spot for the range rule of thumb’s accuracy seems to be right around n = 15 to 35.
Riccardo says
Which is the optimal dimension of the datased to make this rule of thumb work best? Is it in the order of 100 sample points as in the example?
Jim Frost says
Hi Riccardo,
I was curious about this myself, so I ran a simulation study. I drew random samples from a standard normal distribution that ranged in size from 5 to 200. I calculated the actual standard deviation and the range rule of thumb estimate for each sample. Then found the difference: Range Rule of Thumb – Actual StDev.
The Y-units are standard deviations. So, a difference of 1 indicates that the rule of thumb was higher than the actual standard deviation by 1 SD (or 100%). -1 indicates it’s lower by 1 SD (-100%). Of course, no difference was a large as a full SD in either direction.
Below are the results.
There seems to be a sweet spot for samples between 15 – 35. In this range, the rule of thumb is off by an overall average of 0.6%. The lower end of that range tends to underestimate by about 3% while the upper end overestimates by 4%. As your sample size goes further outside that range, smaller samples tend to underestimate by more and larger samples will tend to overestimate by more.
Samples sizes around 100 (between 95 and 105), tend to overestimate by 23%. The largest samples in my study overestimated by an average of about 40%.
Also notice how larger sample sizes do not produce more precise estimates. They’re not only biased high (see above), but the vertical spread of estimates also increases as sample size increases. Usually with estimates, larger sample sizes will reduce the spread (greater precision).
The range rule of thumbs only gives you a rough idea of the standard deviation. However, knowing the sample size can help you adjust its estimate. Also note that these results apply to data the follow a normal distribution. It’s likely to differ with other distributions.
My dataset has n=88. Based on the above, you’d expect the range rule of thumb estimate to be about 23% too high. In reality it was 14.3% too high, but it’s well within the spread of values on the graph for a sample of that size.