The range of a data set is the difference between the maximum and the minimum values. It measures variability using the same units as the data. Larger values represent greater variability.
The range is the easiest measure of dispersion to calculate and interpret in statistics, but it has some limitations. In this post, I’ll show you how to find the range mathematically and graphically, interpret it, explain its limitations, and clarify when to use it.
To find the range in statistics, take the largest value and subtract the smallest value from it.
Range = Highest value – Lowest value
It cannot be a negative value because the formula takes the larger value and subtracts the smaller value.
Related post: Measures of Variability
Example of Finding the Range
For example, in the worksheet below, Dataset 1 has a range of 38 – 20 = 18, while for Dataset 2 it is 52 – 11 = 41. Dataset 2 has a broader range and, therefore, is more variable than Dataset 1.
Conveniently, you can find the minimum, maximum, and range values in the descriptive statistics output from statistical software. Excel’s Descriptive Statistics function includes them, as shown below.
Related post: Descriptive Statistics in Excel
Finding the Range in Graphs
You can find data ranges in several types of graphs, including histograms, boxplots, and scatterplots. In the example graphs below, the red lines represent the ranges. The following graphical representations bring the concept to life. If you’re looking at a chart and don’t have the data, you’ll have to approximate the values visually.
In a histogram, the range is the width that the bars cover along the x-axis. These are approximate values because histograms display bin values rather than raw data values.
In these histograms, distribution A has an approximate range of 65 – 40 = 25 and for distribution C it is 90 – 20 = 70. Distribution C has a broader spread, and its extensive width in the graph illustrates this property.
Boxplots display data ranges for groups within a dataset. In boxplots, it equals the entire length of the whiskers for each group. The minimum and maximum values appear at the ends of the whiskers except when there are outliers. Consequently, ranges in boxplots exclude outliers.
In this boxplot, the scores for Method 3 spread from approximately 37 to 12, producing a range of 25. This group has the largest spread in the dataset. Conversely, Method 2 has the smallest spread of 30 – 20 = 10. Method 2 has an outlier (the asterisk), but the boxplot conveniently excludes it.
In scatterplots, you can find the range of two variables at one time. For the y-axis variable, it is the height of the data, while it’s the width for the x-axis variable.
This scatterplot displays the height and weights of preteen girls in a research study. For these data, weight has a range of approximately 90 – 31 = 59 kilograms and for height it is 1.67 – 1.33 = 0.34 meters.
Note: When you’re assessing mathematical functions rather than data values, the range of f(x) appears on the y-axis (outputs), and the domain is on the x-axis (inputs).
Related posts: Histograms, Boxplots, and Scatterplots
Limitations of Using the Range
The range is simple to understand but it has some limitations you need to consider.
Unfortunately, outliers can influence it considerably because it uses only the two most extreme values. If one value in the dataset is atypically low or high, it changes the entire range all by itself.
Let’s return to the first two data sets in this post. However, I’ve changed the bottom number in Dataset 1 from 18 to 102. The new spread is 82. The single change caused it to increase from 18 to 82. According to the new value, Dataset 1 appears to have more variability than Dataset 2 (r = 41). However, all values except the one outlier in Dataset 1 fall between 20 and 34.
The range is not a robust statistic. The standard deviation and, especially, the interquartile range are more robust to outliers.
Related post: What are Robust Statistics?
Additionally, the sample size itself influences this statistic. As the sample size grows, the range tends to increase. Consequently, you can’t compare values between samples of different sizes.
Why does this happen? Overall, extreme values have lower probabilities of occurring. However, as the sample size increases, extreme values have more opportunities to appear. Consequently, the range tends to spread as the sample size increases.
If you need to compare the variability of different size datasets, use another measure, such as the standard deviation.
When to Use the Range?
Taking the weaknesses into consideration, when is the range a good measure of variability?
It can be an excellent measure when you need an intuitive statistic that indicates the degree to which the data are spread out. Everyone can understand the concept of the difference between the maximum and minimum data points. It’s also easy to calculate in your head using summary statistics when you need a quick assessment.
Use the range with small datasets to avoid outliers and when you’re comparing samples of the same size.
It’s also a great statistic for detecting data entry errors. Because it is so susceptible to outliers, a single mistake can manifest itself. You’re taking a weakness and using it for something positive! For example, if you find that the range of people’s height in a sample is 2 meters, there’s an error!
Using It for Quality Control
Quality control analysts often use this particular measure of variability. For starters, if the range for a batch of products is larger than the spread of the upper and lower spec limits, they know that at least one part is out of spec!
For example, if the range of part lengths is 5mm, but the spread for the spec limits is 3mm, there must be parts out of spec.
Quality control analysts also use R charts, which are range charts—a type of control chart. These graphs monitor the variation in a process by tracking the range over time. They use R charts with small (n = 2–10), consistently sized batches of a product from a stable process, which avoids the pitfalls I mentioned earlier. These graphs quickly detect unstable variability in the process.
In an R chart, the data points represent the ranges for samples taken over time. When a sample value crosses the control limits (red lines), the process is out of statistical control. This process is in control.
Related post: Using Control Charts with Hypothesis Tests
John Maina says