• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Range of a Data Set

By Jim Frost 1 Comment

The range of a data set is the difference between the maximum and the minimum values. It measures variability using the same units as the data. Larger values represent greater variability.

The range is the easiest measure of dispersion to calculate and interpret in statistics, but it has some limitations. In this post, I’ll show you how to find the range mathematically and graphically, interpret it, explain its limitations, and clarify when to use it.

Formula

To find the range in statistics, take the largest value and subtract the smallest value from it.

Range = Highest value – Lowest value

It cannot be a negative value because the formula takes the larger value and subtracts the smaller value.

Related post: Measures of Variability

Example of Finding the Range

For example, in the worksheet below, Dataset 1 has a range of 38 – 20 = 18, while for Dataset 2 it is 52 – 11 = 41. Dataset 2 has a broader range and, therefore, is more variable than Dataset 1.

Example dataset for finding the range.

Conveniently, you can find the minimum, maximum, and range values in the descriptive statistics output from statistical software. Excel’s Descriptive Statistics function includes them, as shown below.

Excel's descriptive statistic displays the range.

Related post: Descriptive Statistics in Excel

Finding the Range in Graphs

You can find data ranges in several types of graphs, including histograms, boxplots, and scatterplots. In the example graphs below, the red lines represent the ranges. The following graphical representations bring the concept to life. If you’re looking at a chart and don’t have the data, you’ll have to approximate the values visually.

Histograms

In a histogram, the range is the width that the bars cover along the x-axis. These are approximate values because histograms display bin values rather than raw data values.

Histograms that display the ranges of data for two datasets.

In these histograms, distribution A has an approximate range of 65 – 40 = 25 and for distribution C it is 90 – 20 = 70. Distribution C has a broader spread, and its extensive width in the graph illustrates this property.

Boxplots

Boxplots display data ranges for groups within a dataset. In boxplots, it equals the entire length of the whiskers for each group. The minimum and maximum values appear at the ends of the whiskers except when there are outliers. Consequently, ranges in boxplots exclude outliers.

Boxplots display ranges for groups within a dataset.

In this boxplot, the scores for Method 3 spread from approximately 37 to 12, producing a range of 25. This group has the largest spread in the dataset. Conversely, Method 2 has the smallest spread of 30 – 20 = 10. Method 2 has an outlier (the asterisk), but the boxplot conveniently excludes it.

Scatterplots

In scatterplots, you can find the range of two variables at one time. For the y-axis variable, it is the height of the data, while it’s the width for the x-axis variable.

Scatterplots display the data ranges of two variables.

This scatterplot displays the height and weights of preteen girls in a research study. For these data, weight has a range of approximately 90 – 31 = 59 kilograms and for height it is 1.67 – 1.33 = 0.34 meters.

Note:  When you’re assessing mathematical functions rather than data values, the range of f(x) appears on the y-axis (outputs), and the domain is on the x-axis (inputs).

Related posts: Histograms, Boxplots, and Scatterplots

Limitations of Using the Range

The range is simple to understand but it has some limitations you need to consider.

Unfortunately, outliers can influence it considerably because it uses only the two most extreme values. If one value in the dataset is atypically low or high, it changes the entire range all by itself.

Let’s return to the first two data sets in this post. However, I’ve changed the bottom number in Dataset 1 from 18 to 102. The new spread is 82. The single change caused it to increase from 18 to 82. According to the new value, Dataset 1 appears to have more variability than Dataset 2 (r = 41). However, all values except the one outlier in Dataset 1 fall between 20 and 34.

Dataset with an outlier.

The range is not a robust statistic. The standard deviation and, especially, the interquartile range are more robust to outliers.

Related post: What are Robust Statistics?

Additionally, the sample size itself influences this statistic. As the sample size grows, the range tends to increase. Consequently, you can’t compare values between samples of different sizes.

Why does this happen? Overall, extreme values have lower probabilities of occurring. However, as the sample size increases, extreme values have more opportunities to appear. Consequently, the range tends to spread as the sample size increases.

If you need to compare the variability of different size datasets, use another measure, such as the standard deviation.

When to Use the Range?

Taking the weaknesses into consideration, when is the range a good measure of variability?

It can be an excellent measure when you need an intuitive statistic that indicates the degree to which the data are spread out. Everyone can understand the concept of the difference between the maximum and minimum data points. It’s also easy to calculate in your head using summary statistics when you need a quick assessment.

Use the range with small datasets to avoid outliers and when you’re comparing samples of the same size.

It’s also a great statistic for detecting data entry errors. Because it is so susceptible to outliers, a single mistake can manifest itself. You’re taking a weakness and using it for something positive! For example, if you find that the range of people’s height in a sample is 2 meters, there’s an error!

Using It for Quality Control

Quality control analysts often use this particular measure of variability. For starters, if the range for a batch of products is larger than the spread of the upper and lower spec limits, they know that at least one part is out of spec!

For example, if the range of part lengths is 5mm, but the spread for the spec limits is 3mm, there must be parts out of spec.

Quality control analysts also use R charts, which are range charts—a type of control chart. These graphs monitor the variation in a process by tracking the range over time. They use R charts with small (n = 2–10), consistently sized batches of a product from a stable process, which avoids the pitfalls I mentioned earlier. These graphs quickly detect unstable variability in the process.

Example of an R chart.

In an R chart, the data points represent the ranges for samples taken over time. When a sample value crosses the control limits (red lines), the process is out of statistical control. This process is in control.

Related post: Using Control Charts with Hypothesis Tests

Share this:

  • Tweet

Related

Filed Under: Basics Tagged With: conceptual, distributions, graphs, interpreting results

Reader Interactions

Comments

  1. John Maina says

    September 13, 2021 at 2:49 am

    Great piece

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Mean, Median, and Mode: Measures of Central Tendency
    • How to Find the P value: Process and Calculations
    • How to do t-Tests in Excel
    • Z-table
    • One-Tailed and Two-Tailed Hypothesis Tests Explained
    • Choosing the Correct Type of Regression Analysis
    • How to Interpret the F-test of Overall Significance in Regression Analysis

    Recent Posts

    • Slope Intercept Form of Linear Equations: A Guide
    • Population vs Sample: Uses and Examples
    • How to Calculate a Percentage
    • Control Chart: Uses, Example, and Types
    • Monte Carlo Simulation: Make Better Decisions
    • Principal Component Analysis Guide & Example

    Recent Comments

    • Jim Frost on Monte Carlo Simulation: Make Better Decisions
    • Gilberto on Monte Carlo Simulation: Make Better Decisions
    • Sultan Mahmood on Linear Regression Equation Explained
    • Sanjay Kumar P on What is the Mean and How to Find It: Definition & Formula
    • Dave on Control Variables: Definition, Uses & Examples

    Copyright © 2023 · Jim Frost · Privacy Policy