A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.

In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data.

The two plots below show the difference graphically for distributions with the same mean but more and less dispersion. The panel on the left shows a distribution that is tightly clustered around the average, while the distribution in the right panel is more spread out.

**Related post**: Measures of Central Tendency: Mean, Median, and Mode

## Why Understanding Variability is Important

Let’s take a step back and first get a handle on why understanding variability is so essential. Analysts frequently use the mean to summarize the center of a population or a process. While the mean is relevant, people often react to variability even more. When a distribution has lower variability, the values in a dataset are more consistent. However, when the variability is higher, the data points are more dissimilar and extreme values become more likely. Consequently, understanding variability helps you grasp the likelihood of unusual events.

In some situations, extreme values can cause problems! Have you seen a weather report where the meteorologist shows extreme heat and drought in one area and flooding in another? It would be nice to average those together! Frequently, we feel discomfort at the extremes more than the mean. Understanding that variability around the mean provides critical information.

Variability is everywhere. Your commute time to work varies a bit every day. When you order a favorite dish at a restaurant repeatedly, it isn’t exactly the same each time. The parts that come off an assembly line might appear to be identical, but they have subtly different lengths and widths.

These are all examples of real-life variability. Some degree of variation is unavoidable. However, too much inconsistency can cause problems. If your morning commute takes much longer than the mean travel time, you will be late for work. If the restaurant dish is much different than how it is usually, you might not like it at all. And, if a manufactured part is too much out of spec, it won’t function as intended.

Some variation is inevitable, but problems occur at the extremes. Distributions with greater variability produce observations with unusually large and small values more frequently than distributions with less variability.

### Example of Different Amounts of Variability

Let’s take a look at two hypothetical pizza restaurants. They both advertise a mean delivery time of 20 minutes. When we’re ravenous, they both sound equally good! However, this equivalence can be deceptive! To determine the restaurant that you should order from when you’re hungry, we need to analyze their variability.

Suppose we study their delivery times, calculate the variability for each place, and determine that their variabilities are different. We’ve computed the standard deviations for both restaurants—which is a measure that we’ll come back to later in this post. How significant is this difference in getting pizza to their customers promptly?

The graphs below display the distribution of delivery times and provide the answer. The restaurant with more variable delivery times has the broader distribution curve. I’ve used the same scales in both graphs so you can visually compare the two distributions.

In these graphs, we consider a 30-minute wait or longer to be unacceptable. We’re hungry after all! The shaded area in each chart represents the proportion of delivery times that surpass 30 minutes. Nearly 16% of the deliveries for the high variability restaurant exceed 30 minutes. On the other hand, only 2% of the deliveries take too long with the low variability restaurant. They both have an average delivery time of 20 minutes, but I know where I’d place my order when I’m hungry!

As this example shows, the central tendency doesn’t provide complete information. We also need to understand the variability around the middle of the distribution to get the full picture. Now, let’s move on to the different ways of measuring variability!

## Range

Let’s start with the range because it is the most straightforward measure of variability to calculate and the simplest to understand. The range of a dataset is the difference between the largest and smallest values in that dataset. For example, in the two datasets below, dataset 1 has a range of 20 – 38 = 18 while dataset 2 has a range of 11 – 52 = 41. Dataset 2 has a wider range and, hence, more variability than dataset 1.

While the range is easy to understand, it is based on only the two most extreme values in the dataset, which makes it very susceptible to outliers. If one of those numbers is unusually high or low, it affects the entire range even if it is atypical.

Additionally, the size of dataset affects the range. In general, you are less likely to observe extreme values. However, as you increase the sample size, you have more opportunities to obtain these extreme values. Consequently, when you draw random samples from the same population, the range tends to increase as the sample size increases. Consequently, use the range to compare variability only when the sample sizes are similar.

## The Interquartile Range (IQR) . . . and other Percentiles

The interquartile range is the middle half of the data. To visualize it, think about the median value that splits the dataset in half. Similarly, you can divide the data into quarters. Statisticians refer to these quarters as quartiles and denote them from low to high as Q1, Q2, Q3, and Q4. The lowest quartile (Q1) contains the quarter of the dataset with the smallest values. The upper quartile (Q4) contains the quarter of the dataset with the highest values. The interquartile range is the middle half of the data that is in between the upper and lower quartiles. In other words, the interquartile range includes the 50% of data points that fall in Q2 and Q3. The IQR is the red area in the graph below.

The interquartile range is a robust measure of variability in a similar manner that the median is a robust measure of central tendency. Neither measure is influenced dramatically by outliers because they don’t depend on every value. Additionally, the interquartile range is excellent for skewed distributions, just like the median. As you’ll learn, when you have a normal distribution, the standard deviation tells you the percentage of observations that fall specific distances from the mean. However, this doesn’t work for skewed distributions, and the IQR is a great alternative.

I’ve divided the dataset below into quartiles. The interquartile range (IQR) extends from the low end of Q2 to the upper limit of Q3. For this dataset, the range is 21 – 39.

### Using other percentiles

When you have a skewed distribution, I find that reporting the median with the interquartile range is a particularly good combination. The interquartile range is equivalent to the region between the 75th and 25th percentile (75 – 25 = 50% of the data). You can also use other percentiles to determine the spread of different proportions. For example, the range between the 97.5th percentile and the 2.5th percentile covers 95% of the data. The broader these ranges, the higher the variability in your dataset.

## Variance

Variance is the average squared difference of the values from the mean. Unlike the previous measures of variability, the variance includes all values in the calculation by comparing each value to the mean. To calculate this statistic, you calculate a set of squared differences between the data points and the mean, sum them, and then divide by the number of observations. Hence, it’s the average squared difference.

There are two formulas for the variance depending on whether you are calculating the variance for an entire population or using a sample to estimate the population variance. The equations are below, and then I work through an example in a table to help bring it to life.

### Population variance

The formula for the variance of an entire population is the following:

In the equation, σ^{2} is the population parameter for the variance, μ is the parameter for the population mean, and N is the number of data points, which should include the entire population.

### Sample variance

To use a sample to estimate the variance for a population, use the following formula. Using the previous equation with sample data tends to underestimate the variability. Because it’s usually impossible to measure an entire population, statisticians use the equation for sample variances much more frequently.

In the equation, s^{2} is the sample variance, and M is the sample mean. N-1 in the denominator corrects for the tendency of a sample to underestimate the population variance.

### Example of calculating the sample variance

I’ll work through an example using the formula for a sample on a dataset with 17 observations in the table below. The numbers in parentheses represent the corresponding table column number. The procedure involves taking each observation (1), subtracting the sample mean (2) to calculate the difference (3), and squaring that difference (4). Then, I sum the squared differences at the bottom of the table. Finally, I take the sum and divide by 16 because I’m using the sample variance equation with 17 observations (17 – 1 = 16). The variance for this dataset is 201.

Because the calculations use the squared differences, the variance is in squared units rather the original units of the data. While higher values of the variance indicate greater variability, there is no intuitive interpretation for specific values. Despite this limitation, various statistical tests use the variance in their calculations. For an example, read my post about the F-test and ANOVA.

While it is difficult to interpret the variance itself, the standard deviation resolves this problem!

## Standard Deviation

The standard deviation is the standard or typical difference between each data point and the mean. When the values in a dataset are grouped closer together, you have a smaller standard deviation. On the other hand, when the values are spread out more, the standard deviation is larger because the standard distance is greater.

Conveniently, the standard deviation uses the original units of the data, which makes interpretation easier. Consequently, the standard deviation is the most widely used measure of variability. For example, in the pizza delivery example, a standard deviation of 5 indicates that the typical delivery time is plus or minus 5 minutes from the mean. It’s often reported along with the mean: 20 minutes (s.d. 5).

The standard deviation is just the square root of the variance. Recall that the variance is in squared units. Hence, the square root returns the value to the natural units. The symbol for the standard deviation as a population parameter is σ while s represents it as a sample estimate. To calculate the standard deviation, calculate the variance as shown above, and then take the square root of it. Voila! You have the standard deviation!

In the variance section, we calculated a variance of 201 in the table.

Therefore, the standard deviation for that dataset is 14.177.

### The Empirical Rule for the Standard Deviation of a Normal Distribution

When you have normally distributed data, or approximately so, the standard deviation becomes particularly valuable. You can use it to determine the proportion of the values that fall within a specified number of standard deviations from the mean. For example, in a normal distribution, 68% of the values will fall within +/- 1 standard deviation from the mean. This property is part of the Empirical Rule. This rule describes the percentage of the data that fall within specific numbers of standard deviations from the mean for bell-shaped curves.

Mean +/- standard deviations | Percentage of data contained |

1 | 68% |

2 | 95% |

3 | 99.7% |

Let’s take another look at the pizza delivery example where we have a mean delivery time of 20 minutes and a standard deviation of 5 minutes. Using the Empirical Rule, we can use the mean and standard deviation to determine that 68% of the delivery times will fall between 15-25 minutes (20 +/- 5) and 95% will fall between 10-30 minutes (20 +/- 2*5).

## Which is Best—the Range, Interquartile Range, or Standard Deviation?

First off, you probably notice that I didn’t include the variance as one of the options in the heading above. That’s because the variance is in squared units and doesn’t provide an intuitive interpretation. So, I’ve crossed that off the list. Let’s go over the other three measures of variability.

When you are comparing samples that are the same size, consider using the range as the measure of variability. It’s a reasonably intuitive statistic. Just be aware that a single outlier can throw the range off. The range is particularly suitable for small samples when you don’t have enough data to calculate the other measures reliably, and the likelihood of obtaining an outlier is also lower.

When you have a skewed distribution, the median is a better measure of central tendency, and it makes sense to pair it with either the interquartile range or other percentile-based ranges because all of these statistics divide the dataset into groups with specific proportions.

For normally distributed data, or even data that aren’t terribly skewed, using the tried and true combination reporting the mean and the standard deviation is the way to go. This combination is by far the most common. You can still supplement this approach with percentile-base ranges as you need.

Raja Wajahat says

Hi sir, Thanks a lot for your blogs, they are really awesome, literally i have no words to explain you how helpful your blogs are.

I have a one question:

We are squaring the differences between mean and observation values because we get a resultant value(sum of all differences) zero if we don’t square them!

so what we can interpret from that? why this resultant value gives zero? and what we can interpret from that?

Jim Frost says

Hi Raja, thank you so much! I’m glad to hear that my blog has been helpful!

When you have a symmetric distribution, you’ll have an equal number of values above the mean as below the mean, and at the same distances. So, imagine you have one distribution where you have many observations that are say equally at +10 and -10. And, another distribution where many scores are near +1 and -1 equally. Clearly the first distribution is much more spread out. However, both distributions will sum to approximately zero, and have an average of approximately zero. So, summing the difference doesn’t allow you to differentiate between these distributions. You want the variability score for the first distribution to be larger to accurately reflect the fact that it is more spread out.

That’s why we use the squared differences, because you can add them up without the plusses and minuses cancelling each other out.

Nitin Gaur says

Hi Jim, First of all, thanks a lot for taking time out to share your Statistical knowledge with the world.

I have a question about Variance vs. Standard Deviation. Why do we even have Variance as a measure of dispersion when we know that it gives squared values which are big and we have to use standard deviation as the easy and more interpretative measure of dispersion anyways?

Jim Frost says

Hi Nitin,

That’s a very good question! While variance really doesn’t mean much to us humans, it turns out that it is importance in various statistical tests. ANOVA is, after all, the analysis of variance. The F-test assesses the ratio of variances to determine whether they are equal. Additionally, in linear models, we have the key notion of sums of squares, which is a similar concept as variances (being squared differences from the mean). So, it’s a useful measure behind the scenes for statistical tests. However, I can’t think of a real world situation where people would think that the actual value of the variance conveys anything meaningful.

jain says

thats too wonderful and lucid ! hope to clarify on many statistical confusions

Kai says

So beautifully explained! Students all around the world would really benefit from your teachings!

Jim Frost says

Thank you, Kai. That means so much to me!

Hiral Godhania says

Hello sir,

I am data science student. I have started following your articles . It gives me proper idea about statistics. It’s very beneficial to all non-statistical background people who really wants to learn proper statistics.

Thanks and regards,

Hiral Godhania

Jim Frost says

Hi Hiral, I’m so happy that you’ve found my articles to be helpful. Thank you so much for taking the time to write such a nice message!

Sundar says

First of all. Thank you for your time and help to spread stats in simple way

My question is I have a dataset with mean 2000 seconds and sd as 1950 seconds. What should I do, when I see such a big sd.

Jim Frost says

Hi Sundar, I’m glad you’ve found my blog to be helpful!

On to your question. Because all of your values are going to be greater than zero, it makes sense to compare the mean and standard deviation. If you could have negative values, then it doesn’t make sense.

You can say that your standard deviation is large compared to the mean. This indicates that while you have an estimate of the central tendency, you really can’t say for any given observation that it is likely to be near the mean. Your data have a lot of variability.

Additionally, I can virtually guarantee you that your data are skewed because you can’t have values less than zero seconds. You tend to get skewed data when you are near a limit. The limit here is zero seconds. And, how near you are to it is defined by the distance between the limit and the central tendency as measured by standard deviations, which is ~1 s.d. in your example. That’s close–so your data are skewed.

You can also think about it in terms of The Empirical Rule. For a bell-shaped curve, you’d expect 95% of the values to fall within +/- 2 standard deviations of the mean. However, that range include negative values. The values that you’d expect to fall below zero must actually fall greater than zero. Hence, your distribution is skewed.

You should graph your data! As for what else you should do, it depends on your goal.

I hope this helps!

Isaac says

Hi Jim,

I saw your post shared on social media by Carmen and noticed your at PSU! I’m a stats PhD student there. I found your blog interesting and intuitive and wanted to reach out to see if there are any resources you could share to help me improve my written and oral communication skills. I’m TAing for the first time this summer and want my class to be as interesting as your blog.

Jim Frost says

Hi Isaac! First, thanks so much for your kind words about my blog. That means a lot to me!

As for resources, I don’t have any about written and oral communications skills. I wish I had something helpful to point you towards. I’ve been explaining statistics for several decades and that’s helped me refine my approach.

For starters, the fact that you’re asking about it indicates that you’re already placing a value on clear communications, which is great! I always imagine someone trying to learn this material for the first time. Some of it is very complex. But, you can often find a simple way to explain it. When you find ways that work better at communicating a concept than other, make note and use that. Always refining along the way.

I’ll often go out of my way to read material that teaches statistics and look for things that are missing or not clear, and it gives me an angle on my own writing. A deep understanding of statistics really helps this process. When I read something where I know a certain aspect is particularly important but perhaps the import isn’t clearly conveyed in the material, it stands out to me. Then, when I write about it, I’ll focus on that aspect more. I really try to hone it into something that a novice can grasp. It’s a process and involves refinement, trying new approaches, and seeing how others approach it (for better and worse).

I’m not at PSU any more. It’s been quite awhile. Your class is lucky to have a TA who really values clear communications!

Thejas says

Hi,

Yes it helps a bit. Thank you for a detailed explanation.

Jim Frost says

You’re very welcome! Best of luck with your studies!

Thejas Iyer says

Hi,

I did not understand why we subtract ‘-1’ from the sample size in the formula for sample variance. Why ‘-1’ ?

Jim Frost says

Hi Thejas,

Statisticians have found that samples tend to underestimate the variance when you simply divide by n. It turns out that the data points in a sample are closer together than they are in the population. Dividing by n-1, rather than n, solves this problem.

Reducing the denominator counteracts the tendency for underestimation. By dividing by a smaller number, the end result is a bit larger.

For example, let’s say that you have sum of squared differences of 100 and a sample size of 10. Dividing by n (10), you obtain a variance of 100/10 = 10. However, if you divide by n-1 (9), you obtain 100/9 = 11.1. Statisticians have found that using n tends to underestimate the variance (a biased estimator in statistical speak). However, n-1 is unbiased.

I hope this helps!

Mahmudul Hasan says

Hello Sir,

Thank you so much for this post. Last year i had faced a huge problem in my paper due to my lack of attention towards variation measurement. This post has cleared some of my confusion. Thank you so much. Keep up the good work. Your explanation is easier to understand for a non statistics/math student like me. I will explore your other blog posts.

Jim Frost says

Hi Mahmudul, you’re very welcome! I’m very glad that my blog posts have helped you. And, thank you for taking the time to write such a nice comment. It means a lot to me!

Ashwini says

Great, thank you Jim

I’ll look forward for that article.

roy hampton says

As usual Jim, very clear explanation. Thank you!

Jim Frost says

Thanks, Roy!

Ashwini says

Jim,

Thank you so much for taking time to post these awesome articles. I have never seen statistics article as intuitive as you post. Thank you for taking time to do this.

I have a request, could you please post an article on samples sizes and details around when do we need to take type-I error (alpha) and type-II error (beta) into consideration for both mean and proportions case. Please make sure to include formulas as well.

Thank you,

Ashwini

Jim Frost says

Hi Ashwini, thanks so much! I strive to make the articles as intuitive as possible.

That sounds like a great idea for an article. For the time being, I’ve written about some of the issues you mention but not all together in one place. Take a look at the following:

Comparing Hypothesis Tests–where I cover various tests including those for the mean and proportions and a little about the differences in required samples sizes.

As for your question about the significance level (alpha), I write about significance levels and p-values. Whenever you perform a hypothesis test, you need to worry about alpha. Beta is important too, but harder to quantify. I need to write about beta specifically at some point!

In regards to your question about sample sizes, sometime in March I’ll write about power and sample size analyses.

I hope this helps for now!

Jerry Tuttle says

Hi. You did explain why the (n-1) denominator in sample variance is used. My question is why do many social science and education textbooks not use the (n-1) in descriptive statistic calculations?

Jim Frost says

Hi Jerry, the sample variance formula is used when using a sample to estimate a population. With descriptive statistics, the goal is not to estimate a population but only to describe the data in hand. In a sense, you’re treating that sample as a population and not using it to estimate one. At least that’s what I’m guessing the textbook’s rationale is based on your description. However, once analysts decide they want to estimate the population parameter, they should use the sample variance equation.

Khursheed Ahmad Ganaie says

Thnk sir ….

Love u 2 much

I wll b waiting fr ur next article. ..nd u hve e helps me a lot

Jim Frost says

Thanks you, Khursheed! I appreciate your kind words! And, I’m glad that they have been helpful for you!