The standard deviation (SD) is a single number that summarizes the variability in a dataset. It represents the typical distance between each data point and the mean. Smaller values indicate that the data points cluster closer to the mean—the values in the dataset are relatively consistent. Conversely, higher values signify that the values spread out further from the mean. Data values become more dissimilar, and extreme values become more likely.
The standard deviation uses the original data units, simplifying the interpretation. For this reason, it is the most widely used measure of variability. Suppose a pizza restaurant measures its delivery time in minutes and has an SD of 5. In that case, the interpretation is that the typical delivery occurs 5 minutes before or after the mean time. Statisticians often report the standard deviation with the mean: 20 minutes (StDev 5). If another pizza restaurant has a standard deviation of 10 minutes, we know that its delivery service is more inconsistent. We’ll assess this example more closely later on!
In this post, learn why the standard deviation is essential, work through an interpretation example, and learn how to calculate it by hand.
Why is the Standard Deviation Important?
Understanding the standard deviation is crucial. While the mean identifies a central value in the distribution, it does not indicate how far the data points fall from the center. Higher SD values signify that more data points are further away from the mean. In other words, extreme values occur more frequently.
Variability is everywhere. When you order a favorite meal at a restaurant, it isn’t exactly the same each time. Your drive time to work varies every day. Parts from an assembly line might seem identical, but they have subtly different lengths and widths.
When variability is high, you can expect to experience extreme values more frequently, which can cause problems! If the restaurant meal differs noticeably from the usual, you might not like it at all. When your morning commute takes much longer than the average travel time, you will be late. And, manufactured parts that are too far out of spec won’t perform correctly.
Frequently, we feel distressed at the extremes more than the mean. Standard deviations help you understand the variability and provides vital information about the consistency of outcomes or lack thereof!
The standard deviation can also help you assess the sample’s heterogeneity.
Related post: What is the Mean in Statistics?
Example of Using the Standard Deviation
Suppose two pizza restaurants advertise a 20-minute average delivery time. We’re starving and both look equally good! However, we know the mean does not tell the entire story!
Let’s assess their standard deviations to choose the restaurant. Imagine we obtain their delivery time data. One restaurant has a SD of 10 minutes while the other has a value of 5. How does this affect deliveries?
The graphs below incorporate the SDs to answer this question. The restaurant with the larger standard deviation (10 minutes) has more variable delivery times and a broader distribution curve.
In these charts, we’ll consider a 30-minute wait or longer to be unacceptable—we’re hungry! The shaded areas represent the percentage of delivery times exceeding 30 minutes. Almost 16% of deliveries for the high variability pizza joint exceed 30 minutes compared to only 2% for the low variability restaurant. They both have a mean delivery time of 20 minutes, but I know where I’d place my order when I’m hungry!
After calculating the standard deviation, you can use various methods to evaluate it. The graphs above incorporate the SD into the normal probability distribution. Alternatively, you can use the Empirical Rule or Chebyshev’s Theorem to assess how the standard deviation relates to the distribution of values. Alternatively, you can calculate the coefficient of variation, which uses both the SD and the mean.
I always recommend graphing your data in a histogram so you can see the variability. These charts really bring the SD to life!
Standard Deviation Formula
The formula for the standard deviation is below.
- s = the sample StDev
- N = number of observations
- Xi = value of each observation
- x̄ = the sample mean
Statisticians refer to the numerator portion of the standard deviation formula as the sum of squares.
Technically, this formula is for the sample standard deviation. The population version uses N in the denominator. Read my post, Measures of Variability, to learn about the differences between the population and sample varieties.
Step-by-Step Example of Calculating the Standard Deviation
Calculating the standard deviation involves the following steps. The numbers correspond to the column numbers.
The calculations take each observation (1), subtract the sample mean (2) to calculate the difference (3), and square that difference (4).
Then, at the bottom, sum the column of squared differences and divide it by 16 (17 – 1 = 16), which equals 201. Statisticians call this value the variance.
Calculate the square root of the variance to derive the SD.
Learn how you can use the range of a dataset to estimate the standard deviation using the range rule of thumb.
The standard deviation is similar to the mean absolute deviation. Both statistics use the original data units and they compare the data points to the mean to assess variability. However, there are differences. To learn more, read my post about the mean absolute deviation (MAD).
People frequently mix up standard deviations vs. standard errors. Both evaluate variability, but they have vastly different purposes. To learn more, read my post, The Standard Error of the Mean.
Ann says
Thank you so much this helped with my IA
cha says
when did you write this sir?
Jim Frost says
Hi!
When citing online resources, you typically use an “Accessed” date rather than a publication date because online content can change over time. For more information, read Purdue University’s Citing Electronic Resources.
Marius Iacomi says
Hi Jim,
I am so happy I have dared to send the second post. Now it is clear for me. I have also realized that I need to add much, much more knowledge to the minium one I have to fully understand the message behind the data. That was the main question running throw my head after finishing the 6 Sigma Yellow Belt. I understand how to collect the date, but how to use them to get the correct message out of them. Thank you very much again for taking the time to answer me! Have a nice day!
Jim Frost says
You’re very welcome! So glad to help!
Marius Iacomi says
Hi Jim, My name is Marius Iacomi. Statistics is absolutely new for me. Just for my understanding, in the example above should the graph of the restaurant with a 5 min standard deviation not containing 25 min on it in place of 30 min?
Jim Frost says
Hi Marius,
The purpose of both graphs is to show the effect on delivery times when the same mean (20 minutes) but different standard deviations (5 vs. 10 minutes). Both graphs display 30 minutes because the example defines a 30 minute wait as being unacceptable.
So, consider this example to be a word problem and you need to find the probability of waiting 30 minutes or longer for the two different distributions. That’s why they both show 30 minutes. The difference in results illustrates the effects of the larger standard deviation for the same defined time period (≥ 30 minutes).
I hope that clarifies it! 🙂
Marius Iacomi says
Hi Jim,
Thank you very much for your very fast reply. I hope I am not going to annoy you with my understanding. What I think, it is not clear for me, is this: why can the waiting time be above 30 minutes if the standard deviation is just 5 for the faster restaurant, and the mean is 20 for both of them. I was aspecting that the faster restaurant always delivers in maximum 25 minutes. Otherwise, what is the point of a 5-min less standard deviation compared to 10 minutes if the mean is 20 minutes. I have followed a 6 sigma yellow belt training, and I am somehow overwhelmed about the statistic needed for the data interpretation. In the end, the better restaurant was not faster at all. If what I say makes no sense, I am in trouble 🙂
Jim Frost says
The graphs show that the shorter standard deviation does in fact reduce delivery times. Specifically, the probability of waiting for more than 30 minutes drops from 0.1587 to 0.02275 thanks to the lower SD. To see that, take a closer look at both graphs and compare the probabilities. That’s a pretty sharp drop.
With the worse restaurant, you’ll wait at least 30 minutes about 1 out of every 6 orders while for the better restaurant it is only 2 out 100 orders.
However, you can’t say that the restaurante with the shorter standard deviation will always deliver with a maximum of 25 minutes. That is just one SD above the mean (Z-score = 1). Hence, about 84% of the deliveries will be less than 25 minutes, but yet 16% will be greater.
So, yes, one restaurant is better than the other! Or at least more consistent in its delivery times. Also, keep in mind that it’s not accurate to say one restaurant is faster because they both have the same mean delivery time. Again, one is just more consistent than the other because its standard deviation is smaller.
Anders Kallner says
Dear Jim, The population standard deviation is underestiamated and thus biased if the Bessel correction is not applied to small samples. However, navigating the Internet indicates that the population variance may not be biased if estimated from the common formula based on the sum of squares and number of observations. The bias in the SD is then blaimed the Jansen’s inequality and nonlinearity of calculalting the square root of the variance. This does not make sense to me. Also, the “rule of thumb” that recommends reducing the denumerator with SQRT(2) or appr 1.5 rather than 1 gives me gray hair! Can you explain?
Jim Frost says
Hi Anders,
There are two sources of bias in this scenario.
A sample tends to underestimate the variance in the entire population. Variance using n in the denominator is a biased estimator for this reason. Bessel’s correction of using N – 1 produces an unbiased estimate of the variance.
Sounds like the problem is solved, but not quite.
The process of taking the square root of the variance to find the standard deviation introduces some bias. That gets into the nonlinear transformation of the data. So, an unbiased variance estimate (using Bessel) can lead to a biased standard deviation estimate (due to taking the square root).
Fortunately, the bias in the SD is small compared to the bias that Bessel’s correction fixes. It is often considered negligible in practice and gets smaller with larger samples.
Unbiasing the standard deviation is possible but involves more complex adjustments that are not as straightforward or universally applicable as Bessel’s correction for variance. The rule of thumb you mention is not standard as far as I’m aware.
The standard practice is to use Bessel’s correction for an unbiased variance estimate and take the square root of that for the standard deviation estimate.
I hope that helps clarify it!
Anne Nelson says
Thank you, Jim. I hadn’t thought of percentiles. Your reply was really helpful. Thanks again, Anne.
Anne Nelson says
I have used fairly basic stats in the past so am used to variability etc. However my medical consultant has told me that so far I have survived more than two SD more than would be expected. I assume by expected he is using the mean. So what does the actually signify for me as a single subject? I’d appreciate any guidance.
Jim Frost says
Hi Anne,
If you assume that survival times are normally distributed, you can use the standard deviations to calculate your survival time percentile. If you survived 2 standard deviations more than average, your Z-score is 2. Using any online Z-score calculator, you can find that you’ve survived longer than 97.7% of those with the condition. Equivalently, you’re at the 97.7th percentile. Congratulations! May you continue to increase your survival Z-score! 🙂
Of course, I don’t actually know that the survival times follow a normal distribution. If they don’t, that value will be off somewhat. How much depends on the degree of skewness. But you’ve definitely survived much longer than average.
Sonya says
I’m taking a stats class at McGill and I have no idea what the professor is talking about. I understand everything you write about! Thank goodness you’re here and I just bought your book!!
Jim Frost says
Hi Sonya,
Thanks so much for your kind words. They made my day because my goal is to make statistics understandable. I’m so glad my website and now books are helpful! 🙂
Towongo Godfrey says
For the first time I grasped the concept of standard deviation. Thank a lot Jim
Jim Frost says
Thanks so much, Towongo. So glad I could help!
Welbert Ockhuizen says
Very clear and easy to understand. Thank you.
Sierra says
A helpful article. It was well-explained and easy to understand. Thank you for this!
archana says
such a clean and easy explanation .thank you Jim.
Aarinola Oladayo says
Beautiful, thank you Jim