The standard deviation (SD) is a single number that summarizes the variability in a dataset. It represents the typical distance between each data point and the mean. Smaller values indicate that the data points cluster closer to the mean—the values in the dataset are relatively consistent. Conversely, higher values signify that the values spread out further from the mean. Data values become more dissimilar, and extreme values become more likely.
The standard deviation uses the original data units, simplifying the interpretation. For this reason, it is the most widely used measure of variability. Suppose a pizza restaurant measures its delivery time in minutes and has an SD of 5. In that case, the interpretation is that the typical delivery occurs 5 minutes before or after the mean time. Statisticians often report the standard deviation with the mean: 20 minutes (StDev 5). If another pizza restaurant has a standard deviation of 10 minutes, we know that its delivery service is more inconsistent. We’ll assess this example more closely later on!
In this post, learn why the standard deviation is essential, work through an interpretation example, and learn how to calculate it by hand.
Why is the Standard Deviation Important?
Understanding the standard deviation is crucial. While the mean identifies a central value in the distribution, it does not indicate how far the data points fall from the center. Higher SD values signify that more data points are further away from the mean. In other words, extreme values occur more frequently.
Variability is everywhere. When you order a favorite meal at a restaurant, it isn’t exactly the same each time. Your drive time to work varies every day. Parts from an assembly line might seem identical, but they have subtly different lengths and widths.
When variability is high, you can expect to experience extreme values more frequently, which can cause problems! If the restaurant meal differs noticeably from the usual, you might not like it at all. When your morning commute takes much longer than the average travel time, you will be late. And, manufactured parts that are too far out of spec won’t perform correctly.
Frequently, we feel distressed at the extremes more than the mean. Standard deviations help you understand the variability and provides vital information about the consistency of outcomes or lack thereof!
The standard deviation can also help you assess the sample’s heterogeneity.
Related post: What is the Mean in Statistics?
Example of Using the Standard Deviation
Suppose two pizza restaurants advertise a 20-minute average delivery time. We’re starving and both look equally good! However, we know the mean does not tell the entire story!
Let’s assess their standard deviations to choose the restaurant. Imagine we obtain their delivery time data. One restaurant has a SD of 10 minutes while the other has a value of 5. How does this affect deliveries?
The graphs below incorporate the SDs to answer this question. The restaurant with the larger standard deviation (10 minutes) has more variable delivery times and a broader distribution curve.
In these charts, we’ll consider a 30-minute wait or longer to be unacceptable—we’re hungry! The shaded areas represent the percentage of delivery times exceeding 30 minutes. Almost 16% of deliveries for the high variability pizza joint exceed 30 minutes compared to only 2% for the low variability restaurant. They both have a mean delivery time of 20 minutes, but I know where I’d place my order when I’m hungry!
After calculating the standard deviation, you can use various methods to evaluate it. The graphs above incorporate the SD into the normal probability distribution. Alternatively, you can use the Empirical Rule or Chebyshev’s Theorem to assess how the standard deviation relates to the distribution of values. Alternatively, you can calculate the coefficient of variation, which uses both the SD and the mean.
I always recommend graphing your data in a histogram so you can see the variability. These charts really bring the SD to life!
Standard Deviation Formula
The formula for the standard deviation is below.
- s = the sample StDev
- N = number of observations
- Xi = value of each observation
- x̄ = the sample mean
Statisticians refer to the numerator portion of the standard deviation formula as the sum of squares.
Technically, this formula is for the sample standard deviation. The population version uses N in the denominator. Read my post, Measures of Variability, to learn about the differences between the population and sample varieties.
Step-by-Step Example of Calculating the Standard Deviation
Calculating the standard deviation involves the following steps. The numbers correspond to the column numbers.
The calculations take each observation (1), subtract the sample mean (2) to calculate the difference (3), and square that difference (4).
Then, at the bottom, sum the column of squared differences and divide it by 16 (17 – 1 = 16), which equals 201. Statisticians call this value the variance.
Calculate the square root of the variance to derive the SD.
The standard deviation is similar to the mean absolute deviation. Both statistics use the original data units and they compare the data points to the mean to assess variability. However, there are differences. To learn more, read my post about the mean absolute deviation (MAD).