A z-score measures the distance between a data point and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is above or below the mean. For example, a z-score of +2 indicates that the data point falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores, and I’ll use those terms interchangeably.
Standardizing the raw data by transforming them into z-scores provides the following benefits:
- Understand where a data point fits into a distribution.
- Compare observations between dissimilar variables.
- Identify outliers
- Calculate probabilities and percentiles using the standard normal distribution.
In this post, I cover all these uses for z-scores along with using z-tables, z-score calculators, and I show you how to do it all in Excel.
How to Find a Z-score
To calculate z-scores, take the raw measurements, subtract the mean, and divide by the standard deviation.
The formula for finding z-scores is the following:
X represents the data point of interest. Mu and sigma represent the mean and standard deviation for the population from which you drew your sample. Alternatively, use the sample mean and standard deviation when you do not know the population values.
Z-scores follow the distribution of the original data. Consequently, when the original data follow the normal distribution, so do the corresponding z-scores. Specifically, the z-scores follow the standard normal distribution, which has a mean of 0 and a standard deviation of 1. However, skewed data will produce z-scores that are similarly skewed.
In this post, I include graphs of z-scores using the standard normal distribution because they bring the concepts to life. Additionally, z-scores are most valuable when your data are normally distributed. However, be aware that when your data are nonnormal, the z-scores are also nonnormal, and the interpretations might not be valid.
Learn how Z-scores are an integral part of hypothesis testing with Z Tests!
Related posts: The Mean in Statistics and Standard Deviation
Using Z-scores to Understand How an Observation Fits into a Distribution
Z-scores help you understand where a specific observation falls within a distribution. Sometimes the raw test scores are not informative. For instance, SAT, ACT, and GRE scores do not have real-world interpretations on their own. An SAT score of 1340 is not fundamentally meaningful. Many psychological metrics are simply sums or averages of responses to a survey. For these cases, you need to know how an individual score compares to the entire distribution of scores. For example, if your standard score for any of these tests is a +2, that’s far above the mean. Now that’s helpful!
In other cases, the measurement units are meaningful, but you want to see the relative standing. For example, if a baby weighs five kilograms, you might wonder how her weight compares to others. For a one-month-old baby girl, that equates to a z-score of 0.74. She weighs more than average, but not by a full standard deviation. Now you understand where she fits in with her cohort!
In all these cases, you’re using standard scores to compare an observation to the average. You’re placing that value within an entire distribution.
When your data are normally distributed, you can graph z-scores on the standard normal distribution, which is a particular form of the normal distribution. The mean occurs at the peak with a z-score of zero. Above average z-scores are on the right half of the distribution and below average values are on the left. The graph below shows where the baby’s z-score of 0.74 fits in the population.
Analysts often convert standard scores to percentiles, which I cover later in this post.
Related post: Understanding the Normal Distribution
Using Standard Scores to Compare Different Types of Variables
Z-scores allow you to take data points drawn from populations with different means and standard deviations and place them on a common scale. This standard scale lets you compare observations for different types of variables that would otherwise be difficult. That’s why z-scores are also known as standard scores, and the process of transforming raw data to z-scores is called standardization. It lets you compare data points across variables that have different distributions.
In other words, you can compare apples to oranges. Isn’t statistics grand!
Imagine we literally need to compare apples to oranges. Specifically, we’ll compare their weights. We have a 110-gram apple and a 100-gram orange.
By comparing the raw values, it’s easy to see the apple weighs slightly more than the orange. However, let’s compare their z-scores. To do this, we need to know the means and standard deviations for the populations of apples and oranges. Assume that apples and oranges follow a normal distribution with the following properties:
Apples | Oranges | |
Mean weight grams | 100 | 140 |
Standard Deviation | 15 | 25 |
Let’s calculate the Z-scores for our apple and orange!
Apple = (110-100) / 15 = 0.667
Orange = (100-140) / 25 = -1.6
The apple’s positive z-score (0.667) signifies that it is heavier than the average apple. It’s not an extreme value, but it is above the mean. Conversely, the orange has a markedly negative Z-score (-1.6). It’s well below the mean weight for oranges. I’ve positioned these standard scores in the standard normal distribution below.
Our apple is a bit heavier than average, while the orange is puny! Using z-scores, we learned where each fruit falls within its distribution and how they compare.
Using Z-scores to Detect Outliers
Z-scores can quantify the unusualness of an observation. Raw data values that are far from the average are unusual and potential outliers. Consequently, we’re looking for high absolute z-scores.
The standard cutoff values for finding outliers are z-scores of +/-3 or more extreme. The standard normal distribution plot below displays the distribution of z-scores. Z-scores beyond the cutoff are so unusual you can hardly see the shading under the curve.
In populations that follow a normal distribution, Z-score values outside +/- 3 have a probability of 0.0027 (2 * 0.00135), approximately 1 in 370 observations. However, if your data don’t follow a normal distribution, this approach might not be correct.
For the example dataset, I display the raw data points and their z-scores. I circled an observation that is a potential outlier.
Caution: Z-scores can be misleading in small datasets because the maximum z-score is limited to (n−1) / √ n.
Samples with ten or fewer data points cannot have Z-scores that exceed the cutoff value of +/-3.
Additionally, an outlier’s presence throws off the z-scores because it inflates the mean and standard deviation. Notice how all z-scores are negative except the outlier’s value. If we calculated Z-scores without the outlier, they’d be different! If your dataset contains outliers, z-values appear to be less extreme (i.e., closer to zero).
Related post: Five Ways to Find Outliers
Using Z-tables to Calculate Probabilities and Percentiles
The standard normal distribution is a probability distribution. Consequently, if you have only the mean and standard deviation, and you can reasonably assume your data follow the normal distribution (at least approximately), you can easily use z-scores to calculate probabilities and percentiles. Typically, you’ll use online calculators, Excel, or statistical software for these calculations. We’ll get to that.
But first I’ll show you the old-fashioned way of doing that by hand using z-tables.
Let’s go back to the z-score for our apple (0.667) from before. We’ll use it to calculate its weight percentile. A percentile is the proportion of a population that falls below a value. Consequently, we need to find the area under the standard normal distribution curve corresponding to the range of z-scores less than 0.667. In the portion of the z-table below, I’ll use the standard score that is closest to our apple, which is 0.65.
Click here for a full Z-table and illustrated instructions for using it!
Related post: Understanding Probability Distributions and Probability Fundamentals
The Nuts and Bolts of Using Z-tables
Using these tables to calculate probabilities requires that you understand the properties of the normal distribution. While the tables provide an answer, it might not be the answer you need. However, by applying your knowledge of the normal distribution, you can find your answer!
For example, the table indicates that the area of the curve between -0.65 and +0.65 is 48.43%. Unfortunately, that’s not what we want to know. We need to find the area that is less than a z-score of 0.65.
We know that the two halves of the normal distribution are symmetrical, which helps us solve our problem. The z-table tells us that the area for the range from -0.65 and +0.65 is 48.43%. Because of the symmetry, the interval from 0 to +0.65 must be half of that: 48.43/2 = 24.215%. Additionally, the area for all scores less than zero is half (50%) of the distribution.
Therefore, the area for all z-scores up to 0.65 = 50% + 24.215% = 74.215%
That’s how you convert standard scores to percentiles. Our apple is at approximately the 74^{th} percentile.
If you want to calculate the probability for values falling between ranges of standard scores, calculate the percentile for each z-score and then subtract them.
For example, the probability of a z-score between 0.40 and 0.65 equals the difference between the percentiles for z = 0.65 and z = 0.40. We calculated the percentile for z = 0.65 above (74.215%). Using the same method, the percentile for z = 0.40 is 65.540%. Now we subtract the percentiles.
74.215% – 65.540% = 8.675%
The probability of an observation having a z-score between 0.40 and 0.65 is 8.675%.
Using only simple math and a z-table, you can easily find the probabilities that you need!
Alternatively, use the Empirical Rule to find probabilities for values in a normal distribution using ranges based on standard deviations.
Related post: Percentiles: Interpretations and Calculations
Using Z-score Calculators
In this day and age, you’ll probably use software and online z-score calculators for these probability calculations. Statistical software produced the probability distribution plot below. It displays the apple’s percentile with a graphical representation of the area under the standard normal distribution curve. Graphing is a great way to get an intuitive feel for what you’re calculating using standard scores.
The percentile is a tad different because we used the z-score of 0.65 in the table while the software uses the more precise value of 0.667.
Alternatively, you can enter z-scores into calculators, like this one.
If you enter the z-score value of 0.667, the left-tail p-value matches the shaded region in the probability plot above (0.7476). The right-tail value (0.2524) equals all values above our z-score, which is equivalent to the unshaded region in the graph. Unsurprisingly, those values add to 1 because you’re covering the entire distribution.
How to Find Z-scores in Excel
You can calculate z-scores and their probabilities in Excel. Let’s work through an example. We’ll return to our apple example and start by calculating standard scores for values in a dataset. I have all the data and formulas in this Excel file: Z-scores.
To find z-scores using Excel, you’ll need to either calculate the sample mean and standard deviation or use population reference values. In this example, I use the sample estimates. If you need to use population values supplied to you, enter them into the spreadsheet rather than calculating them.
My apple weight data are in cells A2:A21.
To calculate the mean and standard deviation, I use the following Excel functions:
- Mean: =AVERAGE(A2:A21)
- Standard deviation (sample): =STDEV.S(A2:A21)
Then, in column B, I use the following Excel formula to calculate the z-scores:
=(A2-A$24)/A$26
Cell A24 is where I have the mean, and A26 has the standard deviation. This formula takes a data value in column A, subtracts the mean, and then divides by the standard deviation.
I copied that formula for all rows from B2:B21 and Excel displays z-scores for all data points.
Using Excel to Calculate Probabilities for Standard Scores
Next, I use Excel’s NORM.S.DIST function to calculate the probabilities associated with z-scores. I work with the standard score from our apple example, 0.667.
The NORM.S.DIST (Z, Cumulative) function provides either the cumulative distribution function (TRUE) or probability mass function (FALSE) for the z-score you specify. The probability mass function is the height value in the z-table earlier in this post, and it corresponds to the y-axis value on a probability distribution plot for the z-score. We’ll use the cumulative function, which calculates the cumulative probability for all z-scores less than the value we specify.
In the function, we need to specify the z-value (0.667) and use the TRUE parameter to obtain the cumulative probability.
I’ll enter the following:
= NORM.S.DIST(0.667,TRUE)
Excel displays 0.747613933, matching the output in the probability distribution plot above.
If you want to find the probability for values greater than the z-score, remember that the values above and below it must sum to 1. Therefore, subtract from 1 to calculate probabilities for larger values:
= 1 – NORM.S.DIST(0.667,TRUE)
Excel displays 0.252386067.
Here’s what my spreadsheet looks like.
Ella says
Thanks for your work, your material make it so easy to understand. I recommand is book Regression analysis: An intuitive guide . It’s really helping.
Jim Frost says
Hi Ella, you’re very welcome and I’m so glad to hear that you’ve found it to be helpful!
Hussain says
Hi Jim. Can we use standardized factor scores (in principal component analysis) for regression analysis? If so, how should we interpret the results of regression analysis as standardized scores have mean = zero and SD = 1?
Thanks,
Jim Frost says
Hi, I think you might be mixing several things together.
Yes, you can use standardized scores independent variables in regression. Click the link to learn more about that, including how you interpret the results. However, that is different than principal components.
But you can also perform regression on principle components in a process known as partial least squares regression. Again, that’s a different procedure than using standardized scores. It’s a useful process when you have highly correlated predictors and/or more predictors than observations.
Vincent says
How to get (n−1) / √ n in the statement below? “Caution: Z-scores can be misleading in small datasets because the maximum z-score is limited to (n−1) / √ n.”
Jim Frost says
Hi Vincent,
In the outliers section of this post, click the link the to the Five Ways to Detect Outliers post. In it, I provide a reference for that.
Roberto says
Thanks! Really helpful article. Keep up the good work.
Peter says
Nice one
Gomathi P says
Fruitful content!