• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Z-score: Definition, Formula, and Uses

By Jim Frost 9 Comments

A z-score measures the distance between a data point and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is above or below the mean. For example, a z-score of +2 indicates that the data point falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores, and I’ll use those terms interchangeably.

Standardizing the raw data by transforming them into z-scores provides the following benefits:

  • Understand where a data point fits into a distribution.
  • Compare observations between dissimilar variables.
  • Identify outliers
  • Calculate probabilities and percentiles using the standard normal distribution.

In this post, I cover all these uses for z-scores along with using z-tables, z-score calculators, and I show you how to do it all in Excel.

How to Find a Z-score

To calculate z-scores, take the raw measurements, subtract the mean, and divide by the standard deviation.

The formula for finding z-scores is the following:

Z = {\displaystyle \frac {\text {X} - \mu}{\sigma}}

X represents the data point of interest. Mu and sigma represent the mean and standard deviation for the population from which you drew your sample. Alternatively, use the sample mean and standard deviation when you do not know the population values.

Z-scores follow the distribution of the original data. Consequently, when the original data follow the normal distribution, so do the corresponding z-scores. Specifically, the z-scores follow the standard normal distribution, which has a mean of 0 and a standard deviation of 1. However, skewed data will produce z-scores that are similarly skewed.

In this post, I include graphs of z-scores using the standard normal distribution because they bring the concepts to life. Additionally, z-scores are most valuable when your data are normally distributed. However, be aware that when your data are nonnormal, the z-scores are also nonnormal, and the interpretations might not be valid.

Learn how Z-scores are an integral part of hypothesis testing with Z Tests!

Related posts: The Mean in Statistics and Standard Deviation

Using Z-scores to Understand How an Observation Fits into a Distribution

Z-scores help you understand where a specific observation falls within a distribution. Sometimes the raw test scores are not informative. For instance, SAT, ACT, and GRE scores do not have real-world interpretations on their own. An SAT score of 1340 is not fundamentally meaningful. Many psychological metrics are simply sums or averages of responses to a survey. For these cases, you need to know how an individual score compares to the entire distribution of scores. For example, if your standard score for any of these tests is a +2, that’s far above the mean. Now that’s helpful!

In other cases, the measurement units are meaningful, but you want to see the relative standing. For example, if a baby weighs five kilograms, you might wonder how her weight compares to others. For a one-month-old baby girl, that equates to a z-score of 0.74. She weighs more than average, but not by a full standard deviation. Now you understand where she fits in with her cohort!

In all these cases, you’re using standard scores to compare an observation to the average. You’re placing that value within an entire distribution.

When your data are normally distributed, you can graph z-scores on the standard normal distribution, which is a particular form of the normal distribution. The mean occurs at the peak with a z-score of zero. Above average z-scores are on the right half of the distribution and below average values are on the left. The graph below shows where the baby’s z-score of 0.74 fits in the population.

image of the standard normal distribution.

Analysts often convert standard scores to percentiles, which I cover later in this post.

Related post: Understanding the Normal Distribution

Using Standard Scores to Compare Different Types of Variables

Z-scores allow you to take data points drawn from populations with different means and standard deviations and place them on a common scale. This standard scale lets you compare observations for different types of variables that would otherwise be difficult. That’s why z-scores are also known as standard scores, and the process of transforming raw data to z-scores is called standardization. It lets you compare data points across variables that have different distributions.

In other words, you can compare apples to oranges. Isn’t statistics grand!

Imagine we literally need to compare apples to oranges. Specifically, we’ll compare their weights. We have a 110-gram apple and a 100-gram orange.

By comparing the raw values, it’s easy to see the apple weighs slightly more than the orange. However, let’s compare their z-scores. To do this, we need to know the means and standard deviations for the populations of apples and oranges. Assume that apples and oranges follow a normal distribution with the following properties:

Apples Oranges
Mean weight grams 100 140
Standard Deviation 15 25

Let’s calculate the Z-scores for our apple and orange!

Apple = (110-100) / 15 = 0.667

Orange = (100-140) / 25 = -1.6

The apple’s positive z-score (0.667) signifies that it is heavier than the average apple. It’s not an extreme value, but it is above the mean. Conversely, the orange has a markedly negative Z-score (-1.6). It’s well below the mean weight for oranges. I’ve positioned these standard scores in the standard normal distribution below.

Graph of a standard normal distribution that compares apples to oranges using a Z-score.

Our apple is a bit heavier than average, while the orange is puny! Using z-scores, we learned where each fruit falls within its distribution and how they compare.

Using Z-scores to Detect Outliers

Z-scores can quantify the unusualness of an observation. Raw data values that are far from the average are unusual and potential outliers. Consequently, we’re looking for high absolute z-scores.

The standard cutoff values for finding outliers are z-scores of +/-3 or more extreme. The standard normal distribution plot below displays the distribution of z-scores. Z-scores beyond the cutoff are so unusual you can hardly see the shading under the curve.

Distribution of Z-scores for finding outliers.

In populations that follow a normal distribution, Z-score values outside +/- 3 have a probability of 0.0027 (2 * 0.00135), approximately 1 in 370 observations. However, if your data don’t follow a normal distribution, this approach might not be correct.

For the example dataset, I display the raw data points and their z-scores. I circled an observation that is a potential outlier.

Datasheet that displays Z-scores to identify outliers.

Caution: Z-scores can be misleading in small datasets because the maximum z-score is limited to (n−1) / √ n.

Samples with ten or fewer data points cannot have Z-scores that exceed the cutoff value of +/-3.

Additionally, an outlier’s presence throws off the z-scores because it inflates the mean and standard deviation. Notice how all z-scores are negative except the outlier’s value. If we calculated Z-scores without the outlier, they’d be different! If your dataset contains outliers, z-values appear to be less extreme (i.e., closer to zero).

Related post: Five Ways to Find Outliers

Using Z-tables to Calculate Probabilities and Percentiles

The standard normal distribution is a probability distribution. Consequently, if you have only the mean and standard deviation, and you can reasonably assume your data follow the normal distribution (at least approximately), you can easily use z-scores to calculate probabilities and percentiles. Typically, you’ll use online calculators, Excel, or statistical software for these calculations. We’ll get to that.

But first I’ll show you the old-fashioned way of doing that by hand using z-tables.

Let’s go back to the z-score for our apple (0.667) from before. We’ll use it to calculate its weight percentile. A percentile is the proportion of a population that falls below a value. Consequently, we need to find the area under the standard normal distribution curve corresponding to the range of z-scores less than 0.667. In the portion of the z-table below, I’ll use the standard score that is closest to our apple, which is 0.65.

Photograph shows a portion of a table of standard scores (Z-scores).

Click here for a full Z-table and illustrated instructions for using it!

Related post: Understanding Probability Distributions and Probability Fundamentals

The Nuts and Bolts of Using Z-tables

Using these tables to calculate probabilities requires that you understand the properties of the normal distribution. While the tables provide an answer, it might not be the answer you need. However, by applying your knowledge of the normal distribution, you can find your answer!

For example, the table indicates that the area of the curve between -0.65 and +0.65 is 48.43%. Unfortunately, that’s not what we want to know. We need to find the area that is less than a z-score of 0.65.

We know that the two halves of the normal distribution are symmetrical, which helps us solve our problem. The z-table tells us that the area for the range from -0.65 and +0.65 is 48.43%. Because of the symmetry, the interval from 0 to +0.65 must be half of that: 48.43/2 = 24.215%. Additionally, the area for all scores less than zero is half (50%) of the distribution.

Therefore, the area for all z-scores up to 0.65 = 50% + 24.215% = 74.215%

That’s how you convert standard scores to percentiles. Our apple is at approximately the 74th percentile.

If you want to calculate the probability for values falling between ranges of standard scores, calculate the percentile for each z-score and then subtract them.

For example, the probability of a z-score between 0.40 and 0.65 equals the difference between the percentiles for z = 0.65 and z = 0.40. We calculated the percentile for z = 0.65 above (74.215%). Using the same method, the percentile for z = 0.40 is 65.540%. Now we subtract the percentiles.

74.215% – 65.540% = 8.675%

The probability of an observation having a z-score between 0.40 and 0.65 is 8.675%.

Using only simple math and a z-table, you can easily find the probabilities that you need!

Alternatively, use the Empirical Rule to find probabilities for values in a normal distribution using ranges based on standard deviations.

Related post: Percentiles: Interpretations and Calculations

Using Z-score Calculators

In this day and age, you’ll probably use software and online z-score calculators for these probability calculations. Statistical software produced the probability distribution plot below. It displays the apple’s percentile with a graphical representation of the area under the standard normal distribution curve. Graphing is a great way to get an intuitive feel for what you’re calculating using standard scores.

The percentile is a tad different because we used the z-score of 0.65 in the table while the software uses the more precise value of 0.667.

A probability distribution plot that graphically displays a percentile using a Z-score.

Alternatively, you can enter z-scores into calculators, like this one.

If you enter the z-score value of 0.667, the left-tail p-value matches the shaded region in the probability plot above (0.7476). The right-tail value (0.2524) equals all values above our z-score, which is equivalent to the unshaded region in the graph. Unsurprisingly, those values add to 1 because you’re covering the entire distribution.

How to Find Z-scores in Excel

You can calculate z-scores and their probabilities in Excel. Let’s work through an example. We’ll return to our apple example and start by calculating standard scores for values in a dataset. I have all the data and formulas in this Excel file: Z-scores.

To find z-scores using Excel, you’ll need to either calculate the sample mean and standard deviation or use population reference values. In this example, I use the sample estimates. If you need to use population values supplied to you, enter them into the spreadsheet rather than calculating them.

My apple weight data are in cells A2:A21.

To calculate the mean and standard deviation, I use the following Excel functions:

  • Mean: =AVERAGE(A2:A21)
  • Standard deviation (sample): =STDEV.S(A2:A21)

Then, in column B, I use the following Excel formula to calculate the z-scores:

=(A2-A$24)/A$26

Cell A24 is where I have the mean, and A26 has the standard deviation. This formula takes a data value in column A, subtracts the mean, and then divides by the standard deviation.

I copied that formula for all rows from B2:B21 and Excel displays z-scores for all data points.

Using Excel to Calculate Probabilities for Standard Scores

Next, I use Excel’s NORM.S.DIST function to calculate the probabilities associated with z-scores. I work with the standard score from our apple example, 0.667.

The NORM.S.DIST (Z, Cumulative) function provides either the cumulative distribution function (TRUE) or probability mass function (FALSE) for the z-score you specify. The probability mass function is the height value in the z-table earlier in this post, and it corresponds to the y-axis value on a probability distribution plot for the z-score. We’ll use the cumulative function, which calculates the cumulative probability for all z-scores less than the value we specify.

In the function, we need to specify the z-value (0.667) and use the TRUE parameter to obtain the cumulative probability.

I’ll enter the following:

= NORM.S.DIST(0.667,TRUE)

Excel displays 0.747613933, matching the output in the probability distribution plot above.

If you want to find the probability for values greater than the z-score, remember that the values above and below it must sum to 1. Therefore, subtract from 1 to calculate probabilities for larger values:

= 1 – NORM.S.DIST(0.667,TRUE)

Excel displays 0.252386067.

Here’s what my spreadsheet looks like.

Excel spreadsheet that calculates z-scores and uses them to find probabilities.

Share this:

  • Tweet

Related

Filed Under: Basics Tagged With: conceptual, distributions, Excel, probability

Reader Interactions

Comments

  1. Ella says

    November 4, 2022 at 12:21 am

    Thanks for your work, your material make it so easy to understand. I recommand is book Regression analysis: An intuitive guide . It’s really helping.

    Reply
    • Jim Frost says

      November 4, 2022 at 12:50 am

      Hi Ella, you’re very welcome and I’m so glad to hear that you’ve found it to be helpful!

      Reply
  2. Hussain says

    June 2, 2022 at 12:05 am

    Hi Jim. Can we use standardized factor scores (in principal component analysis) for regression analysis? If so, how should we interpret the results of regression analysis as standardized scores have mean = zero and SD = 1?
    Thanks,

    Reply
    • Jim Frost says

      June 2, 2022 at 10:59 pm

      Hi, I think you might be mixing several things together.

      Yes, you can use standardized scores independent variables in regression. Click the link to learn more about that, including how you interpret the results. However, that is different than principal components.

      But you can also perform regression on principle components in a process known as partial least squares regression. Again, that’s a different procedure than using standardized scores. It’s a useful process when you have highly correlated predictors and/or more predictors than observations.

      Reply
  3. Vincent says

    May 12, 2022 at 4:19 pm

    How to get (n−1) / √ n in the statement below? “Caution: Z-scores can be misleading in small datasets because the maximum z-score is limited to (n−1) / √ n.”

    Reply
    • Jim Frost says

      May 12, 2022 at 5:11 pm

      Hi Vincent,

      In the outliers section of this post, click the link the to the Five Ways to Detect Outliers post. In it, I provide a reference for that.

      Reply
  4. Roberto says

    April 2, 2022 at 12:57 pm

    Thanks! Really helpful article. Keep up the good work.

    Reply
  5. Peter says

    September 13, 2021 at 7:39 am

    Nice one

    Reply
  6. Gomathi P says

    September 13, 2021 at 1:55 am

    Fruitful content!

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • How to Find the P value: Process and Calculations
    • How to do t-Tests in Excel
    • Mean, Median, and Mode: Measures of Central Tendency
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Z-table
    • One-Tailed and Two-Tailed Hypothesis Tests Explained
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Interpreting Correlation Coefficients

    Recent Posts

    • Fibonacci Sequence: Formula & Uses
    • Undercoverage Bias: Definition & Examples
    • Matched Pairs Design: Uses & Examples
    • Nonresponse Bias: Definition & Reducing
    • Cumulative Distribution Function (CDF): Uses, Graphs & vs PDF
    • Slope Intercept Form of Linear Equations: A Guide

    Recent Comments

    • Steve on Survivorship Bias: Definition, Examples & Avoiding
    • Jim Frost on Using Post Hoc Tests with ANOVA
    • Jim Frost on Statistical Significance: Definition & Meaning
    • Gary on Statistical Significance: Definition & Meaning
    • Gregory C. Alexander on Use Control Charts with Hypothesis Tests

    Copyright © 2023 · Jim Frost · Privacy Policy