• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Assessing Normality: Histograms vs. Normal Probability Plots

By Jim Frost 7 Comments

Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use. Normal probability plots are also known as quantile-quantile plots, or Q-Q Plots for short!

Using Histograms to Graph Normal Distributions

First, let’s look at what you expect to see on a histogram when your data follow a normal distribution.

Histogram that displays data that follow the bell-shaped curve of the normal distribution.

I’ve added the fitted distribution, and it sure seems to fit the data well.

So, what’s wrong using a histogram to assess normality? Histograms are particularly problematic when you have a small sample size because its appearance depends on the number of data points and the number of bars. When you have less than approximately 20 data points, the bars on the histogram don’t adequately display the distribution.

The histogram above uses 100 data points. However, the histograms below use datasets with only 15 observations in each. Can you tell which datasets follow the normal distribution? For comparison, I’ve included the normal distribution curve that provides the best fit for each dataset. Download the CSV dataset to check them yourself: normal_data_examples. The Cs in the graphs below correspond to the columns in the worksheet.

Histogram that displays data that appears to be nonnormal.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.

Surprise! All of these datasets follow the normal distribution, but you can’t tell that from the histograms.

Related posts: Understanding the Normal Distribution and Using Histograms to Understand Your Data

Using Normal Probability Q-Q Plots to Graph Normal Distributions

Instead, graph these distributions using normal probability Q-Q plots, which are also known as normal plots. These plots are simple to use. All you need to do is visually assess whether the data points follow the straight line. If the points track the straight line, your data follow the normal distribution. It’s very straightforward!

I’ll graph the same datasets in the histograms above but use normal probability plots instead. For this type of graph, the best approach is the “fat pencil test.” If you place an imaginary fat pencil over the straight distribution fit line, does it cover the data points? If so, your data are normally distributed. In other words, the data points don’t have to fall right on the line but generally need to follow it.

Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.

These normal probability Q-Q plots show that all the datasets follow the normal distribution. This type of graph is also a great way to determine whether residuals from regression analysis are normally distributed.

The graph below shows how nonnormal data can appear in a normal plot. Notice the systematic departures from the straight line.

Normal plot that displays nonnormal data.

Here are a few technical notes on how statistical software creates Q-Q plots. Your software calculates the cumulative distribution function for your dataset and then displays each observation’s value by its estimated cumulative probability. The graph transforms the X and Y axes so that the distribution line is straight. If your data follow the distribution, they will follow that line.

Normal Probability Q-Q Plots can be Better Than Normality Tests

You can also use normality tests to determine whether your data follow a normal distribution. However, be aware that normality tests are like all other hypothesis tests. As you increase the sample size, their ability to detect small differences increases. With a large enough sample size, these tests can detect minuscule departures from the normal distribution that are meaningless. In this scenario, you can end up with a test that rejects the notion that the data are normally distributed even when they do follow the normal distribution.

For example, the normal probability Q-Q plot below displays a dataset with 5000 observations along with the normality test results. The p-value for the test is 0.010, which indicates that the data do not follow the normal distribution. However, the points on the graph clearly follow the distribution fit line. These data follow the normal distribution despite the test results. This is a rare case where statisticians will say you can use the graph over the hypothesis test!

Normal probability plot that displays a dataset with 5000 observations that follow the normal distribution.

In this post, I’ve highlighted using normal probability Q-Q plots with small and large datasets. However, I prefer using them over histograms for datasets of all sizes. For my eyes at least, it is just easier to determine whether the data points follow a straight line than comparing bars on a histogram to a bell-shaped curve.

This post has been about using Q-Q plots to assess normality. However, you can use these plots to evaluate other distributions. To learn more about this, read my post: How to Identify the Distribution of Your Data.

Share this:

  • Tweet

Related

Filed Under: Basics Tagged With: distributions, graphs

Reader Interactions

Comments

  1. Thomas says

    December 4, 2022 at 10:12 am

    Dear Jim,

    Thanks for very useful information on your home page!
    Some questions below if you have the time.
    If you sample from a normal distribution (known from literature or previous experience from larger sample sizes) but your sample does not by chance is not normal.
    No outliers by the Grubbs test. Can you apply the standard statistics?
    Another topic: Often percentage data is used e.g analytical chemistry (% main peak) where you have closed scale 0-100%. Typically you have between 80-100% is it appropriate using the normal distribution?
    Further, another topic is the pH scale. If you work in a narrow range lets say pH 7.0 to 8.0.
    Can the data be regarded as coming from a normal distribution?

    Reply
    • Jim Frost says

      December 4, 2022 at 10:54 pm

      Hi Thomas,

      If your sample is non-normal but you know for a fact the population is normal, I’d give a very cautious OK for proceeding. Particularly if you have a sample size of at least 30 because normality isn’t crucial for larger samples anyway. However, there are some major caveats to consider.

      If you know that your population follows a normal distribution, but your sample does not, particularly if your sample is strongly non-normal, then you know that your sample does not represent the population in at least some characteristics. That should give you pause if you’re using your sample to draw conclusions about the population. Is there some reason why the sample doesn’t look like the population? It could be random sampling error that occurred by chance. Or perhaps there was some error with your sampling, experimental, and/or measurement process?

      So, theoretically you might be ok proceeding, but you really should understand why your sample doesn’t look like the population. That’s a red warning flag that something might be amiss. It depends how different the sample looks from the population. If it’s only slightly non-normal, it might not be a big deal. But if it’s strongly non-normal and it should be normal, it becomes a bigger concern.

      As for working in narrow ranges, you’ll need to understand empirically what the data look like in those ranges. Technically, the normal distribution has no upper and lower limits. So, if your data have limits, there’s at least a small degree of non-normality right there. But several conditions can make it non-normal enough to be a problem. If those ranges are artificially constrained (e.g., you remove all values outside the range), chances are the data don’t follow a normal distribution. The more constrained they are, the more of a problem it becomes. Additionally, if the mean is closer to one of the range than the other, the distribution likely skews away in the other direction. For example, if the mean is near 95%, the data are probably left-skewed.

      You should examine your data to assess the distribution directly. If they’re not normally distributed, you can either use a non-parametric method or simply collect a large enough sample size so the central limit theorem kicks in and normality isn’t an issue. If you have already, you should read my following posts because they’ll go more in-depth into the issues I talk about above.

      Identifying the Distribution of Your Data
      Nonparametric vs Parametric Tests
      Central Limit Theorem

      I hope that helps!

      Reply
  2. Jereesh K Elias says

    February 8, 2022 at 1:35 am

    I have a doubt regarding choosing a parametric and non-parametric test based on normality of data. I have been taught that if the variables are following a non-normal distribution, we should go for non-parametric test. My doubt is, in case our independent variable is normally distributed and dependent variable is non-normally distributed, broadly which test should we use? parametric or non-parametric?

    Reply
  3. Madhu says

    January 18, 2021 at 2:47 pm

    Great post!! Hope you continue the great work!

    Reply
  4. Suruchi says

    December 1, 2020 at 10:49 am

    Hello Jim
    Your posts are very helpful to me.
    Please guide me about smooth frequency, how to perform it.
    Thanks

    Reply
  5. Anurag Chakraborty says

    June 29, 2019 at 7:29 pm

    How do I construct a normal probability plot ?

    Reply
  6. Cédric ntata says

    September 10, 2018 at 11:22 am

    Je suis très content d apprendre un plus dans mes connaissance stat

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Mean, Median, and Mode: Measures of Central Tendency
    • How to Find the P value: Process and Calculations
    • How to do t-Tests in Excel
    • Z-table
    • Choosing the Correct Type of Regression Analysis
    • One-Tailed and Two-Tailed Hypothesis Tests Explained
    • How to Interpret the F-test of Overall Significance in Regression Analysis

    Recent Posts

    • Slope Intercept Form of Linear Equations: A Guide
    • Population vs Sample: Uses and Examples
    • How to Calculate a Percentage
    • Control Chart: Uses, Example, and Types
    • Monte Carlo Simulation: Make Better Decisions
    • Principal Component Analysis Guide & Example

    Recent Comments

    • Jim Frost on Monte Carlo Simulation: Make Better Decisions
    • Gilberto on Monte Carlo Simulation: Make Better Decisions
    • Sultan Mahmood on Linear Regression Equation Explained
    • Sanjay Kumar P on What is the Mean and How to Find It: Definition & Formula
    • Dave on Control Variables: Definition, Uses & Examples

    Copyright © 2023 · Jim Frost · Privacy Policy