• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Assessing Normality: Histograms vs. Normal Probability Plots

By Jim Frost 5 Comments

Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use. Normal probability plots are also known as quantile-quantile plots, or Q-Q Plots for short!

Using Histograms to Graph Normal Distributions

First, let’s look at what you expect to see on a histogram when your data follow a normal distribution.

Histogram that displays data that follow the bell-shaped curve of the normal distribution.

I’ve added the fitted distribution, and it sure seems to fit the data well.

So, what’s wrong using a histogram to assess normality? Histograms are particularly problematic when you have a small sample size because its appearance depends on the number of data points and the number of bars. When you have less than approximately 20 data points, the bars on the histogram don’t adequately display the distribution.

The histogram above uses 100 data points. However, the histograms below use datasets with only 15 observations in each. Can you tell which datasets follow the normal distribution? For comparison, I’ve included the normal distribution curve that provides the best fit for each dataset. Download the CSV dataset to check them yourself: normal_data_examples. The Cs in the graphs below correspond to the columns in the worksheet.

Histogram that displays data that appears to be nonnormal.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.
Histogram that appears to display nonnormal data.

Surprise! All of these datasets follow the normal distribution, but you can’t tell that from the histograms.

Related posts: Understanding the Normal Distribution and Using Histograms to Understand Your Data

Using Normal Probability Q-Q Plots to Graph Normal Distributions

Instead, graph these distributions using normal probability Q-Q plots, which are also known as normal plots. These plots are simple to use. All you need to do is visually assess whether the data points follow the straight line. If the points track the straight line, your data follow the normal distribution. It’s very straightforward!

I’ll graph the same datasets in the histograms above but use normal probability plots instead. For this type of graph, the best approach is the “fat pencil test.” If you place an imaginary fat pencil over the straight distribution fit line, does it cover the data points? If so, your data are normally distributed. In other words, the data points don’t have to fall right on the line but generally need to follow it.

Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.
Normal probability plot that displays data that are normally distributed.

These normal probability Q-Q plots show that all the datasets follow the normal distribution. This type of graph is also a great way to determine whether residuals from regression analysis are normally distributed.

The graph below shows how nonnormal data can appear in a normal plot. Notice the systematic departures from the straight line.

Normal plot that displays nonnormal data.

Here are a few technical notes on how statistical software creates Q-Q plots. Your software calculates the cumulative distribution function for your dataset and then displays each observation’s value by its estimated cumulative probability. The graph transforms the X and Y axes so that the distribution line is straight. If your data follow the distribution, they will follow that line.

Normal Probability Q-Q Plots can be Better Than Normality Tests

You can also use normality tests to determine whether your data follow a normal distribution. However, be aware that normality tests are like all other hypothesis tests. As you increase the sample size, their ability to detect small differences increases. With a large enough sample size, these tests can detect minuscule departures from the normal distribution that are meaningless. In this scenario, you can end up with a test that rejects the notion that the data are normally distributed even when they do follow the normal distribution.

For example, the normal probability Q-Q plot below displays a dataset with 5000 observations along with the normality test results. The p-value for the test is 0.010, which indicates that the data do not follow the normal distribution. However, the points on the graph clearly follow the distribution fit line. These data follow the normal distribution despite the test results. This is a rare case where statisticians will say you can use the graph over the hypothesis test!

Normal probability plot that displays a dataset with 5000 observations that follow the normal distribution.

In this post, I’ve highlighted using normal probability Q-Q plots with small and large datasets. However, I prefer using them over histograms for datasets of all sizes. For my eyes at least, it is just easier to determine whether the data points follow a straight line than comparing bars on a histogram to a bell-shaped curve.

This post has been about using Q-Q plots to assess normality. However, you can use these plots to evaluate other distributions. To learn more about this, read my post: How to Identify the Distribution of Your Data.

Share this:

  • Tweet

Related

Filed Under: Basics Tagged With: distributions, graphs

Reader Interactions

Comments

  1. Jereesh K Elias says

    February 8, 2022 at 1:35 am

    I have a doubt regarding choosing a parametric and non-parametric test based on normality of data. I have been taught that if the variables are following a non-normal distribution, we should go for non-parametric test. My doubt is, in case our independent variable is normally distributed and dependent variable is non-normally distributed, broadly which test should we use? parametric or non-parametric?

    Reply
  2. Madhu says

    January 18, 2021 at 2:47 pm

    Great post!! Hope you continue the great work!

    Reply
  3. Suruchi says

    December 1, 2020 at 10:49 am

    Hello Jim
    Your posts are very helpful to me.
    Please guide me about smooth frequency, how to perform it.
    Thanks

    Reply
  4. Anurag Chakraborty says

    June 29, 2019 at 7:29 pm

    How do I construct a normal probability plot ?

    Reply
  5. Cédric ntata says

    September 10, 2018 at 11:22 am

    Je suis très content d apprendre un plus dans mes connaissance stat

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics eBook!

New! Buy My Hypothesis Testing eBook!

Buy My Regression eBook!

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter
    • Popular
    • Latest
    Popular
    • How To Interpret R-squared in Regression Analysis
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Measures of Central Tendency: Mean, Median, and Mode
    • Normal Distribution in Statistics
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Understanding Interaction Effects in Statistics
    Latest
    • Cohens D: Definition, Using & Examples
    • Statistical Inference: Definition, Methods & Example
    • T Distribution: Definition & Uses
    • Representative Sample: Definition, Uses & Methods
    • Difference Between Standard Deviation and Standard Error
    • How to Find the P value: Process and Calculations
    • Sampling Methods: Different Types in Research

    Recent Comments

    • Adrian Olszewski on 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression
    • Nor hanieza on Choosing the Correct Type of Regression Analysis
    • Jim Frost on Cohens D: Definition, Using & Examples
    • Bill Cullen on Normal Distribution in Statistics
    • Jerry on Cohens D: Definition, Using & Examples

    Copyright © 2022 · Jim Frost · Privacy Policy