What is a QQ Plot?
A QQ plot, or Quantile-Quantile plot, is a visual tool that determines whether a sample:
- Was drawn from a population that follows a specific probability distribution, often a normal distribution.
- Follows the same distribution as another sample.
A QQ plot provides a powerful visual assessment, pinpointing deviations between distributions and identifying the data points responsible for them. When comparing a sample to a probability distribution, you’ll typically use this graph with a distribution test, such as a normality test, to verify statistical assumptions.
The most common use for a QQ plot is determining whether sample data follow a particular probability distribution. That distribution is frequently the normal distribution, and you’d use this plot with a normality test. However, it can use a different distribution, such as the lognormal, Weibull, or exponential distribution.
In this post, learn about QQ plots, how to interpret them, and the benefits they provide compared to using histograms and hypothesis tests to evaluate distributions.
Graphing Quantiles on a QQ Plot
Quantiles are like percentiles, indicating the percentage of values falling below the quantile. For example, 30% of the data points fall below the 30th quantile. The median is the 50th quantile, where half the data are below it. Learn more about Percentiles: Interpretations and Calculations.
A QQ plot compares the quantiles for two distributions. The distribution on the vertical axis (Y-axis) is your sample data. The nature of the horizontal axis (X-axis) depends on what you’re comparing your sample to. If you compare it to a probability distribution (e.g., Normal distribution), the X-axis reflects the theoretical quantiles for the probability distribution. Statisticians also refer to this type of QQ plot as a probability plot.
However, if you compare one sample to another, the X-axis displays quantiles for the second sample.
In either case, the X and Y quantiles are equivalent when the two distributions are the same. Because Y = X, the slope equals 1, and all the points fall on a 45-degree line. For example, when the data point that is the 30th quantile in the sample (Y) also falls at the 30th quantile in the probability distribution (X), that data point falls right on the Y = X line. However, if it’s the 50th quantile in the probability distribution, it’ll fall below the line.
Note that the axes scaling can change the angle of the line to something other than 45 degrees.
For the remainder of this article, I only look at the normal probability plot form of the QQ plot because that is its most common usage.
How to Interpret a QQ Plot
Interpreting QQ plots is intuitive. When all the dots generally follow the straight line y = x, the sample distribution is similar to the theoretical one. The data points don’t have to fall right on the line. Instead, they only need to follow a line generally—with random variability placing them above and below it.
I use the “fat pencil test.” Place an imaginary fat pencil over the straight line and see if it covers the points.
Conversely, a systematic departure from a straight line suggests your data don’t follow the distribution.
Below is a QQ plot where the data follow the normal distribution. The Y-axis displays the sample percentiles, while the X-axis shows the Z-scores for the theoretical quantile values.
Below is a QQ plot where the data clearly don’t follow the normal distribution because of the systematic deviations.
A QQ plot is a great way to determine whether residuals from regression analysis are normally distributed.
Given that only a limited number of data points reside in the highest and lowest quantiles, we are most likely to observe the effects of random fluctuations at these extreme ends.
Spotting Specific Deviations
Systematic divergences from the line in a QQ plot suggest discrepancies between the sample and theoretical distributions. By examining these deviations in QQ plots, we can gain deeper insights into the underlying characteristics of our data. Keep an eye out for these patterns:
- Dots that form a curve on a normal QQ plot indicate that your sample data are skewed.
- An “S” shaped curve at the ends with a linear portion in the middle suggests the data have more extreme values (or outliers) than the normal distribution in the tails.
QQ Plot Benefits vs. Other Distribution Assessment Tools
When assessing your data’s distribution, you have several standard tools to choose from: QQ plots, histograms, and distribution tests. Using a QQ plot is my preferred method. Let’s close by going over its benefits relative to the other tools.
For several reasons, it’s easier to use a QQ plot than a histogram to see if your data follow a distribution. For starters, you can more accurately determine whether dots follow a line than seeing if histogram bars fit a curve. Additionally, a histogram’s appearance depends on the sample size and the number of bars. With fewer than 20 data points, histograms don’t effectively represent the distribution.
In the examples below, it’s hard to determine whether the data follow a normal distribution in the histograms with a distribution fit curve. However, the corresponding QQ plot with the same data makes it clear that they are normally distributed.
Download the CSV dataset to check them yourself: normal_data_examples. The Cs in the graphs below correspond to the columns in the worksheet.
Related post: Using Histograms to Understand Your Data
Distribution tests are hypothesis tests that determine whether your sample data deviates from a probability distribution. They are valuable tools. However, a QQ plot has an advantage over them in some cases.
As the sample size increases, all hypothesis tests gain statistical power and can detect smaller and smaller differences. The same is true with distribution tests. With large sample sizes, they can detect meaningless miniscule deviations from the probability distribution.
You can see that in action below.
The normality test is statistically significant, indicating the data don’t follow the normal distribution. However, the QQ plot shows that they do. The sample size is 5000, giving the test the power to detect trivial departures from the normal distribution.
Given the above information, you’d conclude that your data are normally distributed. This is a rare case where statisticians will trust graphical results more than the hypothesis test!
Learn how QQ plots play a vital role in Identifying the Distribution of Your Data. This article shows how to use distribution tests and QQ plots together to determine which probability distribution your data follow.
Learn more about how Normal QQ Plots are Better Than Histograms for Assessing Normality.