What is the Interquartile Range (IQR)?
The interquartile range (IQR) measures the spread of the middle half of your data. It is the range for the middle 50% of your sample. Use the IQR to assess the variability where most of your values lie. Larger values indicate that the central portion of your data spread out further. Conversely, smaller values show that the middle values cluster more tightly.
In this post, learn what the interquartile range means and the many ways to use it! I’ll show you how to find the interquartile range, use it to measure variability, graph it in boxplots to assess distribution properties, use it to identify outliers, and test whether your data are normally distributed.
The interquartile range is one of several measures of variability. To learn about the others and how the IQR compares, read my post, Measures of Variability.
Interquartile Range Definition
To visualize the interquartile range, imagine dividing your data into quarters. Statisticians refer to these quarters as quartiles and label them from low to high as Q1, Q2, Q3, and Q4. The lowest quartile (Q1) covers the smallest quarter of values in your dataset. The upper quartile (Q4) comprises the highest quarter of values. The interquartile range is the middle half of the data that lies between the upper and lower quartiles. In other words, the interquartile range includes the 50% of data points that are above Q1 and below Q4. The IQR is the red area in the graph below, containing Q2 and Q3 (not labeled).
When measuring variability, statisticians prefer using the interquartile range instead of the full data range because extreme values and outliers affect it less. Typically, use the IQR with a measure of central tendency, such as the median, to understand your data’s center and spread. This combination creates a fuller picture of your data’s distribution.
Unlike the more familiar mean and standard deviation, the interquartile range and the median are robust measures. Outliers do not strongly influence either statistic because they don’t depend on every value. Additionally, like the median, the interquartile range is superb for skewed distributions. For normal distributions, you can use the standard deviation to determine the percentage of observations that fall specific distances from the mean. However, that doesn’t work for skewed distributions, and the IQR is an excellent alternative.
Related posts: Quartiles: Definition, Finding, and Using, Median: Definition and Uses, and What are Robust Statistics?
How to Find the Interquartile Range (IQR) by Hand
The formula for finding the interquartile range takes the third quartile value and subtracts the first quartile value.
IQR = Q3 – Q1
Equivalently, the interquartile range is the region between the 75th and 25th percentile (75 – 25 = 50% of the data).
Using the IQR formula, we need to find the values for Q3 and Q1. To do that, simply order your data from low to high and split the value into four equal portions.
I’ve divided the dataset below into quartiles. The interquartile range extends from the Q1 value to the Q3 value. For this dataset, the interquartile range is 39 – 20 = 19.
Note that different methods and statistical software programs will find slightly different Q1 and Q3 values, which affects the interquartile range. These variations stem from alternate ways of finding percentiles. For details about that, read my post about Percentiles: Interpretations and Calculations.
How to Find the Interquartile Range using Excel
All statistical software packages will identify the interquartile range as part of their descriptive statistics. Here, I’ll show you how to find it using Excel because most readers can access this application.
To follow along, download the Excel file: IQR. This dataset is the same as the one I use in the illustration above. This file also includes the interquartile range calculations for finding outliers and the IQR normality test described later in this post.
In Excel, you’ll need to use the QUARTILE.EXC function, which has the following arguments: QUARTILE.EXC(array, quart)
- Array: Cell range of numeric values.
- Quart: Quartile you want to find.
In my spreadsheet, the data are in cells A2:A20. Consequently, I’ll use the following syntax to find Q1 and Q3, respectively:
As with my example of finding the interquartile range by hand, Excel indicates that Q3 is 39 and Q1 is 20. IQR = 39 – 20 = 19
Related post: Descriptive Statistics in Excel
Using Boxplots to Graph the Interquartile Range
Boxplots are a great way to visualize interquartile ranges and their relation to the median and the overall distribution. These graphs display ranges of values based on quartiles and show asterisks for outliers that fall outside the whiskers. Boxplots work by splitting your data into quarters.
Let’s look at the boxplot anatomy before getting to the example. Notice how it divides your data into quartiles.
The box in the boxplot is your interquartile range! It contains 50% of your data. By comparing the size of these boxes, you can understand your data’s variability. More dispersed distributions have wider boxes.
Additionally, find where the median line falls within each interquartile box. If the median is closer to one side or the other of the box, it’s a skewed distribution. When the median is near the center of the interquartile range, your distribution is symmetric.
For example, in the boxplot below, method 3 has the highest variability in scores and is left-skewed. Conversely, method 2 has a tighter distribution that is symmetrical, although it also has an outlier—read the next section for more about that!
Related post: Boxplots versus Individual Value Plots
Using the IQR to Find Outliers
The interquartile range can help you identify outliers. For other methods of finding outliers, the outliers themselves influence the calculations, potentially causing you to miss them. Fortunately, interquartile ranges are relatively robust against outlier influence and can avoid this problem. This method also does not assume the data follow the normal distribution or any other distribution. That’s why using the IQR to find outliers is one of my favorite methods!
To find outliers, you’ll need to know your data’s IQR, Q1, and Q3 values. Take these values and input them into the equations below. Statisticians call the result for each equation an outlier gate. I’ve included these calculations in the IQR example Excel file.
Q1 − 1.5 * IQR: Lower outlier gate.
Q3 + 1.5 * IQR: Upper outlier gate.
Using the same example dataset, I’ll calculate the two outlier gates. For that dataset, the interquartile range is 19, Q1 = 20, and Q3 = 39.
Lower outlier gate: 20 – 1.5 * 19 = -8.5
Upper outlier gate: 39 + 1.5 * 19 = 67.5
Then look for values in the dataset that are below the lower gate or above the upper gate. For the example dataset, there are no outliers. All values fall between these two gates.
Boxplots typically use this method to identify outliers and display asterisks when they exist. In the teaching method boxplot above, notice that the Method 2 group has an outlier. The researchers should investigate that value.
Related post: Five Ways to Find Outliers
Using the Interquartile Range to Test Normality
You can even use the interquartile range as a simple test to determine whether your data are normally distributed. When data follow a normal distribution, the interquartile range will have specific properties. The image below highlights these properties. Specifically, in our calculations below, we’ll use the standard deviations (σ) that correspond to the interquartile range, -0.67 and 0.67.
You can assess whether your IQR is consistent with a normal distribution. However, this test should not replace a formal normality hypothesis test.
To perform this test, you’ll need to know the sample standard deviation (s) and sample mean (x̅). Input these values into the formulas for Q1 and Q3 below.
- Q1 = x̅ − (s * 0.67)
- Q3 = x̅ + (s * 0.67)
Compare these calculated values to your data’s actual Q1 and Q3 values. If they are notably different, your data might not follow the normal distribution.
We’ll return to our example dataset from before. Our actual Q1 and Q3 are 20 and 39, respectively.
The sample average is 31.3, and its standard deviation is 14.1. I’ll input those values into the equations.
Q1 = 31.3 – (14.1 * 0.67) = 21.9
Q3 = 31.3 + (14.1 * 0.67) = 40.7
The calculated values are pretty close to the actual data values, suggesting that our data follow the normal distribution. I’ve included these calculations in the IQR example spreadsheet.
Related posts: Understanding the Normal Distribution and How to Identify the Distribution of Your Data
Beck G says
Why do you use exclusive and not inclusive in the excel formula? When would you use one or the other? thanks!
Jim Frost says
Oddly there’s no standardized way to calculate quartiles, which means there’s no standardized way to calculate the IQR. I don’t know of any recommendations for when to use one or the other. That decision will make the most difference in small datasets. But for larger datasets and for probability distributions, there’s not much of a difference.
Bill Lafferty says
Hi Jim, I am a Maths teacher and don’t understand why the semi inter-quartile range is often asked for?
Why would it be useful as we would already know the inter-quartile range?
Jim Frost says
This is anecdotal but I find that the IQR is requested/used much more frequently than the semi interquartile range, which is half the IQR. I’ve personally never used the semi IQR or have been asked for it. Perhaps in some fields it is? I’m not really sure.
As for it’s usefulness, I don’t think it’s more useful than the IQR. It seems like it would be equally useful, just a different form of the same thing really.
Thank you for this informative website. I have noticed that some researchers present IQR as the range between Q1 (25th percentile) to Q3 (75th percentile). Is that correct?
Why do we name IQR range if we present only a number (the difference between Q1 and Q3) and not a range?
Thank you in advance
Jim Frost says
That is confusing. I think it goes back to the statistical meaning of the word range versus the everyday use of the word. Statistically, I range is the difference between two values. For the interquartile range, that would be Q3 – Q1. However, in everyday speech, you’d say it ranges from Q1 to Q3. Personally, I like knowing both! Knowing the range of Q1 to Q3 is helpful for knowing where half the values fall. Knowing Q3-Q1 helps you know how tightly they fall.
At any rate, I think that’s the origin of the different meanings.
H S Kim says
HI, Jim. Thanks again for your explanation and I hope that you find the origin of 1.5IQR.
Let me correct my sentences
As we can see at the diagram, By Jhguch at en.wikipedia, CC BY-SA 2.5, Link, the probability that any number is larger than 2.698σis 0.35% and the probability that any number is smaller than -2.698σis 0.35%. The sum is 0.7% and this is very lower probability of occurrence. So Q3+1.5IQR and Q1-1.5IQR are the criteria of outlier. I know that this logic is proper only if the data follow normal distribution.
When a box plot shows the data distribution is skewed, I can not trust the outlier mark of that box plot. Why? The probability of the outlier is not same as the above logic. I’d like to say do not believe the outlier mark of box plot if the box plot shows that the data is skewed.
I’d like to trust the outlier mark when the box plot shows that the length of Q2-Q1 is same as Q3-Q1. Because we can not get any information about normality from box plot, I just want to accept the outlier if the box plot is symmetric.
I hope to get your opinion whether my opinion is proper or not.
Thank you very much.
H S Kim says
Hi, Jim. I always thank you for your explanations about statistics.
Today, I hope to understand the outlier fence of box plot.
When I draw a box plot with numbers generated by exponential distribution,
the outliers are calculated by Q1-1.5IQR or Q3+1.5IQR.
These formulations are based on the normal distribution.
Under assumption that the data are following the normal distribution,
the probability of outliers are lower than 99.3%.
But in case of non-normal distribution, I think the probability of outliers are not same
as the normal distribution. So I hope to know why we use Q3+1.5IQR even if the data do not
follow the normal data.
If you give me answer, i will be very happy.
Thanks a lot.
Jim Frost says
I’ll need to dig deeper to find the origins of multiplying by 1.5. However, keep in mind that you’re multiplying the interquartile range (IQR) by 1.5, and the IQR varies depending on the underlying distribution. Further, the values of Q1 and Q3 also depend on the underlying distribution. So, it’s inaccurate to say it is based on the normal distribution when Q1, Q3, and the IQR all depend on the actual distribution of the data rather than the normal distribution. It’s true that when the underlying distribution is normal, these results might be consistent with the normal distribution but that is very different than saying that underlying assumption is the normal distribution.
The procedure is designed to be one you can use in the exploratory stages when you don’t know the distribution of your data. As you gain more information, you might be able to use the distributional properties to find outliers.
Areej AF says
Thank you very much for this informative blog, i learned alot.
I want to ask a question about percentiles. What if i want to calculate a 90% confidence interval for 2 percentile (2.5th and 97.5th )??. these percentiles used to determine a reference range of a biological blood measure???
Jim Frost says
Those percentiles (2.5th and 97.5th) will cover 95% of the distribution.
However, I’m wondering if you’re thinking of tolerance intervals? These intervals have a confidence level that at least a specified percentage of the population will fall within the tolerance interval. For example, a quality engineer might like to find the interval where they can be 95% confident that at least 95% of future observations will fall within the interval.
I’m not sure exactly what you’re asking, but you might be thinking about tolerance intervals. If so, read my post where I compare tolerance intervals to other intervals.
Jefferson Ibhagui says
Thanks for this piece.
While reading, a thought crossed my mind and I wanted to get your perspective on it. Here is the “issue”:
I have a spatatial data (I read somewhere they are never normal) I imported to Excel. I visualized it by plotting a boxplot. The plot really looked cool and presentable without the outliers and the IQR attributes you discussed could be discerned. Whereas plotting.with outliers gave the opposite effect – it kinda squashed the IQR (I wish I could upload the images).
The question now is: must outliers be presented in a boxplot?
Jim Frost says
Hi Jefferson, you might try a histogram and get a better look at the distribution if you have enough data points for that. It’s possible that you have skewed-distribution. If they are outliers, then the boxplot is doing a great job at making them noticeable. You should determine whether they’re truly outliers or possible a skewed distribution.
As for which graph to use, the answer depends partially on whether it’s outliers or a skewed distribution. And, try both graphs and see which one portrays the reality of what’s going on with your data.
G J Shah says
You still made me (more) confused. You are right, there are four quarters and they are formed by three quartiles Q1, Q2 and Q3. A quarter is 25 percent of data (set of values) whereas a quartile is a locational value (single valued) that divide the data into these quarters. They are related but very different terms.
Jim Frost says
Q4 is the highest value in the dataset. It’s equivalent to the 100th percentile.
G J Shah says
I found this post very informative. However I found one problem (might be a print error). There are always three quartiles, but you have mentioned four in the first part of your post. Pls take this comment positively. Thank you.
Jim Frost says
Hi GJ, there are in fact four quartiles. Each quartile is a quarter of your data. By definition, there are four quarters and, hence, four quartiles. They are labeled Q1, Q2, Q3, and Q4.
Jim, I am looking forward to seeing one on error bars. Thank you
Ali Abbas Rizvi says
It was really helpful. Thank you for this insightful article. Here, I want to ask the possible reason behind the use of ‘1.5’ while calculating the outlier.