A z-score measures the distance between a data point and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is above or below the mean. For example, a z-score of +2 indicates that the data point falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores, and I’ll use those terms interchangeably. [Read more…] about Z-score: Definition, Formula, and Uses
Venn diagrams visually represent relationships between concepts. They use circles to display similarities and differences between sets of ideas, traits, or items. Intersections indicate that the groups have common elements. Non-overlapping areas represent traits that are unique to one set. Venn diagrams are also known as logic diagrams and set diagrams. [Read more…] about Venn Diagrams: Uses, Examples, and Making
The interquartile range (IQR) measures the spread of the middle half of your data. It is the range for the middle 50% of your sample. Use the IQR to assess the variability where most of your values lie. Larger values indicate that the central portion of your data spread out further. Conversely, smaller values show that the middle values cluster more tightly.
In this post, learn what the interquartile range means and the many ways to use it! I’ll show you how to find the interquartile range, use it to measure variability, graph it in boxplots to assess distribution properties, use it to identify outliers, and test whether your data are normally distributed.
The interquartile range is one of several measures of variability. To learn about the others and how the IQR compares, read my post, Measures of Variability.
Interquartile Range Overview
To visualize the interquartile range, imagine dividing your data into quarters. Statisticians refer to these quarters as quartiles and label them from low to high as Q1, Q2, Q3, and Q4. The lowest quartile (Q1) covers the smallest quarter of values in your dataset. The upper quartile (Q4) comprises the highest quarter of values. The interquartile range is the middle half of the data that lies between the upper and lower quartiles. In other words, the interquartile range includes the 50% of data points that are above Q1 and below Q4. The IQR is the red area in the graph below, containing Q2 and Q3 (not labeled).
When measuring variability, statisticians prefer using the interquartile range instead of the full data range because extreme values and outliers affect it less. Typically, use the IQR with a measure of central tendency, such as the median, to understand your data’s center and spread. This combination creates a fuller picture of your data’s distribution.
Unlike the more familiar mean and standard deviation, the interquartile range and the median are robust measures. Outliers do not strongly influence either statistic because they don’t depend on every value. Additionally, like the median, the interquartile range is superb for skewed distributions. For normal distributions, you can use the standard deviation to determine the percentage of observations that fall specific distances from the mean. However, that doesn’t work for skewed distributions, and the IQR is an excellent alternative.
How to Find the IQR by Hand
The formula for calculating the interquartile range takes the third quartile value and subtracts the first quartile value.
IQR = Q3 – Q1
Equivalently, the interquartile range is the region between the 75th and 25th percentile (75 – 25 = 50% of the data).
Using the IQR formula, we need to find the values for Q3 and Q1. To do that, simply order your data from low to high and split the value into four equal portions.
I’ve divided the dataset below into quartiles. The interquartile range extends from the Q1 value to the Q3 value. For this dataset, the interquartile range is 39 – 20 = 19.
Note that different methods and statistical software programs will find slightly different Q1 and Q3 values, which affects the interquartile range. These variations stem from alternate ways of finding percentiles. For details about that, read my post about Percentiles: Interpretations and Calculations.
Finding the Interquartile Range using Excel
All statistical software packages will identify the interquartile range as part of their descriptive statistics. Here, I’ll show you how to find it using Excel because most readers can access this application.
To follow along, download the Excel file: IQR. This dataset is the same as the one I use in the illustration above. This file also includes the interquartile range calculations for finding outliers and the IQR normality test described later in this post.
In Excel, you’ll need to use the QUARTILE.EXC function, which has the following arguments: QUARTILE.EXC(array, quart)
- Array: Cell range of numeric values.
- Quart: Quartile you want to find.
In my spreadsheet, the data are in cells A2:A20. Consequently, I’ll use the following syntax to find Q1 and Q3, respectively:
As with my example of finding the interquartile range by hand, Excel indicates that Q3 is 39 and Q1 is 20. IQR = 39 – 20 = 19
Related post: Descriptive Statistics in Excel
Using Boxplots to Graph the Interquartile Range
Boxplots are a great way to visualize interquartile ranges and their relation to the median and the overall distribution. These graphs display ranges of values based on quartiles and show asterisks for outliers that fall outside the whiskers. Boxplots work by splitting your data into quarters.
Let’s look at the boxplot anatomy before getting to the example. Notice how it divides your data into quartiles.
The box in the boxplot is your interquartile range! It contains 50% of your data. By comparing the size of these boxes, you can understand your data’s variability. More dispersed distributions have wider boxes.
Additionally, find where the median line falls within each interquartile box. If the median is closer to one side or the other of the box, it’s a skewed distribution. When the median is near the center of the interquartile range, your distribution is symmetric.
For example, in the boxplot below, method 3 has the highest variability in scores and is left-skewed. Conversely, method 2 has a tighter distribution that is symmetrical, although it also has an outlier—read the next section for more about that!
Related post: Boxplots versus Individual Value Plots
Using the IQR to Find Outliers
The interquartile range can help you identify outliers. For other methods of finding outliers, the outliers themselves influence the calculations, potentially causing you to miss them. Fortunately, interquartile ranges are relatively robust against outlier influence and can avoid this problem. This method also does not assume the data follow the normal distribution or any other distribution. That’s why using the IQR to find outliers is one of my favorite methods!
To find outliers, you’ll need to know your data’s IQR, Q1, and Q3 values. Take these values and input them into the equations below. Statisticians call the result for each equation an outlier gate. I’ve included these calculations in the IQR example Excel file.
Q1 − 1.5 * IQR: Lower outlier gate.
Q3 + 1.5 * IQR: Upper outlier gate.
Using the same example dataset, I’ll calculate the two outlier gates. For that dataset, the interquartile range is 19, Q1 = 20, and Q3 = 39.
Lower outlier gate: 20 – 1.5 * 19 = -8.5
Upper outlier gate: 39 + 1.5 * 19 = 67.5
Then look for values in the dataset that are below the lower gate or above the upper gate. For the example dataset, there are no outliers. All values fall between these two gates.
Boxplots typically use this method to identify outliers and display asterisks when they exist. In the teaching method boxplot above, notice that the Method 2 group has an outlier. The researchers should investigate that value.
Related post: Five Ways to Find Outliers
Using the Interquartile Range to Test Normality
You can even use the interquartile range as a simple test to determine whether your data are normally distributed. When data follow a normal distribution, the interquartile range will have specific properties. The image below highlights these properties. Specifically, in our calculations below, we’ll use the standard deviations (σ) that correspond to the interquartile range, -0.67 and 0.67.
You can assess whether your IQR is consistent with a normal distribution. However, this test should not replace a formal normality hypothesis test.
To perform this test, you’ll need to know the sample standard deviation (s) and sample mean (x̅). Input these values into the formulas for Q1 and Q3 below.
- Q1 = x̅ − (s * 0.67)
- Q3 = x̅ + (s * 0.67)
Compare these calculated values to your data’s actual Q1 and Q3 values. If they are notably different, your data might not follow the normal distribution.
We’ll return to our example dataset from before. Our actual Q1 and Q3 are 20 and 39, respectively.
The sample average is 31.3, and its standard deviation is 14.1. I’ll input those values into the equations.
Q1 = 31.3 – (14.1 * 0.67) = 21.9
Q3 = 31.3 + (14.1 * 0.67) = 40.7
The calculated values are pretty close to the actual data values, suggesting that our data follow the normal distribution. I’ve included these calculations in the IQR example spreadsheet.
Excel can calculate correlation coefficients and a variety of other statistical analyses. Even if you don’t use Excel regularly, this post is an excellent introduction to calculating and interpreting correlation.
In this post, I provide step-by-step instructions for having Excel calculate Pearson’s correlation coefficient, and I’ll show you how to interpret the results. Additionally, I include links to relevant statistical resources I’ve written that provide intuitive explanations. Together, we’ll analyze and interpret an example dataset! [Read more…] about Using Excel to Calculate Correlation
Spearman’s correlation in statistics is a nonparametric alternative to Pearson’s correlation. Use Spearman’s correlation for data that follow curvilinear, monotonic relationships and for ordinal data. Statisticians also refer to Spearman’s rank order correlation coefficient as Spearman’s ρ (rho).
In this post, I’ll cover what all that means so you know when and why you should use Spearman’s correlation instead of the more common Pearson’s correlation. [Read more…] about Spearman’s Correlation Explained
Descriptive statistics summarize your dataset, painting a picture of its properties. These properties include various central tendency and variability measures, distribution properties, outlier detection, and other information. Unlike inferential statistics, descriptive statistics only describe your dataset’s characteristics and do not attempt to generalize from a sample to a population. [Read more…] about Descriptive Statistics in Excel
Excel can perform various statistical analyses, including regression analysis. It is a great option because nearly everyone can access Excel. This post is an excellent introduction to performing and interpreting regression analysis, even if Excel isn’t your primary statistical software package.
Moving averages can smooth time series data, reveal underlying trends, and identify components for use in statistical modeling. Smoothing is the process of removing random variations that appear as coarseness in a plot of raw time series data. It reduces the noise to emphasize the signal that can contain trends and cycles. Analysts also refer to the smoothing process as filtering the data. [Read more…] about Using Moving Averages to Smooth Time Series Data
The Birthday Problem in statistics asks, how many people do you need in a group to have a 50% chance that at least two people will share a birthday? Go ahead and think about that for a moment. The answer surprises many people. We’ll get to that shortly.
In this post, I’ll not only answer the birthday paradox, but I’ll also show you how to calculate the probabilities for any size group, run a computer simulation of it, and explain why the answer to the Birthday Problem is so surprising. [Read more…] about Answering the Birthday Problem in Statistics
Use a variances test to determine whether the variability of two groups differs. In this post, we’ll work through a two-sample variances test that Excel provides. Even if Excel isn’t your primary statistical software, this post provides an excellent introduction to variance tests. Excel refers to this analysis as F-Test Two-Sample for Variances. [Read more…] about How to Test Variances in Excel
Use two-way ANOVA to assess differences between the group means that are defined by two categorical factors. In this post, we’ll work through two-way ANOVA using Excel. Even if Excel isn’t your main statistical package, this post is an excellent introduction to two-way ANOVA. Excel refers to this analysis as two factor ANOVA. [Read more…] about How to do Two-Way ANOVA in Excel
Use one-way ANOVA to test whether the means of at least three groups are different. Excel refers to this test as Single Factor ANOVA. This post is an excellent introduction to performing and interpreting a one-way ANOVA test even if Excel isn’t your primary statistical software package. [Read more…] about How to do One-Way ANOVA in Excel
Excel can perform various statistical analyses, including t-tests. It is an excellent option because nearly everyone can access Excel. This post is a great introduction to performing and interpreting t-tests even if Excel isn’t your primary statistical software package.
In this post, I provide step-by-step instructions for using Excel to perform t-tests. Importantly, I also show you how to select the correct form of t-test, choose the right options, and interpret the results. I also include links to additional resources I’ve written, which present clear explanations of relevant t-test concepts that you won’t find in Excel’s documentation. And, I use an example dataset for us to work through and interpret together! [Read more…] about How to do t-Tests in Excel