Blog

Relative Frequencies and Their Distributions

A relative frequency indicates how often a specific kind of event occurs within the total number of observations. It is a type of frequency that uses percentages, proportions, and fractions.

In this post, learn about relative frequencies, the relative frequency distribution, and its cumulative counterpart. [Read more…] about Relative Frequencies and Their Distributions

Venn Diagrams: Uses, Examples, and Making

By Jim Frost Leave a Comment

Venn diagrams visually represent relationships between concepts. They use circles to display similarities and differences between sets of ideas, traits, or items. Intersections indicate that the groups have common elements. Non-overlapping areas represent traits that are unique to one set. Venn diagrams are also known as logic diagrams and set diagrams. [Read more…] about Venn Diagrams: Uses, Examples, and Making

Empirical Rule: Definition & Formula

By Jim Frost 1 Comment

What is the Empirical Rule?

The empirical rule in statistics, also known as the 68 95 99 rule, states that for normal distributions, 68% of observed data points will lie inside one standard deviation of the mean, 95% will fall within two standard deviations, and 99.7% will occur within three standard deviations. [Read more…] about Empirical Rule: Definition & Formula

Interquartile Range (IQR): How to Find and Use It

By Jim Frost 23 Comments

What is the Interquartile Range (IQR)?

The interquartile range (IQR) measures the spread of the middle half of your data. It is the range for the middle 50% of your sample. Use the IQR to assess the variability where most of your values lie. Larger values indicate that the central portion of your data spread out further. Conversely, smaller values show that the middle values cluster more tightly.

In this post, learn what the interquartile range means and the many ways to use it! I’ll show you how to find the interquartile range, use it to measure variability, graph it in boxplots to assess distribution properties, use it to identify outliers, and test whether your data are normally distributed.

The interquartile range is one of several measures of variability. To learn about the others and how the IQR compares, read my post, Measures of Variability.

Interquartile Range Definition

To visualize the interquartile range, imagine dividing your data into quarters. Statisticians refer to these quarters as quartiles and label them from low to high as Q1, Q2, Q3, and Q4. The lowest quartile (Q1) covers the smallest quarter of values in your dataset. The upper quartile (Q4) comprises the highest quarter of values. The interquartile range is the middle half of the data that lies between the upper and lower quartiles. In other words, the interquartile range includes the 50% of data points that are above Q1 and below Q4. The IQR is the red area in the graph below, containing Q2 and Q3 (not labeled).

When measuring variability, statisticians prefer using the interquartile range instead of the full data range because extreme values and outliers affect it less. Typically, use the IQR with a measure of central tendency, such as the median, to understand your data’s center and spread. This combination creates a fuller picture of your data’s distribution.

Unlike the more familiar mean and standard deviation, the interquartile range and the median are robust measures. Outliers do not strongly influence either statistic because they don’t depend on every value. Additionally, like the median, the interquartile range is superb for skewed distributions. For normal distributions, you can use the standard deviation to determine the percentage of observations that fall specific distances from the mean. However, that doesn’t work for skewed distributions, and the IQR is an excellent alternative.

How to Find the Interquartile Range (IQR) by Hand

The formula for finding the interquartile range takes the third quartile value and subtracts the first quartile value.

IQR = Q3 – Q1

Equivalently, the interquartile range is the region between the 75th and 25th percentile (75 – 25 = 50% of the data).

Using the IQR formula, we need to find the values for Q3 and Q1. To do that, simply order your data from low to high and split the value into four equal portions.

I’ve divided the dataset below into quartiles. The interquartile range extends from the Q1 value to the Q3 value. For this dataset, the interquartile range is 39 – 20 = 19.

Note that different methods and statistical software programs will find slightly different Q1 and Q3 values, which affects the interquartile range. These variations stem from alternate ways of finding percentiles. For details about that, read my post about Percentiles: Interpretations and Calculations.

How to Find the Interquartile Range using Excel

All statistical software packages will identify the interquartile range as part of their descriptive statistics. Here, I’ll show you how to find it using Excel because most readers can access this application.

To follow along, download the Excel file: IQR. This dataset is the same as the one I use in the illustration above. This file also includes the interquartile range calculations for finding outliers and the IQR normality test described later in this post.

In Excel, you’ll need to use the QUARTILE.EXC function, which has the following arguments: QUARTILE.EXC(array, quart)

Array: Cell range of numeric values.
Quart: Quartile you want to find.

In my spreadsheet, the data are in cells A2:A20. Consequently, I’ll use the following syntax to find Q1 and Q3, respectively:

=QUARTILE.EXC(A2:A20,1)
=QUARTILE.EXC(A2:A20,3)

As with my example of finding the interquartile range by hand, Excel indicates that Q3 is 39 and Q1 is 20. IQR = 39 – 20 = 19

Related post: Descriptive Statistics in Excel

Using Boxplots to Graph the Interquartile Range

Boxplots are a great way to visualize interquartile ranges and their relation to the median and the overall distribution. These graphs display ranges of values based on quartiles and show asterisks for outliers that fall outside the whiskers. Boxplots work by splitting your data into quarters.

Let’s look at the boxplot anatomy before getting to the example. Notice how it divides your data into quartiles.

The box in the boxplot is your interquartile range! It contains 50% of your data. By comparing the size of these boxes, you can understand your data’s variability. More dispersed distributions have wider boxes.

Additionally, find where the median line falls within each interquartile box. If the median is closer to one side or the other of the box, it’s a skewed distribution. When the median is near the center of the interquartile range, your distribution is symmetric.

For example, in the boxplot below, method 3 has the highest variability in scores and is left-skewed. Conversely, method 2 has a tighter distribution that is symmetrical, although it also has an outlier—read the next section for more about that!

Related post: Box Plots Explained with Examples

Using the IQR to Find Outliers

The interquartile range can help you identify outliers. For other methods of finding outliers, the outliers themselves influence the calculations, potentially causing you to miss them. Fortunately, interquartile ranges are relatively robust against outlier influence and can avoid this problem. This method also does not assume the data follow the normal distribution or any other distribution. That’s why using the IQR to find outliers is one of my favorite methods!

To find outliers, you’ll need to know your data’s IQR, Q1, and Q3 values. Take these values and input them into the equations below. Statisticians call the result for each equation an outlier gate. I’ve included these calculations in the IQR example Excel file.

Q1 − 1.5 * IQR: Lower outlier gate.

Q3 + 1.5 * IQR: Upper outlier gate.

Using the same example dataset, I’ll calculate the two outlier gates. For that dataset, the interquartile range is 19, Q1 = 20, and Q3 = 39.

Lower outlier gate: 20 – 1.5 * 19 = -8.5

Upper outlier gate: 39 + 1.5 * 19 = 67.5

Then look for values in the dataset that are below the lower gate or above the upper gate. For the example dataset, there are no outliers. All values fall between these two gates.

Boxplots typically use this method to identify outliers and display asterisks when they exist. In the teaching method boxplot above, notice that the Method 2 group has an outlier. The researchers should investigate that value.

Related post: Five Ways to Find Outliers

Using the Interquartile Range to Test Normality

You can even use the interquartile range as a simple test to determine whether your data are normally distributed. When data follow a normal distribution, the interquartile range will have specific properties. The image below highlights these properties. Specifically, in our calculations below, we’ll use the standard deviations (σ) that correspond to the interquartile range, -0.67 and 0.67.

Image shows how a probability distribution function relates to a boxplot. — By Jhguch at en.wikipedia, CC BY-SA 2.5, Link

You can assess whether your IQR is consistent with a normal distribution. However, this test should not replace a formal normality hypothesis test.

To perform this test, you’ll need to know the sample standard deviation (s) and sample mean (x̅). Input these values into the formulas for Q1 and Q3 below.

Q1 = x̅ − (s * 0.67)
Q3 = x̅ + (s * 0.67)

Compare these calculated values to your data’s actual Q1 and Q3 values. If they are notably different, your data might not follow the normal distribution.

We’ll return to our example dataset from before. Our actual Q1 and Q3 are 20 and 39, respectively.

The sample average is 31.3, and its standard deviation is 14.1. I’ll input those values into the equations.

Q1 = 31.3 – (14.1 * 0.67) = 21.9

Q3 = 31.3 + (14.1 * 0.67) = 40.7

The calculated values are pretty close to the actual data values, suggesting that our data follow the normal distribution. I’ve included these calculations in the IQR example spreadsheet.

Median Definition and Uses

By Jim Frost 8 Comments

In statistics, the median is the value that splits an ordered list of data values in half. Half the values are below it and half are above—it’s right in the middle of the dataset. The median is the same as the second quartile or the 50th percentile. It is one of several measures of central tendency. [Read more…] about Median Definition and Uses

Independent and Dependent Variables: Differences & Examples

By Jim Frost 15 Comments

Independent variables and dependent variables are the two fundamental types of variables in statistical modeling and experimental designs. Analysts use these methods to understand the relationships between the variables and estimate effect sizes. What effect does one variable have on another?

In this post, learn the definitions of independent and dependent variables, how to identify each type, how they differ between different types of studies, and see examples of them in use. [Read more…] about Independent and Dependent Variables: Differences & Examples

Standard Deviation: Interpretations and Calculations

By Jim Frost 21 Comments

The standard deviation (SD) is a single number that summarizes the variability in a dataset. It represents the typical distance between each data point and the mean. Smaller values indicate that the data points cluster closer to the mean—the values in the dataset are relatively consistent. Conversely, higher values signify that the values spread out further from the mean. Data values become more dissimilar, and extreme values become more likely. [Read more…] about Standard Deviation: Interpretations and Calculations

What is the Mean and How to Find It: Definition & Formula

By Jim Frost 4 Comments

What is the Mean?

The mean in math and statistics summarizes an entire dataset with a single number representing the data’s center point or typical value. It is also known as the arithmetic mean, and it is the most common measure of central tendency. It is frequently called the “average.” [Read more…] about What is the Mean and How to Find It: Definition & Formula

Gamma Distribution: Uses, Parameters & Examples

By Jim Frost 20 Comments

What is the Gamma Distribution?

The gamma distribution is a continuous probability distribution that models right-skewed data. Statisticians have used this distribution to model cancer rates, insurance claims, and rainfall. Additionally, the gamma distribution is similar to the exponential distribution, and you can use it to model the same types of phenomena: failure times, wait times, service times, etc. [Read more…] about Gamma Distribution: Uses, Parameters & Examples

Exponential Distribution: Uses, Parameters & Examples

By Jim Frost 7 Comments

What is the Exponential Distribution?

The exponential distribution is a right-skewed continuous probability distribution that models variables in which small values occur more frequently than higher values. It is a unimodal distribution where small values have relatively high probabilities, which consistently decline as data values increase. Statisticians use the exponential distribution to model the amount of change in people’s pockets, the length of phone calls, and sales totals for customers. In all these cases, small values are more likely than larger values. [Read more…] about Exponential Distribution: Uses, Parameters & Examples

Weibull Distribution: Uses, Parameters & Examples

By Jim Frost 6 Comments

What is a Weibull Distribution?

The Weibull distribution is a continuous probability distribution that can fit an extensive range of distribution shapes. Like the normal distribution, the Weibull distribution is unimodal and describes probabilities associated with continuous data. However, unlike the normal distribution, it can also model skewed data. In fact, its extreme flexibility allows it to model both left- and right-skewed data. [Read more…] about Weibull Distribution: Uses, Parameters & Examples

Poisson Distribution: Definition & Uses

By Jim Frost 11 Comments

What is the Poisson Distribution?

The Poisson distribution is a discrete probability distribution that describes probabilities for counts of events that occur in a specified observation space. It is named after Siméon Denis Poisson.

In statistics, count data represent the number of events or characteristics over a given length of time, area, volume, etc. For example, you can count the number of cigarettes smoked per day, meteors seen per hour, the number of defects in a batch, and the occurrence of a particular crime by county. [Read more…] about Poisson Distribution: Definition & Uses

Introduction to Statistics Using the R Programming Language

By Joachim Schork 21 Comments

The R programming language is a powerful and free statistical software tool that analysts use frequently.

The R programming language is open source software where the R community develops and maintains it, while users can download it for free.

Being open source provides many advantages, including the following:

New statistical methods are quickly available because the R community is vast and active.
The source code for each function is freely available and everybody can review it.
Using the R programming language is free! That’s a significant advantage to relatively expensive statistical tools, such as SAS, STATA, and SPSS.

In this article, I give you a brief introduction to the strengths of the R programming language by applying basic statistical concepts to a real dataset using R functions. [Read more…] about Introduction to Statistics Using the R Programming Language

Scatterplots: Using, Examples, and Interpreting

By Jim Frost 9 Comments

Use scatterplots to show relationships between pairs of continuous variables. These graphs display symbols at the X, Y coordinates of the data points for the paired variables. Scatterplots are also known as scattergrams and scatter charts. [Read more…] about Scatterplots: Using, Examples, and Interpreting

Pie Charts: Using, Examples, and Interpreting

By Jim Frost Leave a Comment

Use pie charts to compare the sizes of categories to the entire dataset. To create a pie chart, you must have a categorical variable that divides your data into groups. These graphs consist of a circle (i.e., the pie) with slices representing subgroups. The size of each slice is proportional to the relative size of each category out of the whole. [Read more…] about Pie Charts: Using, Examples, and Interpreting

Bar Charts: Using, Examples, and Interpreting

By Jim Frost 4 Comments

Use bar charts to compare categories when you have at least one categorical or discrete variable. Each bar represents a summary value for one discrete level, where longer bars indicate higher values. Types of summary values include counts, sums, means, and standard deviations. Bar charts are also known as bar graphs. [Read more…] about Bar Charts: Using, Examples, and Interpreting

Line Charts: Using, Examples, and Interpreting

By Jim Frost 3 Comments

Use line charts to display a series of data points that are connected by lines. Analysts use line charts to emphasize changes in a metric on the vertical Y-axis by another variable on the horizontal X-axis. Often, the X-axis reflects time, but not always. Line charts are also known as line plots. [Read more…] about Line Charts: Using, Examples, and Interpreting

Dot Plots: Using, Examples, and Interpreting

By Jim Frost Leave a Comment

Use dot plots to display the distribution of your sample data when you have continuous variables. These graphs stack dots along the horizontal X-axis to represent the frequencies of different values. More dots indicate greater frequency. Each dot represents a set number of observations. [Read more…] about Dot Plots: Using, Examples, and Interpreting

Empirical Cumulative Distribution Function (CDF) Plots

By Jim Frost 2 Comments

Use an empirical cumulative distribution function plot to display the data points in your sample from lowest to highest against their percentiles. These graphs require continuous variables and allow you to derive percentiles and other distribution properties. This function is also known as the empirical CDF or ECDF. [Read more…] about Empirical Cumulative Distribution Function (CDF) Plots

Contour Plots: Using, Examples, and Interpreting

By Jim Frost 2 Comments

Use contour plots to display the relationship between two independent variables and a dependent variable. The graph shows values of the Z variable for combinations of the X and Y variables. The X and Y values are displayed along the X and Y-axes, while contour lines and bands represent the Z value. The contour lines connect combinations of the X and Y variables that produce equal values of Z. [Read more…] about Contour Plots: Using, Examples, and Interpreting