Graphing your data before performing statistical analysis is a crucial step. Graphs bring your data to life in a way that statistical measures do not because they display the relationships and patterns. In this blog post, you’ll learn about using boxplots and individual value plots to compare distributions of continuous measurements between groups. You’ll also learn why you need to pair these plots with hypothesis tests when you want to make inferences about a population.

Use boxplots and individual value plots when you have a categorical grouping variable and a continuous outcome variable. The levels of the categorical variables form the groups in your data, and the researchers measure the continuous variable.

Both of these graphs allow you to compare the distribution of the continuous values between the groups in your sample data. You can assess properties such as the shapes of the distributions, central tendencies, and variability while looking out for outliers. These types of graphs are often precursors to hypothesis tests, such as 2-sample t-tests and ANOVA.

The sample datasheet below shows how researchers record data for both types of charts. Material is the categorical variable while Strength is the continuous variable. We’ll use this dataset for the individual value plot example that follows.

Note that the graphs in this post are best for comparing distributions between groups. When you need to assess a single continuous distribution, histograms and probability distribution plots are often better choices. For more information about histograms, see Using Histograms to Understand Your Data.

**Related posts**: Measures of Central Tendency, Measures of Variability, and Data Types

## Individual Value Plots

As the name suggests, individual value plots display the value of each observation. This graph is best when you have fewer than 50 data points per group. With a larger number of samples, the data points can become packed close together, jumbled, and hard to evaluate.

- Assess the central tendency by noting the vertical position of each group’s center.
- Assess the variability by gauging the vertical range of data points within each group.

Let’s take a look at the example below. This chart displays the strengths of four different materials. To create this graph yourself, download the CSV data file: IndividualValuePlot. Material type is our categorical grouping variable and Strength is the continuous outcome variable that the researchers measured.

It appears that several different things are happening in this graph. We can compare the central tendencies of the groups. Material 1 has the highest central tendency of the four groups while Material three has the lowest. Regarding variability, Material 3 has a broader range than the other groups.

## Boxplots

Like individual value plots, use boxplots to compare the shapes of distributions, find central tendencies, assess variability, and identify outliers. Boxplots are also known as box and whisker diagrams. While boxplots have the same goals as individual value plots, they look very different.

Instead of displaying the raw data points, boxplots take your sample data and present ranges of values based on quartiles and display asterisks for outliers that fall outside the whiskers. Boxplots work by breaking your data down into quartiles. When your sample size is too small, the quartile estimates might not be meaningful. Consequently, these graphs work best when you have at least 20 data points per group.

Let’s take a look at the anatomy of a boxplot before getting to an example. Notice how it divides your data into quarters—at least approximately because the upper and lower whiskers do not include outliers, which the chart displays separately.

The image below shows how boxplots compare to the probability distribution function for a normal distribution. Notice how each whisker contains 24.65% of the distribution rather than an exact 25%. Boxplots consider the observations beyond the whiskers to be outliers.

## Using Boxplots to Assess Distributions

If you’re assessing a single distribution, using a histogram or a probability distribution plot is probably better. However, for comparing multiple distributions, boxplots are a great method. I find that they’re easier to interpret than individual value plots when you have a sufficiently large sample size.

To compare central tendencies, use the median line in the boxes.

For the variability, remember that half your data for each group falls within the interquartile box. The longer the box and whiskers, the greater the variability of the distribution.

To determine whether a distribution is skewed, look at where the data fall compared to the median. For symmetric distributions, the length of the box and whiskers on both sides of the median should be approximately equal. If the two sides are not roughly equivalent, your distribution is skewed.

## Example of Using a Boxplot to Compare Groups

Suppose we have four groups of scores and we want to compare them by teaching method. To create this graph yourself, download the CSV data file: Boxplot. Teaching method is our categorical grouping variable and Score is the continuous outcome variable that the researchers measured.

Method 1 and 2 have nearly identical medians, but Method 1 has somewhat more variability. The second method also has an outlier that we should investigate. Method 3 has the highest variability in scores and is potentially left-skewed. Method 4 has the highest median.

## Using Individual Value Plots and Boxplots in Conjunction with Hypothesis Tests

Graphing your data is an excellent way to obtain a more intuitive feel for the data. Are there differences between the groups? While these graphs can illustrate your data in this manner, you should use hypothesis tests in conjunction with these graphs if you want to go beyond just describing your sample and instead draw inferences about the population. If you go this route, you’ll also need to use a sampling method, such as random sampling, to obtain a sample that represents the population.

**Related posts**: Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Do the differences in your sample represent differences in the population? You might see patterns in the graphs of your sample data that are just flukes based on random sampling rather than denoting a real relationship in the population. On boxplots and individual value plots, random error in your sample can produce apparent differences between the central tendency and variability of the groups. Additionally, arbitrary graph factors such as the scale of the Y-axis can exaggerate the appearance of differences.

Hypothesis tests play a critical role in separating the signal (real effects in the population) from the noise (random sampling error). This protective function helps prevent you from mistaking random error for a real effect. If the appropriate hypothesis test is not statistically significant, your sample provides insufficient evidence for concluding that the pattern on your graph represents a real effect at the population level. In other words, you might be looking at noise in the sample.

## Hypothesis Tests for Boxplots and Individual Value Plots

The following are hypothesis tests you can use in conjunction with these graphs:

- 2-sample t-test: Assess the equality of two group means.
- ANOVA: Test the equality of three or more group means.
- Mann-Whitney: Assess the equality of two group medians.
- Kruskal-Wallis and Mood’s Median: Test the equality of three or more group medians.
- Test of Equal Variances: Assess the equality of group variances or standard deviations.

Boxplots and individual value plots are great ways to explore your data. Just be sure to use a representative sampling methodology and an appropriate hypothesis test if you want to go beyond the sample and draw inferences about the entire population.

Mani says

Sir, I’m running a regression model with dependent variable is continuous and some of co-variates are

dummies and some are of 3,4,5 categories.Then how can i plot my data ?to see the trend and the observations are 26000

Jim Frost says

Hi Mani,

With categorical independent variables as you describe, you can’t plot the trend like you do when you have both continuous independent and dependent variables. Categorical variables represent groups in your data and you’re analyzing differences between group means. You can use boxplots or individual value plots (IVPs) to graph the differences between groups as I show in this post. With so many observations, I’d recommend boxplots because IVPs would look very cluttered with so many observations!

Jerry says

I understand that there are actually several versions of boxplots — not all of them use the ~25% of-the-data rule for whiskers.

Jim Frost says

Hi Jerry, there are a few variants. I’d say the version I present here is the most common–the whiskers are not quite 25% because they don’t include outliers.

VENU says

Wonderful explanation…

Mark B. says

Excellent. Appreciate your interpretation of these facts. Easier to comprehend.

psychnstatstutor Chart your course to success says

Wonderful, will be printing and adding to my ‘grimoire’ textbook. Also am making a vid to share your post with my student network. A common question amongst them is: “What do I write about?”, and your post can really help them with that, as well as engaging with their data story more. Personally, I would have liked to see some citations, because my network are writing critical essays/research reports/theses they need to back up their decisions that they are making explicit. e.g., boxplots work better when there are more than 20 participants per group.

Jim Frost says

Hi, thank you so much for you kind comments and for sharing my post with your students!

I would have liked to provide a reference as well for the minimum sample size per group for boxplots. I haven’t found a solid reference. However, given that boxplots break the data down by quartiles, there is some mininum number that produces meaningful quartile estimates. With 20 observations in a group, that’s only 5 observations per quartile. I don’t think you could go much lower before the quartile estimates lose meaning.

When you have more than 20 per group, I think it’s a matter of preference to some degree. Seeing the density of dots on individual value plots (IVPs) can also be meaningful. I personally like knowing that half the data falls within the box, one quarter above, and one quarter below. However, I’m not sure it’s inherently better than IVPs. It seems like a cleaner display to me. But, you can probably obtain a similar impression from IVPs. At some point, the individual value plots have a hard time displaying large numbers of observations per group, but that can be a play it by eye decision.

I don’t want to suggest that boxplots are inherently better than IVPs. I do like them–particularly when you want to understand the groups at a glance. With the dots on IVPs, I think you need to observe the graph a bit longer to absorb the same information. You’re mentally assessing the dot density and comparing them between groups. That’s not necessarily a bad thing by any means. But, when you’re conveying information to others, and you’re not sure how much effort they’ll put into understanding the graphs, my sense is that boxplots convey the information more reliably.

One thing I do like better about IVPs for small sample sizes is that you see how small your dataset is. If you create a boxplot, it can hide a small sample size unless the analyst states the sample size clearly–which they should! But, if you look at the IVP in this post, it’s clear you’ve got a small sample, so tread carefully!