Graphing your data before performing statistical analysis is a crucial step. Graphs bring your data to life in a way that statistical measures do not because they display the relationships and patterns. In this blog post, you’ll learn about using boxplots and individual value plots to compare distributions of continuous measurements between groups. You’ll also learn why you need to pair these plots with hypothesis tests when you want to make inferences about a population.

Use boxplots and individual value plots when you have a categorical grouping variable and a continuous outcome variable. The levels of the categorical variables form the groups in your data, and the researchers measure the continuous variable.

Both of these graphs allow you to compare the distribution of the continuous values between the groups in your sample data. You can assess properties such as the shapes of the distributions, central tendencies, and variability while looking out for outliers. These types of graphs are often precursors to hypothesis tests, such as 2-sample t-tests and ANOVA.

The sample datasheet below shows how researchers record data for both types of charts. Material is the categorical variable while Strength is the continuous variable. We’ll use this dataset for the individual value plot example that follows.

Note that the graphs in this post are best for comparing distributions between groups. When you need to assess a single continuous distribution, histograms and probability distribution plots are often better choices. For more information about histograms, see Using Histograms to Understand Your Data.

**Related posts**: Measures of Central Tendency, Measures of Variability, and Data Types

## Individual Value Plots

As the name suggests, individual value plots display the value of each observation. This graph is best when you have fewer than 50 data points per group. With a larger number of samples, the data points can become packed close together, jumbled, and hard to evaluate.

- Assess the central tendency by noting the vertical position of each group’s center.
- Assess the variability by gauging the vertical range of data points within each group.

Let’s take a look at the example below. This chart displays the strengths of four different materials. To create this graph yourself, download the CSV data file: IndividualValuePlot. Material type is our categorical grouping variable and Strength is the continuous outcome variable that the researchers measured.

It appears that several different things are happening in this graph. We can compare the central tendencies of the groups. Material 1 has the highest central tendency of the four groups while Material three has the lowest. Regarding variability, Material 3 has a broader range than the other groups.

## Boxplots

Like individual value plots, use boxplots to compare the shapes of distributions, find central tendencies, assess variability, and identify outliers. Boxplots are also known as box and whisker diagrams. While boxplots have the same goals as individual value plots, they look very different.

Instead of displaying the raw data points, boxplots take your sample data and present ranges of values based on quartiles and display asterisks for outliers that fall outside the whiskers. Boxplots work by breaking your data down into quartiles. When your sample size is too small, the quartile estimates might not be meaningful. Consequently, these graphs work best when you have at least 20 data points per group.

Let’s take a look at the anatomy of a boxplot before getting to an example. Notice how it divides your data into quarters—at least approximately because the upper and lower whiskers do not include outliers, which the chart displays separately.

The image below shows how boxplots compare to the probability distribution function for a normal distribution. Notice how each whisker contains 24.65% of the distribution rather than an exact 25%. Boxplots consider the observations beyond the whiskers to be outliers.

Learn more about outliers, including how boxplots detect them, in my post 5 Ways to Find Outliers in Your Data.

## Using Boxplots to Assess Distributions

If you’re assessing a single distribution, using a histogram or a probability distribution plot is probably better. However, for comparing multiple distributions, boxplots are a great method. I find that they’re easier to interpret than individual value plots when you have a sufficiently large sample size.

To compare central tendencies, use the median line in the boxes.

For the variability, remember that half your data for each group falls within the interquartile box. The longer the box and whiskers, the greater the variability of the distribution.

To determine whether a distribution is skewed, look at where the data fall compared to the median. For symmetric distributions, the length of the box and whiskers on both sides of the median should be approximately equal. If the two sides are not roughly equivalent, your distribution is skewed.

## Example of Using a Boxplot to Compare Groups

Suppose we have four groups of scores and we want to compare them by teaching method. To create this graph yourself, download the CSV data file: Boxplot. Teaching method is our categorical grouping variable and Score is the continuous outcome variable that the researchers measured.

Method 1 and 2 have nearly identical medians, but Method 1 has somewhat more variability. The second method also has an outlier that we should investigate. Method 3 has the highest variability in scores and is potentially left-skewed. Method 4 has the highest median.

## Using Individual Value Plots and Boxplots in Conjunction with Hypothesis Tests

Graphing your data is an excellent way to obtain a more intuitive feel for the data. Are there differences between the groups? While these graphs can illustrate your data in this manner, you should use hypothesis tests in conjunction with these graphs if you want to go beyond just describing your sample and instead draw inferences about the population. If you go this route, you’ll also need to use a sampling method, such as random sampling, to obtain a sample that represents the population.

**Related posts**: Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Do the differences in your sample represent differences in the population? You might see patterns in the graphs of your sample data that are just flukes based on random sampling rather than denoting a real relationship in the population. On boxplots and individual value plots, random error in your sample can produce apparent differences between the central tendency and variability of the groups. Additionally, arbitrary graph factors such as the scale of the Y-axis can exaggerate the appearance of differences.

Hypothesis tests play a critical role in separating the signal (real effects in the population) from the noise (random sampling error). This protective function helps prevent you from mistaking random error for a real effect. If the appropriate hypothesis test is not statistically significant, your sample provides insufficient evidence for concluding that the pattern on your graph represents a real effect at the population level. In other words, you might be looking at noise in the sample.

## Hypothesis Tests for Boxplots and Individual Value Plots

The following are hypothesis tests you can use in conjunction with these graphs:

- 2-sample t-test: Assess the equality of two group means.
- ANOVA: Test the equality of three or more group means.
- Mann-Whitney: Assess the equality of two group medians.
- Kruskal-Wallis and Mood’s Median: Test the equality of three or more group medians.
- Test of Equal Variances: Assess the equality of group variances or standard deviations.

Boxplots and individual value plots are great ways to explore your data. Just be sure to use a representative sampling methodology and an appropriate hypothesis test if you want to go beyond the sample and draw inferences about the entire population.

Mko says

Hello dear Jim,

I have collected test data from control group (n=54) and experimental group (54) to determine effectiveness of a teaching strategy. The data fail to meet normality, outliers, and homogeneity of variance assumptions. It seems I have two choices: nonparametric test or data transformation. So, my point is which is appropriate nonparametric test vs parametric analysis of transformed data.

Regards

Marco says

Hi Jim, thank you very much. It is a clinical question. How can I use a boxplot to compare the result of an investigation on one patient against a boxplot of results in the same investigation in a control group.

Thank you in advance

Marco

Jim Frost says

Hi Marco,

On a boxplot that shows the control group distribution, I think you’d need to add a reference line on the Y-axis (assuming the typical orientation) that represents the individual’s outcome measure. That will show where the individual falls within the distribution of scores for the control group. Using a histogram might be more effective for a single distribution. For a histogram, you’d add a reference line on the X-axis (typical graph orientation).

Nathaly Salas says

Receive greetings. I start in the world of statistics to be able to support my research. Looking for answers to my doubts I found your blog, excellent work. I have some doubts about hypothesis testing for boxplots.

1. First, boxplots indicate that there are apparently differences between samples (abundance of individuals per month).

2. However, the hypothesis test indicates that there are no significant differences, I used the Kruskall-Wallis hypothesis test (p = 0.2094). Previously the Shapiro-Wilk test (p = 0.0020) indicated that the distribution was not normal and Levene’s homoscedasticity test (0.00005) indicated that the variances were not equal.

3. The sample size is 12 groups (months) with 3 data each (3 sampling sites). Are the results of both bloxplot and Kruskall-Wallis affected by the sample size? Are both incorrect?

4. Using the same sample size with other variables, sometimes the statistical program indicates that it is not possible to perform the Shapiro-Wilk or Levene test.

5. Are the results of an ANOVA, applied to the sample size referred to above, valid?

6. In which cases will I use ANOVA, Kruskall or Welch?, This according to the distribution of the data, homoscedasticity, and also the sample size (number of groups and data per group?

First of all, Thanks.

Jim Frost says

Hi Nathaly,

Thank you for your kind words!

It sounds like you only have three observations per group? That’s a very small sample size. In fact, it’s probably too small to obtain meaningful results. You’re going to have very low power (low ability to detect differences) with that small of a sample size. That’s also why you can’t perform those other analyses as you describe. It’s just too few observations per group.

It’s not uncommon to see differences in a graph (like boxplot) but not obtain significant results. I talk about that in this blog post. Read the section about using these graphs in conjunction with hypothesis tests. What you see in the graph can be random error in the data rather than a true difference between means. Also, given the very small sample size per group, it’s not surprising that the test was insignificant.

For such a small sample size and nonnormal data, you’d need to use Kruskall-Wallis. ANOVA results would not be valid for nonnormal distributions with such a small sample size. Read my post about parametric vs. nonparametric tests for more information about choosing between these tests.

Read those posts that I provide links for because they’ll answer your questions. Regarding the heteroscedasticity (unequal variances) between groups, normally I’d recommend that you use Welch’s ANOVA because that analysis can handle unequal variances. However, you do not have a large enough sample size to use it.

I hope this helps! Best of luck with your study!

Luíza says

Thank you very much, Jim!

That was very helpful!

Jim Frost says

You’re very welcome!

Luíza says

Hi, Jim!

Thank you very much for your reply!

But I am afraid I did not make my question clear! Sorry for that!

I didn’t mean the day with the highest number of births in a given day, I meant the day of the week with the highest number of *total* births in the whole data set.

I mean, suppose those plots refer to a given year (in which each day of the week has n=52 observations of number of births per day). In that year, which day of the week had more births (in absolute numbers)? Can I infer that from the highest median? As a rule for boxplots?

And if the samples were not the same size (meaning, not 52 observations for each day)?

Thank you again!!

Jim Frost says

Hi Luiza,

Oh, I got it now! Sorry for misunderstanding! It didn’t help that I was replying very late at night!

So, the highest total births is more complicated to obtain from just the boxplot. And, from the boxplots in the link you provided, you don’t have enough information to know for sure. I’ll walk through the issues to consider.

Median vs mean: If the plots displayed the mean and you knew the number of observations per group, you could determine the exact total for each day. You’d simply take each group’s mean and multiply it by the number of observations on that day. So, if Monday’s mean was 30 and you had 52 observations over a year, you know there were 1560 total births on Monday.

If all the groups had the same number of observations, you’d be multiplying the means for all groups by a constant value (e.g., 52). In that case, you wouldn’t need to do the multiplication. You could simply find the highest mean and know that the corresponding day had the highest total number of births. If you want to know that total value, you’d do the multiplication.

If you have an unequal number of observation, you’d need to do the multiplication to determine which group has the most births. You’d need to know the sample size per group and multiply that by the group’s mean. Although, in this case, you’d have to wonder about the validity of comparing groups of different sample sizes. Suppose Monday had 52 data points while Tuesday had only 45 for some reason or the other. If you find that Monday had more total births, that’s not too surprising or informative because you comparing Monday’s total over 52 weeks to Tuesdays total over 45 weeks. You can get the totals, but comparing them might not be informative. For unequal sample sizes, using means and/or medians is probably more informative.

However, the boxplots in the link do not indicate the mean. Typically, boxplots display the median, but you can optionally add the mean. The median isn’t calculated in the same way as the mean. You can’t just multiply the median by the number of observations to get the total. However, the median and mean will be nearly equal in some cases. In those cases, you can use the median as an estimate of the mean and use the process I describe above to obtain an estimated total births for a day.

So, when are the mean and median nearly equal? When the distributions are symmetric, they’ll be close. So, we need to check the distributions and see if they’re symmetric. If so, you can at least get an estimate of the totals for each group. However, in the boxplots, some of the distributions are not symmetric (e.g., Monday, Wednesday, and Thursday). For asymmetric distributions the longer tail will pull the mean away from the median. The median is no longer a good estimate of the mean. Which means you can’t really produce a good estimate of the total births per day. Read my post about measures of central tendency to see how this works.

At best, you can look at the totality of the boxplots to get an idea of which day or days will have the highest totals. If you look at Thursday and Friday, they both have the highest medians, their interquartile boxes are also the highest overall. Those IQ boxes indicate that half of the values for these two days fall in higher ranges than the other days Collectively, I’d say Thursday and Friday are good candidates for having the highest total numbers. Additionally, it would be difficult if not impossible to determine which of those two days are highest given the information in the graph. Using the same logic, you can probably exclude Saturday, Sunday, and possibly Monday from having the highest totals. Those are all educated guesses rather than exact answers, but that’s the best you can do from those graphs.

All of that is assuming equal sample sizes per group. If the sample sizes are different and we don’t know what they are, then there’s no way to know.

I hope this helps!

Charles says

Hi Jim, I find your explanation so interesting, simple and educative. I will let my student know about your

ebook

Jim Frost says

Hi Charles,

Thank you so much! That’s what I strive for! And, thanks for letting your students know about my ebook! 🙂

Luiza de Oliveira Rodrigues says

Dear Jim,

I have 2 questions on boxplots and I would really appreciate if you could help me!

1) When I have the same amount of observations, but data are differently skewed between groups, can I affirm that the highest median is the highest value of a given data? Example: number of births by week days in a given year. I have roughly 52 observations (n=52) for each day of the week, but different numbers of births on each given day, with different dispersion. When I look at the boxplots, can I infer that the day with the highest median is the day with more births? see plots here: http://www.people.vcu.edu/~pdattalo/BoxPots.html

2) When two groups have a different number of observations, but the same median and different dispersion of data, how do I infer the highest value? Example: on the link above, if I compare Wednesday and Friday (roughly the same median), if they did not have the same n, how could I infer the day with the greater number of births?

Thank you so much for you blog!!!!!!

Luíza

Jim Frost says

Hi Luiza,

Thanks for writing!

I looked at the boxplots on the link you included. For those plots, you’d actually look at the whiskers to see which day has the highest number of births. Thursday has the highest median, but the whiskers show the range of the data for each. And, the whisker for Monday extends to the highest value. Because there are no outliers in any of the groups, we can just look at upper limits of the whiskers, and Monday is the winner! Monday has one of the lower medians but it has the highest variability of all the days, and it’s upper whisker extends to the highest value. Monday’s interquartile box is about the same height as the other days, so it’s really that upper quartile that’s stretched out.

For your second question, you can again just look at the whiskers to see the highest values. Having different sample sizes for groups doesn’t change that aspect because you’re just looking for the largest value in the dataset. You can get all sorts of medians and spreads, but the whiskers will show you the limits of the data, unless there outliers beyond. If there are outliers, you’ll need to look to those for the most extreme value–and maybe assess whether it’s valid or not!

You could also consider plotting these data in an individual value plot. In that case, you’d see the dot for the highest value on Monday right where its whisker ends in the boxplot. That makes the idea a bit more concrete because you’re seeing the data points.

I hope this helps!

Mani says

Sir, I’m running a regression model with dependent variable is continuous and some of co-variates are

dummies and some are of 3,4,5 categories.Then how can i plot my data ?to see the trend and the observations are 26000

Jim Frost says

Hi Mani,

With categorical independent variables as you describe, you can’t plot the trend like you do when you have both continuous independent and dependent variables. Categorical variables represent groups in your data and you’re analyzing differences between group means. You can use boxplots or individual value plots (IVPs) to graph the differences between groups as I show in this post. With so many observations, I’d recommend boxplots because IVPs would look very cluttered with so many observations!

Jerry says

I understand that there are actually several versions of boxplots — not all of them use the ~25% of-the-data rule for whiskers.

Jim Frost says

Hi Jerry, there are a few variants. I’d say the version I present here is the most common–the whiskers are not quite 25% because they don’t include outliers.

VENU says

Wonderful explanation…

Mark B. says

Excellent. Appreciate your interpretation of these facts. Easier to comprehend.

psychnstatstutor Chart your course to success says

Wonderful, will be printing and adding to my ‘grimoire’ textbook. Also am making a vid to share your post with my student network. A common question amongst them is: “What do I write about?”, and your post can really help them with that, as well as engaging with their data story more. Personally, I would have liked to see some citations, because my network are writing critical essays/research reports/theses they need to back up their decisions that they are making explicit. e.g., boxplots work better when there are more than 20 participants per group.

Jim Frost says

Hi, thank you so much for you kind comments and for sharing my post with your students!

I would have liked to provide a reference as well for the minimum sample size per group for boxplots. I haven’t found a solid reference. However, given that boxplots break the data down by quartiles, there is some mininum number that produces meaningful quartile estimates. With 20 observations in a group, that’s only 5 observations per quartile. I don’t think you could go much lower before the quartile estimates lose meaning.

When you have more than 20 per group, I think it’s a matter of preference to some degree. Seeing the density of dots on individual value plots (IVPs) can also be meaningful. I personally like knowing that half the data falls within the box, one quarter above, and one quarter below. However, I’m not sure it’s inherently better than IVPs. It seems like a cleaner display to me. But, you can probably obtain a similar impression from IVPs. At some point, the individual value plots have a hard time displaying large numbers of observations per group, but that can be a play it by eye decision.

I don’t want to suggest that boxplots are inherently better than IVPs. I do like them–particularly when you want to understand the groups at a glance. With the dots on IVPs, I think you need to observe the graph a bit longer to absorb the same information. You’re mentally assessing the dot density and comparing them between groups. That’s not necessarily a bad thing by any means. But, when you’re conveying information to others, and you’re not sure how much effort they’ll put into understanding the graphs, my sense is that boxplots convey the information more reliably.

One thing I do like better about IVPs for small sample sizes is that you see how small your dataset is. If you create a boxplot, it can hide a small sample size unless the analyst states the sample size clearly–which they should! But, if you look at the IVP in this post, it’s clear you’ve got a small sample, so tread carefully!