• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun
  • Calculators

Box Plot Explained with Examples

By Jim Frost 27 Comments

What is a Box Plot?

A box plot, sometimes called a box and whisker plot, provides a snapshot of your continuous variable’s distribution. They particularly excel at comparing the distributions of groups within your dataset. A box plot displays a ton of information in a simplified format. Analysts frequently use them during exploratory data analysis because they display your dataset’s central tendency, skewness, and spread, as well as highlighting outliers.

Box plots truly shine when comparing data distributions across different groups. Their compact design offers a neat summary of data, making it a breeze to compare distributional properties of the groups through the positioning of box and whisker markings.

Example of a box plot that displays scores by teaching method.

Use a box plot to compare distributions when you have a categorical grouping variable and a continuous outcome variable. The levels of the categorical variables form the groups in your data, and the researchers measure the continuous variable. These graphs are often precursors to hypothesis tests, such as 2-sample t-tests and ANOVA.

When you’re assessing one distribution, use a histogram because it offers a more detailed view. For more information, see Using Histograms to Understand Your Data.

Related post: Data Types

Anatomy of a Box and Whisker Plot

Instead of displaying the raw data points, a box and whisker plot takes your sample data and presents ranges of values based on quartiles using boxes and lines. Additionally, they display outliers using asterisks that fall outside the whiskers. Learn more about Quartiles: Definition, Finding & Using.

Box plots display the five-number summary. This summary includes five key data points:

  • The smallest number (minimum)
  • The first quartile (25% mark)
  • The middle number (median)
  • The third quartile (75% mark)
  • The largest number (maximum)

Together, these five values highlight your data’s distribution’s shape, spread, and central tendency. All these measures are nonparametric and do not make assumptions about the data distribution. This aspect makes a box and whisker plots especially suitable for the early stages of analysis.

This graph works by breaking your data down into quartiles. When your sample size is too small, the quartile estimates might not be meaningful. Consequently, these plots work best when you have at least 20 data points per group.

Let’s look at the anatomy of a box plot before getting to an example. Notice how it divides your data into quarters—at least approximately because the upper and lower whiskers do not include outliers, which the chart displays separately.

Diagram of a box and whiskers plot that describes the features.

The image below shows how a box and whisker plot compares to the probability distribution function for a normal distribution. The box itself is the interquartile range, which contains 50% of your data. Additionally, notice how each whisker contains 24.65% of the distribution rather than an exact 25%. Box plots consider the observations beyond the whiskers to be outliers.

Image shows how a probability distribution function relates to a boxplot, also known as a box and whiskers plot.
By Jhguch at en.wikipedia, CC BY-SA 2.5, Link

Learn more about outliers, including how a box and whisker plot detects them, in my post 5 Ways to Find Outliers in Your Data.

How to Read a Box Plot

A box and whisker plot allows you quickly assess a distribution’s central tendency, variability, and skewness. Let me show you how!

Central Tendency

To compare central tendencies in a box plot, use the median line and the overall vertical placement of the boxes.

In the graph below, Group A has a higher median line than Group B. Indeed, it’s easy to see that Group A’s entire distribution is shifted upwards relative to Group B. However, Group A’s lower quartile overlaps with Group B’s upper quartile.

Box plot that display two groups with different medians.

Related posts: Measures of Central Tendency and Median: Definition and Uses

Variability

To assess variability in a box and whisker plot, remember that half your data for each group falls within the interquartile box. The longer the box and whiskers, the greater the variability of the distribution. The total length of the whiskers represents the range of the data.

In the plot below, Group 2 has more variability than Group 1 because it has a longer box and whiskers. Group 1 ranges from approximately 3 to 7 while Group 2 ranges from roughly 1.5 to 9

Box and whisker plot that shows two groups with different variability.

Learn more about Measures of Variability.

Skewness

To determine whether a distribution is skewed in a box plot, look at where the median line falls within the box and whiskers.

You have a symmetrical distribution when the box centers approximately on the median line, and the upper and lower whiskers are about equal length. If the two sides are not roughly equivalent, your distribution is skewed.

It’s a right-skewed distribution when the median is closer to the box’s lower values and the upper whisker is longer. Notice how the long tail extends into the higher values in the box and whisker plot below, making it positively skewed.

Boxplot displays right-skewed distribution.

It’s a left-skewed distribution when the median is closer to the box’s higher values, and the lower whisker is longer. Notice that the long tail extends towards the lower values, making it negatively skewed.

Boxplot of a left-skewed distribution.

Learn more about Skewed Distributions.

Box Plot Example: Comparing Groups

Let’s combine all we’ve learned about box plots and compare four groups in this example.

Suppose we have four groups of test scores and we want to compare them by teaching method. To create this graph yourself, download the CSV data file: Boxplot. Teaching method is our categorical grouping variable and Score is the continuous outcome variable that the researchers measured.

Example of a boxplot that displays scores by four groups.

Method 1 and 2 have nearly identical medians, but Method 1 has somewhat more variability. The second method also has a high outlier that we should investigate. Method 3 has the highest variability in scores and is potentially left-skewed. Method 4 has the highest median.

 

Share this:

  • Tweet

Like this:

Like Loading…

Related

Filed Under: Graphs Tagged With: choosing analysis, data types, distributions, graphs

Reader Interactions

Comments

  1. Wendy says

    November 28, 2024 at 5:45 am

    Hi Jim,
    My median is closer to the box’s higher values. However, the lower whisker is shorter than the upper whisker. Means the long tail extends towards higher value. In this case, is it considered as positive-skewed?

    and also from your example of boxplot score for method 4 (in your diagram above), is it positive or negative skewed?

    Loading...
    Reply
    • dindin says

      January 23, 2025 at 8:47 am

      hi wendy,
      i have the exact case like you, do you have the answer? if you please let me know too
      thank you

      Loading...
      Reply
      • Jim Frost says

        January 23, 2025 at 1:44 pm

        Hi dindin, I just answered Wendy’s question. Please see that for my advice!

        Loading...
        Reply
    • Jim Frost says

      January 23, 2025 at 1:36 pm

      Hi Wendy,

      I’d recommend graphing your data in a histogram to get the best sense of the distribution, including the skew. Boxplots are great for comparing different distributions at a glance. However, for getting the specifics of a single distribution, histograms are better.

      It sounds like your boxplot is giving you mixed signals about which direction the distribution is skewed. The median is closer to the higher values (suggesting left or negative skew) while the long tail is for the higher values (suggesting a right or positive skew). Overall, I’d give priority over the total length of the tail away from the median and favor your data being right/positive skewed). However, again, check in a histogram to be sure!

      For Method 4, because the lengths of both tails are about equal, it’s overall not skewed even though the median is a bit closer to the lower values. Here’s the histogram of Method 4 where it’s easier to see! It is mostly not skewed but you can see that it has a somewhat weird shape too with the 2nd spike. That affects what you see less clearly in the boxplot. Histograms are the better tool for that specific purpose.

      Histogram to compare to the boxplot.

      Loading...
      Reply
      • dindin says

        January 28, 2025 at 2:47 pm

        Thank you so much for the explanation, i really appreciate it!! After making and checking the histogram i found out that my data is a little right/positive skewed. Your website and answers really helped me in finishing my final assigment, thank you once again. Wishing you a happy and healthy life jim!

        Loading...
        Reply
        • Jim Frost says

          January 28, 2025 at 3:50 pm

          You’re very welcome! And thanks so much!

          Loading...
          Reply
  2. Funsho Olukade says

    October 22, 2024 at 3:14 pm

    Prof,
    Since you have a way of explaining complex statistical concept in very simple ways, I was able to discover a simple way to know the skewness if the box plot, just by reading your post today, and this is it:
    Rotate the box plot 90 degree to the right and let the median line to become vertically positioned. Then check the side of the median line with the larger area. If the larger area of the median is on the right, then it’s right skewed. If the larger area of the median is on the left, then it’s left skewed. If the boxes on either side are equal, then it’s symmetrical or non-skewed. Did I end up confusing my readers the more? I hope not ?

    Loading...
    Reply
    • Jim Frost says

      October 24, 2024 at 11:05 pm

      Hi Funsho,

      I think that’s a great way to look at it! 🙂

      Loading...
      Reply
  3. Mirabel Balon says

    October 22, 2024 at 4:19 am

    I just want to say Thank You Jim, this is really useful and all the replies to comments made. I’ve never understood box plots better!

    Loading...
    Reply
    • Jim Frost says

      October 22, 2024 at 1:34 pm

      Hi Mirabel,

      Thanks so much for your kind comment! It means so much too me and I’m glad that it was helpful for you. 🙂

      Loading...
      Reply
  4. Nikos says

    October 15, 2024 at 11:07 am

    Hi Jim,

    Thank you for your great work in explaining things that tend to be very complicated. I want to prepare a few boxplots for an academic paper and the presence of the outlier points in the graph can be distracting. Is it a common practice to present box plots without outlier points in academic papers? Thank you!

    Loading...
    Reply
    • Jim Frost says

      October 17, 2024 at 10:04 pm

      Hi Nikos,

      No! You shouldn’t present the boxplots without outliers if they are present in your data. That would be misrepresenting the data because it shows them as having less variability than they actually have.

      Loading...
      Reply
  5. Karla Nicole Mesa says

    September 21, 2024 at 4:26 pm

    Hi Jim, thank you for the explanation, however, I have troubles interpretating the boxplot regarding the example. I mean, what does it mean in this example that the median is lower or higher than other method? And what does it mean that a method has a higher or lower variability? In terms of the score and technical method

    Loading...
    Reply
    • Nebyu says

      October 15, 2024 at 1:22 pm

      I will try to answer your first question.

      Median, mean, and mode are termed as measures of central tendency. What that means is that they represent a value where most of your data in your dataset is clustered around. Therefore, measures of central tendency can be used to describe your data. More information on when to use them can be found here https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/.

      Using the above example, we can compare the median scores and say that method 4 has a larger median. however, unless we have performed a hypothesis test like ANOVA in this case, we can’t say that the difference in median is statistically significant.

      I hope this helps. Cheers!

      Loading...
      Reply
      • Mirabel Balon says

        October 22, 2024 at 4:18 am

        Thanks for the explaination Nebyu. So are you saying unless we conduct an ANOVA test we cannot conclude if Method 4 is a good/bad teaching method, whichever way we’re looking at it.

        Loading...
        Reply
        • Jim Frost says

          October 22, 2024 at 1:40 pm

          Hi Mirabel,

          Graphs are great for displaying data and patterns present in them. However, when you’re using samples to draw conclusions about a population, the patterns you see in a graph could be due to random chance in the sample rather than reflecting true relationships in the population. Hypothesis tests, such as ANOVA, help you separate those random chance cases from the true population effect cases. For this specific example, One-Way ANOVA followed up by a post hoc test can tell you if Method 4 is significantly better from the other groups in the entire population by accounting for the possibility of random chance in the sample.

          And thanks to Nebyu for explaining the other concepts! 🙂

          I hope that helps!

          Loading...
          Reply
  6. Susie says

    July 6, 2024 at 4:06 am

    Hello and thanks! This is all beautifully explained. However…I’m having trouble producing the actual graphs with my data, which has counts of sea snails (continuous variables) on two types of seaweed categorical variables).
    Do you have instructions for this (I’m using Excel in MS Office 365) or can you point me to smile instruction website/video?
    I will certainly be citing this site in my course report.

    Loading...
    Reply
  7. Thales says

    July 2, 2024 at 6:43 am

    Hello, nice article. Since box plot is a non parametric test, must I not relate it to p value?
    This is a good content and I will be able to discuss better my results in my dissertation. Thanks.

    Loading...
    Reply
    • Jim Frost says

      July 2, 2024 at 10:42 pm

      Hi Thales,

      Yes, if you want to test a pattern you see in any graph to see if it exists in a population, you need to use a hypothesis test (and have a representative sample). Boxplots primarily compare group medians, so one of those non-parametric tests would be a good one. Although, be aware that those tests only assess medians when the group distributions all have the same shape. An alternative would to use a bootstrap method when they have different shapes.

      Even though you’re using a boxplot, you might still want to test medians with ANOVA if it better represents the center of your distributions. You’re not looked into using non-parametric tests.

      Loading...
      Reply
  8. Barry Lloyd says

    June 26, 2024 at 12:45 pm

    Hi Jim, I have come across a data set (N=72) that rejects the Null hypothesis in a mintab normality test but the whiskers on it’s boxplot are of equal length (showing no skewness) and therefore the data should be of Normal Distribution?

    Loading...
    Reply
    • Jim Frost says

      June 26, 2024 at 1:43 pm

      Hi Barry,

      There are several potential things going on here.

      First, not all symmetric distributions are normal. For example, the Cauchy and Laplace distributions are symmetric but non-normal. A distribution can be symmetric but have too many or two few observations in the tail to be normal. See Kurtosis for details.

      So, it’s possible that your data fit a different symmetrical distribution.

      Alternatively, your data might have trivial deviations from the normal distribution that the test has sufficient power to detect. You don’t have a huge sample size so that’s less of a problem.

      I’d recommend the following:

      Graph your data in a histogram to visualize its distribution. That’ll give you a better picture than the boxplot.

      In Minitab’s normality test, look at the Probability Plot and see if the data points generally follow the straight line. If so, your data follow a normal distribution despite the p-value. I write about this in my post about QQ Plots, which are the same idea as probability plots.

      Loading...
      Reply
  9. Sheila says

    May 8, 2024 at 4:03 am

    Hi thanks for this , however I have an assignment and I am quite confused on this , I have data from a mop up vaccination for 25 person for 5 local government across 5 different primary health care
    Using summary statistics and appropriate presentation show the performance across the 5 phc

    Would you consider I use a box plot for this?
    However I always thought box plot was for continuous data.

    Loading...
    Reply
    • Jim Frost says

      May 8, 2024 at 8:36 pm

      Hi Sheila,

      Yes, for boxplots you should have a continuous outcome variable divided into groups. I can’t tell what your data are like so I’m not sure if a boxplot is a good choice. A boxplot would allow you to compare the 5 primary health care providers but you’d need to be comparing some type of continuous variable.

      Loading...
      Reply
  10. Badmus Yetunde says

    April 12, 2024 at 5:15 am

    Hi
    The last example given above is not clear regarding teaching method 4, why does method 4 have the highest median value

    Loading...
    Reply
    • Jim Frost says

      April 12, 2024 at 1:49 pm

      Hi,

      If you refer to the Anatomy of a Boxplot section in this post, you’ll see that the horizontal line within the box represents the median. In the final example, that horizontal line is the furthest up on the Y-axis for Method 4. Hence, Method 4 has the highest median.

      Loading...
      Reply
  11. WKBN Prame says

    February 12, 2024 at 7:04 am

    Can I get better results by transforming data to normality before making the Tukey’s Box Plot and then back calculating?

    Loading...
    Reply
    • Jim Frost says

      February 12, 2024 at 6:10 pm

      Hi,

      It’s important to stay close to your original data to really understand its distribution. I recommend creating a regular box plot with untransformed data. It’s a great tool for the initial exploration of your data. See it how it is. A normal box plot also finds outliers using an approach that doesn’t depend on the normal distribution.

      Loading...
      Reply

Comments and QuestionsCancel reply

Primary Sidebar

Meet Jim

Iโ€™ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Buy My Thinking Analytically Book!

    Cover for my book, Thinking Analytically: An Guide for Making Data-Driven Decisions.

    Top Posts

    • F-table
    • Cronbachโ€™s Alpha: Definition, Calculations & Example
    • Z-table
    • How To Interpret R-squared in Regression Analysis
    • Box Plot Explained with Examples
    • Interpreting Correlation Coefficients
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Cohens D: Definition, Using & Examples
    • T-Distribution Table of Critical Values

    Recent Posts

    • Data Collection Methods: Step-By-Step Guide with Examples
    • ANOVA Calculator
    • Positive Predictive Value: Meaning, Formula, and Interpretation
    • Median Absolute Deviation Calculator
    • Median Absolute Deviation: Definition, Finding & Formula
    • Outlier Calculator

    Recent Comments

    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Comparing Regression Lines with Hypothesis Tests
    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Pareto Chart: Making, Reading & Examples

    Copyright © 2026 · Jim Frost · Privacy Policy

    %d