• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Box Plot Explained with Examples

By Jim Frost Leave a Comment

What is a Box Plot?

A box plot, sometimes called a box and whisker plot, provides a snapshot of your continuous variable’s distribution. They particularly excel at comparing the distributions of groups within your dataset. A box plot displays a ton of information in a simplified format. Analysts frequently use them during exploratory data analysis because they display your dataset’s central tendency, skewness, and spread, as well as highlighting outliers.

Box plots truly shine when comparing data distributions across different groups. Their compact design offers a neat summary of data, making it a breeze to compare distributional properties of the groups through the positioning of box and whisker markings.

Example of a box plot that displays scores by teaching method.

Use a box plot to compare distributions when you have a categorical grouping variable and a continuous outcome variable. The levels of the categorical variables form the groups in your data, and the researchers measure the continuous variable. These graphs are often precursors to hypothesis tests, such as 2-sample t-tests and ANOVA.

When you’re assessing one distribution, use a histogram because it offers a more detailed view. For more information, see Using Histograms to Understand Your Data.

Related post: Data Types

Anatomy of a Box and Whisker Plot

Instead of displaying the raw data points, a box and whisker plot takes your sample data and presents ranges of values based on quartiles using boxes and lines. Additionally, they display outliers using asterisks that fall outside the whiskers. Learn more about Quartiles: Definition, Finding & Using.

Box plots display the five-number summary. This summary includes five key data points:

  • The smallest number (minimum)
  • The first quartile (25% mark)
  • The middle number (median)
  • The third quartile (75% mark)
  • The largest number (maximum)

Together, these five values highlight your data’s distribution’s shape, spread, and central tendency. All these measures are nonparametric and do not make assumptions about the data distribution. This aspect makes a box and whisker plots especially suitable for the early stages of analysis.

This graph works by breaking your data down into quartiles. When your sample size is too small, the quartile estimates might not be meaningful. Consequently, these plots work best when you have at least 20 data points per group.

Let’s look at the anatomy of a box plot before getting to an example. Notice how it divides your data into quarters—at least approximately because the upper and lower whiskers do not include outliers, which the chart displays separately.

Diagram of a box and whiskers plot that describes the features.

The image below shows how a box and whisker plot compares to the probability distribution function for a normal distribution. The box itself is the interquartile range, which contains 50% of your data. Additionally, notice how each whisker contains 24.65% of the distribution rather than an exact 25%. Box plots consider the observations beyond the whiskers to be outliers.

Image shows how a probability distribution function relates to a boxplot, also known as a box and whiskers plot.
By Jhguch at en.wikipedia, CC BY-SA 2.5, Link

Learn more about outliers, including how a box and whisker plot detects them, in my post 5 Ways to Find Outliers in Your Data.

How to Read a Box Plot

A box and whisker plot allows you quickly assess a distribution’s central tendency, variability, and skewness. Let me show you how!

Central Tendency

To compare central tendencies in a box plot, use the median line and the overall vertical placement of the boxes.

In the graph below, Group A has a higher median line than Group B. Indeed, it’s easy to see that Group A’s entire distribution is shifted upwards relative to Group B. However, Group A’s lower quartile overlaps with Group B’s upper quartile.

Box plot that display two groups with different medians.

Related posts: Measures of Central Tendency and Median: Definition and Uses

Variability

To assess variability in a box and whisker plot, remember that half your data for each group falls within the interquartile box. The longer the box and whiskers, the greater the variability of the distribution. The total length of the whiskers represents the range of the data.

In the plot below, Group 2 has more variability than Group 1 because it has a longer box and whiskers. Group 1 ranges from approximately 3 to 7 while Group 2 ranges from roughly 1.5 to 9

Box and whisker plot that shows two groups with different variability.

Learn more about Measures of Variability.

Skewness

To determine whether a distribution is skewed in a box plot, look at where the median line falls within the box and whiskers.

You have a symmetrical distribution when the box centers approximately on the median line, and the upper and lower whiskers are about equal length. If the two sides are not roughly equivalent, your distribution is skewed.

It’s a right-skewed distribution when the median is closer to the box’s lower values and the upper whisker is longer. Notice how the long tail extends into the higher values in the box and whisker plot below, making it positively skewed.

Boxplot displays right-skewed distribution.

It’s a left-skewed distribution when the median is closer to the box’s higher values, and the lower whisker is longer. Notice that the long tail extends towards the lower values, making it negatively skewed.

Boxplot of a left-skewed distribution.

Learn more about Skewed Distributions.

Box Plot Example: Comparing Groups

Let’s combine all we’ve learned about box plots and compare four groups in this example.

Suppose we have four groups of test scores and we want to compare them by teaching method. To create this graph yourself, download the CSV data file: Boxplot. Teaching method is our categorical grouping variable and Score is the continuous outcome variable that the researchers measured.

Example of a boxplot that displays scores by four groups.

Method 1 and 2 have nearly identical medians, but Method 1 has somewhat more variability. The second method also has a high outlier that we should investigate. Method 3 has the highest variability in scores and is potentially left-skewed. Method 4 has the highest median.

 

Share this:

  • Tweet

Related

Filed Under: Graphs Tagged With: choosing analysis, data types, distributions, graphs

Reader Interactions

Comments and QuestionsCancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Top Posts

    • How To Interpret R-squared in Regression Analysis
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Placebo Effect Overview: Definition & Examples
    • Mean, Median, and Mode: Measures of Central Tendency
    • Z-table
    • Cronbach’s Alpha: Definition, Calculations & Example
    • Weighted Average: Formula & Calculation Examples
    • F-table
    • Bernoulli Distribution: Uses, Formula & Example
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

    Recent Posts

    • Bernoulli Distribution: Uses, Formula & Example
    • Placebo Effect Overview: Definition & Examples
    • Randomized Controlled Trial (RCT) Overview
    • Prospective Study: Definition, Benefits & Examples
    • T Test Overview: How to Use & Examples
    • Wilcoxon Signed Rank Test Explained

    Recent Comments

    • Jim Frost on Cronbach’s Alpha: Definition, Calculations & Example
    • John on Cronbach’s Alpha: Definition, Calculations & Example
    • Jim Frost on Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Thu Nguyen on Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Quang Dat on 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

    Copyright © 2023 · Jim Frost · Privacy Policy