• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Scatterplots: Using, Examples, and Interpreting

By Jim Frost 4 Comments

Use scatterplots to show relationships between pairs of continuous variables. These graphs display symbols at the X, Y coordinates of the data points for the paired variables. Scatterplots are also known as scattergrams and scatter charts.

Scatterplot that displays the negative relationship between flash recovery time and batter votlage.The pattern of dots on a scatterplot allows you to determine whether a relationship or correlation exists between two continuous variables. If a relationship exists, the scatterplot indicates its direction and whether it is a linear or curved relationship.

Fitted line plots are a special type of scatterplot that displays the data points along with a fitted line for a simple regression model. This graph allows you to evaluate how well the model fits the data.

Use scatterplots to assess the following features of your dataset:

  • Examine the relationship between two variables.
  • Check for outliers and unusual observations.
  • Create a time series plot with irregular time-dependent data.
  • Evaluate the fit of a regression model.

At a minimum, scatterplots require two continuous variables. To learn about other graphs, read my Guide to Data Types and How to Graph Them.

Example Scatterplot

During an experiment, I measured the Body Mass Index (BMI) and body fat percentage of adolescent girls. I graphed these two variables in a scatterplot to assess the relationship between them.

Fitted line plot that fits the curved relationship between BMI and body fat percentage.

Scatterplots typically contain the following elements:

  • X-axis representing values of a continuous variable. By custom, this is the independent variable when you can classify one of the variables as such.
  • Y-axis representing values of a continuous variable. Traditionally, this is the dependent variable.
  • Symbols plotted at the (X, Y) coordinates of your data. Optionally, the graph can use different colored/shaped symbols to represent separate groups on the same chart.
  • Optionally, you can overlay fit lines to determine how well a model fits the data.

For the BMI and the body fat data, the scatterplot displays a moderately strong, positive relationship. As BMI increases, the body fat percentage also tends to increase. The relationship appears to curve slightly because it flattens out for higher BMI values. To model the curvature, the analysts include a squared term in the model. The fitted line follows the curvature of the data, indicating a good fit.

Learn more about the X and Y Axis.

Interpreting Scatterplots and Assessing Relationships between Variables

Scatterplots display the direction, strength, and linearity of the relationship between two variables.

Positive and Negative Correlation and Relationships

Values tending to rise together indicate a positive correlation. For instance, the relationship between height and weight have a positive correlation.

This scatterplot displays a positive correlation between height and weight.

However, if one variable increases as the other decreases, it’s a negative correlation, as shown below.

Scatterplot that displays the negative relationship between flash recovery time and batter voltage.

Strength of Relationships

Stronger relationships produce a tighter clustering of data points. Be aware that changes in scaling can change the apparent strength of the relationship. Correlation coefficients provide an objective assessment of strength independent of graph scaling.

In the two graphs below, the data points in the top graph cluster more tightly than the data points in the bottom graph. Consequently, the first dataset displays a stronger relationship.

Fitted line plot for a model with a high R-squared and low variability data.

Fitted line plot for a model with a low R-squared and high variability data.

Stronger relationships produce correlation coefficients closer to -1 and +1 and regression models that have higher R-squared values.

Related post: Interpreting Correlation Coefficients

Linear and Curved Relationships

Determine whether your data have a linear or curved relationship. When a relationship between two variables is curved, it affects the type of correlation you can use to assess its strength and how you can model it using regression analysis.

An example regression model to illustrate when to us regression.

Adding a fit line highlights how well the model fits your data. When a relationship exists, you might want to model it using regression analysis.

Related post: Modeling Curvature Using Regression

Determine Whether the Relationship Changes between Groups

When your data have groups, you can determine whether the relationship between two variables differs between the groups. To make these comparisons, you’ll need a categorical variable that defines the groups. All groups must use the same X and Y measurements.

In this scatterplot, the slope of the relationship is the same for the two groups, but the output values of group B are consistently higher for any given input value.

Scatterplot for comparing whether the constants are different.

In this scatterplot, the slope for group B is steeper than for group A. As the input value increases, the output for group B increase more quickly than group A.

Scatterplot for comparing whether two regression models are different.

Use indicator variables and interaction terms in a regression model to test the statistical significance of these differences. Click the link below for details.

Related post: Comparing Regression Lines with Hypothesis Tests

Find Outliers and Unusual Observations with Scatterplots

Scatterplots can help you find multiple types of outliers.

Some outliers have extreme values. These outliers are distanced from other data points, as shown below.

Scatterplot that displays an outlier.

Unusual observations have values that are not necessarily extreme, but they do not fit the observed relationship. In the scatterplot below, the circled point has X and Y values that are not unusual. However, the combination of the two values clearly does not fit the overall relationship.

Scatterplot that displays an unusual value that does not fit the relationship.

Related post: Five Ways to Find Outliers in Your Data

Trends Over Time

Typically, analysts use time series plots to display data over time. However, you can also use scatterplots for this purpose. Scatterplots are a perfect choice for time-related data when your observations occur at irregular intervals. When creating a scatterplot for time data, be sure to add a connect line between the data points!

Use Scatterplots with the Appropriate Hypothesis Tests

You can use scatterplots to display the relationships between continuous variables. However, if you plan to use your sample to infer the characteristics of an entire population, be sure to perform the necessary hypothesis tests and assess statistical significance.

Related post: Descriptive versus Inferential Statistics

Graphs can be subjective because your software lets you edit their properties, such as the graph’s scaling. Altering these settings can change the appearance of scatterplots and the conclusions you draw from them. On the other hand, hypothesis tests present an objective evaluation of statistical significance. They also account for the possibility of random error explaining the observed patterns and differences.

Correlation and regression analysis are the primary methods for statistically assessing relationships between continuous data.

Share this:

  • Tweet

Related

Filed Under: Graphs Tagged With: analysis example, choosing analysis, data types, interpreting results

Reader Interactions

Comments

  1. Shari Rossino says

    October 21, 2022 at 10:18 am

    My scatterplot results look like a perfect tic tac toe board. This does not seem like an appropriate response. Any thoughts Jim? I appreciate your feedback.

    Reply
    • Jim Frost says

      October 23, 2022 at 4:48 pm

      Hi Shari,

      Do you mean the data points make a grid pattern? The appropriateness of any pattern (or lack thereof) depends on the nature of the variables. That pattern might make complete sense for a pair of variables. I can’t tell without that context. But understand the variables and see if the pattern makes sense.

      Reply
  2. Michelle Weston says

    April 18, 2022 at 1:22 pm

    Can the data in a scatterplot be considered right/left skewed?

    Reply
    • Jim Frost says

      April 21, 2022 at 2:32 am

      Hi Michelle,

      When you’re looking at pairs of values as you’re doing in a scatterplot, terms like skew of distribution don’t make sense. Scatterplots highlight relationships between pairs of variables. The skew of a distribution relates to the distribution of a single variable, and you should use a histogram for that.

      However, you can assess the distribution of values for individual variables in the context of a scatterplot by using a marginal plot. This type of plot simply graphs the distribution of each of the variables in a scatterplot separately in the margins, as shown in the example below.

      Example of a marginal plot.

      In this graph, you can see that the distribution of the variable on the X axis (horizontal) is right skewed while the distribution for the variable on the Y axis (vertical) is fairly symmetrical. However, you only get that type of information for the individual variables in the separate histograms and not the scatterplot itself. The scatterplot indicates that there is a negative correlation between the two.

      Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Mean, Median, and Mode: Measures of Central Tendency
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Choosing the Correct Type of Regression Analysis
    • How to Find the P value: Process and Calculations
    • Interpreting Correlation Coefficients
    • How to do t-Tests in Excel
    • Z-table

    Recent Posts

    • Fishers Exact Test: Using & Interpreting
    • Percent Change: Formula and Calculation Steps
    • X and Y Axis in Graphs
    • Simpsons Paradox Explained
    • Covariates: Definition & Uses
    • Weighted Average: Formula & Calculation Examples

    Recent Comments

    • Dave on Control Variables: Definition, Uses & Examples
    • Jim Frost on How High Does R-squared Need to Be?
    • Mark Solomons on How High Does R-squared Need to Be?
    • John Grenci on Normal Distribution in Statistics
    • Jim Frost on Normal Distribution in Statistics

    Copyright © 2023 · Jim Frost · Privacy Policy