• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Empirical Cumulative Distribution Function (CDF) Plots

By Jim Frost Leave a Comment

Use an empirical cumulative distribution function plot to display the data points in your sample from lowest to highest against their percentiles. These graphs require continuous variables and allow you to derive percentiles and other distribution properties. This function is also known as the empirical CDF or ECDF.

Empirical CDF plot of the strength of aluminum castings.If you measure the same characteristic in multiple samples, you can use empirical CDF plots to compare the sample distributions.

Optionally, your software can display the fitted cumulative distribution function so you can compare how well the empirical distribution follows the fitted distribution. The fitted distribution uses parameters estimated from your data. Unlike a Q-Q plot, your statistical software does not transform the axes to create a straight line for a cumulative distribution function. Learn more about these Fitted Cumulative Distribution Functions.

Use an empirical CDF plot to assess the following features of your dataset:

  • Percentiles and proportions for data ranges.
  • Identify where most values occur.
  • Assess the range of your data.
  • Compare sample distributions.
  • Determine how well your data follow a fitted distribution.

The empirical CDF is a step function that asymptotically approaches 0 and 1 on the vertical Y-axis. It’s empirical because it represents your observed values and the corresponding data percentiles. The step function increases by a percentage equal to 1/N for each observation in your dataset of N observations.

At a minimum, empirical CDF plots require one continuous variable. To learn about other graphs, read my Guide to Data Types and How to Graph Them.

Related post: Percentiles: Interpretations and Calculations

Example Empirical CDF Plot

A manufacturer measures the strength of a random sample of aluminum castings.

Empirical CDF plot of the strength of aluminum castings.

The blue stepped line is the empirical CDF function and the red curve is the fitted CDF for the normal distribution.

Empirical CDF plots typically contain the following elements:

  • Y-axis representing a percentile scale.
  • X-axis representing the data values.
  • Stepped function displaying the cumulative distribution observed in the sample.
  • Optionally, statistical software can display a fitted cumulative distribution based on parameters estimated from the sample.

Continue reading to learn how to obtain more information from this graph!

Interpreting Empirical CDF Plots to Assess Distributions

Data Range

To determine the range of the data, look for the first and last steps in the step function.

For the aluminum casting data, strength values range from about 0.3 (the first step) to approximately 1.2 (the last step).

Related post: Measures of Variability

Most Common Values

To determine where the most common values occur, look for the steeper portions of the step function. Conversely, flatter portions indicate ranges with fewer observations.

The steeper portion of the ECDF indicates that most values occur between 0.4 and 0.8

Related post: Measures of Central Tendency

Percentiles

To find the data percentile for an observation, identify its value on the vertical Y-axis. Alternatively, use the fitted CDF to determine the percentile using the fitted distribution. Be sure that the probability distribution provides a good fit for your data!

For example, a strength of 0.8 is at approximately the 70th percentile—72.7 to be precise. In other words, 72.7% of the samples have strength measurements less than 0.8.

Assessing the Fit of a Probability Distribution

Compare the empirical CDF to the fitted CDF to determine how well your data fit the distribution. When your data follow the fitted distribution, you can use percentiles based on that distribution instead of the data percentiles.

For the casting data, it appears that the strength measurements follow the normal distribution. However, it’s easier to use Q-Q plots to determine how well your data fit a distribution. Alternatively, use a distribution test to identify the distribution of your data.

Related post: Identifying the Distribution of Your Data

Using Empirical CDF Plots to Compare Multiple Samples

For these data, a manufacturer assessed the burn resistance of untreated and treated fabric by holding samples over a flame for a set amount of time and measuring the burn length. The manufacturer tests untreated material, Coating A, and Coating B. Lower values represent less burning and, hence, greater flame resistance.

Empirical CDF plot that compares the flame retardance of three samples.

The green empirical cumulative distribution function for Coating B is shifted left the furthest towards lower values, indicating that it provides the most burn protection.

Additionally, the overall slope of the Coating B stepped function is steeper than the other two. Steeper slopes indicate a tighter range of values and, therefore, lower variability.

You can also assess the mean and standard deviation values in the legend to derive similar conclusions. However, you should perform the appropriate hypothesis tests to determine statistical significance.

Using this empirical CDF plot, you can quickly find the burn lengths for each sample that correspond to a particular percentile. For instance, by drawing a horizontal line at 80%, you’ll find that the 80th percentile corresponds to burn lengths of approximately 2.9, 3.4, and 3.9cm for Coating A, Coating B, and plain fabric, respectively.

Share this:

  • Tweet

Related

Filed Under: Graphs Tagged With: analysis example, choosing analysis, data types, interpreting results

Reader Interactions

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Mean, Median, and Mode: Measures of Central Tendency
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Find the P value: Process and Calculations
    • Z-table
    • How to do t-Tests in Excel
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • F-table
    • Choosing the Correct Type of Regression Analysis

    Recent Posts

    • Cumulative Distribution Function (CDF): Uses, Graphs & vs PDF
    • Slope Intercept Form of Linear Equations: A Guide
    • Population vs Sample: Uses and Examples
    • How to Calculate a Percentage
    • Control Chart: Uses, Example, and Types
    • Monte Carlo Simulation: Make Better Decisions

    Recent Comments

    • Gary on Statistical Significance: Definition & Meaning
    • Gregory C. Alexander on Use Control Charts with Hypothesis Tests
    • Kalu on Using Post Hoc Tests with ANOVA
    • Jim Frost on Monte Carlo Simulation: Make Better Decisions
    • Gilberto on Monte Carlo Simulation: Make Better Decisions

    Copyright © 2023 · Jim Frost · Privacy Policy