What is a Cumulative Distribution Function?
A cumulative distribution function (CDF) describes the probabilities of a random variable having values less than or equal to x. It is a cumulative function because it sums the total likelihood up to that point. Its output always ranges between 0 and 1.
CDFs have the following definition:
CDF(x) = P(X ≤ x)
Where X is the random variable, and x is a specific value. The CDF gives us the probability that the random variable X is less than or equal to x. These functions are non-decreasing. As x increases, the likelihood can either increase or stay constant, but it can’t decrease.
Both probability density functions (PDFs) and cumulative distribution functions provide likelihoods for random variables. However, PDFs calculate probability densities for x, while CDFs give the chances for ≤ x. Learn about Probability Density Functions.
Cumulative distribution functions exist for both continuous and discrete variables. Continuous functions find solutions using integrals, while discrete functions sum the probabilities for all discrete values that are less than or equal to each value. Statisticians refer to discrete functions as Probability Mass Functions.
Read on to learn why you’d use a cumulative distribution function, graph them, and learn more about how a CDF vs PDF differs.
Learn more about Cumulative Frequencies: Finding & Interpreting.
Using Cumulative Distribution Functions
Cumulative distribution functions are excellent for providing probabilities that the next observation will be less than or equal to the value you specify. This ability can help you make decisions that incorporate uncertainty.
Additionally, these cumulative probabilities are equivalent to percentiles. A cumulative probability of 0.80 is the same as the 80th percentile. So, CDFs are great for finding percentiles. Learn more about Percentiles: Interpretations and Calculations.
For example, consider the height of an adult male in the United States. We can use the cumulative distribution function to find the probability that a person is less than or equal to 6 feet tall.
For CDF’s, we need to specify the type of distribution (e.g., normal, Weibull, binomial, etc.) and its parameters—just like we do for PDFs.
Adult males in the U.S. have heights that follow a normal distribution with a mean of 69.2 inches and a standard deviation of 2.66 inches. Consequently, we’ll need to use a normal CDF with these parameters to answer our question. Because we’re working in inches, I’ll enter 72 inches for 6 feet.
The typical CDF statistical output from your software or online calculator will look like the following:
The probability that an adult male will be 6 feet tall or shorter is 0.853745. Equivalently, you can say that a 6’ tall adult male is at the 85.4th percentile.
Related post: Normal Distribution
Comparing Distributions
Cumulative distribution functions are fantastic for comparing two distributions. By comparing the CDFs of two random variables, we can see if one is more likely to be less than or equal to a specific value than the other. That helps us make decisions about whether one is more likely to have a particular property.
Imagine we’re a clothing manufacturer and want to compare the prevalence of 6’ tall men to women.
Next, we’ll use the normal CDF to find the probability that an adult woman will be 6’ tall or less. Women’s heights follow a normal distribution with a mean of 64.3 inches and a standard deviation of 2.58 inches.
The statistical output for the normal CDF indicates that women have a probability of 0.99858 for being ≤ 6’. That’s equivalent to the 99.9th percentile.
85.4% of men and 99.9% of women are shorter than 6’. By dividing the inverse probabilities (1 – p), we find that men more than 6 feet tall are 103 times more likely to occur than women. As a clothing manufacturer, knowing that is helpful. A woman more than 6 feet tall is a rarity!
Graphing Normal CDFs
I always think graphs bring statistical concepts to life. So, let’s graph a cumulative distribution function to see it. We’ll return to the normal CDF for men’s heights.
On a cumulative distribution function plot, the horizontal axis displays the x values, while the vertical axis displays cumulative probabilities or percentiles. The curve represents corresponding pairs of x values and cumulative probabilities. For normal CDFs, the function sums from negative infinity up to the value of x, which is (-∞, x] in interval notation. Continuous variables produce a smooth curve, like below, while discrete variables produce a stepped function.
On the CDF graph for men’s heights, I’ve added a reference line at 6’ (i.e., 72”) to show the corresponding probability of 0.854, matching the earlier answer with rounding. Using these graphs, you can easily find probabilities and percentiles for other values. For instance, 70 inches (5′ 10″) is around the 60th percentile.
For comparison, the women’s chart is below. While the graph ends at 72 inches, the distribution actually extends to infinity in both directions.
A height of 6 feet is in the tail of the distribution.
CDF vs PDF
A cumulative distribution function (CDF) and a probability distribution function (PDF) are two statistical tools describing a random variable’s distribution. Both functions display the same probability information but in a different manner. In simple terms, the PDF represents the shape of the distribution, while the CDF represents the accumulation of probabilities as the value of the random variable increases. Learn more about Probability Distribution: Definition & Calculations.
PDFs can find cumulative probabilities by calculating the likelihood for a range up to a particular value. The PDF below shows the probability for the shaded area representing male heights up to 6’ (72”).
It finds the same probability as the CDF, showing how they present the same underlying information in a different format.
Now, imagine that you started with the shaded area to the left side of the PDF and systematically move it to the right while recording the cumulative probabilities—that produces the CDF!
The PDF gives the probability density, the likelihood of the random variable falling close to a value. In comparison, the cumulative distribution function sums the probability densities leading up to each value.
In this manner, the probability density on a PDF is the rate of change for the CDF. Consequently, the ranges where the PDF curve has relatively high probability densities correspond to areas on the CDF curve with steeper slopes. Lower PDF densities correspond to shallower CDF slopes. As the PDF’s curve approaches its peak at the mean, the CDF’s slope increases to its maximum steepness. After the PDF’s peak, the CDF slope flattens.
Learn more about Empirical Cumulative Distribution Function Plots. These graphs help you compare an observed cumulative distribution to a fitted distribution.
Comments and Questions