What is a Probability Density Function (PDF)?
A probability density function describes a probability distribution for a random, continuous variable. Use a probability density function to find the chances that the value of a random variable will occur within a range of values that you specify. More specifically, a PDF is a function where its integral for an interval provides the probability of a value occurring in that interval. For example, what are the chances that the next IQ score you measure will fall between 120 and 140? In statistics, PDF stands for probability density function.
Don’t worry! I’ll break all that down to simpler terms!
Unlike distributions for discrete random variables where specific values can have non-zero probabilities, the likelihood for a single value is always zero for a continuous variable. Consequently, the probability density function provides the chances of a value falling within a specified range for continuous variables. For a PDF in statistics, probability density refers to the likelihood of a value occurring within an interval length of one unit.
In short, probability density functions can find non-zero likelihoods for a continuous random variable X falling within the interval [a, b]. Or, in statistical notation: P (A < X < B).
Learn more about Random Variables: Discrete & Continuous.
If you need to find likelihoods for a discrete variable, use a Probability Mass Function (PMF) instead.
Obtaining a Probability from a PDF
Graphing a probability density function gives you a probability density plot. These graphs are great for understanding how a PDF in statistics calculates probabilities. The chart below displays the PDF for IQ scores, which is a probability density function of a normal distribution.
On a probability density plot, the total area under the curve equals one, representing the total probability of 1 for the full range of possible values in the distribution. In other words, the likelihood of a value falling anywhere in the complete distribution curve is 1. Finding the chances for a smaller range of interest involves finding the portion of the area under the curve corresponding to that range. In our example, we need to find the percentage of the area that falls between 120 and 140.
In the example above, my statistical software finds that P IQ (120 < x < 140) = 0.08738. The probability of a value falling within the range of 120 to 140 is 0.08738.
These graphs also illustrate why probability density functions find a zero likelihood for an individual value. Consider that the probability for a PDF in statistics equals an area. For a non-zero area, you must have both a non-zero height and a non-zero width because Height X Width = Area. In this context, the height is the curve’s height on the graph, while the width relates to the range of values. When you have a single value, you have zero width, which produces zero area. Hence, a zero chance!
Mathematics Behind a Probability Density Function
In mathematical terms, you or your software will use integrals with a probability density function to find probabilities. Among other uses, the integration process finds areas for mathematical functions that describe shapes. That’s perfect for finding likelihoods for intervals in a continuous distribution curve based on a PDF. You’ll usually use statistical software to find PDF probabilities, so I won’t discuss integrals here. For more information, read the Wikipedia article for more details: Integrals.
Examples of Probability Density Functions
While the normal distribution is the most common, it can fit only symmetrical distributions. Fitting skewed distributions requires using other probability density functions. Learn more about the Normal Distribution and Skewed Distributions.
There are a variety of other probability density functions that correspond with distributions of different shapes and properties. Each PDF has between 1-3 parameters that define its shape. Before using a PDF to find a probability, you must identify the correct function and parameter values for the population you are studying.
To accomplish that, you’ll typically gather a random sample from that population and use a combination of parameter estimation techniques and hypothesis tests to identify the distribution and parameters that most likely produced your sample.
For more information about that process, read my post about Identifying the Distribution of Your Data.
Let’s look at examples of the lognormal and Weibull distributions, which can fit skewed data.
Lognormal Probability Density Function
A lognormal PDF models right-skewed distributions. Below is an example of the probability density function for a lognormal distribution that displays the distribution of body fat percentages for teenage girls. The data are from a study I performed.
Notice how it shows a right-skewed distribution. Additionally, the probability distribution plot indicates that 18.64% of the population will fall between 20-24%. The highest probability densities occur near the peak at 26%.
The Weibull PDF is a highly flexible distribution that can model a wide variety of shapes. The example below displays a left-skewed distribution.
Examples of Other Probability Density Functions
There are a variety of PDFs for other distributions. They include the following:
- Weibull distribution: A remarkably versatile distribution that analysts use in many settings. Can model skewed distributions in both directions and approximate the normal distribution.
- Lognormal distribution: Models right-skewed distributions, especially in cases where growth rates are unconnected to size.
- Exponential distribution: Models distributions where small values occur more frequently than large values. Use to model the amount of time between independent events.
- Gamma distribution: Models right-skewed distributions. Use to model the time until the kth event.
- Uniform distribution: Models symmetric distributions where all equal length ranges have the same probability.
- Beta distribution: Models distributions with all values falling within a finite range.
Cumulative distribution functions show the same type of information but in a different way. Instead of displaying probabilities for x-values, they display probabilities for ≤ x. Learn more about Cumulative Distribution Functions: Uses, Graphs and vs. PDFs.