What is a Probability Density Function (PDF)?
A probability density function describes a probability distribution for a random, continuous variable. Use a probability density function to find the chances that the value of a random variable will occur within a range of values that you specify. More specifically, a PDF is a function where its integral for an interval provides the probability of a value occurring in that interval. For example, what are the chances that the next IQ score you measure will fall between 120 and 140? In statistics, PDF stands for probability density function.
Don’t worry! I’ll break all that down to simpler terms!
Unlike distributions for discrete random variables where specific values can have non-zero probabilities, the likelihood for a single value is always zero for a continuous variable. Consequently, the probability density function provides the chances of a value falling within a specified range for continuous variables. For a PDF in statistics, probability density refers to the likelihood of a value occurring within an interval length of one unit.
In short, probability density functions can find non-zero likelihoods for a continuous random variable X falling within the interval [a, b]. Or, in statistical notation: P (A < X < B).
Statisticians refer to the mean of a probability density function as its expected value. Learn more about Expected Values: Definition, Formula & Finding.
Learn more about Random Variables: Discrete & Continuous.
If you need to find likelihoods for a discrete variable, use a Probability Mass Function (PMF) instead.
Related posts: Discrete vs Continuous Variables and Probability Distributions
Obtaining a Probability from a PDF
Graphing a probability density function gives you a probability density plot. These graphs are great for understanding how a PDF in statistics calculates probabilities. The chart below displays the PDF for IQ scores, which is a probability density function of a normal distribution.
On a probability density plot, the total area under the curve equals one, representing the total probability of 1 for the full range of possible values in the distribution. In other words, the likelihood of a value falling anywhere in the complete distribution curve is 1. Finding the chances for a smaller range of interest involves finding the portion of the area under the curve corresponding to that range. In our example, we need to find the percentage of the area that falls between 120 and 140.
In the example above, my statistical software finds that P IQ (120 < x < 140) = 0.08738. The probability of a value falling within the range of 120 to 140 is 0.08738.
These graphs also illustrate why probability density functions find a zero likelihood for an individual value. Consider that the probability for a PDF in statistics equals an area. For a non-zero area, you must have both a non-zero height and a non-zero width because Height X Width = Area. In this context, the height is the curve’s height on the graph, while the width relates to the range of values. When you have a single value, you have zero width, which produces zero area. Hence, a zero chance!
Mathematics Behind a Probability Density Function
In mathematical terms, you or your software will use integrals with a probability density function to find probabilities. Among other uses, the integration process finds areas for mathematical functions that describe shapes. That’s perfect for finding likelihoods for intervals in a continuous distribution curve based on a PDF. You’ll usually use statistical software to find PDF probabilities, so I won’t discuss integrals here. For more information, read the Wikipedia article for more details: Integrals.
Examples of Probability Density Functions
While the normal distribution is the most common, it can fit only symmetrical distributions. Fitting skewed distributions requires using other probability density functions. Learn more about the Normal Distribution and Skewed Distributions.
There are a variety of other probability density functions that correspond with distributions of different shapes and properties. Each PDF has between 1-3 parameters that define its shape. Before using a PDF to find a probability, you must identify the correct function and parameter values for the population you are studying.
To accomplish that, you’ll typically gather a random sample from that population and use a combination of parameter estimation techniques and hypothesis tests to identify the distribution and parameters that most likely produced your sample.
For more information about that process, read my post about Identifying the Distribution of Your Data.
Let’s look at examples of the lognormal and Weibull distributions, which can fit skewed data.
Lognormal Probability Density Function
A lognormal PDF models right-skewed distributions. Below is an example of the probability density function for a lognormal distribution that displays the distribution of body fat percentages for teenage girls. The data are from a study I performed.
Notice how it shows a right-skewed distribution. Additionally, the probability distribution plot indicates that 18.64% of the population will fall between 20-24%. The highest probability densities occur near the peak at 26%.
Weibull PDF
The Weibull PDF is a highly flexible distribution that can model a wide variety of shapes. The example below displays a left-skewed distribution.
Examples of Other Probability Density Functions
There are a variety of PDFs for other distributions. They include the following:
- Weibull distribution: A remarkably versatile distribution that analysts use in many settings. Can model skewed distributions in both directions and approximate the normal distribution.
- Lognormal distribution: Models right-skewed distributions, especially in cases where growth rates are unconnected to size.
- Exponential distribution: Models distributions where small values occur more frequently than large values. Use to model the amount of time between independent events.
- Gamma distribution: Models right-skewed distributions. Use to model the time until the kth event.
- Uniform distribution: Models symmetric distributions where all equal length ranges have the same probability.
- Beta distribution: Models distributions with all values falling within a finite range.
Cumulative distribution functions show the same type of information but in a different way. Instead of displaying probabilities for x-values, they display probabilities for ≤ x. Learn more about Cumulative Distribution Functions: Uses, Graphs and vs. PDFs.
What is the range of pdf?
Hi Sajid,
The range of values for a PDF depends on which distribution it is. For example, for a normal distribution, the values range from negative to positive infinity. However, for a three parameter Weibull distribution, it can range from a specific value based on its threshold parameter to positive infinity. For a beta distribution, all values fall within a finite range set by its parameters, however, the range of [0, 1] is the most common.
So, you really need to know which PDF and, in some cases, the parameters.
Thank you Prof for the wonderful explanation. It brings meaning into the formulae and all the statistical tables involved for the various distribution types. Can’t wait for your data science class.
Thanks Jim,
It’s very clear now. I understand better why you used probability and likelihood interchangeably because in the context of the discussion there should not be any confusion. Perhaps it is the fact that my native language is not English that I saw the word “likelihood” more as a precise mathematical term and not the use of a synonym.
Thanks Jim!
It becomes clearer.
In the normal distribution, Maximum Likelihood consists in finding the parameters mu and variance that will maximize the product of the different probability density values. The confusion comes from my side. I expressed myself badly. I was talking about “likelihood” instead of “probability density”. Thanks again for your support!
Hi Stéphane,
No worries at all! That’s why I have this website. I aim to help!
Just for a bit of clarification. Maximum likelihood estimatation (MLE) is the process of taking a random sample and estimating the population parameters. For a normal distribution, that’s the mean (mu) and standard deviation (sigma).
Remember that population parameters are never known exactly and that we can only estimate them from samples.
The “likelihood” in MLE specifically refers to the fact that the process finds the population parameters that are the most likely (i.e., have the maximum likelihood) of producing the sample statistics you observe in your sample.
Hello Jim,
Thank you for allowing this discussion in all transparency but the mathematical definition of the different notions is not yet clear to me. For the beauty of the text, I understand that sometimes another word is used but it brings confusion when this other word can represent something else mathematically. We often see in the texts that “probability” and “likelihood” are two totally different notions. I think that a concrete use case (with a numerical example) would be interesting to show the meaning of each of these terms. This would allow to answer a question like: How do you call the y-value on a graph (e.g. Normal distribution) for a value x chosen for the random value? This cannot be the probability since the probability of a specific value is zero.
On probability density plot, the y-value is the probability density. I define probability density early in this post.
The x-value relates to the value of a random, continuous variable (X).
I will add something about both points in the section with the graphs to make them clearer.
Probability and likelihood are closely tied together. Indeed, probability is frequently defined as the relative likelihood for an outcome!
Hi Jim, you write :
“Unlike distributions for discrete variables where specific values can have non-zero probabilities, the likelihood for a single value is always zero in a continuous probability distribution function.”
=> Shouldn’t the term ‘probability’ be used instead of ‘likelihood’ in this text?
I see in many courses and documents differences regarding the concepts of :
=> Probability Density Function
=> Probability
=> Likelihood
If I understand correctly:
[Probability Density Function] is the general form of the distribution : for instance for Normal Distribution : N (mu, var) and not the value for a specific value of the random variable.
[Probability] is the area under the bel curve and must only be considered for a range in wich the random variable falls [a,b]. For continuous random variable, the probability of one specific value (for instance : temperature is 23.567435 °c) is null. For the full range, Probability integrates to 1
[Likelihood] is the y value for a specific point on the bell curve belonging to a specific value of the random variable (Maximum Likelihood being the ‘MEAN’ Value). Thus, although the probability of the value 23.567435 °c is null, likelihood can be obtained by considering the application of the probability density function for a specific value of the random value for fixed parameters mu and var, so the likelihood of the precise value 23.567435 °c can be computed.
Is my analysis correct?
Thanks in advance!
Hi Stéphane,
Most of what you write is correct. But here are a few clarifications.
Probability and likelihood are synonyms in this context. The problem with writing about this stuff is reusing the word “probability” over and over gets repetitive. Use the probability density function to find a probability. Add in discussions about probability density and probability density plots, and the word is used over and over. Hence, synonyms, such as likelihood and chance! Take them to all the mean the same in this context. And, indeed, you can’t define a word using the word itself, so for a variety of reasons it’s good to use synonyms.
Probability density function is a function for a continuous variable whose integral provides the probability for an interval. It doesn’t have to be the bell curve (i.e, normal distribution), but it can be.
Likelihood is not a specific point on a curve (bell curve or otherwise). Again, as I point out in this post, the probability (or likelihood) for a single point of a continuous variable is always zero. There is a maximum likelihood function that you might be thinking of. But that’s a process that’s used to estimate the distribution parameters. It finds the population parameters most likely to have produced your observed random sample. You need those parameter estimates to plug into the PDF.
I hope that helps!
Your recent letter about PDF raises some concerns of the nomenclature you use. As always, a term represents what you decide it shall describe. The decision must be unanimously understood and accepted. If you do not know what a spade is there is no use to call a spade a spade, you may call the item anything! Not only globalization makes it important that critical terms are properly described. In metrology the standard publication is the BIPM/VIM (www.bipm.org/vim). On page XII “range” and “interval” are discussed. “Range” is defined by a single number with no place on the number line whereas “interval” is represented by the values of the lower and upper limits and thus fixed on the number line. Therefore, your statement that in a PDF of a continuous variable the probability cannot be given for a single value, only for a range raises an eye-brow or two! No doubt my comment may be regarded as a de lana caprina rixari but particularly for us aliens the Anglo-Saxon laissez-faire attitude to linguistic standardization often is a source of confusion, particularly when the colloquial meaning of a scientific term is implied rather than that defined. I suppose this is partly because of the dominance of the “English language” in science.
There are many more examples; those which you may come across are, for instance, “accuracy” and “sensitivity”. See VIM!
Hi Anders,
I am sure that your comment is meant as humorous, but it raises my eyebrow (to use your expression). Yes, there are cases where “range” is used in the way you suggest. However, that’s not the only acceptable use. All languages have their quirks!
Let me introduce you to Cambridge dictionary definition for “range.” Two of the definitions are the following:
“[Noun] The amount, number, or type of something between an upper and a lower limit:
* The price range is from $100 to $500.
* The product is aimed at young people in the 18–25 age range.”
“[Verb] To have an upper and a lower limit in amount, number, etc.:
* Dress sizes range from petite to extra large.
* Prices range between $50 and $250.”
In my IQ example in this post, I talk about IQ scores that range [verb] from 120 to 140. Or, you can say that the IQ range [noun] is 120 to 140. Interval is also correct.