What is a Probability Density Function (PDF)?
A probability density function describes a probability distribution for a random, continuous variable. Use a probability density function to find the chances that the value of a random variable will occur within a range of values that you specify. More specifically, a PDF is a function where its integral for an interval provides the probability of a value occurring in that interval. For example, what are the chances that the next IQ score you measure will fall between 120 and 140? In statistics, PDF stands for probability density function.
Don’t worry! I’ll break all that down to simpler terms!
Unlike distributions for discrete random variables where specific values can have non-zero probabilities, the likelihood for a single value is always zero for a continuous variable. Consequently, the probability density function provides the chances of a value falling within a specified range for continuous variables. For a PDF in statistics, probability density refers to the likelihood of a value occurring within an interval length of one unit.
In short, probability density functions can find non-zero likelihoods for a continuous random variable X falling within the interval [a, b]. Or, in statistical notation: P (A < X < B).
Statisticians refer to the mean of a probability density function as its expected value. Learn more about Expected Values: Definition, Formula & Finding.
Learn more about Random Variables: Discrete & Continuous.
If you need to find likelihoods for a discrete variable, use a Probability Mass Function (PMF) instead.
Related posts: Discrete vs Continuous Variables and Probability Distributions
Obtaining a Probability from a PDF
Graphing a probability density function gives you a probability density plot. These graphs are great for understanding how a PDF in statistics calculates probabilities. The chart below displays the PDF for IQ scores, which is a probability density function of a normal distribution.
On a probability density plot, the total area under the curve equals one, representing the total probability of 1 for the full range of possible values in the distribution. In other words, the likelihood of a value falling anywhere in the complete distribution curve is 1. Finding the chances for a smaller range of interest involves finding the portion of the area under the curve corresponding to that range. In our example, we need to find the percentage of the area that falls between 120 and 140.
In the example above, my statistical software finds that P IQ (120 < x < 140) = 0.08738. The probability of a value falling within the range of 120 to 140 is 0.08738.
These graphs also illustrate why probability density functions find a zero likelihood for an individual value. Consider that the probability for a PDF in statistics equals an area. For a non-zero area, you must have both a non-zero height and a non-zero width because Height X Width = Area. In this context, the height is the curve’s height on the graph, while the width relates to the range of values. When you have a single value, you have zero width, which produces zero area. Hence, a zero chance!
Mathematics Behind a Probability Density Function
In mathematical terms, you or your software will use integrals with a probability density function to find probabilities. Among other uses, the integration process finds areas for mathematical functions that describe shapes. That’s perfect for finding likelihoods for intervals in a continuous distribution curve based on a PDF. You’ll usually use statistical software to find PDF probabilities, so I won’t discuss integrals here. For more information, read the Wikipedia article for more details: Integrals.
Examples of Probability Density Functions
While the normal distribution is the most common, it can fit only symmetrical distributions. Fitting skewed distributions requires using other probability density functions. Learn more about the Normal Distribution and Skewed Distributions.
There are a variety of other probability density functions that correspond with distributions of different shapes and properties. Each PDF has between 1-3 parameters that define its shape. Before using a PDF to find a probability, you must identify the correct function and parameter values for the population you are studying.
To accomplish that, you’ll typically gather a random sample from that population and use a combination of parameter estimation techniques and hypothesis tests to identify the distribution and parameters that most likely produced your sample.
For more information about that process, read my post about Identifying the Distribution of Your Data.
Let’s look at examples of the lognormal and Weibull distributions, which can fit skewed data.
Lognormal Probability Density Function
A lognormal PDF models right-skewed distributions. Below is an example of the probability density function for a lognormal distribution that displays the distribution of body fat percentages for teenage girls. The data are from a study I performed.
Notice how it shows a right-skewed distribution. Additionally, the probability distribution plot indicates that 18.64% of the population will fall between 20-24%. The highest probability densities occur near the peak at 26%.
Weibull PDF
The Weibull PDF is a highly flexible distribution that can model a wide variety of shapes. The example below displays a left-skewed distribution.
Examples of Other Probability Density Functions
There are a variety of PDFs for other distributions. They include the following:
- Weibull distribution: A remarkably versatile distribution that analysts use in many settings. Can model skewed distributions in both directions and approximate the normal distribution.
- Lognormal distribution: Models right-skewed distributions, especially in cases where growth rates are unconnected to size.
- Exponential distribution: Models distributions where small values occur more frequently than large values. Use to model the amount of time between independent events.
- Gamma distribution: Models right-skewed distributions. Use to model the time until the kth event.
- Uniform distribution: Models symmetric distributions where all equal length ranges have the same probability.
- Beta distribution: Models distributions with all values falling within a finite range.
Cumulative distribution functions show the same type of information but in a different way. Instead of displaying probabilities for x-values, they display probabilities for ≤ x. Learn more about Cumulative Distribution Functions: Uses, Graphs and vs. PDFs.
John Taylor says
Perfect Jim!
Thanks a lot, that has really cleared it up.
John Taylor says
Hi Jim,
Thanks for the answer!
I understand that the interval when looking at density between two values is chosen by the researcher, but what I am confused about is the actual value of density at a given point. For example, if density is 0.02 at a certain x value, what does this mean? I understand it is relatively meaningless and not really necessary, but it does confuse me.
So when we integrate, what are we actually summing up beneath the curve? Surely it must be values of density for an individual value, but what does this density mean per value?
Some have said it is the value of density per unit. I don’t quite understand this. Others have presented an example of it being density over a very small window, so x and x + dx, and so when when we integrate we sum up all the base x height’s (x times dx) over an interval of our choice. If this is the case, why is it the density for an interval size of dx?
In other words, I am relatively okay with the concept of density in terms of finding the area beneath the curve and looking for probability with continuous variables. However, I am struggling to understand what density at a single point means, and how to interpret it. If it is just a figure that tells us relatively how likely our random variable is to be near that x value, how near? I struggle to see how density can have a value for a certain x value when there is an infinite amount of x values along the x-axis!
Thank you in advance!
Jim Frost says
Hi John,
There’s actually a very specific meaning to density at point X (although it’s still really an interval). X is a random variable. There’s basically two forms of density that you’ll see, but they’re really the same thing. You can look at the density of a specific x-value or for a range that is of interest to the analyst. But both cases are ranges.
The density at a specific x-value is the probability of an observation falling within a 1-unit length interval that centers on your specific x-value. The unit is literally the value of 1 and it depends on what your variable measures. If the variable is IQ points, a unit 1 IQ point. If it’s kilograms, it’s 1kg. Etc. Hence, the density depends on both the value of x and the probability distribution function that describes your random variable. That’s the density you see on the Y-axis of PDF plots or when you calculate it for a specific point. This context for density uses an interval length of 1, half below and half above the x-value, or x ± 0.5.
In the other context, the analyst chooses the interval.
As for what you’re summing, I discuss this in more detail in the post, reread more carefully the section where I discuss the area under the curve and show some examples. Essentially, you set the total area under the curve to equal 1. Then you sum the area beneath the curve for your interval of interest. The proportion of the area under the curve that falls within your interval is the probability density for that range. That’s the density for your interval.
So, in either case (1-unit interval or analyst chosen interval), the density is the proportion of the area under the curve that falls within the interval. The two cases only differ in how you choose the interval.
John Taylor says
Hi Jim,
I do just have one final question just to really clear things up, after thinking about it a bit more.
The final thing I am just a bit confused about (I understand everything you have explained to me so far), is how the the rate of change derived from the CDF at a certain point gives us the probability density. You explained this to me already and I understand how over a longer interval, summing up the rates of change would work.
However, the rate of change at a single point on the CDF is the height of the curve at that single point on the PDF. That is also the probability density. So the value of the probability density for a single x value is also the rate of change of the accumulated probability for the CDF at that point.
So how is this rate of change for a single x value on the CDF equivalent to the area under the pdf for a one unit interval around this x value? In other words, how does integrating over an interval of + / – 0.5 around this x value give us the rate of change for that specific point on the CDF?
It just doesn’t really make sense intuitively to me. Density for a single point x (so the y-axis values) is the probability a random x value will fall within a one unit interval around x. And we get the density out of this PDF function when we sub in an x value. This lets us plot the curve as it gives us the height etc.
However, that value we get out of the PDF function is not only what we have outlined (the probability that a random x value will fall within a one unit interval around x) but also the instantaneous rate of change for that single point within the CDF. And this is what I am confused about – how the rate of change for a single point on the CDF is also the probability over an interval within the PDF.
Thanks Jim, I understand I’ve asked a few question recently so I appreciate you taking the time to reply with some great answers. You have cleared up a lot for me – this is just the last little bit I can’t wrap my head around.
Jim Frost says
A CDF just sums those changes. Hence, cumulative distribution function. The two distribution functions are directly tied together. You have to realize that it’s all about ranges for PDFs and CDFs. Don’t fixate on the single point aspect. It’s often discussed like that but it’s really a one-unit range.
John Taylor says
Hi Jim,
I know this is an old article and Im not sure if you will see this.
I don’t really understand your definition of probability density.
I understand that probability density arises from it being the gradient of a single point on the CDF (because the PDF is the derivative of the CDF). However, I don’t understand what you mean by the likelihood of the variable falling within an interval of one unit. I don’t understand how big this one unit is. If my x values are in cm’s, then is it the probability that the x value will fall within an interval of 1cm? This doesn’t seem right to me. In my head, I am thinking of the unit being an infinitesimal, but overall I am incredibly confused on a few things.
1. How big is the interval / unit when we are talking about density?
2. How is the size of this interval (whatever it is) determined? If it is 1cm, then how do we get the density for this 1cm from a derivative? If it is a very very very small interval, do we get this interval size from the differentiation?
3. And finally, I am not sure of the relationship between the CDF and PDF in terms of density. Why does the density arise from the rate of change of a point on the CDF? Why does this rate of change tell us the likelihood of a value being close to a point? How close? I have seen some explanations that described density using an interval of x and the point x + dx, and if you multiply the difference (dx) by the density, you get the probability x is in that interval… this seems to match up with the explanation I commonly see that density is probability per unit, as if probability = density x length (dx), then density = probability / length. But this would therefore mean that the unit (length) is dx in this case. But if this is the case, how is this interval determined? Why does the density obtained from the gradient of the CDF give us this magical probability / length concept? Is this probability / unit concept even correct?
Thank you a lot in advance, I hope this reaches you. I know this is a bit of a question dump. Your articles are a life-saver.
Jim Frost says
Hi John,
Whew! That’s a bunch of questions. I’ll tackle some of the main ones.
The first thing to realize is that for a continuous variable, the probability of any individual value is zero. You need to calculate probability for a range of values to get a non-zero answer. While probability in a more traditional sense refers to the likelihood of an event happening within a discrete set of outcomes, density refers to how “concentrated” the probability is around a particular point in a continuous space.
The size of the interval isn’t fixed. You, the analyst, set the size to suit your needs. For example, you might need to know the probability of finding someone with an IQ between 110 and 120. In this case, the interval is 10. The units are whatever you’re measuring, which are IQ points in this example.
When we talk about the probability of a variable falling within a certain interval, say 1 cm, we integrate the PDF over that interval. The size of this interval can be anything we choose, not just 1 cm. It’s a range within which we want to calculate the probability. Again, the interval size is not determined by the PDF. It’s determined by the question we’re asking.
The relationship with the derivative comes in because the PDF is the derivative of the CDF. The CDF gives the probability that a variable is less than or equal to a certain value. The derivative of this function at any point gives the rate at which this probability is changing at that point – which is the density.
The reason density arises from the rate of change of the CDF is that density is about how rapidly the accumulated probability is increasing at a particular point. A high rate of change (steep slope on the CDF) means high density, suggesting a higher likelihood of the random variable falling near that value.
Hopefully that answers your questions. If you have anymore, please let me know!
Sajid says
What is the range of pdf?
Jim Frost says
Hi Sajid,
The range of values for a PDF depends on which distribution it is. For example, for a normal distribution, the values range from negative to positive infinity. However, for a three parameter Weibull distribution, it can range from a specific value based on its threshold parameter to positive infinity. For a beta distribution, all values fall within a finite range set by its parameters, however, the range of [0, 1] is the most common.
So, you really need to know which PDF and, in some cases, the parameters.
Funsho Olukade says
Thank you Prof for the wonderful explanation. It brings meaning into the formulae and all the statistical tables involved for the various distribution types. Can’t wait for your data science class.
Stéphane Degeye says
Thanks Jim,
It’s very clear now. I understand better why you used probability and likelihood interchangeably because in the context of the discussion there should not be any confusion. Perhaps it is the fact that my native language is not English that I saw the word “likelihood” more as a precise mathematical term and not the use of a synonym.
Stéphane Degeye says
Thanks Jim!
It becomes clearer.
In the normal distribution, Maximum Likelihood consists in finding the parameters mu and variance that will maximize the product of the different probability density values. The confusion comes from my side. I expressed myself badly. I was talking about “likelihood” instead of “probability density”. Thanks again for your support!
Jim Frost says
Hi Stéphane,
No worries at all! That’s why I have this website. I aim to help!
Just for a bit of clarification. Maximum likelihood estimatation (MLE) is the process of taking a random sample and estimating the population parameters. For a normal distribution, that’s the mean (mu) and standard deviation (sigma).
Remember that population parameters are never known exactly and that we can only estimate them from samples.
The “likelihood” in MLE specifically refers to the fact that the process finds the population parameters that are the most likely (i.e., have the maximum likelihood) of producing the sample statistics you observe in your sample.
Stéphane Degeye says
Hello Jim,
Thank you for allowing this discussion in all transparency but the mathematical definition of the different notions is not yet clear to me. For the beauty of the text, I understand that sometimes another word is used but it brings confusion when this other word can represent something else mathematically. We often see in the texts that “probability” and “likelihood” are two totally different notions. I think that a concrete use case (with a numerical example) would be interesting to show the meaning of each of these terms. This would allow to answer a question like: How do you call the y-value on a graph (e.g. Normal distribution) for a value x chosen for the random value? This cannot be the probability since the probability of a specific value is zero.
Jim Frost says
On probability density plot, the y-value is the probability density. I define probability density early in this post.
The x-value relates to the value of a random, continuous variable (X).
I will add something about both points in the section with the graphs to make them clearer.
Probability and likelihood are closely tied together. Indeed, probability is frequently defined as the relative likelihood for an outcome!
Stéphane Degeye says
Hi Jim, you write :
“Unlike distributions for discrete variables where specific values can have non-zero probabilities, the likelihood for a single value is always zero in a continuous probability distribution function.”
=> Shouldn’t the term ‘probability’ be used instead of ‘likelihood’ in this text?
I see in many courses and documents differences regarding the concepts of :
=> Probability Density Function
=> Probability
=> Likelihood
If I understand correctly:
[Probability Density Function] is the general form of the distribution : for instance for Normal Distribution : N (mu, var) and not the value for a specific value of the random variable.
[Probability] is the area under the bel curve and must only be considered for a range in wich the random variable falls [a,b]. For continuous random variable, the probability of one specific value (for instance : temperature is 23.567435 °c) is null. For the full range, Probability integrates to 1
[Likelihood] is the y value for a specific point on the bell curve belonging to a specific value of the random variable (Maximum Likelihood being the ‘MEAN’ Value). Thus, although the probability of the value 23.567435 °c is null, likelihood can be obtained by considering the application of the probability density function for a specific value of the random value for fixed parameters mu and var, so the likelihood of the precise value 23.567435 °c can be computed.
Is my analysis correct?
Thanks in advance!
Jim Frost says
Hi Stéphane,
Most of what you write is correct. But here are a few clarifications.
Probability and likelihood are synonyms in this context. The problem with writing about this stuff is reusing the word “probability” over and over gets repetitive. Use the probability density function to find a probability. Add in discussions about probability density and probability density plots, and the word is used over and over. Hence, synonyms, such as likelihood and chance! Take them to all the mean the same in this context. And, indeed, you can’t define a word using the word itself, so for a variety of reasons it’s good to use synonyms.
Probability density function is a function for a continuous variable whose integral provides the probability for an interval. It doesn’t have to be the bell curve (i.e, normal distribution), but it can be.
Likelihood is not a specific point on a curve (bell curve or otherwise). Again, as I point out in this post, the probability (or likelihood) for a single point of a continuous variable is always zero. There is a maximum likelihood function that you might be thinking of. But that’s a process that’s used to estimate the distribution parameters. It finds the population parameters most likely to have produced your observed random sample. You need those parameter estimates to plug into the PDF.
I hope that helps!
Anders Kallner says
Your recent letter about PDF raises some concerns of the nomenclature you use. As always, a term represents what you decide it shall describe. The decision must be unanimously understood and accepted. If you do not know what a spade is there is no use to call a spade a spade, you may call the item anything! Not only globalization makes it important that critical terms are properly described. In metrology the standard publication is the BIPM/VIM (www.bipm.org/vim). On page XII “range” and “interval” are discussed. “Range” is defined by a single number with no place on the number line whereas “interval” is represented by the values of the lower and upper limits and thus fixed on the number line. Therefore, your statement that in a PDF of a continuous variable the probability cannot be given for a single value, only for a range raises an eye-brow or two! No doubt my comment may be regarded as a de lana caprina rixari but particularly for us aliens the Anglo-Saxon laissez-faire attitude to linguistic standardization often is a source of confusion, particularly when the colloquial meaning of a scientific term is implied rather than that defined. I suppose this is partly because of the dominance of the “English language” in science.
There are many more examples; those which you may come across are, for instance, “accuracy” and “sensitivity”. See VIM!
Jim Frost says
Hi Anders,
I am sure that your comment is meant as humorous, but it raises my eyebrow (to use your expression). Yes, there are cases where “range” is used in the way you suggest. However, that’s not the only acceptable use. All languages have their quirks!
Let me introduce you to Cambridge dictionary definition for “range.” Two of the definitions are the following:
“[Noun] The amount, number, or type of something between an upper and a lower limit:
* The price range is from $100 to $500.
* The product is aimed at young people in the 18–25 age range.”
“[Verb] To have an upper and a lower limit in amount, number, etc.:
* Dress sizes range from petite to extra large.
* Prices range between $50 and $250.”
In my IQ example in this post, I talk about IQ scores that range [verb] from 120 to 140. Or, you can say that the IQ range [noun] is 120 to 140. Interval is also correct.