What is the Lognormal Distribution?
Statisticians use this distribution to model growth rates that are independent of size, which frequently occurs in biology and financial areas. It also models time to failure in reliability studies, rainfall amounts, species abundance, and the number of moves in chess games. Read my post to see how it models global income distributions. In my post about how to identify the distribution of your data, I discover that the lognormal distribution provides the best fit to my data for the body fat percentages of middle school girls.
As the name implies, the lognormal distribution is related to logs and the normal distribution. Let’s see how that works!
If your data follows a lognormal distribution and you transform it by taking the natural log of all values, the new values will fit a normal distribution. In other words, when your variable X follows a lognormal distribution, Ln(X) fits a normal distribution. Hence, you take the logs and get a normal distribution . . . lognormal.
You can exponentiate a normal distribution (exp (X)) to obtain the lognormal distribution. In this manner, you can transform back and forth between pairs of related lognormal and normal distributions.
The sum of many independent and identically distributed (IID) variables frequently produces a normal distribution. However, the product of many IID variables creates a lognormal distribution. Consider the following to understand why:
If y = x1x2x3, then ln(y) = ln(x1) + ln(x2) + ln(x3)
Lognormal Distribution Parameters
There are several ways to parameterize the lognormal distribution. I’ll use the location, scale, and threshold parameters. The values of the location and scale parameters relate to the normal distribution that the log-transformed data follow, which statisticians also refer to as the logged distribution.
Specifically, when you have a normal distribution with the mean of µ and a standard deviation of σ, the lognormal distribution uses these values as its location and scale parameters, respectively.
The threshold parameter defines the minimum value in a lognormal distribution. All values must be greater than the threshold. Therefore, negative threshold values let the distribution handle both positive and negative values. Zero allows the distribution to contain only positive values.
When you hold the location and scale parameters constant, the threshold shifts the distribution left and right, as shown below.
Lognormal Location Parameter (µ)
The location represents the peak (mean, median, and mode) of the normally distributed data. In the lognormal distribution, take e and raise it by the location value (elocation) to find the median of the lognormal distribution.
In the graph below, I hold the threshold and scale parameters constant to highlight the effect of changing the location parameter.
The plot below is from my post where I use these distributions to model global incomes. It illustrates how the location parameter is the median of this distribution. The graph below displays the probability distribution function for this lognormal distribution. Learn more about Probability Density Functions.
I’ve shaded 50% of the distribution, which corresponds to the median value of 28,788. You can also obtain this value by taking e and raising it by the location value. In this case, e10.2677 = 28,788.
Scale Parameter (σ)
The scale represents the standard deviation of the normally distributed data.
In the chart below, I hold the threshold and location parameters constant to emphasize the effect of changing the scale parameter.