## What is the Lognormal Distribution?

The lognormal distribution is a continuous probability distribution that models right-skewed data. The shape of the lognormal distribution is comparable to the Weibull and loglogistic distributions.

Statisticians use this distribution to model growth rates that are independent of size, which frequently occurs in biology and financial areas. It also models time to failure in reliability studies, rainfall amounts, species abundance, and the number of moves in chess games. Read my post to see how it models global income distributions. In my post about how to identify the distribution of your data, I discover that the lognormal distribution provides the best fit to my data for the body fat percentages of middle school girls.

As the name implies, the lognormal distribution is related to logs and the normal distribution. Let’s see how that works!

If your data follows a lognormal distribution and you transform it by taking the natural log of all values, the new values will fit a normal distribution. In other words, when your variable X follows a lognormal distribution, Ln(X) fits a normal distribution. Hence, you take the logs and get a normal distribution . . . lognormal.

You can exponentiate a normal distribution (exp (X)) to obtain the lognormal distribution. In this manner, you can transform back and forth between pairs of related lognormal and normal distributions.

The sum of many independent and identically distributed (IID) variables frequently produces a normal distribution. However, the product of many IID variables creates a lognormal distribution. Consider the following to understand why:

If y = x1x2x3, then ln(y) = ln(x1) + ln(x2) + ln(x3)

Because of the multiplication process behind lognormal distributions, the geometric mean can be a better measure of central tendency than the arithmetic mean for this distribution.

## Lognormal Distribution Parameters

There are several ways to parameterize the lognormal distribution. I’ll use the location, scale, and threshold parameters. The values of the location and scale parameters relate to the normal distribution that the log-transformed data follow, which statisticians also refer to as the logged distribution.

Specifically, when you have a normal distribution with the mean of µ and a standard deviation of σ, the lognormal distribution uses these values as its location and scale parameters, respectively.

### Threshold Parameter

The threshold parameter defines the minimum value in a lognormal distribution. All values must be greater than the threshold. Therefore, negative threshold values let the distribution handle both positive and negative values. Zero allows the distribution to contain only positive values.

When you hold the location and scale parameters constant, the threshold shifts the distribution left and right, as shown below.

### Lognormal Location Parameter (µ)

The location represents the peak (mean, median, and mode) of the normally distributed data. In the lognormal distribution, take e and raise it by the location value (e^{location}) to find the median of the lognormal distribution.

In the graph below, I hold the threshold and scale parameters constant to highlight the effect of changing the location parameter.

The plot below is from my post where I use these distributions to model global incomes. It illustrates how the location parameter is the median of this distribution. The graph below displays the probability distribution function for this lognormal distribution. Learn more about Probability Density Functions.

I’ve shaded 50% of the distribution, which corresponds to the median value of 28,788. You can also obtain this value by taking e and raising it by the location value. In this case, e^{10.2677} = 28,788.

### Scale Parameter (σ)

The scale represents the standard deviation of the normally distributed data.

In the chart below, I hold the threshold and location parameters constant to emphasize the effect of changing the scale parameter.

Andrew C says

G’day Jim – I discovered your website yesterday and love it.

I have a gnarly business question for you.

This is about supply, returns and (constrained) sales. Consider the case of a company that makes fresh product daily, and delivers it daily to client outlets (call this “supply”). It has a one day shelf life. The company reps go into the client stores to supply the product; they also collect “returns” (unsold product to the client store’s consumers) from the previous day. So the net sales is considered to be “supply less returns”. If returns>1, then we can be sure that sales is equal to the real consumer demand. The problem is, if returns=0, then we don’t know whether we constrained the sales because our initial supply was too low.

My question: is there a way of inferring what the sales would have been (if unconstrained) by using a statistical approach? I can eyeball a sales frequency histogram and see clearly that sales have been cut off – like truncating the right side of the tail of a normal distribution (or lognormal). Is there a way to estimate the “shape” of the missing data? (PS. I can confirm that the highest sales records are also the ones with 0 returns). Have you ever had to deal with this kind of problem?!

vanceh says

HI Jim,

I’m interested in forecasting expected values of processes that have lognormal distributions. The literature seems to always use the arithmetic mean (AM) for these forecasts, but my understanding is that the probability of meeting or exceeding forecasts based on the probability weighed drops with increased variance. For example, if I throw a dice 10K times, multiplying the values from the throws together it appears to me that the standard methods for this problem forecast the “Expected Value” as E[X]^10K, where E[X} = 3.5. This seem absurd to me, because the Law of Large Numbers would insist that the distribution of those throws would be quite close to uniform. This would give an result close to the Geometric mean of a standard dice’s values to the Nth power ~ 2.9938^10K. The 3.5^10K result by my calculations would require a 26+ sigma event (!) to meet or exceed–hardly something one should “expect”. Where am I going wrong here? Thanks, Vance