What is the Lognormal Distribution?
The lognormal distribution is a continuous probability distribution that models right-skewed data. The unimodal shape of the lognormal distribution is comparable to the Weibull and loglogistic distributions.
Statisticians use this distribution to model growth rates that are independent of size, which frequently occurs in biology and financial areas. It also models time to failure in reliability studies, rainfall amounts, species abundance, and the number of moves in chess games. Read my post to see how it models global income distributions. In my post about how to identify the distribution of your data, I discover that the lognormal distribution provides the best fit to my data for the body fat percentages of middle school girls.
As the name implies, the lognormal distribution is related to logs and the normal distribution. Let’s see how that works!
If your data follows a lognormal distribution and you transform it by taking the natural log of all values, the new values will fit a normal distribution. In other words, when your variable X follows a lognormal distribution, Ln(X) fits a normal distribution. Hence, you take the logs and get a normal distribution . . . lognormal.
You can exponentiate a normal distribution (exp (X)) to obtain the lognormal distribution. In this manner, you can transform back and forth between pairs of related lognormal and normal distributions.
The sum of many independent and identically distributed (IID) variables frequently produces a normal distribution. However, the product of many IID variables creates a lognormal distribution. Consider the following to understand why:
If y = x1x2x3, then ln(y) = ln(x1) + ln(x2) + ln(x3)
Because of the multiplication process behind lognormal distributions, the geometric mean can be a better measure of central tendency than the arithmetic mean for this distribution.
Lognormal Distribution Parameters
There are several ways to parameterize the lognormal distribution. I’ll use the location, scale, and threshold parameters. The values of the location and scale parameters relate to the normal distribution that the log-transformed data follow, which statisticians also refer to as the logged distribution.
Specifically, when you have a normal distribution with the mean of µ and a standard deviation of σ, the lognormal distribution uses these values as its location and scale parameters, respectively.
Threshold Parameter
The threshold parameter defines the minimum value in a lognormal distribution. All values must be greater than the threshold. Therefore, negative threshold values let the distribution handle both positive and negative values. Zero allows the distribution to contain only positive values.
When you hold the location and scale parameters constant, the threshold shifts the distribution left and right, as shown below.
Lognormal Location Parameter (µ)
The location represents the peak (mean, median, and mode) of the normally distributed data. In the lognormal distribution, take e and raise it by the location value (elocation) to find the median of the lognormal distribution.
In the graph below, I hold the threshold and scale parameters constant to highlight the effect of changing the location parameter.
The plot below is from my post where I use these distributions to model global incomes. It illustrates how the location parameter is the median of this distribution. The graph below displays the probability distribution function for this lognormal distribution. Learn more about Probability Density Functions.
I’ve shaded 50% of the distribution, which corresponds to the median value of 28,788. You can also obtain this value by taking e and raising it by the location value. In this case, e10.2677 = 28,788.
Scale Parameter (σ)
The scale represents the standard deviation of the normally distributed data.
In the chart below, I hold the threshold and location parameters constant to emphasize the effect of changing the scale parameter.
Katherine says
Hello Jim,
Thank you very much Jim, this helps a lot. Since, I do not have the raw data I do not know the shift, however since my variable is bio-accumulation factor or BAF, I know that the minimum value should be greater than zero. I wonder if I can make an assumption for the shift parameter. I wonder if it can be assumed to be a small positive value such as 0.01.
Katherine Parakal says
Hello Jim,
Thank you for the information on log normal distribution. It really helped me understand more about it. I have a question. I just have the arithmetic mean and standard deviation of a data. I do not have the raw data. However, from literature this data is likely to be log normally distributed. Is there a way to calculate location parameter and shift parameter using just arithmetic mean and standard deviation.
Jim Frost says
Hi Katherine
Thank you for your question and for finding my explanation of the log-normal distribution helpful.
Yes, it is possible to estimate the parameters of a log-normal distribution using only the arithmetic mean and standard deviation of your data, assuming the data follows a standard log-normal distribution. Hereโs how you can do it:
To estimate the location parameter (mu) and scale parameter (sigma) of the log-normal distribution, we use the following transformations:
1. Location Parameter (mu): mu = ln((M^2) / sqrt(SD^2 + M^2))
2. Scale Parameter (sigma): sigma = sqrt(ln((SD^2 / M^2) + 1))
Where:
– M is the arithmetic mean of your data.
– SD is the standard deviation of your data.
– ln is the natural logarithm.
– sqrt is the square root function.
These equations assume that the data fits a pure log-normal distribution without any shift. If you suspect that the data might be a shifted log-normal distribution (i.e., it doesn’t start at zero), additional information about the shift would be necessary.
This method provides a reasonable estimate of the parameters under the assumption that the given mean and standard deviation accurately represent the underlying log-normal distribution.
I hope this helps!
Greg says
This info is not terribly useful without how to calculate the parameters from your data.
It’s not often that anyone constructs a distribution from scratch as opposed to fitting a distribution to observed data.
Jim Frost says
I’m sure there’s a nicer way to phrase your question. ๐
(I almost didn’t approve the comment because of your tone.)
At any rate, read How to Identify the Distribution of your Data for more information on that topic.
Raymond A Sirianne says
You give a location parameter of 10.2677 for the graph of US Income in 2006, but you don’t say where that number came from. Could you clarify, please?
Jim Frost says
Hi Raymond,
I include the reference right on the graph. I include that article in the reference section of the following blog post: A Statistical Thanksgiving: Global Income Distributions. They conducted a rigorous income study for the numerous countries they include in their article and I used their parameter estimates. Please go to my other blog post to get the reference for the original study and read that to see how they estimated the parameters. It was quite an extensive process and far too lengthy to cover here.
diana kornbrot says
I need to find shift parameter for SHIFTED log normal parameters for variable t. Distirubiton is such that log(t-t0) is normally distributed. I need t0 and mean iof distribution. my design has 20 people in 4 conditions, so thare are 80 distributions. I do not wan to do this by hand! I have access to SPSS. Any help gratefully recived
Andrew C says
G’day Jim – I discovered your website yesterday and love it.
I have a gnarly business question for you.
This is about supply, returns and (constrained) sales. Consider the case of a company that makes fresh product daily, and delivers it daily to client outlets (call this “supply”). It has a one day shelf life. The company reps go into the client stores to supply the product; they also collect “returns” (unsold product to the client store’s consumers) from the previous day. So the net sales is considered to be “supply less returns”. If returns>1, then we can be sure that sales is equal to the real consumer demand. The problem is, if returns=0, then we don’t know whether we constrained the sales because our initial supply was too low.
My question: is there a way of inferring what the sales would have been (if unconstrained) by using a statistical approach? I can eyeball a sales frequency histogram and see clearly that sales have been cut off – like truncating the right side of the tail of a normal distribution (or lognormal). Is there a way to estimate the “shape” of the missing data? (PS. I can confirm that the highest sales records are also the ones with 0 returns). Have you ever had to deal with this kind of problem?!
vanceh says
HI Jim,
I’m interested in forecasting expected values of processes that have lognormal distributions. The literature seems to always use the arithmetic mean (AM) for these forecasts, but my understanding is that the probability of meeting or exceeding forecasts based on the probability weighed drops with increased variance. For example, if I throw a dice 10K times, multiplying the values from the throws together it appears to me that the standard methods for this problem forecast the “Expected Value” as E[X]^10K, where E[X} = 3.5. This seem absurd to me, because the Law of Large Numbers would insist that the distribution of those throws would be quite close to uniform. This would give an result close to the Geometric mean of a standard dice’s values to the Nth power ~ 2.9938^10K. The 3.5^10K result by my calculations would require a 26+ sigma event (!) to meet or exceed–hardly something one should “expect”. Where am I going wrong here? Thanks, Vance