Lognormal distribution

The lognormal distribution models variables that are always positive and right-skewed, where the logarithm of the variable follows a normal distribution. It appears naturally whenever a quantity is the product of many independent random factors.

Definition

A random variable \(X\) follows a lognormal distribution with parameters \(\mu \in \mathbb{R}\) and \(\sigma > 0\), written \(X \sim \text{Lognormal}(\mu, \sigma)\), if \(Y = \ln(X) \sim N(\mu, \sigma)\).

Its probability density function is:

\[f(x) = \frac{1}{x\sigma\sqrt{2\pi}}\exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0\]

⚠️ μ and σ are parameters of ln(X), not of X itself

This is the most common source of confusion. In \(X \sim \text{Lognormal}(\mu, \sigma)\):

  • \(\mu\) is the mean of \(\ln(X)\), not of \(X\).
  • \(\sigma\) is the standard deviation of \(\ln(X)\), not of \(X\).

The actual mean of \(X\) is \(e^{\mu + \sigma^2/2}\), which is always larger than \(e^\mu\). If someone tells you “this variable has a lognormal distribution with mean 0 and SD 1”, they almost certainly mean that \(\ln(X) \sim N(0,1)\), not that \(E(X) = 0\).

The CDF has a clean form in terms of the standard normal CDF \(\Phi\):

\[F(x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)\]

This makes probability calculations straightforward: standardize \(\ln(x)\) and look up the standard normal table.

PDF and CDF

PDF and CDF of the lognormal distribution for different parameters

Properties

For \(X \sim \text{Lognormal}(\mu, \sigma)\):

  1. Expected Value (Mean)

\[E(X) = e^{\mu + \sigma^2/2}\]

  1. Variance

\[\text{Var}(X) = \left(e^{\sigma^2} - 1\right)e^{2\mu + \sigma^2}\]

  1. Median

\[\text{Median}(X) = e^{\mu}\]

The median is always less than the mean (since \(e^{\mu} < e^{\mu + \sigma^2/2}\)), reflecting the right skew.

  1. Mode

\[\text{Mode}(X) = e^{\mu - \sigma^2}\]

The mode is the smallest of the three: Mode \(<\) Median \(<\) Mean, a hallmark of right-skewed distributions.

  1. Skewness

\[\text{Skewness} = \left(e^{\sigma^2} + 2\right)\sqrt{e^{\sigma^2} - 1}\]

Always positive. For \(\sigma = 1\): skewness \(\approx 6.18\).

  1. Kurtosis

\[g_2 = e^{4\sigma^2} + 2e^{3\sigma^2} + 3e^{2\sigma^2} - 6\]

For \(\sigma = 1\): \(g_2 \approx 110.9\), extremely leptokurtic. The lognormal has very heavy tails for large \(\sigma\).

  1. Quantile Function

\[Q(p) = e^{\mu + \sigma\,\Phi^{-1}(p)}\]

where \(\Phi^{-1}\) is the standard normal quantile function.

Why the lognormal appears in practice

The lognormal arises naturally whenever a quantity is the product of many independent positive random factors. By the Central Limit Theorem applied to logarithms:

\[\ln(X) = \ln(X_1 \cdot X_2 \cdots X_n) = \sum_{i=1}^n \ln(X_i) \xrightarrow{} N(\mu, \sigma)\]

So \(X\) is lognormal. This is why it describes:

  • Stock prices: today’s price is yesterday’s price multiplied by a random return factor.
  • Income distributions: salaries result from multiplicative adjustments over a career.
  • Biological sizes: cell growth is multiplicative at each division.
  • Repair times: each step in a repair process multiplies the previous duration.

💡 Log-transform and work in normal scale

The key practical advantage of the lognormal: if \(X\) is lognormal, then \(\ln(X)\) is normal, and all normal distribution tools apply. To find \(P(X \leq x)\), compute \(P(\ln X \leq \ln x) = \Phi((\ln x - \mu)/\sigma)\). To find the 90th percentile of \(X\), find the 90th percentile of \(\ln(X)\) and exponentiate. Working in log scale turns a lognormal problem into a normal problem.

Step-by-step example

The daily returns of a stock are approximately normal, so its price follows a lognormal process. Suppose the log-price changes have \(\mu = 0.001\) (daily drift) and \(\sigma = 0.02\) (daily volatility). The stock currently trades at 100 USD.

After one trading day, the price \(P \sim 100 \cdot \text{Lognormal}(0.001, 0.02)\).

Expected price tomorrow:

\[E(P) = 100 \cdot e^{0.001 + 0.02^2/2} = 100 \cdot e^{0.0012} \approx 100.12 \text{ USD}\]

Probability the price drops below 98 USD:

Standardize: \(z = (\ln(98/100) - 0.001)/0.02 = (\ln(0.98) - 0.001)/0.02 \approx (-0.0202 - 0.001)/0.02 \approx -1.06\)

\[P(P < 98) = \Phi(-1.06) \approx 0.145\]

About 14.5% chance of a drop below 98 USD in one day.

90th percentile of tomorrow’s price:

\[Q(0.90) = 100 \cdot e^{0.001 + 0.02 \times 1.282} \approx 100 \cdot e^{0.02664} \approx 102.70 \text{ USD}\]

More lognormal examples

  • Income: household incomes in a country follow approximately \(\text{Lognormal}(10.5, 0.8)\) (in local currency units). Mean income: \(e^{10.5 + 0.32} \approx e^{10.82} \approx 50{,}000\). Median income: \(e^{10.5} \approx 36{,}000\). The mean is 39% higher than the median, reflecting the long right tail of high earners.

  • Website response time: request latency often follows a lognormal distribution. If \(\mu = 2\) and \(\sigma = 0.5\) (in log-milliseconds), the median latency is \(e^2 \approx 7.4\) ms and the 99th percentile is \(e^{2 + 0.5 \times 2.326} \approx e^{3.163} \approx 23.6\) ms.

Example icon

💡 Relationship with other distributions

  • Normal: \(\ln(X) \sim N(\mu, \sigma)\) by definition.
  • Exponential: a special case of lognormal is not exact, but both model positive right-skewed data. Use exponential for memoryless processes, lognormal for multiplicative ones.
  • Pareto: for very heavy tails (power-law behavior), the Pareto distribution is more appropriate than the lognormal.
  • Log-transformation in regression: log-linear regression models assume the response is lognormal given the predictors.