Introduction to Log-Normal Distribution

The log-normal distribution is a continuous probability distribution of a random variable whose logarithm follows a normal distribution. This distribution is particularly useful for modeling positive-valued phenomena that are skewed or have outliers, such as income distributions, particle sizes, survival times, and financial returns. The log-normal distribution is widely used in finance, engineering, environmental science, and insurance due to its flexibility in modeling right-skewed data.

Definition of Log-Normal Distribution

A continuous random variable $X$ is said to have a log-normal distribution if the random variable $Y = \ln(X)$ has a normal distribution with mean $\mu$ and standard deviation $\sigma$.

The probability density function (PDF) of $X$ is:

$$ f(x;\mu,\sigma) = \begin{cases} \frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{1}{2\sigma^2}(\ln x -\mu)^2}, & x \geq 0 \ 0, & x < 0 \end{cases} $$

where:

  • $\mu$ is the location parameter (mean of $\ln X$)
  • $\sigma$ is the scale parameter (standard deviation of $\ln X$, $\sigma > 0$)

Notation: $X \sim LN(\mu, \sigma^2)$ or $X \sim \text{LogNormal}(\mu, \sigma)$

Relationship to Normal Distribution: If $Y \sim N(\mu, \sigma^2)$, then $X = e^Y \sim LN(\mu, \sigma^2)$

Standard Log-Normal Distribution

The standard log-normal distribution is obtained by setting $\mu = 0$ and $\sigma = 1$:

$$ f(x) = \frac{1}{\sqrt{2\pi}x}e^{-\frac{1}{2}(\ln x)^2}, \quad x \geq 0 $$

Probability Density Function (PDF)

The PDF of the log-normal distribution is:

$$f(x) = \frac{1}{\sqrt{2\pi}\sigma x}\exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0$$

Properties of the PDF

  • Support: All positive real numbers $(0, \infty)$
  • Shape: Always right-skewed (positively skewed)
  • Flexibility: Parameters $\mu$ and $\sigma$ control location and spread
  • Maximum: Occurs at $x = e^{\mu - \sigma^2}$ (the mode)

Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) is:

$$F(x) = P(X \leq x) = P(\ln X \leq \ln x) = P(Y \leq \ln x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)$$

where $\Phi$ is the CDF of the standard normal distribution.

In terms of the standard normal CDF:

$$F(x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)$$

This allows us to compute log-normal probabilities using standard normal tables.

Key Properties of Log-Normal Distribution

Mean of Log-Normal Distribution

The mean (expected value) of the log-normal distribution is:

$$E(X) = e^{\mu + \frac{\sigma^2}{2}}$$

Proof

Since $Y = \ln X \sim N(\mu, \sigma^2)$, the $r^{th}$ raw moment is:

$$ \begin{eqnarray*} \mu_r^\prime &=& E(X^r) = E(e^{rY})\ &=& M_Y(r)\ &=& e^{\mu r + \frac{1}{2}r^2\sigma^2} \end{eqnarray*} $$

For $r = 1$: $$E(X) = e^{\mu + \frac{\sigma^2}{2}}$$

Variance of Log-Normal Distribution

The variance of the log-normal distribution is:

$$V(X) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1)$$

Proof

First, find $E(X^2)$. Using $r = 2$:

$$E(X^2) = e^{2\mu + 2\sigma^2}$$

Then:

$$ \begin{eqnarray*} V(X) &=& E(X^2) - [E(X)]^2\ &=& e^{2\mu + 2\sigma^2} - (e^{\mu + \frac{\sigma^2}{2}})^2\ &=& e^{2\mu + 2\sigma^2} - e^{2\mu + \sigma^2}\ &=& e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) \end{eqnarray*} $$

Standard Deviation

$$\sigma_X = \sqrt{V(X)} = e^{\mu + \frac{\sigma^2}{2}}\sqrt{e^{\sigma^2} - 1}$$

Coefficient of Variation

The coefficient of variation is:

$$CV = \frac{\sigma_X}{E(X)} = \sqrt{e^{\sigma^2} - 1}$$

Note that the CV depends only on the scale parameter $\sigma$, not on $\mu$.

Median of Log-Normal Distribution

The median of the log-normal distribution is:

$$\text{Median} = e^{\mu}$$

This is because $P(\ln X \leq \mu) = 0.5$, which means $P(X \leq e^{\mu}) = 0.5$.

Quartiles of Log-Normal Distribution

The quartiles of the log-normal distribution are:

  • First Quartile: $Q_1 = e^{\mu - 0.675\sigma}$
  • Second Quartile (Median): $Q_2 = e^{\mu}$
  • Third Quartile: $Q_3 = e^{\mu + 0.675\sigma}$

where $0.675$ corresponds to the standard normal quantile $z_{0.25}$.

Proof

For the $i^{th}$ quartile:

$$P(X \leq Q_i) = \frac{i}{4}$$

$$P\left(\frac{\ln X - \mu}{\sigma} \leq \frac{\ln Q_i - \mu}{\sigma}\right) = \frac{i}{4}$$

$$P\left(Z \leq \frac{\ln Q_i - \mu}{\sigma}\right) = \frac{i}{4}$$

where $Z \sim N(0,1)$.

Therefore: $$\frac{\ln Q_i - \mu}{\sigma} = z_{i/4}$$

$$Q_i = e^{\mu + \sigma z_{i/4}}$$

Mode of Log-Normal Distribution

The mode of the log-normal distribution is:

$$\text{Mode} = e^{\mu - \sigma^2}$$

Proof

Taking the derivative of the PDF and setting it to zero:

$$f’(x) = 0 \Rightarrow \frac{d}{dx}\left[\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{1}{2\sigma^2}(\ln x - \mu)^2}\right] = 0$$

This simplifies to:

$$-\frac{1}{x^2}\left[\frac{\ln x - \mu}{\sigma^2} + 1\right] = 0$$

$$\ln x = \mu - \sigma^2$$

$$x = e^{\mu - \sigma^2}$$

Raw Moments

The $r^{th}$ raw moment of the log-normal distribution is:

$$\mu_r^\prime = E(X^r) = e^{\mu r + \frac{1}{2}r^2\sigma^2}$$

This gives us:

  • $\mu_1^\prime = E(X) = e^{\mu + \frac{\sigma^2}{2}}$
  • $\mu_2^\prime = E(X^2) = e^{2\mu + 2\sigma^2}$
  • $\mu_3^\prime = E(X^3) = e^{3\mu + \frac{9\sigma^2}{2}}$

Moment Generating Function

The moment generating function of the log-normal distribution does not exist in closed form. However, since $Y = \ln X \sim N(\mu, \sigma^2)$, we can use the MGF of the normal distribution to derive moments.

Characteristic Function

The characteristic function is:

$$\phi_X(t) = E(e^{itX}) = \sum_{n=0}^{\infty} \frac{(it)^n}{n!} e^{\mu n + \frac{\sigma^2 n^2}{2}}$$

Skewness and Kurtosis

Skewness

The coefficient of skewness is:

$$\beta_1 = (e^{\sigma^2} + 2)\sqrt{e^{\sigma^2} - 1}$$

The log-normal distribution is always right-skewed (positively skewed). Skewness increases with $\sigma$.

Kurtosis

The coefficient of kurtosis is:

$$\beta_2 = e^{4\sigma^2} + 2e^{3\sigma^2} + 3e^{2\sigma^2} - 3$$

The log-normal distribution is always leptokurtic (heavy-tailed).

Properties Summary Table

Property Formula
PDF $f(x) = \frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\ln x-\mu)^2}{2\sigma^2}}$
CDF $F(x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)$
Support $(0, \infty)$
Mean $e^{\mu + \sigma^2/2}$
Median $e^{\mu}$
Mode $e^{\mu - \sigma^2}$
Variance $e^{2\mu + \sigma^2}(e^{\sigma^2} - 1)$
Std. Deviation $e^{\mu + \sigma^2/2}\sqrt{e^{\sigma^2} - 1}$
Coeff. of Variation $\sqrt{e^{\sigma^2} - 1}$
Skewness $(e^{\sigma^2} + 2)\sqrt{e^{\sigma^2} - 1}$
$Q_1$ $e^{\mu - 0.675\sigma}$
$Q_3$ $e^{\mu + 0.675\sigma}$

Relationship to Normal Distribution

The log-normal distribution has a fundamental relationship with the normal distribution:

  • If $Y \sim N(\mu, \sigma^2)$, then $X = e^Y \sim LN(\mu, \sigma^2)$
  • If $X \sim LN(\mu, \sigma^2)$, then $Y = \ln X \sim N(\mu, \sigma^2)$

This relationship allows us to compute probabilities for log-normal random variables using standard normal tables.

Examples with Solutions

Example 1: Electronic Component Lifetime

Problem: The lifetime (in days) of a certain electronic component that operates in a high-temperature environment is log-normally distributed with $\mu = 1.2$ and $\sigma = 0.5$.

a. Find the mean and variance of the component lifetime b. Find the probability that the component works till 4 days c. Find the probability that the component works more than 5 days d. Find the probability that the component works between 3 and 5 days

Solution:

Given: $X \sim LN(\mu = 1.2, \sigma = 0.5)$

This means $Y = \ln X \sim N(1.2, 0.25)$

Part (a): Mean and Variance

$$E(X) = e^{\mu + \sigma^2/2} = e^{1.2 + 0.5^2/2} = e^{1.2 + 0.125} = e^{1.325} \approx 3.762 \text{ days}$$

$$V(X) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) = e^{2(1.2) + 0.5^2}(e^{0.5^2} - 1)$$ $$= e^{2.65}(e^{0.25} - 1) = 14.154 \times 0.284 \approx 4.020$$

$$\sigma_X = \sqrt{4.020} \approx 2.005 \text{ days}$$

Part (b): $P(X < 4)$

Convert to standard normal:

$$Z = \frac{\ln X - \mu}{\sigma} = \frac{\ln 4 - 1.2}{0.5} = \frac{1.386 - 1.2}{0.5} = \frac{0.186}{0.5} = 0.372$$

$$P(X < 4) = P(Z < 0.372) \approx 0.6443$$

The probability that the component works till 4 days is approximately 64.43%.

Part (c): $P(X > 5)$

$$Z = \frac{\ln 5 - 1.2}{0.5} = \frac{1.609 - 1.2}{0.5} = \frac{0.409}{0.5} = 0.818$$

$$P(X > 5) = 1 - P(X < 5) = 1 - P(Z < 0.818)$$ $$\approx 1 - 0.7939 = 0.2061$$

The probability that the component works more than 5 days is approximately 20.61%.

Part (d): $P(3 < X < 5)$

For $X = 3$: $$Z_1 = \frac{\ln 3 - 1.2}{0.5} = \frac{1.099 - 1.2}{0.5} = \frac{-0.101}{0.5} = -0.202$$

For $X = 5$: $$Z_2 = 0.818$$ (from part c)

$$P(3 < X < 5) = P(Z_1 < Z < Z_2)$$ $$= P(Z < 0.818) - P(Z < -0.202)$$ $$\approx 0.7939 - 0.4207 = 0.3732$$

The probability that the component works between 3 and 5 days is approximately 37.32%.

Example 2: Income Distribution

Problem: Income (in thousands of dollars) in a certain population follows a log-normal distribution with parameters $\mu = 3.5$ and $\sigma = 0.8$.

a. Find the median income b. Find the mean income c. What percentage of the population earns less than $30,000? d. What is the income level below which 75% of the population earns?

Solution:

Given: $X \sim LN(\mu = 3.5, \sigma = 0.8)$ (in thousands)

Part (a): Median Income

$$\text{Median} = e^{\mu} = e^{3.5} \approx 33.115 \text{ thousand dollars = } $33,115$$

Part (b): Mean Income

$$E(X) = e^{\mu + \sigma^2/2} = e^{3.5 + 0.8^2/2} = e^{3.5 + 0.32} = e^{3.82}$$ $$\approx 45.864 \text{ thousand dollars = } $45,864$$

Part (c): $P(X < 30)$

$$Z = \frac{\ln 30 - 3.5}{0.8} = \frac{3.401 - 3.5}{0.8} = \frac{-0.099}{0.8} = -0.124$$

$$P(X < 30) = P(Z < -0.124) \approx 0.4507$$

Approximately 45.07% of the population earns less than $30,000.

Part (d): 75th Percentile (Third Quartile)

$$Q_3 = e^{\mu + 0.675\sigma} = e^{3.5 + 0.675(0.8)} = e^{3.5 + 0.54} = e^{4.04}$$ $$\approx 56.922 \text{ thousand dollars = } $56,922$$

75% of the population earns less than approximately $56,922.

Example 3: Particle Size Distribution

Problem: Particle sizes (in micrometers) in an industrial process follow a log-normal distribution with mean $\mu_X = 50$ and variance $\sigma_X^2 = 400$.

Find the parameters $\mu$ and $\sigma$ of the log-normal distribution.

Solution:

Given: $E(X) = 50$ and $V(X) = 400$

From the mean formula: $$E(X) = e^{\mu + \sigma^2/2} = 50$$

From the variance formula: $$V(X) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) = 400$$

Let $\alpha = e^{\sigma^2}$. Then: $$E(X)^2 = e^{2\mu + \sigma^2} = 2500$$

From the variance formula: $$e^{2\mu + \sigma^2}(\alpha - 1) = 400$$ $$2500(\alpha - 1) = 400$$ $$\alpha - 1 = 0.16$$ $$\alpha = 1.16$$ $$e^{\sigma^2} = 1.16$$ $$\sigma^2 = \ln(1.16) = 0.148$$ $$\sigma = \sqrt{0.148} \approx 0.385$$

From the mean formula: $$e^{\mu + \sigma^2/2} = 50$$ $$\mu + 0.148/2 = \ln 50$$ $$\mu = 3.912 - 0.074 = 3.838$$

Therefore: $\mu \approx 3.838$ and $\sigma \approx 0.385$

When to Use Log-Normal Distribution

The log-normal distribution is appropriate when:

  1. Right-Skewed Data: Data with positive skew and no negative values
  2. Multiplicative Processes: Products of independent factors
  3. Income/Wealth: Income distributions and wealth concentrations
  4. Particle Sizes: In materials science and geology
  5. Survival Times: In medical and biological studies
  6. Financial Returns: Asset prices and returns (with caveats)
  7. Environmental Data: Pollution concentrations, species abundance
  8. Droplet/Bubble Sizes: In fluid mechanics and multiphase flow

Applications

Finance

  • Stock prices and returns (under certain conditions)
  • Asset valuations
  • Option pricing

Environmental Science

  • Pollutant concentrations
  • Species abundance
  • Soil properties

Engineering and Manufacturing

  • Particle size distributions
  • Material properties
  • Component lifetimes

Medicine and Biology

  • Survival times
  • Tumor sizes
  • Bacterial cell counts
  • Disease incubation periods

Economics

  • Income distributions
  • Household expenditures
  • Firm size distributions

Advantages and Disadvantages

Advantages

  • Right Skewness: Naturally models right-skewed positive data
  • Multiplicative Property: Appropriate for processes involving multiplication
  • Normal Equivalence: Uses well-known normal distribution theory
  • Flexibility: Two parameters provide good fit to varied data
  • No Closed Form CDF Issues: Uses normal CDF

Disadvantages

  • No Closed-Form MGF: Complex for some theoretical work
  • Interpretation: Parameters less intuitive than mean/variance
  • Parameter Estimation: Requires specific methods
  • Outlier Sensitivity: Sensitive to extreme values

Connection to Other Distributions

  • Normal Distribution: Fundamental relationship: $\ln X \sim N(\mu, \sigma^2)$
  • Exponential: Exponential of exponential random variable
  • Weibull: Both model positive values with flexibility
  • Gamma: Both used for positive-valued data
  • Exponential Family: Not strictly an exponential family distribution

Conclusion

The log-normal distribution is an essential tool for statisticians and researchers working with positive-valued, right-skewed data. Its fundamental relationship to the normal distribution makes it theoretically tractable while its flexibility makes it practically valuable. Applications range from income distributions to environmental science to financial modeling. The ability to transform problems involving log-normal variables into normal distribution problems;through the logarithmic transformation;provides computational convenience and theoretical insight.

Understanding when and how to use the log-normal distribution is crucial for accurate data analysis in fields as diverse as economics, environmental science, engineering, and medicine.