Introduction to Log-Normal Distribution
The log-normal distribution is a continuous probability distribution of a random variable whose logarithm follows a normal distribution. This distribution is particularly useful for modeling positive-valued phenomena that are skewed or have outliers, such as income distributions, particle sizes, survival times, and financial returns. The log-normal distribution is widely used in finance, engineering, environmental science, and insurance due to its flexibility in modeling right-skewed data.
Definition of Log-Normal Distribution
A continuous random variable $X$ is said to have a log-normal distribution if the random variable $Y = \ln(X)$ has a normal distribution with mean $\mu$ and standard deviation $\sigma$.
The probability density function (PDF) of $X$ is:
$$ f(x;\mu,\sigma) = \begin{cases} \frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{1}{2\sigma^2}(\ln x -\mu)^2}, & x \geq 0 \ 0, & x < 0 \end{cases} $$
where:
- $\mu$ is the location parameter (mean of $\ln X$)
- $\sigma$ is the scale parameter (standard deviation of $\ln X$, $\sigma > 0$)
Notation: $X \sim LN(\mu, \sigma^2)$ or $X \sim \text{LogNormal}(\mu, \sigma)$
Relationship to Normal Distribution: If $Y \sim N(\mu, \sigma^2)$, then $X = e^Y \sim LN(\mu, \sigma^2)$
Standard Log-Normal Distribution
The standard log-normal distribution is obtained by setting $\mu = 0$ and $\sigma = 1$:
$$ f(x) = \frac{1}{\sqrt{2\pi}x}e^{-\frac{1}{2}(\ln x)^2}, \quad x \geq 0 $$
Probability Density Function (PDF)
The PDF of the log-normal distribution is:
$$f(x) = \frac{1}{\sqrt{2\pi}\sigma x}\exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0$$
Properties of the PDF
- Support: All positive real numbers $(0, \infty)$
- Shape: Always right-skewed (positively skewed)
- Flexibility: Parameters $\mu$ and $\sigma$ control location and spread
- Maximum: Occurs at $x = e^{\mu - \sigma^2}$ (the mode)
Cumulative Distribution Function (CDF)
The cumulative distribution function (CDF) is:
$$F(x) = P(X \leq x) = P(\ln X \leq \ln x) = P(Y \leq \ln x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)$$
where $\Phi$ is the CDF of the standard normal distribution.
In terms of the standard normal CDF:
$$F(x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)$$
This allows us to compute log-normal probabilities using standard normal tables.
Key Properties of Log-Normal Distribution
Mean of Log-Normal Distribution
The mean (expected value) of the log-normal distribution is:
$$E(X) = e^{\mu + \frac{\sigma^2}{2}}$$
Proof
Since $Y = \ln X \sim N(\mu, \sigma^2)$, the $r^{th}$ raw moment is:
$$ \begin{eqnarray*} \mu_r^\prime &=& E(X^r) = E(e^{rY})\ &=& M_Y(r)\ &=& e^{\mu r + \frac{1}{2}r^2\sigma^2} \end{eqnarray*} $$
For $r = 1$: $$E(X) = e^{\mu + \frac{\sigma^2}{2}}$$
Variance of Log-Normal Distribution
The variance of the log-normal distribution is:
$$V(X) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1)$$
Proof
First, find $E(X^2)$. Using $r = 2$:
$$E(X^2) = e^{2\mu + 2\sigma^2}$$
Then:
$$ \begin{eqnarray*} V(X) &=& E(X^2) - [E(X)]^2\ &=& e^{2\mu + 2\sigma^2} - (e^{\mu + \frac{\sigma^2}{2}})^2\ &=& e^{2\mu + 2\sigma^2} - e^{2\mu + \sigma^2}\ &=& e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) \end{eqnarray*} $$
Standard Deviation
$$\sigma_X = \sqrt{V(X)} = e^{\mu + \frac{\sigma^2}{2}}\sqrt{e^{\sigma^2} - 1}$$
Coefficient of Variation
The coefficient of variation is:
$$CV = \frac{\sigma_X}{E(X)} = \sqrt{e^{\sigma^2} - 1}$$
Note that the CV depends only on the scale parameter $\sigma$, not on $\mu$.
Median of Log-Normal Distribution
The median of the log-normal distribution is:
$$\text{Median} = e^{\mu}$$
This is because $P(\ln X \leq \mu) = 0.5$, which means $P(X \leq e^{\mu}) = 0.5$.
Quartiles of Log-Normal Distribution
The quartiles of the log-normal distribution are:
- First Quartile: $Q_1 = e^{\mu - 0.675\sigma}$
- Second Quartile (Median): $Q_2 = e^{\mu}$
- Third Quartile: $Q_3 = e^{\mu + 0.675\sigma}$
where $0.675$ corresponds to the standard normal quantile $z_{0.25}$.
Proof
For the $i^{th}$ quartile:
$$P(X \leq Q_i) = \frac{i}{4}$$
$$P\left(\frac{\ln X - \mu}{\sigma} \leq \frac{\ln Q_i - \mu}{\sigma}\right) = \frac{i}{4}$$
$$P\left(Z \leq \frac{\ln Q_i - \mu}{\sigma}\right) = \frac{i}{4}$$
where $Z \sim N(0,1)$.
Therefore: $$\frac{\ln Q_i - \mu}{\sigma} = z_{i/4}$$
$$Q_i = e^{\mu + \sigma z_{i/4}}$$
Mode of Log-Normal Distribution
The mode of the log-normal distribution is:
$$\text{Mode} = e^{\mu - \sigma^2}$$
Proof
Taking the derivative of the PDF and setting it to zero:
$$f’(x) = 0 \Rightarrow \frac{d}{dx}\left[\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{1}{2\sigma^2}(\ln x - \mu)^2}\right] = 0$$
This simplifies to:
$$-\frac{1}{x^2}\left[\frac{\ln x - \mu}{\sigma^2} + 1\right] = 0$$
$$\ln x = \mu - \sigma^2$$
$$x = e^{\mu - \sigma^2}$$
Raw Moments
The $r^{th}$ raw moment of the log-normal distribution is:
$$\mu_r^\prime = E(X^r) = e^{\mu r + \frac{1}{2}r^2\sigma^2}$$
This gives us:
- $\mu_1^\prime = E(X) = e^{\mu + \frac{\sigma^2}{2}}$
- $\mu_2^\prime = E(X^2) = e^{2\mu + 2\sigma^2}$
- $\mu_3^\prime = E(X^3) = e^{3\mu + \frac{9\sigma^2}{2}}$
Moment Generating Function
The moment generating function of the log-normal distribution does not exist in closed form. However, since $Y = \ln X \sim N(\mu, \sigma^2)$, we can use the MGF of the normal distribution to derive moments.
Characteristic Function
The characteristic function is:
$$\phi_X(t) = E(e^{itX}) = \sum_{n=0}^{\infty} \frac{(it)^n}{n!} e^{\mu n + \frac{\sigma^2 n^2}{2}}$$
Skewness and Kurtosis
Skewness
The coefficient of skewness is:
$$\beta_1 = (e^{\sigma^2} + 2)\sqrt{e^{\sigma^2} - 1}$$
The log-normal distribution is always right-skewed (positively skewed). Skewness increases with $\sigma$.
Kurtosis
The coefficient of kurtosis is:
$$\beta_2 = e^{4\sigma^2} + 2e^{3\sigma^2} + 3e^{2\sigma^2} - 3$$
The log-normal distribution is always leptokurtic (heavy-tailed).
Properties Summary Table
| Property | Formula |
|---|---|
| $f(x) = \frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\ln x-\mu)^2}{2\sigma^2}}$ | |
| CDF | $F(x) = \Phi\left(\frac{\ln x - \mu}{\sigma}\right)$ |
| Support | $(0, \infty)$ |
| Mean | $e^{\mu + \sigma^2/2}$ |
| Median | $e^{\mu}$ |
| Mode | $e^{\mu - \sigma^2}$ |
| Variance | $e^{2\mu + \sigma^2}(e^{\sigma^2} - 1)$ |
| Std. Deviation | $e^{\mu + \sigma^2/2}\sqrt{e^{\sigma^2} - 1}$ |
| Coeff. of Variation | $\sqrt{e^{\sigma^2} - 1}$ |
| Skewness | $(e^{\sigma^2} + 2)\sqrt{e^{\sigma^2} - 1}$ |
| $Q_1$ | $e^{\mu - 0.675\sigma}$ |
| $Q_3$ | $e^{\mu + 0.675\sigma}$ |
Relationship to Normal Distribution
The log-normal distribution has a fundamental relationship with the normal distribution:
- If $Y \sim N(\mu, \sigma^2)$, then $X = e^Y \sim LN(\mu, \sigma^2)$
- If $X \sim LN(\mu, \sigma^2)$, then $Y = \ln X \sim N(\mu, \sigma^2)$
This relationship allows us to compute probabilities for log-normal random variables using standard normal tables.
Examples with Solutions
Example 1: Electronic Component Lifetime
Problem: The lifetime (in days) of a certain electronic component that operates in a high-temperature environment is log-normally distributed with $\mu = 1.2$ and $\sigma = 0.5$.
a. Find the mean and variance of the component lifetime b. Find the probability that the component works till 4 days c. Find the probability that the component works more than 5 days d. Find the probability that the component works between 3 and 5 days
Solution:
Given: $X \sim LN(\mu = 1.2, \sigma = 0.5)$
This means $Y = \ln X \sim N(1.2, 0.25)$
Part (a): Mean and Variance
$$E(X) = e^{\mu + \sigma^2/2} = e^{1.2 + 0.5^2/2} = e^{1.2 + 0.125} = e^{1.325} \approx 3.762 \text{ days}$$
$$V(X) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) = e^{2(1.2) + 0.5^2}(e^{0.5^2} - 1)$$ $$= e^{2.65}(e^{0.25} - 1) = 14.154 \times 0.284 \approx 4.020$$
$$\sigma_X = \sqrt{4.020} \approx 2.005 \text{ days}$$
Part (b): $P(X < 4)$
Convert to standard normal:
$$Z = \frac{\ln X - \mu}{\sigma} = \frac{\ln 4 - 1.2}{0.5} = \frac{1.386 - 1.2}{0.5} = \frac{0.186}{0.5} = 0.372$$
$$P(X < 4) = P(Z < 0.372) \approx 0.6443$$
The probability that the component works till 4 days is approximately 64.43%.
Part (c): $P(X > 5)$
$$Z = \frac{\ln 5 - 1.2}{0.5} = \frac{1.609 - 1.2}{0.5} = \frac{0.409}{0.5} = 0.818$$
$$P(X > 5) = 1 - P(X < 5) = 1 - P(Z < 0.818)$$ $$\approx 1 - 0.7939 = 0.2061$$
The probability that the component works more than 5 days is approximately 20.61%.
Part (d): $P(3 < X < 5)$
For $X = 3$: $$Z_1 = \frac{\ln 3 - 1.2}{0.5} = \frac{1.099 - 1.2}{0.5} = \frac{-0.101}{0.5} = -0.202$$
For $X = 5$: $$Z_2 = 0.818$$ (from part c)
$$P(3 < X < 5) = P(Z_1 < Z < Z_2)$$ $$= P(Z < 0.818) - P(Z < -0.202)$$ $$\approx 0.7939 - 0.4207 = 0.3732$$
The probability that the component works between 3 and 5 days is approximately 37.32%.
Example 2: Income Distribution
Problem: Income (in thousands of dollars) in a certain population follows a log-normal distribution with parameters $\mu = 3.5$ and $\sigma = 0.8$.
a. Find the median income b. Find the mean income c. What percentage of the population earns less than $30,000? d. What is the income level below which 75% of the population earns?
Solution:
Given: $X \sim LN(\mu = 3.5, \sigma = 0.8)$ (in thousands)
Part (a): Median Income
$$\text{Median} = e^{\mu} = e^{3.5} \approx 33.115 \text{ thousand dollars = } $33,115$$
Part (b): Mean Income
$$E(X) = e^{\mu + \sigma^2/2} = e^{3.5 + 0.8^2/2} = e^{3.5 + 0.32} = e^{3.82}$$ $$\approx 45.864 \text{ thousand dollars = } $45,864$$
Part (c): $P(X < 30)$
$$Z = \frac{\ln 30 - 3.5}{0.8} = \frac{3.401 - 3.5}{0.8} = \frac{-0.099}{0.8} = -0.124$$
$$P(X < 30) = P(Z < -0.124) \approx 0.4507$$
Approximately 45.07% of the population earns less than $30,000.
Part (d): 75th Percentile (Third Quartile)
$$Q_3 = e^{\mu + 0.675\sigma} = e^{3.5 + 0.675(0.8)} = e^{3.5 + 0.54} = e^{4.04}$$ $$\approx 56.922 \text{ thousand dollars = } $56,922$$
75% of the population earns less than approximately $56,922.
Example 3: Particle Size Distribution
Problem: Particle sizes (in micrometers) in an industrial process follow a log-normal distribution with mean $\mu_X = 50$ and variance $\sigma_X^2 = 400$.
Find the parameters $\mu$ and $\sigma$ of the log-normal distribution.
Solution:
Given: $E(X) = 50$ and $V(X) = 400$
From the mean formula: $$E(X) = e^{\mu + \sigma^2/2} = 50$$
From the variance formula: $$V(X) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) = 400$$
Let $\alpha = e^{\sigma^2}$. Then: $$E(X)^2 = e^{2\mu + \sigma^2} = 2500$$
From the variance formula: $$e^{2\mu + \sigma^2}(\alpha - 1) = 400$$ $$2500(\alpha - 1) = 400$$ $$\alpha - 1 = 0.16$$ $$\alpha = 1.16$$ $$e^{\sigma^2} = 1.16$$ $$\sigma^2 = \ln(1.16) = 0.148$$ $$\sigma = \sqrt{0.148} \approx 0.385$$
From the mean formula: $$e^{\mu + \sigma^2/2} = 50$$ $$\mu + 0.148/2 = \ln 50$$ $$\mu = 3.912 - 0.074 = 3.838$$
Therefore: $\mu \approx 3.838$ and $\sigma \approx 0.385$
When to Use Log-Normal Distribution
The log-normal distribution is appropriate when:
- Right-Skewed Data: Data with positive skew and no negative values
- Multiplicative Processes: Products of independent factors
- Income/Wealth: Income distributions and wealth concentrations
- Particle Sizes: In materials science and geology
- Survival Times: In medical and biological studies
- Financial Returns: Asset prices and returns (with caveats)
- Environmental Data: Pollution concentrations, species abundance
- Droplet/Bubble Sizes: In fluid mechanics and multiphase flow
Applications
Finance
- Stock prices and returns (under certain conditions)
- Asset valuations
- Option pricing
Environmental Science
- Pollutant concentrations
- Species abundance
- Soil properties
Engineering and Manufacturing
- Particle size distributions
- Material properties
- Component lifetimes
Medicine and Biology
- Survival times
- Tumor sizes
- Bacterial cell counts
- Disease incubation periods
Economics
- Income distributions
- Household expenditures
- Firm size distributions
Advantages and Disadvantages
Advantages
- Right Skewness: Naturally models right-skewed positive data
- Multiplicative Property: Appropriate for processes involving multiplication
- Normal Equivalence: Uses well-known normal distribution theory
- Flexibility: Two parameters provide good fit to varied data
- No Closed Form CDF Issues: Uses normal CDF
Disadvantages
- No Closed-Form MGF: Complex for some theoretical work
- Interpretation: Parameters less intuitive than mean/variance
- Parameter Estimation: Requires specific methods
- Outlier Sensitivity: Sensitive to extreme values
Connection to Other Distributions
- Normal Distribution: Fundamental relationship: $\ln X \sim N(\mu, \sigma^2)$
- Exponential: Exponential of exponential random variable
- Weibull: Both model positive values with flexibility
- Gamma: Both used for positive-valued data
- Exponential Family: Not strictly an exponential family distribution
Conclusion
The log-normal distribution is an essential tool for statisticians and researchers working with positive-valued, right-skewed data. Its fundamental relationship to the normal distribution makes it theoretically tractable while its flexibility makes it practically valuable. Applications range from income distributions to environmental science to financial modeling. The ability to transform problems involving log-normal variables into normal distribution problems;through the logarithmic transformation;provides computational convenience and theoretical insight.
Understanding when and how to use the log-normal distribution is crucial for accurate data analysis in fields as diverse as economics, environmental science, engineering, and medicine.