Empirical Rule for ungrouped data
Empirical rule is the general rule of thumb that applies to bell shaped (symmetrical) distribution. The empirical rule can be stated as :
- $68$% of the data will fall within one standard deviation of the mean,
- $95$% of the data will fall within two standard deviations of the mean,
- $99.7$% of the data will fall within three standard deviations of the mean.
Formula
Let $x_1,x_2,\cdots, x_n$ be $n$ sample observations. If the distribution of $x$ is approximately symmetrical, then
$68$% of the data falls in $\overline{x}\pm 1 s_x$
$95$% of the data falls in $\overline{x}\pm 2 s_x$
$99.7$% of the data falls in $\overline{x}\pm 3 s_x$
where,
$\overline{x}=\dfrac{1}{n}\sum_{i=1}^{n}x_i$
is the sample mean,$s_x =\sqrt{\dfrac{1}{n-1}\bigg(\sum_{i=1}^{n}x_i^2-\dfrac{\big(\sum_{i=1}^n x_i\big)^2}{n}\bigg)}$
is the sample standard deviation.
Example
The following data gives the hourly wage rates (in dollars) of 10 employees of a company.
20,21,24,25,18,22,24,22,20,22.
Check empirical rule for the given data.
Solution
$x_i$ | $x_i^2$ | |
---|---|---|
20 | 400 | |
21 | 441 | |
24 | 576 | |
25 | 625 | |
18 | 324 | |
22 | 484 | |
24 | 576 | |
22 | 484 | |
20 | 400 | |
22 | 484 | |
Total | 218 | 4794 |
Sample mean
The sample mean of $X$ is
$$ \begin{aligned} \overline{x} &=\frac{1}{n}\sum_{i=1}^n x_i\\ &=\frac{218}{10}\\ &=21.8\text{ dollars} \end{aligned} $$
The average of hourly wage rate is $21.8$ dollars.
Sample variance
Sample variance of $X$ is
$$ \begin{aligned} s_x^2 &=\dfrac{1}{n-1}\bigg(\sum_{i=1}^{n}x_i^2-\frac{\big(\sum_{i=1}^n x_i\big)^2}{n}\bigg)\\ &=\dfrac{1}{9}\bigg(4794-\frac{(218)^2}{10}\bigg)\\ &=\dfrac{1}{9}\big(4794-\frac{47524}{10}\big)\\ &=\dfrac{1}{9}\big(4794-4752.4\big)\\ &= \frac{41.6}{9}\\ &=4.6222 \end{aligned} $$
Sample standard deviation
The standard deviation is the positive square root of the variance.
The sample standard deviation is
$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{4.6222}\\ &=2.1499 \text{ dollars} \end{aligned} $$
Thus the standard deviation of hourly wage rate is $2.1499$ dollars.
Empirical Rule
$68$% of the data falls in $\overline{x}\pm 1 s_x$.
i.e., ($21.8\pm 1*2.1499$
) contains $68$% of the data values.
$\Rightarrow$ ($19.6501, 23.9499$
) contains $68$% of the data values.
$95$% of the data falls in $\overline{x}\pm 2 s_x$.
i.e., ($21.8\pm 2*2.1499$
) contains $95$% of the data values.
$\Rightarrow$ ($17.5002, 26.0998$
) contains $95$% of the data values.
$99.7$% of the data falls in $\overline{x}\pm 3 s_x$.
i.e., ($21.8\pm 3*2.1499$
) contains $99.7$% of the data values.
$\Rightarrow$ ($15.3503, 28.2497$
) contains $99.7$% of the data values.