## Empirical Rule for ungrouped data

Empirical rule is the general rule of thumb that applies to bell shaped (symmetrical) distribution. The empirical rule can be stated as :

• $68$% of the data will fall within one standard deviation of the mean,
• $95$% of the data will fall within two standard deviations of the mean,
• $99.7$% of the data will fall within three standard deviations of the mean.

## Formula

Let $x_1,x_2,\cdots, x_n$ be $n$ sample observations. If the distribution of $x$ is approximately symmetrical, then

$68$% of the data falls in $\overline{x}\pm 1 s_x$

$95$% of the data falls in $\overline{x}\pm 2 s_x$

$99.7$% of the data falls in $\overline{x}\pm 3 s_x$

where,

• $\overline{x}=\dfrac{1}{n}\sum_{i=1}^{n}x_i$ is the sample mean,
• $s_x =\sqrt{\dfrac{1}{n-1}\bigg(\sum_{i=1}^{n}x_i^2-\dfrac{\big(\sum_{i=1}^n x_i\big)^2}{n}\bigg)}$ is the sample standard deviation.

## Example

The following data gives the hourly wage rates (in dollars) of 10 employees of a company.

20,21,24,25,18,22,24,22,20,22.

Check empirical rule for the given data.

### Solution

$x_i$ $x_i^2$
20 400
21 441
24 576
25 625
18 324
22 484
24 576
22 484
20 400
22 484
Total 218 4794

Sample mean

The sample mean of $X$ is

\begin{aligned} \overline{x} &=\frac{1}{n}\sum_{i=1}^n x_i\\ &=\frac{218}{10}\\ &=21.8\text{ dollars} \end{aligned}

The average of hourly wage rate is $21.8$ dollars.

Sample variance

Sample variance of $X$ is

\begin{aligned} s_x^2 &=\dfrac{1}{n-1}\bigg(\sum_{i=1}^{n}x_i^2-\frac{\big(\sum_{i=1}^n x_i\big)^2}{n}\bigg)\\ &=\dfrac{1}{9}\bigg(4794-\frac{(218)^2}{10}\bigg)\\ &=\dfrac{1}{9}\big(4794-\frac{47524}{10}\big)\\ &=\dfrac{1}{9}\big(4794-4752.4\big)\\ &= \frac{41.6}{9}\\ &=4.6222 \end{aligned}

Sample standard deviation

The standard deviation is the positive square root of the variance.

The sample standard deviation is

\begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{4.6222}\\ &=2.1499 \text{ dollars} \end{aligned}

Thus the standard deviation of hourly wage rate is $2.1499$ dollars.

Empirical Rule

$68$% of the data falls in $\overline{x}\pm 1 s_x$.

i.e., ($21.8\pm 1*2.1499$) contains $68$% of the data values.

$\Rightarrow$ ($19.6501, 23.9499$) contains $68$% of the data values.

$95$% of the data falls in $\overline{x}\pm 2 s_x$.

i.e., ($21.8\pm 2*2.1499$) contains $95$% of the data values.

$\Rightarrow$ ($17.5002, 26.0998$) contains $95$% of the data values.

$99.7$% of the data falls in $\overline{x}\pm 3 s_x$.

i.e., ($21.8\pm 3*2.1499$) contains $99.7$% of the data values.

$\Rightarrow$ ($15.3503, 28.2497$) contains $99.7$% of the data values.