Empirical rule for grouped data

Empirical rule is the general rule of thumb that applies to bell shaped (symmetrical) distribution. The empirical rule can be stated as :

  • $68$% of the data will fall within one standard deviation of the mean,
  • $95$% of the data will fall within two standard deviations of the mean,
  • $99.7$% of the data will fall within three standard deviations of the mean.

Let $(x_i,f_i), i=1,2, \cdots , n$ be given frequency distribution. If the distribution of $x$ is approximately symmetrical, then

$68$% of the data falls in $\overline{x}\pm 1 s_x$

$95$% of the data falls in $\overline{x}\pm 2 s_x$

$99.7$% of the data falls in $\overline{x}\pm 3 s_x$

where,

  • $\overline{x}=\dfrac{1}{N}\sum_{i=1}^{n}f_ix_i$ is the sample mean,
  • $s_x =\sqrt{\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\dfrac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)}$ is the sample standard deviation.

Example 1

A librarian keeps the records about the amount of time spent (in minutes) in a library by college students. Data is as follows:

Time spent 30 32 35 38 40
No. of students 8 12 20 10 5

Check empirical rule for the above frequency distribution.

Solution

$x_i$ $f_i$ $f_i*x_i$ $f_ix_i^2$
30 8 240 7200
32 12 384 12288
35 20 700 24500
38 10 380 14440
40 5 200 8000
Total 55 1904 66428

Sample mean

The sample mean of $X$ is

$$ \begin{aligned} \overline{x} &=\frac{1}{n}\sum_{i=1}^n f_ix_i\\ &=\frac{1904}{55}\\ &=34.6182\text{ minutes} \end{aligned} $$

The average time spent in library is $34.6182$ minutes.

Sample variance

Sample variance of $X$ is

$$ \begin{aligned} s_x^2 &=\dfrac{1}{n-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{n}\bigg)\\ &=\dfrac{1}{54}\bigg(66428-\frac{(1904)^2}{55}\bigg)\\ &=\dfrac{1}{54}\big(66428-\frac{3625216}{55}\big)\\ &=\dfrac{1}{54}\big(66428-65913.01818\big)\\ &= \frac{514.98182}{54}\\ &=9.5367 \end{aligned} $$

Sample standard deviation

The standard deviation is the positive square root of the variance.

The sample standard deviation is

$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{17}\\ &=3.0882 \text{ minutes} \end{aligned} $$

Thus the standard deviation of time spent in library is $3.0882$ minutes.

Empirical Rule

$68$% of the students spent time in the library between

$$ \begin{aligned} & \overline{x}- 1 s_x \text{ and } \overline{x}+ 1 s_x \text{ minutes}\\ \Rightarrow & 34.6182 - 1* 3.0882 \text{ and } 34.6182 + 1* 3.0882 \text{ minutes}\\ \Rightarrow & 31.53 \text{ and } 37.7064 \text{ minutes}\\ \end{aligned} $$

$95$% of the students spent time in the library between

$$ \begin{aligned} & \overline{x}- 2 s_x \text{ and } \overline{x}+ 2 s_x \text{ minutes}\\ \Rightarrow & 34.6182 - 2* 3.0882 \text{ and } 34.6182 + 2* 3.0882 \text{ minutes}\\ \Rightarrow & 28.4418 \text{ and } 40.7946 \text{ minutes}\\ \end{aligned} $$

$99.7$% of the students spent time in the library between

$$ \begin{aligned} & \overline{x}- 3 s_x \text{ and } \overline{x}+ 3 s_x \text{ minutes}\\ \Rightarrow & 34.6182 - 3* 3.0882 \text{ and } 34.6182 + 3* 3.0882 \text{ minutes}\\ \Rightarrow & 25.3536 \text{ and } 43.8828 \text{ minutes}\\ \end{aligned} $$

Example 2

The following table gives the amount of time (in minutes) spent on the internet each evening by a group of 56 students.

Time spent on Internet ($x$) 10-12 13-15 16-18 19-21 22-24
No. of students ($f$) 3 12 15 24 2

Chech empirical rule for the above frequency distribution.

Solution

Let $X$ denote the time spent on the internet.

Here the classes are inclusive. To make them exclusive type subtract 0.5 from the lower limit and add 0.5 to the upper limit of each class.

Class Interval Class Boundries mid-value ($x_i$) Freq ($f_i$) $f_i*x_i$ $f_ix_i^2$
10-12 9.5-12.5 11 3 33 363
13-15 12.5-15.5 14 12 168 2352
16-18 15.5-18.5 17 15 255 4335
19-21 18.5-21.5 20 24 480 9600
22-24 21.5-24.5 23 2 46 1058
Total 56 982 17708

Sample mean

The sample mean of $X$ is

$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{982}{56}\\ &=17.5357\text{ minutes} \end{aligned} $$

The average time spent on the internet is $17.5357$ minutes.

Sample variance

Sample variance of $X$ is

$$ \begin{aligned} s_x^2 &=\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)\\ &=\dfrac{1}{55}\bigg(17708-\frac{(982)^2}{56}\bigg)\\ &=\dfrac{1}{55}\big(17708-\frac{964324}{56}\big)\\ &=\dfrac{1}{55}\big(17708-17220.07143\big)\\ &= \frac{487.92857}{55}\\ &=8.8714 \end{aligned} $$

Sample standard deviation

The standard deviation is the positive square root of the variance.

The sample standard deviation is

$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{22.5}\\ &=2.9785 \text{ minutes} \end{aligned} $$

Thus the standard deviation of time spent on the internet is $2.9785$ minutes.

Empirical Rule

$68$% of the students spent time on the internet between

$$ \begin{aligned} & \overline{x}- 1 s_x \text{ and } \overline{x}+ 1 s_x \text{ minutes}\\ \Rightarrow & 17.5357 - 1* 2.9785 \text{ and } 17.5357 + 1* 2.9785 \text{ minutes}\\ \Rightarrow & 14.5572 \text{ and } 20.5142 \text{ minutes}\\ \end{aligned} $$

$95$% of the students spent time on the internet between

$$ \begin{aligned} & \overline{x}- 2 s_x \text{ and } \overline{x}+ 2 s_x \text{ minutes}\\ \Rightarrow & 17.5357 - 2* 2.9785 \text{ and } 17.5357 + 2* 2.9785 \text{ minutes}\\ \Rightarrow & 11.5787 \text{ and } 23.4927 \text{ minutes}\\ \end{aligned} $$

$99.7$% of the students spent time on the internet between

$$ \begin{aligned} & \overline{x}- 3 s_x \text{ and } \overline{x}+ 3 s_x \text{ minutes}\\ \Rightarrow & 17.5357 - 3* 2.9785 \text{ and } 17.5357 + 3* 2.9785 \text{ minutes}\\ \Rightarrow & 8.6002 \text{ and } 26.4712 \text{ minutes}\\ \end{aligned} $$

Related Resources