Empirical rule for grouped data
Empirical rule is the general rule of thumb that applies to bell shaped (symmetrical) distribution. The empirical rule can be stated as :
- $68$% of the data will fall within one standard deviation of the mean,
- $95$% of the data will fall within two standard deviations of the mean,
- $99.7$% of the data will fall within three standard deviations of the mean.
Let $(x_i,f_i), i=1,2, \cdots , n$
be given frequency distribution. If the distribution of $x$ is approximately symmetrical, then
$68$% of the data falls in $\overline{x}\pm 1 s_x$
$95$% of the data falls in $\overline{x}\pm 2 s_x$
$99.7$% of the data falls in $\overline{x}\pm 3 s_x$
where,
$\overline{x}=\dfrac{1}{N}\sum_{i=1}^{n}f_ix_i$
is the sample mean,$s_x =\sqrt{\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\dfrac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)}$
is the sample standard deviation.
Example 1
A librarian keeps the records about the amount of time spent (in minutes) in a library by college students. Data is as follows:
Time spent | 30 | 32 | 35 | 38 | 40 |
---|---|---|---|---|---|
No. of students | 8 | 12 | 20 | 10 | 5 |
Check empirical rule for the above frequency distribution.
Solution
$x_i$ | $f_i$ | $f_i*x_i$ | $f_ix_i^2$ | |
---|---|---|---|---|
30 | 8 | 240 | 7200 | |
32 | 12 | 384 | 12288 | |
35 | 20 | 700 | 24500 | |
38 | 10 | 380 | 14440 | |
40 | 5 | 200 | 8000 | |
Total | 55 | 1904 | 66428 |
Sample mean
The sample mean of $X$ is
$$ \begin{aligned} \overline{x} &=\frac{1}{n}\sum_{i=1}^n f_ix_i\\ &=\frac{1904}{55}\\ &=34.6182\text{ minutes} \end{aligned} $$
The average time spent in library is $34.6182$ minutes.
Sample variance
Sample variance of $X$ is
$$ \begin{aligned} s_x^2 &=\dfrac{1}{n-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{n}\bigg)\\ &=\dfrac{1}{54}\bigg(66428-\frac{(1904)^2}{55}\bigg)\\ &=\dfrac{1}{54}\big(66428-\frac{3625216}{55}\big)\\ &=\dfrac{1}{54}\big(66428-65913.01818\big)\\ &= \frac{514.98182}{54}\\ &=9.5367 \end{aligned} $$
Sample standard deviation
The standard deviation is the positive square root of the variance.
The sample standard deviation is
$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{17}\\ &=3.0882 \text{ minutes} \end{aligned} $$
Thus the standard deviation of time spent in library is $3.0882$ minutes.
Empirical Rule
$68$% of the students spent time in the library between
$$ \begin{aligned} & \overline{x}- 1 s_x \text{ and } \overline{x}+ 1 s_x \text{ minutes}\\ \Rightarrow & 34.6182 - 1* 3.0882 \text{ and } 34.6182 + 1* 3.0882 \text{ minutes}\\ \Rightarrow & 31.53 \text{ and } 37.7064 \text{ minutes}\\ \end{aligned} $$
$95$% of the students spent time in the library between
$$ \begin{aligned} & \overline{x}- 2 s_x \text{ and } \overline{x}+ 2 s_x \text{ minutes}\\ \Rightarrow & 34.6182 - 2* 3.0882 \text{ and } 34.6182 + 2* 3.0882 \text{ minutes}\\ \Rightarrow & 28.4418 \text{ and } 40.7946 \text{ minutes}\\ \end{aligned} $$
$99.7$% of the students spent time in the library between
$$ \begin{aligned} & \overline{x}- 3 s_x \text{ and } \overline{x}+ 3 s_x \text{ minutes}\\ \Rightarrow & 34.6182 - 3* 3.0882 \text{ and } 34.6182 + 3* 3.0882 \text{ minutes}\\ \Rightarrow & 25.3536 \text{ and } 43.8828 \text{ minutes}\\ \end{aligned} $$
Example 2
The following table gives the amount of time (in minutes) spent on the internet each evening by a group of 56 students.
Time spent on Internet ($x$) | 10-12 | 13-15 | 16-18 | 19-21 | 22-24 |
---|---|---|---|---|---|
No. of students ($f$) | 3 | 12 | 15 | 24 | 2 |
Chech empirical rule for the above frequency distribution.
Solution
Let $X$ denote the time spent on the internet.
Here the classes are inclusive. To make them exclusive type subtract 0.5 from the lower limit and add 0.5 to the upper limit of each class.
Class Interval | Class Boundries | mid-value ($x_i$) | Freq ($f_i$) | $f_i*x_i$ | $f_ix_i^2$ | |
---|---|---|---|---|---|---|
10-12 | 9.5-12.5 | 11 | 3 | 33 | 363 | |
13-15 | 12.5-15.5 | 14 | 12 | 168 | 2352 | |
16-18 | 15.5-18.5 | 17 | 15 | 255 | 4335 | |
19-21 | 18.5-21.5 | 20 | 24 | 480 | 9600 | |
22-24 | 21.5-24.5 | 23 | 2 | 46 | 1058 | |
Total | 56 | 982 | 17708 |
Sample mean
The sample mean of $X$ is
$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{982}{56}\\ &=17.5357\text{ minutes} \end{aligned} $$
The average time spent on the internet is $17.5357$ minutes.
Sample variance
Sample variance of $X$ is
$$ \begin{aligned} s_x^2 &=\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)\\ &=\dfrac{1}{55}\bigg(17708-\frac{(982)^2}{56}\bigg)\\ &=\dfrac{1}{55}\big(17708-\frac{964324}{56}\big)\\ &=\dfrac{1}{55}\big(17708-17220.07143\big)\\ &= \frac{487.92857}{55}\\ &=8.8714 \end{aligned} $$
Sample standard deviation
The standard deviation is the positive square root of the variance.
The sample standard deviation is
$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{22.5}\\ &=2.9785 \text{ minutes} \end{aligned} $$
Thus the standard deviation of time spent on the internet is $2.9785$ minutes.
Empirical Rule
$68$% of the students spent time on the internet between
$$ \begin{aligned} & \overline{x}- 1 s_x \text{ and } \overline{x}+ 1 s_x \text{ minutes}\\ \Rightarrow & 17.5357 - 1* 2.9785 \text{ and } 17.5357 + 1* 2.9785 \text{ minutes}\\ \Rightarrow & 14.5572 \text{ and } 20.5142 \text{ minutes}\\ \end{aligned} $$
$95$% of the students spent time on the internet between
$$ \begin{aligned} & \overline{x}- 2 s_x \text{ and } \overline{x}+ 2 s_x \text{ minutes}\\ \Rightarrow & 17.5357 - 2* 2.9785 \text{ and } 17.5357 + 2* 2.9785 \text{ minutes}\\ \Rightarrow & 11.5787 \text{ and } 23.4927 \text{ minutes}\\ \end{aligned} $$
$99.7$% of the students spent time on the internet between
$$ \begin{aligned} & \overline{x}- 3 s_x \text{ and } \overline{x}+ 3 s_x \text{ minutes}\\ \Rightarrow & 17.5357 - 3* 2.9785 \text{ and } 17.5357 + 3* 2.9785 \text{ minutes}\\ \Rightarrow & 8.6002 \text{ and } 26.4712 \text{ minutes}\\ \end{aligned} $$