Five number summary for grouped data
A five number summary is a quick and easy way to determine the the center, the spread and outliers (if any) of a data set.
Five number summary includes five values, namely,
- minimum value ($\min$),
- first quartile ($Q_1$),
- $\text{median }$ ($Q_2$),
- third quartile ($Q_3$),
- maximum value ($\max$).
Formula
$\min$= lower limit of the first class,
$\max$= upper limit of the last class,
$Q_i=l + \bigg(\dfrac{\dfrac{iN}{4} - F_<}{f}\bigg)\times h$
; $i=1,2,\cdots,3$
where
- $l$ is the lower limit of the $i^{th}$ quartile class
- $N=\sum f$ total number of observations
- $f$ frequency of the $i^{th}$ quartile class
- $F_<$ cumulative frequency of the class previous to $i^{th}$ quartile class
- $h$ is the class width
Example 1
A class teacher has the following data about the number of absences of 35 students of a class. Compute five number summary for the following frequency distribution.
No.of days ($x$) | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
No. of Students ($f$) | 1 | 15 | 10 | 5 | 4 |
Solution
$x_i$ | $f_i$ | $cf$ | |
---|---|---|---|
2 | 1 | 1 | |
3 | 15 | 16 | |
4 | 10 | 26 | |
5 | 5 | 31 | |
6 | 4 | 35 | |
Total | 35 |
Minumum Value
The minimum number of absent days $\min = 2$
.
Maximum Value
The maximum number of absent days $\max = 6$
.
The formula for $i^{th}$ quartile is
$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$
where $N$ is the total number of observations.
First Quartile $Q_1$
$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(35)}{4}\bigg)^{th}\text{ value}\\ &=\big(8.75\big)^{th}\text{ value} \end{aligned} $$
The cumulative frequency just greater than or equal to $8.75$ is $16$. The corresponding value of $X$ is the $1^{st}$ quartile. That is, $Q_1 =3$ days.
Thus, $25$ % of the students had absences less than or equal to $3$ days.
Median $M$
$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{35}{2}\bigg)^{th}\text{ value}\\ &=\big(17.5\big)^{th}\text{ value} \end{aligned} $$
The cumulative frequency just greater than or equal to $8.75$ is $26$. The corresponding value of $X$ is the median. That is, $M =4$ days.
Thus, $50$ % of the students had absences less than or equal to $4$ days.
Third Quartile $Q_3$
$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(35)}{4}\bigg)^{th}\text{ value}\\ &=\big(26.25\big)^{th}\text{ value} \end{aligned} $$
The cumulative frequency just greater than or equal to $26.25$ is $31$. The corresponding value of $X$ is the $3^{rd}$ quartile. That is, $Q_3 =5$ days.
Thus, $75$ % of the students had absences less than or equal to $5$ days.
Thus the five number summary of given data set is
$\min = 2$ days, $Q_1 = 3$ days, $\text{median }=4$ days, $Q_3=5$ days and $\max = 6$ days.
Example 2
The following table gives the amount of time (in minutes) spent on the internet each evening by a group of 56 students. Compute five number summary for the following frequency distribution.
Time spent on Internet ($x$) | 10-12 | 13-15 | 16-18 | 19-21 | 22-24 |
---|---|---|---|---|---|
No. of students ($f$) | 3 | 12 | 15 | 24 | 2 |
Solution
The classes are inclusive. To make them exclusive type subtract 0.5 from the lower limit and add 0.5 to the upper limit of each class.
Class Interval | Class Boundries | $f_i$ | $cf$ | |
---|---|---|---|---|
10-12 | 9.5-12.5 | 3 | 3 | |
13-15 | 12.5-15.5 | 12 | 15 | |
16-18 | 15.5-18.5 | 15 | 30 | |
19-21 | 18.5-21.5 | 24 | 54 | |
22-24 | 21.5-24.5 | 2 | 56 | |
Total | 56 |
Minumum Value
The minimum time spent on the internet$\min = 9.5 \text{ minutes}$
.
Maximum Value
The maximum time spent on the internet$\max = 24.5 \text{ minutes}$
.
Quartiles
The formula for $i^{th}$ quartile is
$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$
where $N$ is the total number of observations.
First Quartile $Q_1$
$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(56)}{4}\bigg)^{th}\text{ value}\\ &=\big(14\big)^{th}\text{ value} \end{aligned} $$
The cumulative frequency just greater than or equal to $14$ is $15$. The corresponding class $12.5-15.5$ is the $1^{st}$ quartile class.
Thus
- $l = 12.5$, the lower limit of the $1^{st}$ quartile class
- $N=56$, total number of observations
- $f =12$, frequency of the $1^{st}$ quartile class
- $F_< = 3$, cumulative frequency of the class previous to $1^{st}$ quartile class
- $h =3$, the class width
The first quartile $Q_1$ can be computed as follows:
$$ \begin{aligned} Q_1 &= l + \bigg(\frac{\frac{1(N)}{4} - F_<}{f}\bigg)\times h\\ &= 12.5 + \bigg(\frac{\frac{1*56}{4} - 3}{12}\bigg)\times 3\\ &= 12.5 + \bigg(\frac{14 - 3}{12}\bigg)\times 3\\ &= 12.5 + \big(0.9167\big)\times 3\\ &= 12.5 + 2.75\\ &= 15.25 \text{ minutes} \end{aligned} $$
Thus, $25$ % of the students spent less than or equal to $15.25$ minutes on the internet.
Median
$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{56}{2}\bigg)^{th}\text{ value}\\ &=\big(28\big)^{th}\text{ value} \end{aligned} $$
The cumulative frequency just greater than or equal to $28$ is $30$. The corresponding class $15.5-18.5$ is the median class.
Thus
- $l = 15.5$, the lower limit of the median class
- $N=56$, total number of observations
- $f =15$, frequency of the median class
- $F_< = 15$, cumulative frequency of the class previous to median class
- $h =3$, the class width
The median $M$ can be computed as follows:
$$ \begin{aligned} M &= l + \bigg(\frac{\frac{N}{2} - F_<}{f}\bigg)\times h\\ &= 15.5 + \bigg(\frac{\frac{56}{2} - 15}{15}\bigg)\times 3\\ &= 15.5 + \bigg(\frac{28 - 15}{15}\bigg)\times 3\\ &= 15.5 + \big(0.8667\big)\times 3\\ &= 15.5 + 2.6\\ &= 18.1 \text{ minutes} \end{aligned} $$
Thus, $50$ % of the students spent less than or equal to $18.1$ minutes on the internet.
Third Quartile $Q_3$
$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(56)}{4}\bigg)^{th}\text{ value}\\ &=\big(42\big)^{th}\text{ value} \end{aligned} $$
The cumulative frequency just greater than or equal to $42$ is $54$. The corresponding class $18.5-21.5$ is the $3^{rd}$ quartile class.
Thus
- $l = 18.5$, the lower limit of the $3^{rd}$ quartile class
- $N=56$, total number of observations
- $f =24$, frequency of the $3^{rd}$ quartile class
- $F_< = 30$, cumulative frequency of the class previous to $3^{rd}$ quartile class
- $h =3$, the class width
The third quartile $Q_3$ can be computed as follows:
$$ \begin{aligned} Q_3 &= l + \bigg(\frac{\frac{3(N)}{4} - F_<}{f}\bigg)\times h\\ &= 18.5 + \bigg(\frac{\frac{3*56}{4} - 30}{24}\bigg)\times 3\\ &= 18.5 + \bigg(\frac{42 - 30}{24}\bigg)\times 3\\ &= 18.5 + \big(0.5\big)\times 3\\ &= 18.5 + 1.5\\ &= 20 \text{ minutes} \end{aligned} $$
Thus, $75$ % of the students spent less than or equal to $20$ minutes on the internet.
Thus the five number summary of time spent on the internet is
$\min = 9.5$ minutes, $Q_1 = 15.25$ minutes, $\text{median }=18.1$ minutes, $Q_3=20$ minutes and $\max = 24.5$ minutes.