## Five number summary for grouped data

A five number summary is a quick and easy way to determine the the center, the spread and outliers (if any) of a data set.

Five number summary includes five values, namely,

- minimum value ($\min$),
- first quartile ($Q_1$),
- $\text{median }$ ($Q_2$),
- third quartile ($Q_3$),
- maximum value ($\max$).

## Formula

$\min$= lower limit of the first class,
$\max$= upper limit of the last class,
`$Q_i=l + \bigg(\dfrac{\dfrac{iN}{4} - F_<}{f}\bigg)\times h$`

; `$i=1,2,\cdots,3$`

where

- $l$ is the lower limit of the $i^{th}$ quartile class
- $N=\sum f$ total number of observations
- $f$ frequency of the $i^{th}$ quartile class
- $F_<$ cumulative frequency of the class previous to $i^{th}$ quartile class
- $h$ is the class width

## Example 1

A class teacher has the following data about the number of absences of 35 students of a class. Compute five number summary for the following frequency distribution.

No.of days ($x$) | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|

No. of Students ($f$) | 1 | 15 | 10 | 5 | 4 |

### Solution

$x_i$ | $f_i$ | $cf$ | |
---|---|---|---|

2 | 1 | 1 | |

3 | 15 | 16 | |

4 | 10 | 26 | |

5 | 5 | 31 | |

6 | 4 | 35 | |

Total | 35 |

**Minumum Value**

The minimum number of absent days `$\min = 2$`

.

**Maximum Value**

The maximum number of absent days `$\max = 6$`

.

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

**First Quartile $Q_1$**

```
$$
\begin{aligned}
Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{1(35)}{4}\bigg)^{th}\text{ value}\\
&=\big(8.75\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $8.75$ is $16$. The corresponding value of $X$ is the $1^{st}$ quartile. That is, $Q_1 =3$ days.

Thus, $25$ % of the students had absences less than or equal to $3$ days.

**Median $M$**

```
$$
\begin{aligned}
M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{35}{2}\bigg)^{th}\text{ value}\\
&=\big(17.5\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $8.75$ is $26$. The corresponding value of $X$ is the median. That is, $M =4$ days.

Thus, $50$ % of the students had absences less than or equal to $4$ days.

**Third Quartile $Q_3$**

```
$$
\begin{aligned}
Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{3(35)}{4}\bigg)^{th}\text{ value}\\
&=\big(26.25\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $26.25$ is $31$. The corresponding value of $X$ is the $3^{rd}$ quartile. That is, $Q_3 =5$ days.

Thus, $75$ % of the students had absences less than or equal to $5$ days.

Thus the five number summary of given data set is

$\min = 2$ days, $Q_1 = 3$ days, $\text{median }=4$ days, $Q_3=5$ days and $\max = 6$ days.

## Example 2

The following table gives the amount of time (in minutes) spent on the internet each evening by a group of 56 students. Compute five number summary for the following frequency distribution.

Time spent on Internet ($x$) | 10-12 | 13-15 | 16-18 | 19-21 | 22-24 |
---|---|---|---|---|---|

No. of students ($f$) | 3 | 12 | 15 | 24 | 2 |

### Solution

The classes are inclusive. To make them exclusive type subtract 0.5 from the lower limit and add 0.5 to the upper limit of each class.

Class Interval | Class Boundries | $f_i$ | $cf$ | |
---|---|---|---|---|

10-12 | 9.5-12.5 | 3 | 3 | |

13-15 | 12.5-15.5 | 12 | 15 | |

16-18 | 15.5-18.5 | 15 | 30 | |

19-21 | 18.5-21.5 | 24 | 54 | |

22-24 | 21.5-24.5 | 2 | 56 | |

Total | 56 |

**Minumum Value**

The minimum time spent on the internet`$\min = 9.5 \text{ minutes}$`

.

**Maximum Value**

The maximum time spent on the internet`$\max = 24.5 \text{ minutes}$`

.

**Quartiles**

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

**First Quartile $Q_1$**

```
$$
\begin{aligned}
Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{1(56)}{4}\bigg)^{th}\text{ value}\\
&=\big(14\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $14$ is $15$. The corresponding class $12.5-15.5$ is the $1^{st}$ quartile class.

Thus

- $l = 12.5$, the lower limit of the $1^{st}$ quartile class
- $N=56$, total number of observations
- $f =12$, frequency of the $1^{st}$ quartile class
- $F_< = 3$, cumulative frequency of the class previous to $1^{st}$ quartile class
- $h =3$, the class width

The first quartile $Q_1$ can be computed as follows:

```
$$
\begin{aligned}
Q_1 &= l + \bigg(\frac{\frac{1(N)}{4} - F_<}{f}\bigg)\times h\\
&= 12.5 + \bigg(\frac{\frac{1*56}{4} - 3}{12}\bigg)\times 3\\
&= 12.5 + \bigg(\frac{14 - 3}{12}\bigg)\times 3\\
&= 12.5 + \big(0.9167\big)\times 3\\
&= 12.5 + 2.75\\
&= 15.25 \text{ minutes}
\end{aligned}
$$
```

Thus, $25$ % of the students spent less than or equal to $15.25$ minutes on the internet.

**Median**

```
$$
\begin{aligned}
M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{56}{2}\bigg)^{th}\text{ value}\\
&=\big(28\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $28$ is $30$. The corresponding class $15.5-18.5$ is the median class.

Thus

- $l = 15.5$, the lower limit of the median class
- $N=56$, total number of observations
- $f =15$, frequency of the median class
- $F_< = 15$, cumulative frequency of the class previous to median class
- $h =3$, the class width

The median $M$ can be computed as follows:

```
$$
\begin{aligned}
M &= l + \bigg(\frac{\frac{N}{2} - F_<}{f}\bigg)\times h\\
&= 15.5 + \bigg(\frac{\frac{56}{2} - 15}{15}\bigg)\times 3\\
&= 15.5 + \bigg(\frac{28 - 15}{15}\bigg)\times 3\\
&= 15.5 + \big(0.8667\big)\times 3\\
&= 15.5 + 2.6\\
&= 18.1 \text{ minutes}
\end{aligned}
$$
```

Thus, $50$ % of the students spent less than or equal to $18.1$ minutes on the internet.

**Third Quartile $Q_3$**

```
$$
\begin{aligned}
Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{3(56)}{4}\bigg)^{th}\text{ value}\\
&=\big(42\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $42$ is $54$. The corresponding class $18.5-21.5$ is the $3^{rd}$ quartile class.

Thus

- $l = 18.5$, the lower limit of the $3^{rd}$ quartile class
- $N=56$, total number of observations
- $f =24$, frequency of the $3^{rd}$ quartile class
- $F_< = 30$, cumulative frequency of the class previous to $3^{rd}$ quartile class
- $h =3$, the class width

The third quartile $Q_3$ can be computed as follows:

```
$$
\begin{aligned}
Q_3 &= l + \bigg(\frac{\frac{3(N)}{4} - F_<}{f}\bigg)\times h\\
&= 18.5 + \bigg(\frac{\frac{3*56}{4} - 30}{24}\bigg)\times 3\\
&= 18.5 + \bigg(\frac{42 - 30}{24}\bigg)\times 3\\
&= 18.5 + \big(0.5\big)\times 3\\
&= 18.5 + 1.5\\
&= 20 \text{ minutes}
\end{aligned}
$$
```

Thus, $75$ % of the students spent less than or equal to $20$ minutes on the internet.

Thus the five number summary of time spent on the internet is

$\min = 9.5$ minutes, $Q_1 = 15.25$ minutes, $\text{median }=18.1$ minutes, $Q_3=20$ minutes and $\max = 24.5$ minutes.