Five number summary for grouped data

A five number summary is a quick and easy way to determine the the center, the spread and outliers (if any) of a data set.

Five number summary includes five values, namely,

  • minimum value ($\min$),
  • first quartile ($Q_1$),
  • $\text{median }$ ($Q_2$),
  • third quartile ($Q_3$),
  • maximum value ($\max$).

Formula

$\min$= lower limit of the first class, $\max$= upper limit of the last class, $Q_i=l + \bigg(\dfrac{\dfrac{iN}{4} - F_<}{f}\bigg)\times h$ ; $i=1,2,\cdots,3$

where

  • $l$ is the lower limit of the $i^{th}$ quartile class
  • $N=\sum f$ total number of observations
  • $f$ frequency of the $i^{th}$ quartile class
  • $F_<$ cumulative frequency of the class previous to $i^{th}$ quartile class
  • $h$ is the class width

Example 1

A class teacher has the following data about the number of absences of 35 students of a class. Compute five number summary for the following frequency distribution.

No.of days ($x$) 2 3 4 5 6
No. of Students ($f$) 1 15 10 5 4

Solution

$x_i$ $f_i$ $cf$
2 1 1
3 15 16
4 10 26
5 5 31
6 4 35
Total 35

Minumum Value

The minimum number of absent days $\min = 2$.

Maximum Value

The maximum number of absent days $\max = 6$.

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

First Quartile $Q_1$

$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(35)}{4}\bigg)^{th}\text{ value}\\ &=\big(8.75\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $8.75$ is $16$. The corresponding value of $X$ is the $1^{st}$ quartile. That is, $Q_1 =3$ days.

Thus, $25$ % of the students had absences less than or equal to $3$ days.

Median $M$

$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{35}{2}\bigg)^{th}\text{ value}\\ &=\big(17.5\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $8.75$ is $26$. The corresponding value of $X$ is the median. That is, $M =4$ days.

Thus, $50$ % of the students had absences less than or equal to $4$ days.

Third Quartile $Q_3$

$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(35)}{4}\bigg)^{th}\text{ value}\\ &=\big(26.25\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $26.25$ is $31$. The corresponding value of $X$ is the $3^{rd}$ quartile. That is, $Q_3 =5$ days.

Thus, $75$ % of the students had absences less than or equal to $5$ days.

Thus the five number summary of given data set is

$\min = 2$ days, $Q_1 = 3$ days, $\text{median }=4$ days, $Q_3=5$ days and $\max = 6$ days.

Example 2

The following table gives the amount of time (in minutes) spent on the internet each evening by a group of 56 students. Compute five number summary for the following frequency distribution.

Time spent on Internet ($x$) 10-12 13-15 16-18 19-21 22-24
No. of students ($f$) 3 12 15 24 2

Solution

The classes are inclusive. To make them exclusive type subtract 0.5 from the lower limit and add 0.5 to the upper limit of each class.

Class Interval Class Boundries $f_i$ $cf$
10-12 9.5-12.5 3 3
13-15 12.5-15.5 12 15
16-18 15.5-18.5 15 30
19-21 18.5-21.5 24 54
22-24 21.5-24.5 2 56
Total 56

Minumum Value

The minimum time spent on the internet$\min = 9.5 \text{ minutes}$.

Maximum Value

The maximum time spent on the internet$\max = 24.5 \text{ minutes}$.

Quartiles

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

First Quartile $Q_1$

$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(56)}{4}\bigg)^{th}\text{ value}\\ &=\big(14\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $14$ is $15$. The corresponding class $12.5-15.5$ is the $1^{st}$ quartile class.

Thus

  • $l = 12.5$, the lower limit of the $1^{st}$ quartile class
  • $N=56$, total number of observations
  • $f =12$, frequency of the $1^{st}$ quartile class
  • $F_< = 3$, cumulative frequency of the class previous to $1^{st}$ quartile class
  • $h =3$, the class width

The first quartile $Q_1$ can be computed as follows:

$$ \begin{aligned} Q_1 &= l + \bigg(\frac{\frac{1(N)}{4} - F_<}{f}\bigg)\times h\\ &= 12.5 + \bigg(\frac{\frac{1*56}{4} - 3}{12}\bigg)\times 3\\ &= 12.5 + \bigg(\frac{14 - 3}{12}\bigg)\times 3\\ &= 12.5 + \big(0.9167\big)\times 3\\ &= 12.5 + 2.75\\ &= 15.25 \text{ minutes} \end{aligned} $$ Thus, $25$ % of the students spent less than or equal to $15.25$ minutes on the internet.

Median

$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{56}{2}\bigg)^{th}\text{ value}\\ &=\big(28\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $28$ is $30$. The corresponding class $15.5-18.5$ is the median class.

Thus

  • $l = 15.5$, the lower limit of the median class
  • $N=56$, total number of observations
  • $f =15$, frequency of the median class
  • $F_< = 15$, cumulative frequency of the class previous to median class
  • $h =3$, the class width

The median $M$ can be computed as follows:

$$ \begin{aligned} M &= l + \bigg(\frac{\frac{N}{2} - F_<}{f}\bigg)\times h\\ &= 15.5 + \bigg(\frac{\frac{56}{2} - 15}{15}\bigg)\times 3\\ &= 15.5 + \bigg(\frac{28 - 15}{15}\bigg)\times 3\\ &= 15.5 + \big(0.8667\big)\times 3\\ &= 15.5 + 2.6\\ &= 18.1 \text{ minutes} \end{aligned} $$ Thus, $50$ % of the students spent less than or equal to $18.1$ minutes on the internet.

Third Quartile $Q_3$

$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(56)}{4}\bigg)^{th}\text{ value}\\ &=\big(42\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $42$ is $54$. The corresponding class $18.5-21.5$ is the $3^{rd}$ quartile class.

Thus

  • $l = 18.5$, the lower limit of the $3^{rd}$ quartile class
  • $N=56$, total number of observations
  • $f =24$, frequency of the $3^{rd}$ quartile class
  • $F_< = 30$, cumulative frequency of the class previous to $3^{rd}$ quartile class
  • $h =3$, the class width

The third quartile $Q_3$ can be computed as follows:

$$ \begin{aligned} Q_3 &= l + \bigg(\frac{\frac{3(N)}{4} - F_<}{f}\bigg)\times h\\ &= 18.5 + \bigg(\frac{\frac{3*56}{4} - 30}{24}\bigg)\times 3\\ &= 18.5 + \bigg(\frac{42 - 30}{24}\bigg)\times 3\\ &= 18.5 + \big(0.5\big)\times 3\\ &= 18.5 + 1.5\\ &= 20 \text{ minutes} \end{aligned} $$ Thus, $75$ % of the students spent less than or equal to $20$ minutes on the internet.

Thus the five number summary of time spent on the internet is

$\min = 9.5$ minutes, $Q_1 = 15.25$ minutes, $\text{median }=18.1$ minutes, $Q_3=20$ minutes and $\max = 24.5$ minutes.

Related Resources