## Summary statistic for grouped data

Summary statistic summarize and provide information about the sample data. It includes the minimum value of the data, first quartile ($Q_1$), median (i.e., $Q_2$), mean ($\overline{x}$), third quartile ($Q_3$) and the minimum value of the data.

Summary statistic includes

- minimum value ($\min$),
- first quartile ($Q_1$),
- $\text{median }$ ($Q_2$),
- sample mean ($\overline{x}$),
- third quartile ($Q_3$),
- maximum value ($\max$).

## Formula

$\min$, $Q_1$, $\text{median}$, $\overline{x}$, $Q_3$ and $\max$

The mean of $X$ is denoted by $\overline{x}$ and is given by

`$\overline{x} =\dfrac{1}{n}\sum_{i=1}^{n}x_i$`

### Quartiles

The formula for $i^{th}$ quartile is

$Q_i =$ Value of $\bigg(\dfrac{i(N+1)}{4}\bigg)^{th}$ observation, $i=1,2,3$

where $N$ is the total number of observations.

## Example 1

A librarian keeps the records about the amount of time spent (in minutes) in a library by college students. Data is as follows:

Time spent | 30 | 32 | 35 | 38 | 40 |
---|---|---|---|---|---|

No. of students | 8 | 12 | 20 | 10 | 5 |

Compute summary statistics for the above frequency distribution.

### Solution

$x_i$ | $f_i$ | $f_i*x_i$ | $cf$ | |
---|---|---|---|---|

30 | 8 | 240 | 8 | |

32 | 12 | 384 | 20 | |

35 | 20 | 700 | 40 | |

38 | 10 | 380 | 50 | |

40 | 5 | 200 | 55 | |

Total | 55 | 1904 |

**Minimum Value**

The minimum amount of time spent in library by college students is `$\min = 30$`

minutes.

**Maximum Value**

The maximum amount of time spent in library by college students is `$\max = 40$`

minutes.

**Sample mean**

The sample mean of $X$ is

`$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{1904}{55}\\ &=34.6182 \text{ minutes} \end{aligned} $$`

The average amount of time spent in library by college students is $34.6182$ minutes.

**Quartiles**

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

**First Quartile $Q_1$**

`$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(55)}{4}\bigg)^{th}\text{ value}\\ &=\big(13.75\big)^{th}\text{ value} \end{aligned} $$`

The cumulative frequency just greater than or equal to $13.75$ is $20$. The corresponding value of $X$ is the $1^{st}$ quartile. That is, $Q_1 =32$ minutes.

**Median $M$**

`$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{55}{2}\bigg)^{th}\text{ value}\\ &=\big(27.5\big)^{th}\text{ value} \end{aligned} $$`

The cumulative frequency just greater than or equal to $13.75$ is $40$. The corresponding value of $X$ is the median. That is, $M =35$ minutes.

**Third Quartile $Q_3$**

`$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(55)}{4}\bigg)^{th}\text{ value}\\ &=\big(41.25\big)^{th}\text{ value} \end{aligned} $$`

The cumulative frequency just greater than or equal to $41.25$ is $50$. The corresponding value of $X$ is the $3^{rd}$ quartile. That is, $Q_3 =38$ minutes.

Thus the summary statistics for the amount of time spent in library by college students is

$\min = 30$ minutes, $Q_1 = 32$ minutes, $\text{median }=35$ minutes, $\overline{x}=34.6182$ minutes, $Q_3=38$ minutes and $\max = 40$ minutes.

## Example 2

The following table gives the distribution of weight (in pounds) of 100 newborn babies at certain hospital in 2012.

Weight (in pounds) | 3-5 | 5-7 | 7-9 | 9-11 | 11-13 |
---|---|---|---|---|---|

No.of babies | 10 | 30 | 28 | 18 | 14 |

Compute summary statistics for the above frequency distribution.

### Solution

Class Interval | $x_i$ | $f_i$ | $f_i*x_i$ | $cf$ | |
---|---|---|---|---|---|

3-5 | 4 | 10 | 40 | 10 | |

5-7 | 6 | 30 | 180 | 40 | |

7-9 | 8 | 28 | 224 | 68 | |

9-11 | 10 | 18 | 180 | 86 | |

11-13 | 12 | 14 | 168 | 100 | |

Total | 100 | 792 |

**Minumum Value**

The minimum weight of newborn babies is `$\min = 3 \text{ pounds}$`

.

**Maximum Value**

The maximum weight of newborn babies is `$\max = 13 \text{ pounds}$`

.

**Sample mean**

The sample mean of $X$ is

`$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{792}{100}\\ &=7.92\text{ pounds} \end{aligned} $$`

The average weight of newborn babies is $7.92$ pounds.

**Quartiles**

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

**First Quartile $Q_1$**

`$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(100)}{4}\bigg)^{th}\text{ value}\\ &=\big(25\big)^{th}\text{ value} \end{aligned} $$`

The cumulative frequency just greater than or equal to $25$ is $40$. The corresponding class $5-7$ is the $1^{st}$ quartile class.

Thus

- $l = 5$, the lower limit of the $1^{st}$ quartile class
- $N=100$, total number of observations
- $f =30$, frequency of the $1^{st}$ quartile class
- $F_< = 10$, cumulative frequency of the class previous to $1^{st}$ quartile class
- $h =2$, the class width

The first quartile $Q_1$ can be computed as follows:

`$$ \begin{aligned} Q_1 &= l + \bigg(\frac{\frac{1(N)}{4} - F_<}{f}\bigg)\times h\\ &= 5 + \bigg(\frac{\frac{1*100}{4} - 10}{30}\bigg)\times 2\\ &= 5 + \bigg(\frac{25 - 10}{30}\bigg)\times 2\\ &= 5 + \big(0.5\big)\times 2\\ &= 5 + 1\\ &= 6 \text{ pounds} \end{aligned} $$`

Thus, $25$ % of weight of newborn babies is less than or equal to $6$ pounds.

**Median**

`$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{100}{2}\bigg)^{th}\text{ value}\\ &=\big(50\big)^{th}\text{ value} \end{aligned} $$`

The cumulative frequency just greater than or equal to $50$ is $68$. The corresponding class $7-9$ is the median class.

Thus

- $l = 7$, the lower limit of the median class
- $N=100$, total number of observations
- $f =28$, frequency of the median class
- $F_< = 40$, cumulative frequency of the class previous to median class
- $h =2$, the class width

The median $M$ can be computed as follows:

`$$ \begin{aligned} M &= l + \bigg(\frac{\frac{N}{2} - F_<}{f}\bigg)\times h\\ &= 7 + \bigg(\frac{\frac{100}{2} - 40}{28}\bigg)\times 2\\ &= 7 + \bigg(\frac{50 - 40}{28}\bigg)\times 2\\ &= 7 + \big(0.3571\big)\times 2\\ &= 7 + 0.7143\\ &= 7.7143 \text{ pounds} \end{aligned} $$`

Thus, $50$ % of weight of newborn babies is less than or equal to $7.7143$ pounds.

**Third Quartile $Q_3$**

`$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(100)}{4}\bigg)^{th}\text{ value}\\ &=\big(75\big)^{th}\text{ value} \end{aligned} $$`

The cumulative frequency just greater than or equal to $75$ is $86$. The corresponding class $9-11$ is the $3^{rd}$ quartile class.

Thus

- $l = 9$, the lower limit of the $3^{rd}$ quartile class
- $N=100$, total number of observations
- $f =18$, frequency of the $3^{rd}$ quartile class
- $F_< = 68$, cumulative frequency of the class previous to $3^{rd}$ quartile class
- $h =2$, the class width

The third quartile $Q_3$ can be computed as follows:

`$$ \begin{aligned} Q_3 &= l + \bigg(\frac{\frac{3(N)}{4} - F_<}{f}\bigg)\times h\\ &= 9 + \bigg(\frac{\frac{3*100}{4} - 68}{18}\bigg)\times 2\\ &= 9 + \bigg(\frac{75 - 68}{18}\bigg)\times 2\\ &= 9 + \big(0.3889\big)\times 2\\ &= 9 + 0.7778\\ &= 9.7778 \text{ pounds} \end{aligned} $$`

Thus, $75$ % of weight of newborn babies is less than or equal to $9.7778$ pounds.

Thus the summary statistics of weight of newborn babies is

$\min = 3$ pounds, $Q_1 = 6$ pounds, $\text{median }=7.7143$ pounds,$\overline{x}=7.92$ pounds, $Q_3=9.7778$ pounds and $\max = 13$ pounds.

## Related Resources

Suggestions and comments will be appreciated.