Summary statistic for grouped data

Summary statistic summarize and provide information about the sample data. It includes the minimum value of the data, first quartile ($Q_1$), median (i.e., $Q_2$), mean ($\overline{x}$), third quartile ($Q_3$) and the minimum value of the data.

Summary statistic includes

  • minimum value ($\min$),
  • first quartile ($Q_1$),
  • $\text{median }$ ($Q_2$),
  • sample mean ($\overline{x}$),
  • third quartile ($Q_3$),
  • maximum value ($\max$).

Formula

$\min$, $Q_1$, $\text{median}$, $\overline{x}$, $Q_3$ and $\max$

The mean of $X$ is denoted by $\overline{x}$ and is given by

$\overline{x} =\dfrac{1}{n}\sum_{i=1}^{n}x_i$

Quartiles

The formula for $i^{th}$ quartile is

$Q_i =$ Value of $\bigg(\dfrac{i(N+1)}{4}\bigg)^{th}$ observation, $i=1,2,3$

where $N$ is the total number of observations.

Example 1

A librarian keeps the records about the amount of time spent (in minutes) in a library by college students. Data is as follows:

Time spent 30 32 35 38 40
No. of students 8 12 20 10 5

Compute summary statistics for the above frequency distribution.

Solution

$x_i$ $f_i$ $f_i*x_i$ $cf$
30 8 240 8
32 12 384 20
35 20 700 40
38 10 380 50
40 5 200 55
Total 55 1904

Minimum Value

The minimum amount of time spent in library by college students is $\min = 30$ minutes.

Maximum Value

The maximum amount of time spent in library by college students is $\max = 40$ minutes.

Sample mean

The sample mean of $X$ is

$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{1904}{55}\\ &=34.6182 \text{ minutes} \end{aligned} $$

The average amount of time spent in library by college students is $34.6182$ minutes.

Quartiles

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

First Quartile $Q_1$

$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(55)}{4}\bigg)^{th}\text{ value}\\ &=\big(13.75\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $13.75$ is $20$. The corresponding value of $X$ is the $1^{st}$ quartile. That is, $Q_1 =32$ minutes.

Median $M$

$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{55}{2}\bigg)^{th}\text{ value}\\ &=\big(27.5\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $13.75$ is $40$. The corresponding value of $X$ is the median. That is, $M =35$ minutes.

Third Quartile $Q_3$

$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(55)}{4}\bigg)^{th}\text{ value}\\ &=\big(41.25\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $41.25$ is $50$. The corresponding value of $X$ is the $3^{rd}$ quartile. That is, $Q_3 =38$ minutes.

Thus the summary statistics for the amount of time spent in library by college students is

$\min = 30$ minutes, $Q_1 = 32$ minutes, $\text{median }=35$ minutes, $\overline{x}=34.6182$ minutes, $Q_3=38$ minutes and $\max = 40$ minutes.

Example 2

The following table gives the distribution of weight (in pounds) of 100 newborn babies at certain hospital in 2012.

Weight (in pounds) 3-5 5-7 7-9 9-11 11-13
No.of babies 10 30 28 18 14

Compute summary statistics for the above frequency distribution.

Solution

Class Interval $x_i$ $f_i$ $f_i*x_i$ $cf$
3-5 4 10 40 10
5-7 6 30 180 40
7-9 8 28 224 68
9-11 10 18 180 86
11-13 12 14 168 100
Total 100 792

Minumum Value

The minimum weight of newborn babies is $\min = 3 \text{ pounds}$.

Maximum Value

The maximum weight of newborn babies is $\max = 13 \text{ pounds}$.

Sample mean

The sample mean of $X$ is

$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{792}{100}\\ &=7.92\text{ pounds} \end{aligned} $$

The average weight of newborn babies is $7.92$ pounds.

Quartiles

The formula for $i^{th}$ quartile is

$Q_i =\bigg(\dfrac{i(N)}{4}\bigg)^{th}$ value, $i=1,2,3$

where $N$ is the total number of observations.

First Quartile $Q_1$

$$ \begin{aligned} Q_{1} &=\bigg(\dfrac{1(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{1(100)}{4}\bigg)^{th}\text{ value}\\ &=\big(25\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $25$ is $40$. The corresponding class $5-7$ is the $1^{st}$ quartile class.

Thus

  • $l = 5$, the lower limit of the $1^{st}$ quartile class
  • $N=100$, total number of observations
  • $f =30$, frequency of the $1^{st}$ quartile class
  • $F_< = 10$, cumulative frequency of the class previous to $1^{st}$ quartile class
  • $h =2$, the class width

The first quartile $Q_1$ can be computed as follows:

$$ \begin{aligned} Q_1 &= l + \bigg(\frac{\frac{1(N)}{4} - F_<}{f}\bigg)\times h\\ &= 5 + \bigg(\frac{\frac{1*100}{4} - 10}{30}\bigg)\times 2\\ &= 5 + \bigg(\frac{25 - 10}{30}\bigg)\times 2\\ &= 5 + \big(0.5\big)\times 2\\ &= 5 + 1\\ &= 6 \text{ pounds} \end{aligned} $$ Thus, $25$ % of weight of newborn babies is less than or equal to $6$ pounds.

Median

$$ \begin{aligned} M &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{100}{2}\bigg)^{th}\text{ value}\\ &=\big(50\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $50$ is $68$. The corresponding class $7-9$ is the median class.

Thus

  • $l = 7$, the lower limit of the median class
  • $N=100$, total number of observations
  • $f =28$, frequency of the median class
  • $F_< = 40$, cumulative frequency of the class previous to median class
  • $h =2$, the class width

The median $M$ can be computed as follows:

$$ \begin{aligned} M &= l + \bigg(\frac{\frac{N}{2} - F_<}{f}\bigg)\times h\\ &= 7 + \bigg(\frac{\frac{100}{2} - 40}{28}\bigg)\times 2\\ &= 7 + \bigg(\frac{50 - 40}{28}\bigg)\times 2\\ &= 7 + \big(0.3571\big)\times 2\\ &= 7 + 0.7143\\ &= 7.7143 \text{ pounds} \end{aligned} $$ Thus, $50$ % of weight of newborn babies is less than or equal to $7.7143$ pounds.

Third Quartile $Q_3$

$$ \begin{aligned} Q_{3} &=\bigg(\dfrac{3(N)}{4}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{3(100)}{4}\bigg)^{th}\text{ value}\\ &=\big(75\big)^{th}\text{ value} \end{aligned} $$

The cumulative frequency just greater than or equal to $75$ is $86$. The corresponding class $9-11$ is the $3^{rd}$ quartile class.

Thus

  • $l = 9$, the lower limit of the $3^{rd}$ quartile class
  • $N=100$, total number of observations
  • $f =18$, frequency of the $3^{rd}$ quartile class
  • $F_< = 68$, cumulative frequency of the class previous to $3^{rd}$ quartile class
  • $h =2$, the class width

The third quartile $Q_3$ can be computed as follows:

$$ \begin{aligned} Q_3 &= l + \bigg(\frac{\frac{3(N)}{4} - F_<}{f}\bigg)\times h\\ &= 9 + \bigg(\frac{\frac{3*100}{4} - 68}{18}\bigg)\times 2\\ &= 9 + \bigg(\frac{75 - 68}{18}\bigg)\times 2\\ &= 9 + \big(0.3889\big)\times 2\\ &= 9 + 0.7778\\ &= 9.7778 \text{ pounds} \end{aligned} $$ Thus, $75$ % of weight of newborn babies is less than or equal to $9.7778$ pounds.

Thus the summary statistics of weight of newborn babies is

$\min = 3$ pounds, $Q_1 = 6$ pounds, $\text{median }=7.7143$ pounds,$\overline{x}=7.92$ pounds, $Q_3=9.7778$ pounds and $\max = 13$ pounds.

Suggestions and comments will be appreciated.

Related Resources