Karl Pearson coefficient of skewness for grouped data

Let $(x_i,f_i), i=1,2, \cdots , n$ be given frequency distribution.

Formula

The Karl Pearson’s coefficient skewness is given by

$S_k =\dfrac{Mean-Mode)}{sd}=\dfrac{\overline{x}-\text{Mode}}{s_x}$

OR

$S_k =\dfrac{3(Mean-Median)}{sd}=\dfrac{\overline{x}-M}{s_x}$

where,

  • $\overline{x}$ is the sample mean,
  • $M$ is the median,
  • $s_x$ is the sample standard deviation.

Sample mean

The sample mean $\overline{x}$ is given by

$$ \begin{eqnarray*} \overline{x}& =\frac{1}{N}\sum_{i=1}^{n}f_ix_i \end{eqnarray*} $$

Sample median

The median is given by

$\text{Median } = l + \bigg(\dfrac{\frac{N}{2} - F_<}{f}\bigg)\times h$

where,

  • $N$, total number of observations
  • $l$, the lower limit of the median class
  • $f$, frequency of the median class
  • $F_<$, cumulative frequency of the pre median class
  • $h$, the class width

Sample mode

The mode of the distribution is given by

$\text{Mode } = l + \bigg(\dfrac{f_m - f_1}{2f_m-f_1-f_2}\bigg)\times h$

where,

  • $l$, the lower limit of the modal class
  • $f_m$, frequency of the modal class
  • $f_1$, frequency of the class pre-modal class
  • $f_2$, frequency of the class post-modal class
  • $h$, the class width

Sample Standard deviation

Sample standard deviation is given by

$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)} \end{aligned} $$

Example 1

The number of students absent in a class was recorded every day for 60 days and the information is given in the following frequency distribution.

No.of Students absent (x) 0 1 2 3 4 5 6
No.of days (f) 3 6 18 18 8 5 2

Find the Karl Pearson’s coefficient of skewness.

Solution

$x_i$ $f_i$ $f_i*x_i$ $f_i*x_i^2$ $cf$
0 3 0 0 3
1 6 6 6 9
2 18 36 72 27
3 18 54 162 45
4 8 32 128 53
5 5 25 125 58
6 2 12 72 60
Total 60 165 565

Sample mean

The sample mean of $X$ is

$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{165}{60}\\ &=2.75 \end{aligned} $$

The average of no. of students absent is $2.75$ students.

Since the given frequency distribution is bimodal, we use empirical formula to calculate Karl Pearson’s coefficient of skewness.

For asymmetric distribution,

$$ \begin{aligned} \text{Mean} - \text{Mode} &= 3(\text{Mean} - \text{Median}) \end{aligned} $$

Thus, Karl Pearson’s coefficient of skewness is given by

$$ \begin{aligned} S_k &=\dfrac{3(Mean-Median)}{sd}\\ &=\dfrac{\overline{x}-M}{s_x} \end{aligned} $$

Sample Median

Median no. of students absent is $$ \begin{aligned} \text{Median} &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\ &= \bigg(\dfrac{60}{2}\bigg)^{th}\text{ value}\\ &=\big(30\big)^{th}\text{ value} \end{aligned} $$ The cumulative frequency just greater than or equal to $30$ is $45$. The corresponding value of $x$ is median. That is, $M =3$.

Thus, median number of accidents $M$ = $3$.

Sample variance

Sample variance of $X$ is

$$ \begin{aligned} s_x^2 &=\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)\\ &=\dfrac{1}{59}\bigg(565-\frac{(165)^2}{60}\bigg)\\ &=\dfrac{1}{59}\big(565-\frac{27225}{60}\big)\\ &=\dfrac{1}{59}\big(565-453.75\big)\\ &= \frac{111.25}{59}\\ &=1.8856 \end{aligned} $$

Sample standard deviation

The standard deviation is the positive square root of the variance.

The sample standard deviation is

$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{1.8856}\\ &=1.3732 \end{aligned} $$

Thus the standard deviation of no. of students absent is $1.3732$ students.

Karl Pearson’s coefficient of skewness

The Karl Pearson’s coefficient skewness is

$$ \begin{aligned} s_k &=\frac{3(Mean-Median)}{sd}\\ &=\frac{3\times(2.75-3)}{2.1602}\\ &= -0.5462 \end{aligned} $$

As the value of $s_k < 0$, the data is $\text{negatively skewed}$.

Example 2

The following table gives the distribution of weight (in pounds) of 100 newborn babies at certain hospital in 2012.

Weight (in pounds) 3-5 5-7 7-9 9-11 11-13
No.of babies 10 30 28 18 14

Compute Karl Pearson’s coefficient of skewness.

Solution

Class Interval mid-value ($x$) $f$ $f*x$ $f*x^2$
3-5 4 10 40 160
5-7 6 30 180 1080
7-9 8 28 224 1792
9-11 10 18 180 1800
11-13 12 14 168 2016
Total 100 792 6848

Mean

The mean weight of babies is

$$ \begin{aligned} \overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\ &=\frac{792}{100}\\ &=7.92 \text{ pounds} \end{aligned} $$

Sample Mode

The maximum frequency is $30$, the corresponding class $5-7$ is the modal class.

Mode of the given frequency distribution is: $$ \begin{aligned} \text{Mode } &= l + \bigg(\frac{f_m - f_1}{2f_m-f_1-f_2}\bigg)\times h\\ \end{aligned} $$ where,

  • $l = 5$, the lower limit of the modal class
  • $f_m =30$, frequency of the modal class
  • $f_1 = 10$, frequency of the pre-modal class
  • $f_2 = 28$, frequency of the post-modal class
  • $h =2$, the class width

Thus mode of a frequency distribution is

$$ \begin{aligned} \text{Mode } &= l + \bigg(\frac{f_m - f_1}{2f_m-f_1-f_2}\bigg)\times h\\ &= 5 + \bigg(\frac{30 - 10}{2\times30 - 10 - 28}\bigg)\times 2\\ &= 5 + \bigg(\frac{20}{22}\bigg)\times 2\\ &= 5 + \big(0.9091\big)\times 2\\ &= 5 + \big(1.8182\big)\\ &= 6.8182 \text{ pounds} \end{aligned} $$

Sample variance

Sample variance of $X$ is

$$ \begin{aligned} s_x^2 &=\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)\\ &=\dfrac{1}{99}\bigg(6848-\frac{(792)^2}{100}\bigg)\\ &=\dfrac{1}{99}\big(6848-\frac{627264}{100}\big)\\ &=\dfrac{1}{99}\big(6848-6272.64\big)\\ &= \frac{575.36}{99}\\ &=5.8117 \end{aligned} $$

Sample standard deviation

The sample standard deviation is

$$ \begin{aligned} s_x &=\sqrt{s_x^2}\\ &=\sqrt{5.8117}\\ &=2.4107 \text{ pounds} \end{aligned} $$

Thus the standard deviation of weight of babies is $2.4107$ pounds.

Karl Pearson’s coefficient of skewness

The Karl Pearson’s coefficient skewness is

$$ \begin{aligned} s_k &=\frac{Mean-\text{Mode}}{sd}\\ &=\frac{7.92-6.8182}{3.1623}\\ &= 0.457 \end{aligned} $$

As the value of $s_k > 0$, the data is $\text{positively skewed}$.

Related Resources