## Karl Pearson coefficient of skewness for grouped data

Let `$(x_i,f_i), i=1,2, \cdots , n$`

be given frequency distribution.

## Formula

The Karl Pearson’s coefficient skewness is given by

`$S_k =\dfrac{Mean-Mode)}{sd}=\dfrac{\overline{x}-\text{Mode}}{s_x}$`

OR

`$S_k =\dfrac{3(Mean-Median)}{sd}=\dfrac{\overline{x}-M}{s_x}$`

where,

- $\overline{x}$ is the sample mean,
- $M$ is the median,
- $s_x$ is the sample standard deviation.

## Sample mean

The sample mean $\overline{x}$ is given by

```
$$
\begin{eqnarray*}
\overline{x}& =\frac{1}{N}\sum_{i=1}^{n}f_ix_i
\end{eqnarray*}
$$
```

## Sample median

The median is given by

`$\text{Median } = l + \bigg(\dfrac{\frac{N}{2} - F_<}{f}\bigg)\times h$`

where,

- $N$, total number of observations
- $l$, the lower limit of the median class
- $f$, frequency of the median class
- $F_<$, cumulative frequency of the pre median class
- $h$, the class width

## Sample mode

The mode of the distribution is given by

`$\text{Mode } = l + \bigg(\dfrac{f_m - f_1}{2f_m-f_1-f_2}\bigg)\times h$`

where,

- $l$, the lower limit of the modal class
- $f_m$, frequency of the modal class
- $f_1$, frequency of the class pre-modal class
- $f_2$, frequency of the class post-modal class
- $h$, the class width

## Sample Standard deviation

Sample standard deviation is given by

```
$$
\begin{aligned}
s_x &=\sqrt{s_x^2}\\
&=\sqrt{\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)}
\end{aligned}
$$
```

## Example 1

The number of students absent in a class was recorded every day for 60 days and the information is given in the following frequency distribution.

No.of Students absent (x) | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|

No.of days (f) | 3 | 6 | 18 | 18 | 8 | 5 | 2 |

Find the Karl Pearson’s coefficient of skewness.

### Solution

$x_i$ | $f_i$ | $f_i*x_i$ | $f_i*x_i^2$ | $cf$ | |
---|---|---|---|---|---|

0 | 3 | 0 | 0 | 3 | |

1 | 6 | 6 | 6 | 9 | |

2 | 18 | 36 | 72 | 27 | |

3 | 18 | 54 | 162 | 45 | |

4 | 8 | 32 | 128 | 53 | |

5 | 5 | 25 | 125 | 58 | |

6 | 2 | 12 | 72 | 60 | |

Total | 60 | 165 | 565 |

**Sample mean**

The sample mean of $X$ is

```
$$
\begin{aligned}
\overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\
&=\frac{165}{60}\\
&=2.75
\end{aligned}
$$
```

The average of no. of students absent is $2.75$ students.

Since the given frequency distribution is bimodal, we use empirical formula to calculate Karl Pearson’s coefficient of skewness.

For asymmetric distribution,

```
$$
\begin{aligned}
\text{Mean} - \text{Mode} &= 3(\text{Mean} - \text{Median})
\end{aligned}
$$
```

Thus, Karl Pearson’s coefficient of skewness is given by

```
$$
\begin{aligned}
S_k &=\dfrac{3(Mean-Median)}{sd}\\
&=\dfrac{\overline{x}-M}{s_x}
\end{aligned}
$$
```

**Sample Median**

Median no. of students absent is
```
$$
\begin{aligned}
\text{Median} &=\bigg(\dfrac{N}{2}\bigg)^{th}\text{ value}\\
&= \bigg(\dfrac{60}{2}\bigg)^{th}\text{ value}\\
&=\big(30\big)^{th}\text{ value}
\end{aligned}
$$
```

The cumulative frequency just greater than or equal to $30$ is $45$. The corresponding value of $x$ is median. That is, $M =3$.

Thus, median number of accidents $M$ = $3$.

**Sample variance**

Sample variance of $X$ is

```
$$
\begin{aligned}
s_x^2 &=\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)\\
&=\dfrac{1}{59}\bigg(565-\frac{(165)^2}{60}\bigg)\\
&=\dfrac{1}{59}\big(565-\frac{27225}{60}\big)\\
&=\dfrac{1}{59}\big(565-453.75\big)\\
&= \frac{111.25}{59}\\
&=1.8856
\end{aligned}
$$
```

**Sample standard deviation**

The standard deviation is the positive square root of the variance.

The sample standard deviation is

```
$$
\begin{aligned}
s_x &=\sqrt{s_x^2}\\
&=\sqrt{1.8856}\\
&=1.3732
\end{aligned}
$$
```

Thus the standard deviation of no. of students absent is $1.3732$ students.

**Karl Pearson’s coefficient of skewness**

The Karl Pearson’s coefficient skewness is

```
$$
\begin{aligned}
s_k &=\frac{3(Mean-Median)}{sd}\\
&=\frac{3\times(2.75-3)}{2.1602}\\
&= -0.5462
\end{aligned}
$$
```

As the value of $s_k < 0$, the data is $\text{negatively skewed}$.

## Example 2

The following table gives the distribution of weight (in pounds) of 100 newborn babies at certain hospital in 2012.

Weight (in pounds) | 3-5 | 5-7 | 7-9 | 9-11 | 11-13 |
---|---|---|---|---|---|

No.of babies | 10 | 30 | 28 | 18 | 14 |

Compute Karl Pearson’s coefficient of skewness.

### Solution

Class Interval | mid-value ($x$) | $f$ | $f*x$ | $f*x^2$ | |
---|---|---|---|---|---|

3-5 | 4 | 10 | 40 | 160 | |

5-7 | 6 | 30 | 180 | 1080 | |

7-9 | 8 | 28 | 224 | 1792 | |

9-11 | 10 | 18 | 180 | 1800 | |

11-13 | 12 | 14 | 168 | 2016 | |

Total | 100 | 792 | 6848 |

**Mean**

The mean weight of babies is

```
$$
\begin{aligned}
\overline{x} &=\frac{1}{N}\sum_{i=1}^n f_ix_i\\
&=\frac{792}{100}\\
&=7.92 \text{ pounds}
\end{aligned}
$$
```

**Sample Mode**

The maximum frequency is $30$, the corresponding class $5-7$ is the modal class.

Mode of the given frequency distribution is:
```
$$
\begin{aligned}
\text{Mode } &= l + \bigg(\frac{f_m - f_1}{2f_m-f_1-f_2}\bigg)\times h\\
\end{aligned}
$$
```

where,

- $l = 5$, the lower limit of the modal class
- $f_m =30$, frequency of the modal class
- $f_1 = 10$, frequency of the pre-modal class
- $f_2 = 28$, frequency of the post-modal class
- $h =2$, the class width

Thus mode of a frequency distribution is

```
$$
\begin{aligned}
\text{Mode } &= l + \bigg(\frac{f_m - f_1}{2f_m-f_1-f_2}\bigg)\times h\\
&= 5 + \bigg(\frac{30 - 10}{2\times30 - 10 - 28}\bigg)\times 2\\
&= 5 + \bigg(\frac{20}{22}\bigg)\times 2\\
&= 5 + \big(0.9091\big)\times 2\\
&= 5 + \big(1.8182\big)\\
&= 6.8182 \text{ pounds}
\end{aligned}
$$
```

**Sample variance**

Sample variance of $X$ is

```
$$
\begin{aligned}
s_x^2 &=\dfrac{1}{N-1}\bigg(\sum_{i=1}^{n}f_ix_i^2-\frac{\big(\sum_{i=1}^n f_ix_i\big)^2}{N}\bigg)\\
&=\dfrac{1}{99}\bigg(6848-\frac{(792)^2}{100}\bigg)\\
&=\dfrac{1}{99}\big(6848-\frac{627264}{100}\big)\\
&=\dfrac{1}{99}\big(6848-6272.64\big)\\
&= \frac{575.36}{99}\\
&=5.8117
\end{aligned}
$$
```

**Sample standard deviation**

The sample standard deviation is

```
$$
\begin{aligned}
s_x &=\sqrt{s_x^2}\\
&=\sqrt{5.8117}\\
&=2.4107 \text{ pounds}
\end{aligned}
$$
```

Thus the standard deviation of weight of babies is $2.4107$ pounds.

**Karl Pearson’s coefficient of skewness**

The Karl Pearson’s coefficient skewness is

```
$$
\begin{aligned}
s_k &=\frac{Mean-\text{Mode}}{sd}\\
&=\frac{7.92-6.8182}{3.1623}\\
&= 0.457
\end{aligned}
$$
```

As the value of $s_k > 0$, the data is $\text{positively skewed}$.