Testing Correlation Coefficient

In this tutorial we will discuss step by step solution of numerical problems on testing whether the population correlation coefficient is $\rho_0$ or not.

Example 1

The median records shows that the correlation between the age of the mother and the birth weight of their first child is less than -0.34. A random sample of 8 mother’s age and the birth weight of their first child are as follows:

Age of mother 35 24 28 29 26 30 34 32
Birth weight of child 2.85 3.50 3.25 3.00 3.25 2.75 2.90 3.00

Test whether the medical records provide the true information at 5% level of significance.

Solution

Let $x$ denote the age of mother and $y$ denote the birth weight of first child.

The number of pairs $n= 8$.

$x$ $y$ $x^2$ $y^2$ $xy$
1 35 2.85 1225 8.123 99.75
2 24 3.50 576 12.250 84.00
3 28 3.25 784 10.562 91.00
4 29 3.00 841 9.000 87.00
5 26 3.25 676 10.562 84.50
6 30 2.75 900 7.562 82.50
7 34 2.90 1156 8.410 98.60
8 32 3.00 1024 9.000 96.00
Total 238 24.50 7182 75.470 723.35

The sample variance of $x$ is

\begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(7182-\frac{(238)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(7182-\frac{56644}{8}\bigg)\\ &= \frac{1}{7}\bigg(7182-7080.5\bigg)\\ &= \frac{101.5}{7}\\ &= 14.5. \end{aligned} The sample variance of $x$ is

\begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(75.47-\frac{(24.5)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(75.47-\frac{600.25}{8}\bigg)\\ &= \frac{1}{7}\bigg(75.47-75.0312\bigg)\\ &= \frac{0.4387}{7}\\ &= 0.0627. \end{aligned}

The sample covariance between $x$ and $y$ is

\begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(723.35-\frac{(238)(24.5)}{8}\bigg)\\ &= \frac{1}{7}\bigg(723.35-\frac{5831}{8}\bigg)\\ &= \frac{1}{7}\bigg(723.35-728.875\bigg)\\ &= \frac{-5.525}{7}\\ &= -0.7893. \end{aligned} The Karl Pearson’s sample correlation coefficient between age of mother and birth weight of first child is

\begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-0.7893}{\sqrt{14.5\times 0.0627}}\\ &=\frac{-0.7893}{\sqrt{0.9092}}\\ &=-0.828. \end{aligned} The correlation coefficient between age of mother and birth weight of first child is $-0.828$.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = -0.34$ against $H_1 : \rho < -0.34$ ($\text{left-tailed}$)

Step 2 Test Statistic

The test statistic for testing above hypothesis testing problem is \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}} \end{aligned} where \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg) \end{aligned} and \begin{aligned} \xi & =\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg) \end{aligned} Under the null hypothesis the test statistic $Z$ follows $N(0,1)$ distribution.

Step 3 Significance Level

The significance level is $\alpha = 0.05$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{left-tailed}$, the critical value of $Z$ $\text{is}$ $-1.64$ (from Normal Statistical Table).

The rejection region (i.e. critical region) is $\text{Z < -1.64}$.

Step 5 Computation

\begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+(-0.828)}{1-(-0.828)}\bigg)\\ &=0.5\times \log_e\big(0.0941\big)\\ &=0.5\times -2.3635\\ &= -1.1817 \end{aligned} and \begin{aligned} \xi&=\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+(-0.34)}{1-(-0.34)}\bigg)\\ &=0.5\times \log_e\big(0.4925\big)\\ &=0.5\times -0.7082\\ &= -0.3541 \end{aligned} The test statistic under the null hypothesis is \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}}\\ &=\dfrac{-1.1817-(-0.3541)}{\sqrt{\frac{1}{8-3}}}\\ &=\dfrac{-0.8276}{\sqrt{\frac{1}{5}}}\\ &=-1.8507 \end{aligned}

The test statistic is $Z_{obs} =-1.851$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis at $\alpha = 0.05$ level of significance.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{left-tailed}$ test, so the p-value is the area to the $\text{negative}$ of the test statistic ($Z_{obs}=-1.851$) is p-value = $0.0321$.

The p-value is $0.0321$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis at $\alpha =0.05$ level of significance.

Interpretation

There is enough evidence to conclude that the medical records provide true information at $0.05$ level of significance.

Example 2

The correlation between scores on a traditional aptitude test and scores on a final test is known to be approximately 0.6. A new aptitude test has been developed and is tried on a random sample of 100 students, resulting in a correlation of 0.65. Does this result imply that the new test is better?

Solution

Given that the sample correlation between $X$ and $Y$ is $0.65$ for a sample of $100$ pair of observations.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0.6$ against $H_1 : \rho > 0.6$ ($\text{right-tailed}$)

Step 2 Test Statistic

The test statistic for testing above hypothesis testing problem is \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}} \end{aligned} where \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg) \end{aligned} and \begin{aligned} \xi & =\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg) \end{aligned} Under the null hypothesis the test statistic $Z$ follows $N(0,1)$ distribution.

Step 3 Significance Level

The significance level is $\alpha = 0.05$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{right-tailed}$, the critical value of $Z$ $\text{is}$ $1.64$ (from Normal Statistical Table).

The rejection region (i.e. critical region) is $\text{Z > 1.64}$.

Step 5 Computation

\begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+0.65}{1-0.65}\bigg)\\ &=0.5\times \log_e\big(4.7143\big)\\ &=0.5\times 1.5506\\ &= 0.7753 \end{aligned} and \begin{aligned} \xi&=\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+0.6}{1-0.6}\bigg)\\ &=0.5\times \log_e\big(4\big)\\ &=0.5\times 1.3863\\ &= 0.6931 \end{aligned} The test statistic under the null hypothesis is \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}}\\ &=\dfrac{0.7753-0.6931}{\sqrt{\frac{1}{100-3}}}\\ &=\dfrac{0.0822}{\sqrt{\frac{1}{97}}}\\ &=0.8091 \end{aligned}

The test statistic is $Z_{obs} =0.809$ which falls $\text{outside}$ the critical region, we $\text{fail to reject}$ the null hypothesis at $\alpha = 0.05$ level of significance.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{right-tailed}$ test, so the p-value is the area to the $\text{right}$ of the test statistic ($Z_{obs}=0.809$) is p-value = $0.2092$.

The p-value is $0.2092$ which is $\text{greater than}$ the significance level of $\alpha = 0.05$, we $\text{fail to reject}$ the null hypothesis at $\alpha =0.05$ level of significance.

Interpretation

There is insufficient evidence to conclude that the new test is better.