Testing Correlation Coefficient
In this tutorial we will discuss step by step solution of numerical problems on testing whether the population correlation coefficient is $\rho_0$ or not.
Example 1
The median records shows that the correlation between the age of the mother and the birth weight of their first child is less than -0.34. A random sample of 8 mother’s age and the birth weight of their first child are as follows:
Age of mother | 35 | 24 | 28 | 29 | 26 | 30 | 34 | 32 |
---|---|---|---|---|---|---|---|---|
Birth weight of child | 2.85 | 3.50 | 3.25 | 3.00 | 3.25 | 2.75 | 2.90 | 3.00 |
Test whether the medical records provide the true information at 5% level of significance.
Solution
Let $x$ denote the age of mother and $y$ denote the birth weight of first child.
The number of pairs $n= 8$.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 35 | 2.85 | 1225 | 8.123 | 99.75 |
2 | 24 | 3.50 | 576 | 12.250 | 84.00 |
3 | 28 | 3.25 | 784 | 10.562 | 91.00 |
4 | 29 | 3.00 | 841 | 9.000 | 87.00 |
5 | 26 | 3.25 | 676 | 10.562 | 84.50 |
6 | 30 | 2.75 | 900 | 7.562 | 82.50 |
7 | 34 | 2.90 | 1156 | 8.410 | 98.60 |
8 | 32 | 3.00 | 1024 | 9.000 | 96.00 |
Total | 238 | 24.50 | 7182 | 75.470 | 723.35 |
The sample variance of $x$ is
$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(7182-\frac{(238)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(7182-\frac{56644}{8}\bigg)\\ &= \frac{1}{7}\bigg(7182-7080.5\bigg)\\ &= \frac{101.5}{7}\\ &= 14.5. \end{aligned} $$
The sample variance of $x$ is
$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(75.47-\frac{(24.5)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(75.47-\frac{600.25}{8}\bigg)\\ &= \frac{1}{7}\bigg(75.47-75.0312\bigg)\\ &= \frac{0.4387}{7}\\ &= 0.0627. \end{aligned} $$
The sample covariance between $x$ and $y$ is
$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(723.35-\frac{(238)(24.5)}{8}\bigg)\\ &= \frac{1}{7}\bigg(723.35-\frac{5831}{8}\bigg)\\ &= \frac{1}{7}\bigg(723.35-728.875\bigg)\\ &= \frac{-5.525}{7}\\ &= -0.7893. \end{aligned} $$
The Karl Pearson’s sample correlation coefficient between age of mother and birth weight of first child is
$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-0.7893}{\sqrt{14.5\times 0.0627}}\\ &=\frac{-0.7893}{\sqrt{0.9092}}\\ &=-0.828. \end{aligned} $$
The correlation coefficient between age of mother and birth weight of first child is $-0.828$.
Step 1 Hypothesis Testing Problem
The hypothesis testing problem is $H_0 : \rho = -0.34$ against $H_1 : \rho < -0.34$ ($\text{left-tailed}$)
Step 2 Test Statistic
The test statistic for testing above hypothesis testing problem is
$$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}} \end{aligned} $$
where
$$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg) \end{aligned} $$
and
$$ \begin{aligned} \xi & =\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg) \end{aligned} $$
Under the null hypothesis the test statistic $Z$ follows $N(0,1)$ distribution.
Step 3 Significance Level
The significance level is $\alpha = 0.05$.
Step 4 Critical Value(s)
As the alternative hypothesis is $\text{left-tailed}$, the critical value of $Z$ $\text{is}$ $-1.64$ (from Normal Statistical Table).
The rejection region (i.e. critical region) is $\text{Z < -1.64}$.
Step 5 Computation
$$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+(-0.828)}{1-(-0.828)}\bigg)\\ &=0.5\times \log_e\big(0.0941\big)\\ &=0.5\times -2.3635\\ &= -1.1817 \end{aligned} $$
and
$$ \begin{aligned} \xi&=\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+(-0.34)}{1-(-0.34)}\bigg)\\ &=0.5\times \log_e\big(0.4925\big)\\ &=0.5\times -0.7082\\ &= -0.3541 \end{aligned} $$
The test statistic under the null hypothesis is
$$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}}\\ &=\dfrac{-1.1817-(-0.3541)}{\sqrt{\frac{1}{8-3}}}\\ &=\dfrac{-0.8276}{\sqrt{\frac{1}{5}}}\\ &=-1.8507 \end{aligned} $$
Step 6 Decision (Traditional Approach)
The test statistic is $Z_{obs} =-1.851$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis at $\alpha = 0.05$ level of significance.
OR
Step 6 Decision ($p$-value Approach)
This is a $\text{left-tailed}$ test, so the p-value is the area to the $\text{negative}$ of the test statistic ($Z_{obs}=-1.851$) is p-value = $0.0321$.
The p-value is $0.0321$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis at $\alpha =0.05$ level of significance.
Interpretation
There is enough evidence to conclude that the medical records provide true information at $0.05$ level of significance.
Example 2
The correlation between scores on a traditional aptitude test and scores on a final test is known to be approximately 0.6. A new aptitude test has been developed and is tried on a random sample of 100 students, resulting in a correlation of 0.65. Does this result imply that the new test is better?
Solution
Given that the sample correlation between $X$ and $Y$ is $0.65$ for a sample of $100$ pair of observations.
Step 1 Hypothesis Testing Problem
The hypothesis testing problem is $H_0 : \rho = 0.6$ against $H_1 : \rho > 0.6$ ($\text{right-tailed}$)
Step 2 Test Statistic
The test statistic for testing above hypothesis testing problem is
$$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}} \end{aligned} $$
where
$$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg) \end{aligned} $$
and
$$ \begin{aligned} \xi & =\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg) \end{aligned} $$
Under the null hypothesis the test statistic $Z$ follows $N(0,1)$ distribution.
Step 3 Significance Level
The significance level is $\alpha = 0.05$.
Step 4 Critical Value(s)
As the alternative hypothesis is $\text{right-tailed}$, the critical value of $Z$ $\text{is}$ $1.64$ (from Normal Statistical Table).
The rejection region (i.e. critical region) is $\text{Z > 1.64}$.
Step 5 Computation
$$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+0.65}{1-0.65}\bigg)\\ &=0.5\times \log_e\big(4.7143\big)\\ &=0.5\times 1.5506\\ &= 0.7753 \end{aligned} $$
and
$$ \begin{aligned} \xi&=\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+0.6}{1-0.6}\bigg)\\ &=0.5\times \log_e\big(4\big)\\ &=0.5\times 1.3863\\ &= 0.6931 \end{aligned} $$
The test statistic under the null hypothesis is
$$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}}\\ &=\dfrac{0.7753-0.6931}{\sqrt{\frac{1}{100-3}}}\\ &=\dfrac{0.0822}{\sqrt{\frac{1}{97}}}\\ &=0.8091 \end{aligned} $$
Step 6 Decision (Traditional Approach)
The test statistic is $Z_{obs} =0.809$ which falls $\text{outside}$ the critical region, we $\text{fail to reject}$ the null hypothesis at $\alpha = 0.05$ level of significance.
OR
Step 6 Decision ($p$-value Approach)
This is a $\text{right-tailed}$ test, so the p-value is the area to the $\text{right}$ of the test statistic ($Z_{obs}=0.809$) is p-value = $0.2092$.
The p-value is $0.2092$ which is $\text{greater than}$ the significance level of $\alpha = 0.05$, we $\text{fail to reject}$ the null hypothesis at $\alpha =0.05$ level of significance.
Interpretation
There is insufficient evidence to conclude that the new test is better.