Testing Correlation Coefficient Part 2 examples

Feb 20, 2024 by Dr. Raju Chaudhari

Testing Correlation Coefficient

In this tutorial we will discuss step by step solution of numerical problems on testing whether the population correlation coefficient is $\rho_0$ or not.

Example 1

The median records shows that the correlation between the age of the mother and the birth weight of their first child is less than -0.34. A random sample of 8 mother’s age and the birth weight of their first child are as follows:

Age of mother	35	24	28	29	26	30	34	32
Birth weight of child	2.85	3.50	3.25	3.00	3.25	2.75	2.90	3.00

Test whether the medical records provide the true information at 5% level of significance.

Solution

Let $x$ denote the age of mother and $y$ denote the birth weight of first child.

The number of pairs $n= 8$.

	$x$	$y$	$x^2$	$y^2$	$xy$
1	35	2.85	1225	8.123	99.75
2	24	3.50	576	12.250	84.00
3	28	3.25	784	10.562	91.00
4	29	3.00	841	9.000	87.00
5	26	3.25	676	10.562	84.50
6	30	2.75	900	7.562	82.50
7	34	2.90	1156	8.410	98.60
8	32	3.00	1024	9.000	96.00
Total	238	24.50	7182	75.470	723.35

The sample variance of $x$ is

$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(7182-\frac{(238)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(7182-\frac{56644}{8}\bigg)\\ &= \frac{1}{7}\bigg(7182-7080.5\bigg)\\ &= \frac{101.5}{7}\\ &= 14.5. \end{aligned} $$ The sample variance of $x$ is

$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(75.47-\frac{(24.5)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(75.47-\frac{600.25}{8}\bigg)\\ &= \frac{1}{7}\bigg(75.47-75.0312\bigg)\\ &= \frac{0.4387}{7}\\ &= 0.0627. \end{aligned} $$

The sample covariance between $x$ and $y$ is

$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(723.35-\frac{(238)(24.5)}{8}\bigg)\\ &= \frac{1}{7}\bigg(723.35-\frac{5831}{8}\bigg)\\ &= \frac{1}{7}\bigg(723.35-728.875\bigg)\\ &= \frac{-5.525}{7}\\ &= -0.7893. \end{aligned} $$ The Karl Pearson’s sample correlation coefficient between age of mother and birth weight of first child is

$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-0.7893}{\sqrt{14.5\times 0.0627}}\\ &=\frac{-0.7893}{\sqrt{0.9092}}\\ &=-0.828. \end{aligned} $$ The correlation coefficient between age of mother and birth weight of first child is $-0.828$.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = -0.34$ against $H_1 : \rho < -0.34$ ($\text{left-tailed}$)

Step 2 Test Statistic

The test statistic for testing above hypothesis testing problem is $$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}} \end{aligned} $$ where $$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg) \end{aligned} $$ and $$ \begin{aligned} \xi & =\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg) \end{aligned} $$ Under the null hypothesis the test statistic $Z$ follows $N(0,1)$ distribution.

Step 3 Significance Level

The significance level is $\alpha = 0.05$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{left-tailed}$, the critical value of $Z$ $\text{is}$ $-1.64$ (from Normal Statistical Table).

The rejection region (i.e. critical region) is $\text{Z < -1.64}$.

Step 5 Computation

$$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+(-0.828)}{1-(-0.828)}\bigg)\\ &=0.5\times \log_e\big(0.0941\big)\\ &=0.5\times -2.3635\\ &= -1.1817 \end{aligned} $$ and $$ \begin{aligned} \xi&=\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+(-0.34)}{1-(-0.34)}\bigg)\\ &=0.5\times \log_e\big(0.4925\big)\\ &=0.5\times -0.7082\\ &= -0.3541 \end{aligned} $$ The test statistic under the null hypothesis is $$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}}\\ &=\dfrac{-1.1817-(-0.3541)}{\sqrt{\frac{1}{8-3}}}\\ &=\dfrac{-0.8276}{\sqrt{\frac{1}{5}}}\\ &=-1.8507 \end{aligned} $$

Step 6 Decision (Traditional Approach)

The test statistic is $Z_{obs} =-1.851$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis at $\alpha = 0.05$ level of significance.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{left-tailed}$ test, so the p-value is the area to the $\text{negative}$ of the test statistic ($Z_{obs}=-1.851$) is p-value = $0.0321$.

The p-value is $0.0321$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis at $\alpha =0.05$ level of significance.

Interpretation

There is enough evidence to conclude that the medical records provide true information at $0.05$ level of significance.

Example 2

The correlation between scores on a traditional aptitude test and scores on a final test is known to be approximately 0.6. A new aptitude test has been developed and is tried on a random sample of 100 students, resulting in a correlation of 0.65. Does this result imply that the new test is better?

Solution

Given that the sample correlation between $X$ and $Y$ is $0.65$ for a sample of $100$ pair of observations.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0.6$ against $H_1 : \rho > 0.6$ ($\text{right-tailed}$)

Step 2 Test Statistic

The test statistic for testing above hypothesis testing problem is $$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}} \end{aligned} $$ where $$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg) \end{aligned} $$ and $$ \begin{aligned} \xi & =\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg) \end{aligned} $$ Under the null hypothesis the test statistic $Z$ follows $N(0,1)$ distribution.

Step 3 Significance Level

The significance level is $\alpha = 0.05$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{right-tailed}$, the critical value of $Z$ $\text{is}$ $1.64$ (from Normal Statistical Table).

The rejection region (i.e. critical region) is $\text{Z > 1.64}$.

Step 5 Computation

$$ \begin{aligned} U&=\frac{1}{2}\log_e \bigg(\frac{1+r}{1-r}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+0.65}{1-0.65}\bigg)\\ &=0.5\times \log_e\big(4.7143\big)\\ &=0.5\times 1.5506\\ &= 0.7753 \end{aligned} $$ and $$ \begin{aligned} \xi&=\frac{1}{2}\log_e \bigg(\frac{1+\rho_0}{1-\rho_0}\bigg)\\ &=0.5\times \log_e\bigg(\frac{1+0.6}{1-0.6}\bigg)\\ &=0.5\times \log_e\big(4\big)\\ &=0.5\times 1.3863\\ &= 0.6931 \end{aligned} $$ The test statistic under the null hypothesis is $$ \begin{aligned} Z&=\dfrac{U-\xi}{\sqrt{\frac{1}{n-3}}}\\ &=\dfrac{0.7753-0.6931}{\sqrt{\frac{1}{100-3}}}\\ &=\dfrac{0.0822}{\sqrt{\frac{1}{97}}}\\ &=0.8091 \end{aligned} $$

Step 6 Decision (Traditional Approach)

The test statistic is $Z_{obs} =0.809$ which falls $\text{outside}$ the critical region, we $\text{fail to reject}$ the null hypothesis at $\alpha = 0.05$ level of significance.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{right-tailed}$ test, so the p-value is the area to the $\text{right}$ of the test statistic ($Z_{obs}=0.809$) is p-value = $0.2092$.

The p-value is $0.2092$ which is $\text{greater than}$ the significance level of $\alpha = 0.05$, we $\text{fail to reject}$ the null hypothesis at $\alpha =0.05$ level of significance.

Interpretation

There is insufficient evidence to conclude that the new test is better.

Related Resources

Calculator

Testing homogeneity of Two Correlation Coefficient

Feb 20, 2024
TUTORIALS

Testing Correlation Coefficient Part 2

Feb 20, 2024
TUTORIALS