Testing Correlation Coefficient Examples

Feb 20, 2024 by Dr. Raju Chaudhari

Testing Significance of Linear Relationship

A test of significance for a linear relationship between the variables $x$ and $y$ can be performed using the sample correlation coefficient $r_{xy}$.

Example 1

For a sample of eight bears, researchers measured the distances around the bears’ chests and weighed the bears. The Sample correlation coefficient between the chest size and weight of bears is $r=0.744$ for $n=8$ bears. Using $\alpha=0.05$, determine if there is a positive linear correlation between chest size and weight.

Solution

Given that $n = 8$ pair of observations, sample correlation coefficient is $r= 0.744$.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho > 0$ ($\text{right-tailed}$)

Step 2 Test Statistic

The test statistic is $$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$ which follows $t$ distribution with $n-2$ degrees of freedom.

Step 3 Significance Level

The significance level is $\alpha = 0.05$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{right-tailed}$, the critical value of $t$ $\text{is}$ $1.943$.

The rejection region (i.e. critical region) is $\text{t > 1.943}$.

Step 5 Computation

The test statistic under the null hypothesis is $$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{0.744}{\sqrt{1-0.744^2}}\sqrt{8 -2}\\ &= 2.727 \end{aligned} $$

Step 6 Decision (Traditional Approach)

The test statistic is $t =2.727$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{right-tailed}$ test, so the p-value is the area to the right of the test statistic ($t=2.727$) is p-value = $0.0172$.

The p-value is $0.0172$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis.

Interpretation

There is sufficient evidence to conclude that there is a significant positive linear relationship between chest size and weight of bears.

Example 2

Following is the data about the demand and price of a commodity for 8 periods.

Demand	16	20	18	21	13	15	17	22
Price	10	8	12	6	13	9	11	7

It was expected to estimate a linear regression for demand and price of a commodity.

Test whether there is a significant negative relationship between price and demand of a product.

Solution

Let $x$ denote the price of a commodity and $y$ denote the demand of a commodity.

The number of pairs $n= 8$.

	$x$	$y$	$x^2$	$y^2$	$xy$
1	10	16	100	256	160
2	8	20	64	400	160
3	12	18	144	324	216
4	6	21	36	441	126
5	13	13	169	169	169
6	9	15	81	225	135
7	11	17	121	289	187
8	7	22	49	484	154
Total	76	142	764	2588	1307

The sample variance of $x$ is

$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(764-\frac{(76)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(764-\frac{5776}{8}\bigg)\\ &= \frac{1}{7}\bigg(764-722\bigg)\\ &= \frac{42}{7}\\ &= 6. \end{aligned} $$ The sample variance of $x$ is

$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(2588-\frac{(142)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(2588-\frac{20164}{8}\bigg)\\ &= \frac{1}{7}\bigg(2588-2520.5\bigg)\\ &= \frac{67.5}{7}\\ &= 9.6429. \end{aligned} $$

The sample covariance between $x$ and $y$ is

$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(1307-\frac{(76)(142)}{8}\bigg)\\ &= \frac{1}{7}\bigg(1307-\frac{10792}{8}\bigg)\\ &= \frac{1}{7}\bigg(1307-1349\bigg)\\ &= \frac{-42}{7}\\ &= -6. \end{aligned} $$ The Karl Pearson’s sample correlation coefficient between price of a commodity and demand of a commodity is

$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-6}{\sqrt{6\times 9.6429}}\\ &=\frac{-6}{\sqrt{57.8574}}\\ &=-0.789. \end{aligned} $$ The correlation coefficient between price of a commodity and demand of a commodity is $-0.789$.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho < 0$ ($\text{left-tailed}$)

Step 2 Test Statistic

The test statistic is $$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$ which follows $t$ distribution with $n-2$ degrees of freedom.

Step 3 Significance Level

The significance level is $\alpha = 0.05$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{left-tailed}$, the critical value of $t$ $\text{is}$ $-1.943$.

The rejection region (i.e. critical region) is $\text{t < -1.943}$.

Step 5 Computation

The test statistic under the null hypothesis is $$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{-0.789}{\sqrt{1--0.789^2}}\sqrt{8 -2}\\ &= -3.146 \end{aligned} $$

Step 6 Decision (Traditional Approach)

The test statistic is $t =-3.146$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{left-tailed}$ test, so the p-value is the area to the left of the test statistic ($t=-3.146$) is p-value = $0.01$.

The p-value is $0.01$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis.

Interpretation

There is sufficient evidence to conclude that there is a significant negative linear relationship between demand and price of a commodity.

Example 3

Following is the data about the exam scores of 10 randomly selected students and the number of hours they studied for the exam.

Hours studied	4	5	6	9	10	8	7	3	8	5
Exam score	68	65	85	84	82	86	83	76	67	74

Test whether there is a significant correlation between hours studied and examination score. Use $\alpha=0.01$.

Solution

Let $x$ denote the hours studied and $y$ denote the exam score.

The number of pairs $n= 11$.

	$x$	$y$	$x^2$	$y^2$	$xy$
1	4	68	16	4624	272
2	5	65	25	4225	325
3	6	85	36	7225	510
4	9	84	81	7056	756
5	10	62	100	3844	620
6	8	86	64	7396	688
7	10	83	100	6889	830
8	7	76	49	5776	532
9	3	67	9	4489	201
10	8	74	64	5476	592
11	5	69	25	4761	345
Total	75	819	569	61761	5671

The sample variance of $x$ is

$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(569-\frac{(75)^2}{11}\bigg)\\ &= \frac{1}{10}\bigg(569-\frac{5625}{11}\bigg)\\ &= \frac{1}{10}\bigg(569-511.3636\bigg)\\ &= \frac{57.6364}{10}\\ &= 5.7636. \end{aligned} $$ The sample variance of $x$ is

$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(61761-\frac{(819)^2}{11}\bigg)\\ &= \frac{1}{10}\bigg(61761-\frac{670761}{11}\bigg)\\ &= \frac{1}{10}\bigg(61761-60978.2727\bigg)\\ &= \frac{782.7273}{10}\\ &= 78.2727. \end{aligned} $$

The sample covariance between $x$ and $y$ is

$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(5671-\frac{(75)(819)}{11}\bigg)\\ &= \frac{1}{10}\bigg(5671-\frac{61425}{11}\bigg)\\ &= \frac{1}{10}\bigg(5671-5584.0909\bigg)\\ &= \frac{86.9091}{10}\\ &= 8.6909. \end{aligned} $$ The Karl Pearson’s sample correlation coefficient between hours studied and exam score is

$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{8.6909}{\sqrt{5.7636\times 78.2727}}\\ &=\frac{8.6909}{\sqrt{451.1325}}\\ &=0.409. \end{aligned} $$ The sample correlation coefficient between hours studied and exam score is $0.409$.

Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho \neq 0$ ($\text{two-tailed}$)

Step 2 Test Statistic

The test statistic is $$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$ which follows $t$ distribution with $n-2$ degrees of freedom.

Step 3 Significance Level

The significance level is $\alpha = 0.01$.

Step 4 Critical Value(s)

As the alternative hypothesis is $\text{two-tailed}$, the critical value of $t$ $\text{are}$ $-3.25 and 3.25$.

The rejection region (i.e. critical region) is $\text{t < -3.25 or t > 3.25}$.

Step 5 Computation

The test statistic under the null hypothesis is $$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{0.409}{\sqrt{1-0.409^2}}\sqrt{11 -2}\\ &= 1.345 \end{aligned} $$

Step 6 Decision (Traditional Approach)

The test statistic is $t =1.345$ which falls $\text{outside}$ the critical region, we $\text{fail to reject}$ the null hypothesis.

OR

Step 6 Decision ($p$-value Approach)

This is a $\text{two-tailed}$ test, so the p-value is the twice the area to the right of the test statistic ($t=1.345$) is p-value = $0.2117$.

The p-value is $0.2117$ which is $\text{greater than}$ the significance level of $\alpha = 0.01$, we $\text{fail to reject}$ the null hypothesis.

Interpretation

There is insufficient evidence to conclude that there is a significant linear relationship between hours studied and examination score because the correlation coefficient between $x$ and $y$ is not significantly different from zero.

Related Resources

Theory

Calculator

Testing Correlation Coefficient Part 2

Feb 20, 2024
TUTORIALS

Testing Correlation Coefficient

Feb 20, 2024
TUTORIALS