## Testing Significance of Linear Relationship

A test of significance for a linear relationship between the variables $x$ and $y$ can be performed using the sample correlation coefficient $r_{xy}$.

## Example 1

For a sample of eight bears, researchers measured the distances around the bears’ chests and weighed the bears. The Sample correlation coefficient between the chest size and weight of bears is $r=0.744$ for $n=8$ bears. Using $\alpha=0.05$, determine if there is a positive linear correlation between chest size and weight.

### Solution

Given that $n = 8$ pair of observations, sample correlation coefficient is $r= 0.744$.

### Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho > 0$ ($\text{right-tailed}$)

### Step 2 Test Statistic

The test statistic is
`$$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$`

which follows $t$ distribution with $n-2$ degrees of freedom.

### Step 3 Significance Level

The significance level is $\alpha = 0.05$.

### Step 4 Critical Value(s)

As the alternative hypothesis is $\text{right-tailed}$, the critical value of $t$ $\text{is}$ $1.943$.

The rejection region (i.e. critical region) is $\text{t > 1.943}$.

### Step 5 Computation

The test statistic under the null hypothesis is
`$$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{0.744}{\sqrt{1-0.744^2}}\sqrt{8 -2}\\ &= 2.727 \end{aligned} $$`

### Step 6 Decision (Traditional Approach)

The test statistic is $t =2.727$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis.

OR

### Step 6 Decision ($p$-value Approach)

This is a $\text{right-tailed}$ test, so the p-value is the area to the right of the test statistic ($t=2.727$) is p-value = $0.0172$.

The p-value is $0.0172$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis.

### Interpretation

There is sufficient evidence to conclude that there is a significant positive linear relationship between chest size and weight of bears.

## Example 2

Following is the data about the demand and price of a commodity for 8 periods.

Demand | 16 | 20 | 18 | 21 | 13 | 15 | 17 | 22 |
---|---|---|---|---|---|---|---|---|

Price | 10 | 8 | 12 | 6 | 13 | 9 | 11 | 7 |

It was expected to estimate a linear regression for demand and price of a commodity.

Test whether there is a significant negative relationship between price and demand of a product.

### Solution

Let $x$ denote the price of a commodity and $y$ denote the demand of a commodity.

The number of pairs $n= 8$.

$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|

1 | 10 | 16 | 100 | 256 | 160 |

2 | 8 | 20 | 64 | 400 | 160 |

3 | 12 | 18 | 144 | 324 | 216 |

4 | 6 | 21 | 36 | 441 | 126 |

5 | 13 | 13 | 169 | 169 | 169 |

6 | 9 | 15 | 81 | 225 | 135 |

7 | 11 | 17 | 121 | 289 | 187 |

8 | 7 | 22 | 49 | 484 | 154 |

Total | 76 | 142 | 764 | 2588 | 1307 |

The sample variance of $x$ is

`$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(764-\frac{(76)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(764-\frac{5776}{8}\bigg)\\ &= \frac{1}{7}\bigg(764-722\bigg)\\ &= \frac{42}{7}\\ &= 6. \end{aligned} $$`

The sample variance of $x$ is

`$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(2588-\frac{(142)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(2588-\frac{20164}{8}\bigg)\\ &= \frac{1}{7}\bigg(2588-2520.5\bigg)\\ &= \frac{67.5}{7}\\ &= 9.6429. \end{aligned} $$`

The sample covariance between $x$ and $y$ is

`$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(1307-\frac{(76)(142)}{8}\bigg)\\ &= \frac{1}{7}\bigg(1307-\frac{10792}{8}\bigg)\\ &= \frac{1}{7}\bigg(1307-1349\bigg)\\ &= \frac{-42}{7}\\ &= -6. \end{aligned} $$`

The Karl Pearson’s sample correlation coefficient between **price of a commodity** and **demand of a commodity** is

`$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-6}{\sqrt{6\times 9.6429}}\\ &=\frac{-6}{\sqrt{57.8574}}\\ &=-0.789. \end{aligned} $$`

The correlation coefficient between **price of a commodity** and **demand of a commodity** is $-0.789$.

### Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho < 0$ ($\text{left-tailed}$)

### Step 2 Test Statistic

The test statistic is
`$$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$`

which follows $t$ distribution with $n-2$ degrees of freedom.

### Step 3 Significance Level

The significance level is $\alpha = 0.05$.

### Step 4 Critical Value(s)

As the alternative hypothesis is $\text{left-tailed}$, the critical value of $t$ $\text{is}$ $-1.943$.

The rejection region (i.e. critical region) is $\text{t < -1.943}$.

### Step 5 Computation

The test statistic under the null hypothesis is
`$$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{-0.789}{\sqrt{1--0.789^2}}\sqrt{8 -2}\\ &= -3.146 \end{aligned} $$`

### Step 6 Decision (Traditional Approach)

The test statistic is $t =-3.146$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis.

OR

### Step 6 Decision ($p$-value Approach)

This is a $\text{left-tailed}$ test, so the p-value is the area to the left of the test statistic ($t=-3.146$) is p-value = $0.01$.

The p-value is $0.01$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis.

### Interpretation

There is sufficient evidence to conclude that there is a significant negative linear relationship between demand and price of a commodity.

## Example 3

Following is the data about the exam scores of 10 randomly selected students and the number of hours they studied for the exam.

Hours studied | 4 | 5 | 6 | 9 | 10 | 8 | 7 | 3 | 8 | 5 |
---|---|---|---|---|---|---|---|---|---|---|

Exam score | 68 | 65 | 85 | 84 | 82 | 86 | 83 | 76 | 67 | 74 |

Test whether there is a significant correlation between hours studied and examination score. Use $\alpha=0.01$.

### Solution

Let $x$ denote the hours studied and $y$ denote the exam score.

The number of pairs $n= 11$.

$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|

1 | 4 | 68 | 16 | 4624 | 272 |

2 | 5 | 65 | 25 | 4225 | 325 |

3 | 6 | 85 | 36 | 7225 | 510 |

4 | 9 | 84 | 81 | 7056 | 756 |

5 | 10 | 62 | 100 | 3844 | 620 |

6 | 8 | 86 | 64 | 7396 | 688 |

7 | 10 | 83 | 100 | 6889 | 830 |

8 | 7 | 76 | 49 | 5776 | 532 |

9 | 3 | 67 | 9 | 4489 | 201 |

10 | 8 | 74 | 64 | 5476 | 592 |

11 | 5 | 69 | 25 | 4761 | 345 |

Total | 75 | 819 | 569 | 61761 | 5671 |

The sample variance of $x$ is

`$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(569-\frac{(75)^2}{11}\bigg)\\ &= \frac{1}{10}\bigg(569-\frac{5625}{11}\bigg)\\ &= \frac{1}{10}\bigg(569-511.3636\bigg)\\ &= \frac{57.6364}{10}\\ &= 5.7636. \end{aligned} $$`

The sample variance of $x$ is

`$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(61761-\frac{(819)^2}{11}\bigg)\\ &= \frac{1}{10}\bigg(61761-\frac{670761}{11}\bigg)\\ &= \frac{1}{10}\bigg(61761-60978.2727\bigg)\\ &= \frac{782.7273}{10}\\ &= 78.2727. \end{aligned} $$`

The sample covariance between $x$ and $y$ is

`$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(5671-\frac{(75)(819)}{11}\bigg)\\ &= \frac{1}{10}\bigg(5671-\frac{61425}{11}\bigg)\\ &= \frac{1}{10}\bigg(5671-5584.0909\bigg)\\ &= \frac{86.9091}{10}\\ &= 8.6909. \end{aligned} $$`

The Karl Pearson’s sample correlation coefficient between **hours studied** and **exam score** is

`$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{8.6909}{\sqrt{5.7636\times 78.2727}}\\ &=\frac{8.6909}{\sqrt{451.1325}}\\ &=0.409. \end{aligned} $$`

The sample correlation coefficient between **hours studied** and **exam score** is $0.409$.

### Step 1 Hypothesis Testing Problem

The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho \neq 0$ ($\text{two-tailed}$)

### Step 2 Test Statistic

The test statistic is
`$$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$`

which follows $t$ distribution with $n-2$ degrees of freedom.

### Step 3 Significance Level

The significance level is $\alpha = 0.01$.

### Step 4 Critical Value(s)

As the alternative hypothesis is $\text{two-tailed}$, the critical value of $t$ $\text{are}$ $-3.25 and 3.25$.

The rejection region (i.e. critical region) is $\text{t < -3.25 or t > 3.25}$.

### Step 5 Computation

The test statistic under the null hypothesis is
`$$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{0.409}{\sqrt{1-0.409^2}}\sqrt{11 -2}\\ &= 1.345 \end{aligned} $$`

### Step 6 Decision (Traditional Approach)

The test statistic is $t =1.345$ which falls $\text{outside}$ the critical region, we $\text{fail to reject}$ the null hypothesis.

OR

### Step 6 Decision ($p$-value Approach)

This is a $\text{two-tailed}$ test, so the p-value is the twice the area to the right of the test statistic ($t=1.345$) is p-value = $0.2117$.

The p-value is $0.2117$ which is $\text{greater than}$ the significance level of $\alpha = 0.01$, we $\text{fail to reject}$ the null hypothesis.

### Interpretation

There is insufficient evidence to conclude that there is a significant linear relationship between hours studied and examination score because the correlation coefficient between $x$ and $y$ is not significantly different from zero.