Testing Significance of Linear Relationship
A test of significance for a linear relationship between the variables $x$ and $y$ can be performed using the sample correlation coefficient $r_{xy}$.
Example 1
For a sample of eight bears, researchers measured the distances around the bears’ chests and weighed the bears. The Sample correlation coefficient between the chest size and weight of bears is $r=0.744$ for $n=8$ bears. Using $\alpha=0.05$, determine if there is a positive linear correlation between chest size and weight.
Solution
Given that $n = 8$ pair of observations, sample correlation coefficient is $r= 0.744$.
Step 1 Hypothesis Testing Problem
The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho > 0$ ($\text{right-tailed}$)
Step 2 Test Statistic
The test statistic is
$$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$
which follows $t$ distribution with $n-2$ degrees of freedom.
Step 3 Significance Level
The significance level is $\alpha = 0.05$.
Step 4 Critical Value(s)
As the alternative hypothesis is $\text{right-tailed}$, the critical value of $t$ $\text{is}$ $1.943$.
The rejection region (i.e. critical region) is $\text{t > 1.943}$.
Step 5 Computation
The test statistic under the null hypothesis is
$$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{0.744}{\sqrt{1-0.744^2}}\sqrt{8 -2}\\ &= 2.727 \end{aligned} $$
Step 6 Decision (Traditional Approach)
The test statistic is $t =2.727$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis.
OR
Step 6 Decision ($p$-value Approach)
This is a $\text{right-tailed}$ test, so the p-value is the area to the right of the test statistic ($t=2.727$) is p-value = $0.0172$.
The p-value is $0.0172$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis.
Interpretation
There is sufficient evidence to conclude that there is a significant positive linear relationship between chest size and weight of bears.
Example 2
Following is the data about the demand and price of a commodity for 8 periods.
Demand | 16 | 20 | 18 | 21 | 13 | 15 | 17 | 22 |
---|---|---|---|---|---|---|---|---|
Price | 10 | 8 | 12 | 6 | 13 | 9 | 11 | 7 |
It was expected to estimate a linear regression for demand and price of a commodity.
Test whether there is a significant negative relationship between price and demand of a product.
Solution
Let $x$ denote the price of a commodity and $y$ denote the demand of a commodity.
The number of pairs $n= 8$.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 10 | 16 | 100 | 256 | 160 |
2 | 8 | 20 | 64 | 400 | 160 |
3 | 12 | 18 | 144 | 324 | 216 |
4 | 6 | 21 | 36 | 441 | 126 |
5 | 13 | 13 | 169 | 169 | 169 |
6 | 9 | 15 | 81 | 225 | 135 |
7 | 11 | 17 | 121 | 289 | 187 |
8 | 7 | 22 | 49 | 484 | 154 |
Total | 76 | 142 | 764 | 2588 | 1307 |
The sample variance of $x$ is
$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(764-\frac{(76)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(764-\frac{5776}{8}\bigg)\\ &= \frac{1}{7}\bigg(764-722\bigg)\\ &= \frac{42}{7}\\ &= 6. \end{aligned} $$
The sample variance of $x$ is
$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(2588-\frac{(142)^2}{8}\bigg)\\ &= \frac{1}{7}\bigg(2588-\frac{20164}{8}\bigg)\\ &= \frac{1}{7}\bigg(2588-2520.5\bigg)\\ &= \frac{67.5}{7}\\ &= 9.6429. \end{aligned} $$
The sample covariance between $x$ and $y$ is
$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{8-1}\bigg(1307-\frac{(76)(142)}{8}\bigg)\\ &= \frac{1}{7}\bigg(1307-\frac{10792}{8}\bigg)\\ &= \frac{1}{7}\bigg(1307-1349\bigg)\\ &= \frac{-42}{7}\\ &= -6. \end{aligned} $$
The Karl Pearson’s sample correlation coefficient between price of a commodity and demand of a commodity is
$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-6}{\sqrt{6\times 9.6429}}\\ &=\frac{-6}{\sqrt{57.8574}}\\ &=-0.789. \end{aligned} $$
The correlation coefficient between price of a commodity and demand of a commodity is $-0.789$.
Step 1 Hypothesis Testing Problem
The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho < 0$ ($\text{left-tailed}$)
Step 2 Test Statistic
The test statistic is
$$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$
which follows $t$ distribution with $n-2$ degrees of freedom.
Step 3 Significance Level
The significance level is $\alpha = 0.05$.
Step 4 Critical Value(s)
As the alternative hypothesis is $\text{left-tailed}$, the critical value of $t$ $\text{is}$ $-1.943$.
The rejection region (i.e. critical region) is $\text{t < -1.943}$.
Step 5 Computation
The test statistic under the null hypothesis is
$$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{-0.789}{\sqrt{1--0.789^2}}\sqrt{8 -2}\\ &= -3.146 \end{aligned} $$
Step 6 Decision (Traditional Approach)
The test statistic is $t =-3.146$ which falls $\text{inside}$ the critical region, we $\text{reject}$ the null hypothesis.
OR
Step 6 Decision ($p$-value Approach)
This is a $\text{left-tailed}$ test, so the p-value is the area to the left of the test statistic ($t=-3.146$) is p-value = $0.01$.
The p-value is $0.01$ which is $\text{less than}$ the significance level of $\alpha = 0.05$, we $\text{reject}$ the null hypothesis.
Interpretation
There is sufficient evidence to conclude that there is a significant negative linear relationship between demand and price of a commodity.
Example 3
Following is the data about the exam scores of 10 randomly selected students and the number of hours they studied for the exam.
Hours studied | 4 | 5 | 6 | 9 | 10 | 8 | 7 | 3 | 8 | 5 |
---|---|---|---|---|---|---|---|---|---|---|
Exam score | 68 | 65 | 85 | 84 | 82 | 86 | 83 | 76 | 67 | 74 |
Test whether there is a significant correlation between hours studied and examination score. Use $\alpha=0.01$.
Solution
Let $x$ denote the hours studied and $y$ denote the exam score.
The number of pairs $n= 11$.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 4 | 68 | 16 | 4624 | 272 |
2 | 5 | 65 | 25 | 4225 | 325 |
3 | 6 | 85 | 36 | 7225 | 510 |
4 | 9 | 84 | 81 | 7056 | 756 |
5 | 10 | 62 | 100 | 3844 | 620 |
6 | 8 | 86 | 64 | 7396 | 688 |
7 | 10 | 83 | 100 | 6889 | 830 |
8 | 7 | 76 | 49 | 5776 | 532 |
9 | 3 | 67 | 9 | 4489 | 201 |
10 | 8 | 74 | 64 | 5476 | 592 |
11 | 5 | 69 | 25 | 4761 | 345 |
Total | 75 | 819 | 569 | 61761 | 5671 |
The sample variance of $x$ is
$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(569-\frac{(75)^2}{11}\bigg)\\ &= \frac{1}{10}\bigg(569-\frac{5625}{11}\bigg)\\ &= \frac{1}{10}\bigg(569-511.3636\bigg)\\ &= \frac{57.6364}{10}\\ &= 5.7636. \end{aligned} $$
The sample variance of $x$ is
$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(61761-\frac{(819)^2}{11}\bigg)\\ &= \frac{1}{10}\bigg(61761-\frac{670761}{11}\bigg)\\ &= \frac{1}{10}\bigg(61761-60978.2727\bigg)\\ &= \frac{782.7273}{10}\\ &= 78.2727. \end{aligned} $$
The sample covariance between $x$ and $y$ is
$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{11-1}\bigg(5671-\frac{(75)(819)}{11}\bigg)\\ &= \frac{1}{10}\bigg(5671-\frac{61425}{11}\bigg)\\ &= \frac{1}{10}\bigg(5671-5584.0909\bigg)\\ &= \frac{86.9091}{10}\\ &= 8.6909. \end{aligned} $$
The Karl Pearson’s sample correlation coefficient between hours studied and exam score is
$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{8.6909}{\sqrt{5.7636\times 78.2727}}\\ &=\frac{8.6909}{\sqrt{451.1325}}\\ &=0.409. \end{aligned} $$
The sample correlation coefficient between hours studied and exam score is $0.409$.
Step 1 Hypothesis Testing Problem
The hypothesis testing problem is $H_0 : \rho = 0$ against $H_1 : \rho \neq 0$ ($\text{two-tailed}$)
Step 2 Test Statistic
The test statistic is
$$ \begin{aligned} t& =\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \end{aligned} $$
which follows $t$ distribution with $n-2$ degrees of freedom.
Step 3 Significance Level
The significance level is $\alpha = 0.01$.
Step 4 Critical Value(s)
As the alternative hypothesis is $\text{two-tailed}$, the critical value of $t$ $\text{are}$ $-3.25 and 3.25$.
The rejection region (i.e. critical region) is $\text{t < -3.25 or t > 3.25}$.
Step 5 Computation
The test statistic under the null hypothesis is
$$ \begin{aligned} t&=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2}\\ &= \frac{0.409}{\sqrt{1-0.409^2}}\sqrt{11 -2}\\ &= 1.345 \end{aligned} $$
Step 6 Decision (Traditional Approach)
The test statistic is $t =1.345$ which falls $\text{outside}$ the critical region, we $\text{fail to reject}$ the null hypothesis.
OR
Step 6 Decision ($p$-value Approach)
This is a $\text{two-tailed}$ test, so the p-value is the twice the area to the right of the test statistic ($t=1.345$) is p-value = $0.2117$.
The p-value is $0.2117$ which is $\text{greater than}$ the significance level of $\alpha = 0.01$, we $\text{fail to reject}$ the null hypothesis.
Interpretation
There is insufficient evidence to conclude that there is a significant linear relationship between hours studied and examination score because the correlation coefficient between $x$ and $y$ is not significantly different from zero.