## Two sample proportion test

Suppose we want to compare two distinct populations $A$ and $B$ with respect to possessions of certain attribute among their members. Suppose take samples of sizes $n_1$ and $n_2$ from the population A and B respectively.

Let $X_1$ and $X_2$ be the observed number of successes i.e., number of units possessing the attributes, from the two samples respectively.

Then, `$\hat{p}_1=\frac{X_1}{n_1}$`

be the observed proportion of successes in the sample from population $A$.
`$\hat{p}_2=\frac{X_2}{n_2}$`

be the observed proportion of successes in the sample from population $B$. The pooled estimate of sample proportion is `$\hat{p} =\dfrac{X_1 +X_2}{n_1 + n_2}$`

.

## Assumptions

Assumptions for testing a proportion are as follows:

a. The samples are random samples.

b. The sample data are independent of one another.

c. The populations are normally or approximately normally distributed and the sample sizes are less than 30.

## Step by Step Procedure

We wish to test the null hypothesis $H_0 : p_1 = p_2$, i.e., the two proportions do not differ significantly.

The standard error of difference between two proportions is
```
$$
\begin{aligned}
SE(\hat{p}_1-\hat{p}_2) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}
\end{aligned}
$$
```

where $\hat{p} =\dfrac{X_1 +X_2}{n_1 + n_2}$ is the pooled estimate of sample proportion.

The step by step hypothesis testing procedure is as follows:

## Step 1 State the hypothesis testing problem

The hypothesis testing problem can be structured in any one of the three situations as follows:

Situation | Hypothesis Testing Problem |
---|---|

Situation A : | $H_0: p_1=p_2$ against $H_a : p_1 < p_2$ (Left-tailed) |

Situation B : | $H_0: p_1=p_2$ against $H_a : p_1 > p_2$ (Right-tailed) |

Situation C : | $H_0: p_1=p_2$ against $H_a : p_1 \neq p_2$ (Two-tailed) |

## Step 2 Define the test statistic

The test statistic for testing above hypothesis is
```
$$
\begin{aligned}
Z & = \frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{SE(\hat{p}_1-\hat{p}_2)}\\\
& = \frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}}
\end{aligned}
$$
```

The test statistic $Z$ follows standard normal distribution $N(0,1)$.

## Step 3 Specify the level of significance $\alpha$

## Step 4 Determine the critical values

For the specified value of $\alpha$ determine the critical region depending upon the alternative hypothesis.

- For
**left-tailed**alternative hypothesis: Find the $Z$-critical value using`$$ \begin{aligned} P(Z<-Z_\alpha) &= \alpha. \end{aligned} $$`

For

**right-tailed**alternative hypothesis: $Z_\alpha$.`$$ \begin{aligned} P(Z>Z_\alpha) &= \alpha. \end{aligned} $$`

For

**two-tailed**alternative hypothesis: $Z_{\alpha/2}$.`$$ \begin{aligned} P(|Z|> Z_{\alpha/2}) &= \alpha. \end{aligned} $$`

## Step 5 Computation

Compute the test statistic under the null hypothesis $H_0$ using equation
```
$$
\begin{aligned}
Z_{obs} & = \frac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}}
\end{aligned}
$$
```

## Step 6 Decision (Traditional Approach)

It is based on the critical values.

- For
**left-tailed**alternative hypothesis: Reject $H_0$ if`$Z_{obs}\leq -Z_\alpha$`

. - For
**right-tailed**alternative hypothesis: Reject $H_0$ if`$Z_{obs}\geq Z_\alpha$`

. - For
**two-tailed**alternative hypothesis: Reject $H_0$ if`$|Z_{obs}|\geq Z_{\alpha/2}$`

.

**OR**

## Step 6 Decision ($p$-value Approach)

It is based on the $p$-value.

Alternative Hypothesis | Type of Hypothesis | $p$-value |
---|---|---|

$H_a: p_1 < p_2$ | Left-tailed | $p$-value `$= P(Z\leq Z_{obs})$` |

$H_a: p_1>p_2$ | Right-tailed | $p$-value `$= P(Z\geq Z_{obs})$` |

$H_a: p_1\neq p_2$ | Two-tailed | $p$-value `$= 2P(Z\geq abs(Z_{obs}))$` |

If $p$-value is less than $\alpha$, then reject the null hypothesis $H_0$ at $\alpha$ level of significance, otherwise fail to reject $H_0$ at $\alpha$ level of significance.