Two sample proportion test
Suppose we want to compare two distinct populations $A$ and $B$ with respect to possessions of certain attribute among their members. Suppose take samples of sizes $n_1$ and $n_2$ from the population A and B respectively.
Let $X_1$ and $X_2$ be the observed number of successes i.e., number of units possessing the attributes, from the two samples respectively.
Then, $\hat{p}_1=\frac{X_1}{n_1}$
be the observed proportion of successes in the sample from population $A$.
$\hat{p}_2=\frac{X_2}{n_2}$
be the observed proportion of successes in the sample from population $B$. The pooled estimate of sample proportion is $\hat{p} =\dfrac{X_1 +X_2}{n_1 + n_2}$
.
Assumptions
Assumptions for testing a proportion are as follows:
a. The samples are random samples.
b. The sample data are independent of one another.
c. The populations are normally or approximately normally distributed and the sample sizes are less than 30.
Step by Step Procedure
We wish to test the null hypothesis $H_0 : p_1 = p_2$, i.e., the two proportions do not differ significantly.
The standard error of difference between two proportions is
$$ \begin{aligned} SE(\hat{p}_1-\hat{p}_2) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}} \end{aligned} $$
where $\hat{p} =\dfrac{X_1 +X_2}{n_1 + n_2}$ is the pooled estimate of sample proportion.
The step by step hypothesis testing procedure is as follows:
Step 1 State the hypothesis testing problem
The hypothesis testing problem can be structured in any one of the three situations as follows:
Situation | Hypothesis Testing Problem |
---|---|
Situation A : | $H_0: p_1=p_2$ against $H_a : p_1 < p_2$ (Left-tailed) |
Situation B : | $H_0: p_1=p_2$ against $H_a : p_1 > p_2$ (Right-tailed) |
Situation C : | $H_0: p_1=p_2$ against $H_a : p_1 \neq p_2$ (Two-tailed) |
Step 2 Define the test statistic
The test statistic for testing above hypothesis is
$$ \begin{aligned} Z & = \frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{SE(\hat{p}_1-\hat{p}_2)}\\\ & = \frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}} \end{aligned} $$
The test statistic $Z$ follows standard normal distribution $N(0,1)$.
Step 3 Specify the level of significance $\alpha$
Step 4 Determine the critical values
For the specified value of $\alpha$ determine the critical region depending upon the alternative hypothesis.
-
For left-tailed alternative hypothesis: Find the $Z$-critical value using
$$ \begin{aligned} P(Z<-Z_\alpha) &= \alpha. \end{aligned} $$
-
For right-tailed alternative hypothesis: $Z_\alpha$.
$$ \begin{aligned} P(Z>Z_\alpha) &= \alpha. \end{aligned} $$
-
For two-tailed alternative hypothesis: $Z_{\alpha/2}$.
$$ \begin{aligned} P(|Z|> Z_{\alpha/2}) &= \alpha. \end{aligned} $$
Step 5 Computation
Compute the test statistic under the null hypothesis $H_0$ using equation
$$ \begin{aligned} Z_{obs} & = \frac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}} \end{aligned} $$
Step 6 Decision (Traditional Approach)
It is based on the critical values.
- For left-tailed alternative hypothesis: Reject $H_0$ if
$Z_{obs}\leq -Z_\alpha$
. - For right-tailed alternative hypothesis: Reject $H_0$ if
$Z_{obs}\geq Z_\alpha$
. - For two-tailed alternative hypothesis: Reject $H_0$ if
$|Z_{obs}|\geq Z_{\alpha/2}$
.
OR
Step 6 Decision ($p$-value Approach)
It is based on the $p$-value.
Alternative Hypothesis | Type of Hypothesis | $p$-value |
---|---|---|
$H_a: p_1 < p_2$ | Left-tailed | $p$-value $= P(Z\leq Z_{obs})$ |
$H_a: p_1>p_2$ | Right-tailed | $p$-value $= P(Z\geq Z_{obs})$ |
$H_a: p_1\neq p_2$ | Two-tailed | $p$-value $= 2P(Z\geq abs(Z_{obs}))$ |
If $p$-value is less than $\alpha$, then reject the null hypothesis $H_0$ at $\alpha$ level of significance, otherwise fail to reject $H_0$ at $\alpha$ level of significance.