Introduction to Chi-Square Tests
Chi-square tests (χ²) are used for analyzing categorical (nominal and ordinal) data. They test whether observed frequencies in categories differ significantly from expected frequencies. Chi-square tests are widely used in:
- Quality control (product defects by category)
- Market research (customer preferences by demographic)
- Medical research (disease prevalence by treatment group)
- Social sciences (behavior patterns across populations)
- Genetics (phenotype ratios in offspring)
Key Characteristics
- Tests categorical/nominal data (not continuous)
- Based on frequency counts, not raw data
- Uses chi-square distribution (χ²)
- Non-parametric test (no normality assumption)
- Always right-tailed test
Section 1: Chi-Square Distribution
Properties of Chi-Square Distribution
The chi-square distribution is a probability distribution with:
- Only positive values: χ² ≥ 0 (never negative)
- Right-skewed: Long tail to the right
- One parameter: Degrees of freedom (df)
- Approaches normal: As df increases, distribution becomes more symmetric
Degrees of Freedom (df)
-
Goodness of Fit: df = k - 1 - p
- k = number of categories
- p = number of parameters estimated from data
- Often: df = k - 1 (if no parameters estimated)
-
Test of Independence: df = (r - 1)(c - 1)
- r = number of rows
- c = number of columns
Critical Values
Common critical values at α = 0.05:
| df | χ² Critical Value |
|---|---|
| 1 | 3.841 |
| 2 | 5.991 |
| 3 | 7.815 |
| 4 | 9.488 |
| 5 | 11.070 |
| 6 | 12.592 |
| 7 | 14.067 |
| 10 | 18.307 |
Section 2: Chi-Square Goodness of Fit Test
Purpose
Tests whether categorical data fit an expected distribution. Compares observed frequencies (from data) to expected frequencies (from hypothesis).
Hypotheses
- H₀: The data follow the hypothesized distribution
- H₁: The data do NOT follow the hypothesized distribution
Assumptions
- Random sample: Data randomly collected
- Independent observations: Each observation independent
- Expected frequencies: At least 80% of categories have expected frequency ≥ 5
- No more than 20% of categories have expected frequency < 5
- No category has expected frequency < 1
Formula
$$\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}$$
Where:
- O_i = Observed frequency in category i
- E_i = Expected frequency in category i
- k = Number of categories
- df = k - 1 (or k - 1 - p if estimating parameters)
Calculation Steps
Step 1: State Hypotheses
- H₀: Data follow hypothesized distribution
- H₁: Data don’t follow hypothesized distribution
- Set α (typically 0.05)
Step 2: Calculate Expected Frequencies For uniform distribution: $$E_i = \frac{n}{k}$$
For specific distribution (e.g., multinomial): $$E_i = n \times p_i$$
Step 3: Check Assumptions
- Verify at least 80% of expected frequencies ≥ 5
- No expected frequency < 1
Step 4: Calculate Chi-Square Statistic $$\chi^2 = \sum \frac{(O - E)^2}{E}$$
Step 5: Find Critical Value
- Use χ² table with df = k - 1
- Compare test statistic to critical value
Step 6: Decision
- If χ² > χ²_critical: Reject H₀
- If χ² ≤ χ²_critical: Fail to reject H₀
Example 1: Fairness of a Die
Problem: Is a die fair? Roll it 600 times. Expected: 100 per face.
Observed Results:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed | 85 | 112 | 98 | 105 | 106 | 94 |
| Expected | 100 | 100 | 100 | 100 | 100 | 100 |
Solution:
Calculate chi-square: $$\chi^2 = \frac{(85-100)^2}{100} + \frac{(112-100)^2}{100} + \frac{(98-100)^2}{100}$$ $$+ \frac{(105-100)^2}{100} + \frac{(106-100)^2}{100} + \frac{(94-100)^2}{100}$$
$$= \frac{225 + 144 + 4 + 25 + 36 + 36}{100} = \frac{470}{100} = 4.70$$
Degrees of freedom: df = 6 - 1 = 5
Critical value at α = 0.05: χ²₀.₀₅(5) = 11.070
Decision: χ² = 4.70 < 11.070, fail to reject H₀
Conclusion: No evidence that the die is unfair (p ≈ 0.45). The observed frequencies are consistent with a fair die.
Example 2: Customer Preference
Problem: A store manager thinks customers prefer products equally. Survey 400 customers:
| Product | A | B | C | D |
|---|---|---|---|---|
| Preferred | 120 | 85 | 95 | 100 |
| Expected | 100 | 100 | 100 | 100 |
Solution:
$$\chi^2 = \frac{(120-100)^2}{100} + \frac{(85-100)^2}{100} + \frac{(95-100)^2}{100} + \frac{(100-100)^2}{100}$$
$$= \frac{400 + 225 + 25 + 0}{100} = 6.50$$
df = 4 - 1 = 3 Critical value: χ²₀.₀₅(3) = 7.815
Decision: χ² = 6.50 < 7.815, fail to reject H₀
Conclusion: Insufficient evidence that preferences differ (p ≈ 0.09). Data consistent with equal preference assumption.
Section 3: Chi-Square Test of Independence
Purpose
Tests whether two categorical variables are independent (unrelated). Uses data arranged in a contingency table.
Hypotheses
- H₀: Variables A and B are independent
- H₁: Variables A and B are NOT independent (associated)
Contingency Tables
A contingency table displays joint frequencies for two categorical variables.
2×2 Contingency Table Example:
| Outcome Yes | Outcome No | Total | |
|---|---|---|---|
| Treatment | a | b | a+b |
| Control | c | d | c+d |
| Total | a+c | b+d | n |
General r × c Table: Rows and columns for any number of categories
Formula
$$\chi^2 = \sum \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$$
Where:
- O_ij = Observed frequency in cell (i,j)
- E_ij = Expected frequency in cell (i,j)
- Summation over all rows (i) and columns (j)
Expected Frequencies
$$E_{ij} = \frac{(\text{Row } i \text{ total}) \times (\text{Column } j \text{ total})}{\text{Grand total}}$$
Calculation Steps
Step 1: Create Contingency Table
- Arrange observed frequencies
- Calculate row and column totals
Step 2: Calculate Expected Frequencies $$E_{ij} = \frac{R_i \times C_j}{n}$$
Step 3: Check Assumptions
- All expected frequencies ≥ 5 (or at least 80% ≥ 5)
Step 4: Calculate Chi-Square $$\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$$
Step 5: Find df and Critical Value
- df = (rows - 1)(columns - 1)
- Look up critical value
Step 6: Decision
- If χ² > χ²_critical: Reject H₀ (variables related)
- If χ² ≤ χ²_critical: Fail to reject H₀ (variables independent)
Example 1: Smoking and Lung Disease
Question: Is smoking status related to lung disease?
Contingency Table (Observed):
| Lung Disease | No Disease | Total | |
|---|---|---|---|
| Smoker | 150 | 250 | 400 |
| Non-Smoker | 80 | 420 | 500 |
| Total | 230 | 670 | 900 |
Solution:
Calculate expected frequencies:
$$E_{11} = \frac{400 \times 230}{900} = 102.22$$ $$E_{12} = \frac{400 \times 670}{900} = 297.78$$ $$E_{21} = \frac{500 \times 230}{900} = 127.78$$ $$E_{22} = \frac{500 \times 670}{900} = 372.22$$
Expected Table:
| Disease | No Disease | Total | |
|---|---|---|---|
| Smoker | 102.22 | 297.78 | 400 |
| Non-Smoker | 127.78 | 372.22 | 500 |
| Total | 230 | 670 | 900 |
Calculate chi-square: $$\chi^2 = \frac{(150-102.22)^2}{102.22} + \frac{(250-297.78)^2}{297.78}$$ $$+ \frac{(80-127.78)^2}{127.78} + \frac{(420-372.22)^2}{372.22}$$
$$= 22.41 + 6.10 + 17.91 + 6.12 = 52.54$$
Degrees of freedom: df = (2-1)(2-1) = 1
Critical value at α = 0.05: χ²₀.₀₅(1) = 3.841
Decision: χ² = 52.54 > 3.841, reject H₀
Conclusion: Strong evidence that smoking and lung disease are related (p < 0.001). Smokers have significantly higher disease prevalence.
Example 2: Product Quality by Manufacturer
Question: Does defect rate differ by manufacturer?
Contingency Table (Observed):
| Defective | Acceptable | Total | |
|---|---|---|---|
| Manufacturer A | 12 | 138 | 150 |
| Manufacturer B | 8 | 142 | 150 |
| Manufacturer C | 15 | 135 | 150 |
| Total | 35 | 405 | 440 |
Expected frequencies: (All equal for balanced design) $$E_{ij} = \frac{150 \times 35}{440} ≈ 11.93$$
Calculate chi-square: $$\chi^2 ≈ 0.54 + 0.32 + 0.31 + … ≈ 0.97$$
df = (3-1)(2-1) = 2 Critical value: χ²₀.₀₅(2) = 5.991
Decision: χ² = 0.97 < 5.991, fail to reject H₀
Conclusion: No significant difference in defect rates among manufacturers (p ≈ 0.62).
Section 4: Effect Sizes for Chi-Square Tests
Phi Coefficient (φ)
For 2×2 tables only:
$$\phi = \sqrt{\frac{\chi^2}{n}}$$
Interpretation (Cohen’s guidelines):
- φ ≈ 0.1: Small effect
- φ ≈ 0.3: Medium effect
- φ ≈ 0.5: Large effect
Cramér’s V
For r × c tables (any size):
$$V = \sqrt{\frac{\chi^2}{n(k-1)}}$$
Where k = min(rows - 1, columns - 1)
Interpretation:
- V ≈ 0.1: Small effect
- V ≈ 0.3: Medium effect
- V ≈ 0.5: Large effect
Example: Effect Size for Smoking Study
$$V = \sqrt{\frac{52.54}{900 \times (2-1)}} = \sqrt{\frac{52.54}{900}} = \sqrt{0.0584} = 0.242$$
This is a small to medium effect - the association is statistically significant and practically meaningful.
Section 5: Assumptions and Conditions
Required Assumptions
-
Independence: Observations are independent
- No observation counted twice
- Random sampling
-
Expected Frequency Requirement:
- Minimum: At least 80% of cells have expected ≥ 5
- Strict rule: No cell has expected < 5
- For 2×2: All cells should have expected ≥ 5
-
Adequate Sample Size:
- Generally n ≥ 20 for 2×2 tables
- Larger samples needed for larger tables
What if Assumptions Violated?
| Violation | Solution |
|---|---|
| Low expected frequencies | Combine categories, increase sample size |
| Dependent observations | Use different test accounting for dependence |
| Small 2×2 table (n < 20) | Use Fisher’s exact test |
Section 6: McNemar Test
Purpose
Tests related/paired categorical outcomes. Equivalent of paired t-test for categorical data.
Common Applications:
- Before-after binary outcomes
- Matched case-control studies
- Paired dichotomous measurements
Data Structure
| After: Yes | After: No | Total | |
|---|---|---|---|
| Before: Yes | a | b | a+b |
| Before: No | c | d | c+d |
| Total | a+c | b+d | n |
Only b and c (discordant pairs) matter for the test.
Formula
$$\chi^2 = \frac{(b-c)^2}{b+c}$$
With df = 1
Example
Question: Did an intervention change behavior?
| After: Yes | After: No | |
|---|---|---|
| Before: Yes | 45 | 12 |
| Before: No | 18 | 25 |
$$\chi^2 = \frac{(12-18)^2}{12+18} = \frac{36}{30} = 1.20$$
Critical value χ²₀.₀₅(1) = 3.841
Decision: χ² = 1.20 < 3.841, fail to reject H₀
Conclusion: No significant change in behavior (p ≈ 0.27).
Section 7: Residuals and Standardized Residuals
Understanding Residuals
Residuals show which cells contribute most to the chi-square statistic.
$$\text{Residual} = O - E$$
$$\text{Standardized Residual} = \frac{O - E}{\sqrt{E}}$$
Interpretation
Standardized residuals > |2| indicate substantial deviation from independence.
Example: In smoking study
- Smoker + Disease: Standardized residual = 4.73 (very high positive)
- More disease in smokers than expected
- Non-Smoker + Disease: Standardized residual = -4.24 (very high negative)
- Fewer disease in non-smokers than expected
Section 8: Reporting Chi-Square Results
Standard Format
“A chi-square test of independence was performed to examine the relationship between smoking status and lung disease. The results indicated a significant association, χ²(1) = 52.54, p < 0.001, V = 0.24 (small to medium effect).”
Components
- Test name: Chi-square test of independence/goodness of fit
- Degrees of freedom: (df)
- Test statistic: χ² value
- P-value: p = value
- Effect size: φ or V value
- Interpretation: What does it mean?
Full Example Report
“To test whether customer satisfaction depends on income level, a chi-square test of independence was conducted on a sample of 500 customers. The contingency table revealed that 72% of high-income customers were satisfied compared to 48% of low-income customers. This difference was statistically significant, χ²(1) = 24.18, p < 0.001, V = 0.22, suggesting a small to medium association between income and satisfaction.”
Section 9: Limitations and Alternatives
Limitations of Chi-Square
- Only tests independence/fit: Doesn’t measure strength directly (need effect size)
- Nominal data only: Loses information from ordinal/continuous data
- Sample size dependent: Large n can show significance with small effects
- Requires categories: Can’t analyze raw continuous data
- Group-level data: Can’t use individual-level patterns
Alternatives
| Situation | Alternative |
|---|---|
| Paired binary outcomes | McNemar test |
| Small 2×2 table (n < 20) | Fisher’s exact test |
| Many zero cells | Likelihood ratio test (G-test) |
| Ordered categories | Ordinal association tests |
| Multiple variables | Logistic regression |
Section 10: Common Mistakes and Pitfalls
Mistake 1: Using with Continuous Data
Problem: Applying chi-square to continuous measurements Solution: Either categorize (if appropriate) or use continuous test
Mistake 2: Ignoring Low Expected Frequencies
Problem: Violating assumption about minimum expected frequencies Solution: Check all expected ≥ 5; combine categories if needed
Mistake 3: Confusing Independence with Causation
Problem: Concluding causation from significant independence test Solution: Remember: association ≠ causation
Mistake 4: Over-interpretation of Large p-Values
Problem: Claiming “no relationship” from non-significant result Solution: State “insufficient evidence”; consider power
Mistake 5: Using Wrong Effect Size
Problem: Using Phi coefficient for non-2×2 table Solution: Use Cramér’s V for tables larger than 2×2
Section 11: Decision Tree for Categorical Data
Analyzing categorical data?
│
├─→ Testing ONE categorical variable against expected?
│ └─→ Chi-Square Goodness of Fit Test
│
└─→ Testing TWO categorical variables for independence?
├─→ 2×2 table with n < 20?
│ └─→ Fisher's Exact Test
│
├─→ Paired/matched outcomes?
│ └─→ McNemar Test
│
└─→ Standard contingency table?
└─→ Chi-Square Test of Independence
Interactive Chi-Square Calculator
[Calculator would be embedded here with tools for:]
- Goodness of fit test calculator
- Independence test calculator (contingency table input)
- Expected frequency calculator
- Critical value finder
- P-value calculator
- Effect size calculators (Phi, Cramér’s V)
- McNemar test calculator
Key Formulas Cheat Sheet
Chi-Square Test Statistic
$$\chi^2 = \sum \frac{(O - E)^2}{E}$$
Expected Frequency (Independence)
$$E_{ij} = \frac{R_i \times C_j}{n}$$
Degrees of Freedom
$$\text{Goodness of Fit: } df = k - 1$$ $$\text{Independence: } df = (r-1)(c-1)$$
Phi Coefficient (2×2 only)
$$\phi = \sqrt{\frac{\chi^2}{n}}$$
Cramér’s V (any table)
$$V = \sqrt{\frac{\chi^2}{n(k-1)}} \text{ where } k = \min(r-1, c-1)$$
McNemar Test
$$\chi^2 = \frac{(b-c)^2}{b+c}, \quad df = 1$$
Summary Table: Chi-Square Tests
| Test | Purpose | Data | df Formula | Effect Size |
|---|---|---|---|---|
| Goodness of Fit | Fit to distribution | 1 categorical | k - 1 | (not typical) |
| Independence | Association | 2 categorical | (r-1)(c-1) | V or φ |
| McNemar | Paired change | Paired binary | 1 | - |
Related Tests
Alternative Tests for Categorical Data:
- ANOVA - For continuous outcomes across groups
- Variance Tests - For equality of variance
Foundational Concepts:
- Hypothesis Testing Guide - Core principles
- Statistical Significance - Understanding p-values
- Type I and Type II Errors - Statistical errors explained
Related Parametric Tests:
Related Resources
Complementary Tests:
Foundational Topics:
Advanced Topics:
- Logistic Regression - For multiple categorical predictors
- Non-Parametric Tests - Alternatives to parametric tests
Next Steps
After mastering chi-square tests:
- ANOVA: Compare 3+ continuous groups
- Logistic Regression: Model categorical outcomes
- Non-Parametric Tests: Rank-based alternatives
- Multivariate Analysis: Multiple categorical variables
References
-
Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2018). Statistics for Business and Economics (14th ed.). Cengage Learning. - Chi-square tests for goodness-of-fit and independence tests with business applications.
-
NIST/SEMATECH. (2023). e-Handbook of Statistical Methods. Retrieved from https://www.itl.nist.gov/div898/handbook/ - Statistical procedures for chi-square testing and categorical data analysis.