Introduction to Chi-Square Tests

Chi-square tests (χ²) are used for analyzing categorical (nominal and ordinal) data. They test whether observed frequencies in categories differ significantly from expected frequencies. Chi-square tests are widely used in:

  • Quality control (product defects by category)
  • Market research (customer preferences by demographic)
  • Medical research (disease prevalence by treatment group)
  • Social sciences (behavior patterns across populations)
  • Genetics (phenotype ratios in offspring)

Key Characteristics

  • Tests categorical/nominal data (not continuous)
  • Based on frequency counts, not raw data
  • Uses chi-square distribution (χ²)
  • Non-parametric test (no normality assumption)
  • Always right-tailed test

Section 1: Chi-Square Distribution

Properties of Chi-Square Distribution

The chi-square distribution is a probability distribution with:

  1. Only positive values: χ² ≥ 0 (never negative)
  2. Right-skewed: Long tail to the right
  3. One parameter: Degrees of freedom (df)
  4. Approaches normal: As df increases, distribution becomes more symmetric

Degrees of Freedom (df)

  • Goodness of Fit: df = k - 1 - p

    • k = number of categories
    • p = number of parameters estimated from data
    • Often: df = k - 1 (if no parameters estimated)
  • Test of Independence: df = (r - 1)(c - 1)

    • r = number of rows
    • c = number of columns

Critical Values

Common critical values at α = 0.05:

df χ² Critical Value
1 3.841
2 5.991
3 7.815
4 9.488
5 11.070
6 12.592
7 14.067
10 18.307

Section 2: Chi-Square Goodness of Fit Test

Purpose

Tests whether categorical data fit an expected distribution. Compares observed frequencies (from data) to expected frequencies (from hypothesis).

Hypotheses

  • H₀: The data follow the hypothesized distribution
  • H₁: The data do NOT follow the hypothesized distribution

Assumptions

  1. Random sample: Data randomly collected
  2. Independent observations: Each observation independent
  3. Expected frequencies: At least 80% of categories have expected frequency ≥ 5
    • No more than 20% of categories have expected frequency < 5
    • No category has expected frequency < 1

Formula

$$\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}$$

Where:

  • O_i = Observed frequency in category i
  • E_i = Expected frequency in category i
  • k = Number of categories
  • df = k - 1 (or k - 1 - p if estimating parameters)

Calculation Steps

Step 1: State Hypotheses

  • H₀: Data follow hypothesized distribution
  • H₁: Data don’t follow hypothesized distribution
  • Set α (typically 0.05)

Step 2: Calculate Expected Frequencies For uniform distribution: $$E_i = \frac{n}{k}$$

For specific distribution (e.g., multinomial): $$E_i = n \times p_i$$

Step 3: Check Assumptions

  • Verify at least 80% of expected frequencies ≥ 5
  • No expected frequency < 1

Step 4: Calculate Chi-Square Statistic $$\chi^2 = \sum \frac{(O - E)^2}{E}$$

Step 5: Find Critical Value

  • Use χ² table with df = k - 1
  • Compare test statistic to critical value

Step 6: Decision

  • If χ² > χ²_critical: Reject H₀
  • If χ² ≤ χ²_critical: Fail to reject H₀

Example 1: Fairness of a Die

Problem: Is a die fair? Roll it 600 times. Expected: 100 per face.

Observed Results:

Face 1 2 3 4 5 6
Observed 85 112 98 105 106 94
Expected 100 100 100 100 100 100

Solution:

Calculate chi-square: $$\chi^2 = \frac{(85-100)^2}{100} + \frac{(112-100)^2}{100} + \frac{(98-100)^2}{100}$$ $$+ \frac{(105-100)^2}{100} + \frac{(106-100)^2}{100} + \frac{(94-100)^2}{100}$$

$$= \frac{225 + 144 + 4 + 25 + 36 + 36}{100} = \frac{470}{100} = 4.70$$

Degrees of freedom: df = 6 - 1 = 5

Critical value at α = 0.05: χ²₀.₀₅(5) = 11.070

Decision: χ² = 4.70 < 11.070, fail to reject H₀

Conclusion: No evidence that the die is unfair (p ≈ 0.45). The observed frequencies are consistent with a fair die.

Example 2: Customer Preference

Problem: A store manager thinks customers prefer products equally. Survey 400 customers:

Product A B C D
Preferred 120 85 95 100
Expected 100 100 100 100

Solution:

$$\chi^2 = \frac{(120-100)^2}{100} + \frac{(85-100)^2}{100} + \frac{(95-100)^2}{100} + \frac{(100-100)^2}{100}$$

$$= \frac{400 + 225 + 25 + 0}{100} = 6.50$$

df = 4 - 1 = 3 Critical value: χ²₀.₀₅(3) = 7.815

Decision: χ² = 6.50 < 7.815, fail to reject H₀

Conclusion: Insufficient evidence that preferences differ (p ≈ 0.09). Data consistent with equal preference assumption.


Section 3: Chi-Square Test of Independence

Purpose

Tests whether two categorical variables are independent (unrelated). Uses data arranged in a contingency table.

Hypotheses

  • H₀: Variables A and B are independent
  • H₁: Variables A and B are NOT independent (associated)

Contingency Tables

A contingency table displays joint frequencies for two categorical variables.

2×2 Contingency Table Example:

Outcome Yes Outcome No Total
Treatment a b a+b
Control c d c+d
Total a+c b+d n

General r × c Table: Rows and columns for any number of categories

Formula

$$\chi^2 = \sum \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$$

Where:

  • O_ij = Observed frequency in cell (i,j)
  • E_ij = Expected frequency in cell (i,j)
  • Summation over all rows (i) and columns (j)

Expected Frequencies

$$E_{ij} = \frac{(\text{Row } i \text{ total}) \times (\text{Column } j \text{ total})}{\text{Grand total}}$$

Calculation Steps

Step 1: Create Contingency Table

  • Arrange observed frequencies
  • Calculate row and column totals

Step 2: Calculate Expected Frequencies $$E_{ij} = \frac{R_i \times C_j}{n}$$

Step 3: Check Assumptions

  • All expected frequencies ≥ 5 (or at least 80% ≥ 5)

Step 4: Calculate Chi-Square $$\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$$

Step 5: Find df and Critical Value

  • df = (rows - 1)(columns - 1)
  • Look up critical value

Step 6: Decision

  • If χ² > χ²_critical: Reject H₀ (variables related)
  • If χ² ≤ χ²_critical: Fail to reject H₀ (variables independent)

Example 1: Smoking and Lung Disease

Question: Is smoking status related to lung disease?

Contingency Table (Observed):

Lung Disease No Disease Total
Smoker 150 250 400
Non-Smoker 80 420 500
Total 230 670 900

Solution:

Calculate expected frequencies:

$$E_{11} = \frac{400 \times 230}{900} = 102.22$$ $$E_{12} = \frac{400 \times 670}{900} = 297.78$$ $$E_{21} = \frac{500 \times 230}{900} = 127.78$$ $$E_{22} = \frac{500 \times 670}{900} = 372.22$$

Expected Table:

Disease No Disease Total
Smoker 102.22 297.78 400
Non-Smoker 127.78 372.22 500
Total 230 670 900

Calculate chi-square: $$\chi^2 = \frac{(150-102.22)^2}{102.22} + \frac{(250-297.78)^2}{297.78}$$ $$+ \frac{(80-127.78)^2}{127.78} + \frac{(420-372.22)^2}{372.22}$$

$$= 22.41 + 6.10 + 17.91 + 6.12 = 52.54$$

Degrees of freedom: df = (2-1)(2-1) = 1

Critical value at α = 0.05: χ²₀.₀₅(1) = 3.841

Decision: χ² = 52.54 > 3.841, reject H₀

Conclusion: Strong evidence that smoking and lung disease are related (p < 0.001). Smokers have significantly higher disease prevalence.

Example 2: Product Quality by Manufacturer

Question: Does defect rate differ by manufacturer?

Contingency Table (Observed):

Defective Acceptable Total
Manufacturer A 12 138 150
Manufacturer B 8 142 150
Manufacturer C 15 135 150
Total 35 405 440

Expected frequencies: (All equal for balanced design) $$E_{ij} = \frac{150 \times 35}{440} ≈ 11.93$$

Calculate chi-square: $$\chi^2 ≈ 0.54 + 0.32 + 0.31 + … ≈ 0.97$$

df = (3-1)(2-1) = 2 Critical value: χ²₀.₀₅(2) = 5.991

Decision: χ² = 0.97 < 5.991, fail to reject H₀

Conclusion: No significant difference in defect rates among manufacturers (p ≈ 0.62).


Section 4: Effect Sizes for Chi-Square Tests

Phi Coefficient (φ)

For 2×2 tables only:

$$\phi = \sqrt{\frac{\chi^2}{n}}$$

Interpretation (Cohen’s guidelines):

  • φ ≈ 0.1: Small effect
  • φ ≈ 0.3: Medium effect
  • φ ≈ 0.5: Large effect

Cramér’s V

For r × c tables (any size):

$$V = \sqrt{\frac{\chi^2}{n(k-1)}}$$

Where k = min(rows - 1, columns - 1)

Interpretation:

  • V ≈ 0.1: Small effect
  • V ≈ 0.3: Medium effect
  • V ≈ 0.5: Large effect

Example: Effect Size for Smoking Study

$$V = \sqrt{\frac{52.54}{900 \times (2-1)}} = \sqrt{\frac{52.54}{900}} = \sqrt{0.0584} = 0.242$$

This is a small to medium effect - the association is statistically significant and practically meaningful.


Section 5: Assumptions and Conditions

Required Assumptions

  1. Independence: Observations are independent

    • No observation counted twice
    • Random sampling
  2. Expected Frequency Requirement:

    • Minimum: At least 80% of cells have expected ≥ 5
    • Strict rule: No cell has expected < 5
    • For 2×2: All cells should have expected ≥ 5
  3. Adequate Sample Size:

    • Generally n ≥ 20 for 2×2 tables
    • Larger samples needed for larger tables

What if Assumptions Violated?

Violation Solution
Low expected frequencies Combine categories, increase sample size
Dependent observations Use different test accounting for dependence
Small 2×2 table (n < 20) Use Fisher’s exact test

Section 6: McNemar Test

Purpose

Tests related/paired categorical outcomes. Equivalent of paired t-test for categorical data.

Common Applications:

  • Before-after binary outcomes
  • Matched case-control studies
  • Paired dichotomous measurements

Data Structure

After: Yes After: No Total
Before: Yes a b a+b
Before: No c d c+d
Total a+c b+d n

Only b and c (discordant pairs) matter for the test.

Formula

$$\chi^2 = \frac{(b-c)^2}{b+c}$$

With df = 1

Example

Question: Did an intervention change behavior?

After: Yes After: No
Before: Yes 45 12
Before: No 18 25

$$\chi^2 = \frac{(12-18)^2}{12+18} = \frac{36}{30} = 1.20$$

Critical value χ²₀.₀₅(1) = 3.841

Decision: χ² = 1.20 < 3.841, fail to reject H₀

Conclusion: No significant change in behavior (p ≈ 0.27).


Section 7: Residuals and Standardized Residuals

Understanding Residuals

Residuals show which cells contribute most to the chi-square statistic.

$$\text{Residual} = O - E$$

$$\text{Standardized Residual} = \frac{O - E}{\sqrt{E}}$$

Interpretation

Standardized residuals > |2| indicate substantial deviation from independence.

Example: In smoking study

  • Smoker + Disease: Standardized residual = 4.73 (very high positive)
    • More disease in smokers than expected
  • Non-Smoker + Disease: Standardized residual = -4.24 (very high negative)
    • Fewer disease in non-smokers than expected

Section 8: Reporting Chi-Square Results

Standard Format

“A chi-square test of independence was performed to examine the relationship between smoking status and lung disease. The results indicated a significant association, χ²(1) = 52.54, p < 0.001, V = 0.24 (small to medium effect).”

Components

  1. Test name: Chi-square test of independence/goodness of fit
  2. Degrees of freedom: (df)
  3. Test statistic: χ² value
  4. P-value: p = value
  5. Effect size: φ or V value
  6. Interpretation: What does it mean?

Full Example Report

“To test whether customer satisfaction depends on income level, a chi-square test of independence was conducted on a sample of 500 customers. The contingency table revealed that 72% of high-income customers were satisfied compared to 48% of low-income customers. This difference was statistically significant, χ²(1) = 24.18, p < 0.001, V = 0.22, suggesting a small to medium association between income and satisfaction.”


Section 9: Limitations and Alternatives

Limitations of Chi-Square

  1. Only tests independence/fit: Doesn’t measure strength directly (need effect size)
  2. Nominal data only: Loses information from ordinal/continuous data
  3. Sample size dependent: Large n can show significance with small effects
  4. Requires categories: Can’t analyze raw continuous data
  5. Group-level data: Can’t use individual-level patterns

Alternatives

Situation Alternative
Paired binary outcomes McNemar test
Small 2×2 table (n < 20) Fisher’s exact test
Many zero cells Likelihood ratio test (G-test)
Ordered categories Ordinal association tests
Multiple variables Logistic regression

Section 10: Common Mistakes and Pitfalls

Mistake 1: Using with Continuous Data

Problem: Applying chi-square to continuous measurements Solution: Either categorize (if appropriate) or use continuous test

Mistake 2: Ignoring Low Expected Frequencies

Problem: Violating assumption about minimum expected frequencies Solution: Check all expected ≥ 5; combine categories if needed

Mistake 3: Confusing Independence with Causation

Problem: Concluding causation from significant independence test Solution: Remember: association ≠ causation

Mistake 4: Over-interpretation of Large p-Values

Problem: Claiming “no relationship” from non-significant result Solution: State “insufficient evidence”; consider power

Mistake 5: Using Wrong Effect Size

Problem: Using Phi coefficient for non-2×2 table Solution: Use Cramér’s V for tables larger than 2×2


Section 11: Decision Tree for Categorical Data

Analyzing categorical data?
│
├─→ Testing ONE categorical variable against expected?
│   └─→ Chi-Square Goodness of Fit Test
│
└─→ Testing TWO categorical variables for independence?
    ├─→ 2×2 table with n < 20?
    │   └─→ Fisher's Exact Test
    │
    ├─→ Paired/matched outcomes?
    │   └─→ McNemar Test
    │
    └─→ Standard contingency table?
        └─→ Chi-Square Test of Independence

Interactive Chi-Square Calculator

[Calculator would be embedded here with tools for:]

  • Goodness of fit test calculator
  • Independence test calculator (contingency table input)
  • Expected frequency calculator
  • Critical value finder
  • P-value calculator
  • Effect size calculators (Phi, Cramér’s V)
  • McNemar test calculator

Key Formulas Cheat Sheet

Chi-Square Test Statistic

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

Expected Frequency (Independence)

$$E_{ij} = \frac{R_i \times C_j}{n}$$

Degrees of Freedom

$$\text{Goodness of Fit: } df = k - 1$$ $$\text{Independence: } df = (r-1)(c-1)$$

Phi Coefficient (2×2 only)

$$\phi = \sqrt{\frac{\chi^2}{n}}$$

Cramér’s V (any table)

$$V = \sqrt{\frac{\chi^2}{n(k-1)}} \text{ where } k = \min(r-1, c-1)$$

McNemar Test

$$\chi^2 = \frac{(b-c)^2}{b+c}, \quad df = 1$$


Summary Table: Chi-Square Tests

Test Purpose Data df Formula Effect Size
Goodness of Fit Fit to distribution 1 categorical k - 1 (not typical)
Independence Association 2 categorical (r-1)(c-1) V or φ
McNemar Paired change Paired binary 1 -

Alternative Tests for Categorical Data:

Foundational Concepts:

Related Parametric Tests:

  • Z-Tests - For proportions with large samples
  • T-Tests - For comparing means

Complementary Tests:

Foundational Topics:

Advanced Topics:


Next Steps

After mastering chi-square tests:

  1. ANOVA: Compare 3+ continuous groups
  2. Logistic Regression: Model categorical outcomes
  3. Non-Parametric Tests: Rank-based alternatives
  4. Multivariate Analysis: Multiple categorical variables

References

  1. Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2018). Statistics for Business and Economics (14th ed.). Cengage Learning. - Chi-square tests for goodness-of-fit and independence tests with business applications.

  2. NIST/SEMATECH. (2023). e-Handbook of Statistical Methods. Retrieved from https://www.itl.nist.gov/div898/handbook/ - Statistical procedures for chi-square testing and categorical data analysis.