Introduction to ANOVA

ANOVA (Analysis of Variance) is a statistical method for comparing means across three or more independent groups. It extends the t-test (which compares two groups) to handle multiple group comparisons.

Why ANOVA Instead of Multiple T-Tests?

Problem with Multiple T-Tests:

  • Comparing 4 groups requires 6 t-tests: (4 choose 2) = 6 pairs
  • Each t-test has α = 0.05 (5% Type I error)
  • With 6 tests: Family-wise error = 1 - (0.95)^6 ≈ 26% false positive rate!

Solution: ANOVA

  • Single test for all groups simultaneously
  • Controls family-wise error rate at α = 0.05
  • More powerful than multiple t-tests
  • Accounts for multiple comparisons

ANOVA Key Features

  • Parametric test: Assumes normality and equal variances
  • Uses F-statistic: Ratio of between-group to within-group variance
  • Based on F-distribution: Different from t-distribution
  • Non-directional: Tests if ANY means differ (not which direction)

When to Use ANOVA

Use when:

  • Comparing 3+ independent groups
  • Continuous outcome variable
  • Approximately normal distributions
  • Equal variances across groups
  • Independent observations

Don’t use when:

  • Only 2 groups (use t-test instead)
  • Categorical outcome (use chi-square)
  • Severely non-normal data (use Kruskal-Wallis)
  • Dependent/paired observations (use repeated measures ANOVA)

Section 1: One-Way ANOVA

Purpose and Hypothesis

Tests whether means differ across multiple levels of a single factor.

Hypotheses:

  • H₀: μ₁ = μ₂ = … = μₖ (All group means are equal)
  • H₁: At least one group mean differs

Assumptions

  1. Independence: Observations within and across groups are independent
  2. Normality: Outcome variable is normally distributed within each group
  3. Homogeneity of Variance: Groups have equal population variances
  4. Continuous data: Outcome is continuous (not categorical)

ANOVA Table

The ANOVA table summarizes variance decomposition:

Source Sum of Squares (SS) df Mean Square (MS) F-Statistic
Between Groups SS_B k - 1 MS_B = SS_B/df_B F = MS_B/MS_W
Within Groups SS_W n - k MS_W = SS_W/df_W -
Total SS_T n - 1 - -

Formulas

Total Sum of Squares: $$SS_T = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \overline{x}_{..})^2$$

Between-Group Sum of Squares: $$SS_B = \sum_{i=1}^{k} n_i (\overline{x}i - \overline{x}{..})^2$$

Within-Group Sum of Squares: $$SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \overline{x}_i)^2$$

F-Statistic: $$F = \frac{MS_B}{MS_W} = \frac{SS_B/(k-1)}{SS_W/(n-k)}$$

Where:

  • k = number of groups
  • n = total sample size
  • $\overline{x}_i$ = mean of group i
  • $\overline{x}_{..}$ = grand mean (overall mean)

Step-by-Step Procedure

Step 1: State Hypotheses

  • H₀: All group means equal
  • H₁: At least one mean differs
  • α = 0.05

Step 2: Check Assumptions

  • Independence verified?
  • Normality (Q-Q plot or Shapiro-Wilk)?
  • Levene’s test for equal variances?

Step 3: Calculate Summary Statistics

  • Group means: $\overline{x}_i$
  • Group variances: s_i²
  • Grand mean: $\overline{x}_{..}$

Step 4: Calculate Sums of Squares

  • SS_B, SS_W, SS_T
  • Complete ANOVA table

Step 5: Calculate F-Statistic

  • F = MS_B / MS_W

Step 6: Find P-Value

  • Use F-distribution with df₁ = k-1, df₂ = n-k
  • P-value = P(F > F_calc)

Step 7: Decision

  • If p ≤ α: Reject H₀ (means differ significantly)
  • If p > α: Fail to reject H₀ (insufficient evidence)

Example: One-Way ANOVA

Problem: Compare average salaries across three departments.

Data:

  • Engineering: n=8, mean=$85k, SD=$8k
  • Marketing: n=10, mean=$72k, SD=$6k
  • Sales: n=12, mean=$78k, SD=$7k
  • Overall: n=30, grand mean=$78.5k

Calculate Sums of Squares:

Within groups: $$SS_W = (8-1) \times 8^2 + (10-1) \times 6^2 + (12-1) \times 7^2 = 448 + 324 + 539 = 1311$$

Between groups: $$SS_B = 8(85-78.5)^2 + 10(72-78.5)^2 + 12(78-78.5)^2$$ $$= 8(6.5)^2 + 10(-6.5)^2 + 12(-0.5)^2 = 338 + 422.5 + 3 = 763.5$$

Total: SS_T = SS_B + SS_W = 2074.5

ANOVA Table:

Source SS df MS F
Between 763.5 2 381.75 8.24
Within 1311 27 48.56 -
Total 2074.5 29 - -

Decision:

  • F = 8.24 with df₁ = 2, df₂ = 27
  • P-value ≈ 0.001
  • Since p < 0.05, reject H₀

Conclusion: Department salaries differ significantly (p = 0.001). Follow up with post-hoc tests to identify which departments differ.


Section 2: Post-Hoc Tests

Purpose

ANOVA tells you IF means differ, but not WHICH means differ. Post-hoc tests make pairwise comparisons while controlling family-wise error rate.

Tukey’s Honest Significant Difference (HSD)

Most common post-hoc test

$$HSD = q_\alpha \times \sqrt{\frac{MS_W}{2}} \times \sqrt{\frac{1}{n_i} + \frac{1}{n_j}}$$

Where:

  • q_α = critical value from Studentized range distribution
  • MS_W = within-group mean square from ANOVA

Decision Rule: If |mean_i - mean_j| > HSD, the two means differ significantly.

Other Post-Hoc Tests

Test When to Use Characteristics
Tukey HSD Equal sample sizes, general use Balanced; most powerful
Scheffé Unequal sample sizes, flexible More conservative; safe
Bonferroni Few planned comparisons Simple correction; very conservative
Dunnett Comparing to control only More powerful for control comparison

Example: Tukey Post-Hoc

From salary example:

  • Engineering: $85k
  • Marketing: $72k
  • Sales: $78k

Pairwise Comparisons:

  1. Engineering vs Marketing: $85 - $72 = $13k
  2. Engineering vs Sales: $85 - $78 = $7k
  3. Marketing vs Sales: $78 - $72 = $6k

Tukey HSD (assuming equal n = 10): $$HSD = 3.51 \times \sqrt{\frac{48.56}{2}} \times \sqrt{\frac{2}{10}} = 3.51 \times 4.93 \times 0.447 ≈ 7.73k$$

Conclusions:

  • Eng vs Mkt: $13k > $7.73k → Significantly different ✓
  • Eng vs Sales: $7k < $7.73k → Not significantly different
  • Mkt vs Sales: $6k < $7.73k → Not significantly different

Section 3: Two-Way ANOVA

Purpose

Tests effects of TWO factors (independent variables) on one outcome.

Research Questions:

  • Do factor A and factor B independently affect the outcome?
  • Is there an interaction (synergistic effect) between factors?

Example Design

Study: Comparing teaching methods (3 levels) and student gender (2 levels) on test scores

  • Factor A (Teaching Method): Traditional, Online, Hybrid (3 levels)
  • Factor B (Gender): Male, Female (2 levels)
  • Outcome: Test score (continuous)

Hypotheses

  • H₀(A): Factor A main effect = 0 (levels of A don’t differ)
  • H₀(B): Factor B main effect = 0 (levels of B don’t differ)
  • H₀(AB): Interaction effect = 0 (factors don’t interact)

Two-Way ANOVA Table

Source SS df MS F
Factor A SS_A a-1 MS_A F_A
Factor B SS_B b-1 MS_B F_B
Interaction (AB) SS_AB (a-1)(b-1) MS_AB F_AB
Error (Within) SS_E n-ab MS_E -
Total SS_T n-1 - -

Where:

  • a = levels of factor A
  • b = levels of factor B
  • n = total sample size

Interpreting Interactions

Main Effects Only (No Interaction):

  • Effect of A same at all levels of B
  • Effect of B same at all levels of A
  • Parallel lines in interaction plot

Interaction Present:

  • Effect of A differs by level of B
  • Non-parallel lines in interaction plot
  • Can’t interpret main effects alone

When Interaction is Significant

  • Analyze simple effects (effect of A at each level of B)
  • Don’t interpret main effects in isolation
  • Graph the interaction to understand pattern

Section 4: Repeated Measures ANOVA

Purpose

Tests effects of a factor when the same subjects measured multiple times.

Common Scenarios:

  • Pretest, posttest, follow-up on same subjects
  • Multiple treatments applied to same subjects
  • Growth measurements over time

Advantages

  • Subjects are their own control (reduces individual differences)
  • More powerful than between-subjects design
  • Requires smaller sample size

Sphericity Assumption

Sphericity: The variances of differences between repeated measures are equal across all pairs.

Mauchly’s Test: Tests sphericity

  • p > 0.05: Assume sphericity (use standard F)
  • p ≤ 0.05: Violate sphericity (use adjusted F like Greenhouse-Geisser)

Adjustments for Sphericity Violation

If sphericity violated, adjust degrees of freedom:

$$F_{\text{adjusted}} = F \text{ (same numerator/denominator)}$$ $$df_{\text{adjusted}} = \varepsilon \times df_{\text{original}}$$

Where ε (epsilon) comes from Greenhouse-Geisser or Huynh-Feldt correction.


Section 5: Assumptions and Diagnostics

Checking Normality

Methods:

  • Q-Q plot (points near diagonal = normal)
  • Histogram (bell-shaped = normal)
  • Shapiro-Wilk test (p > 0.05 = normal)
  • Residuals plot (randomly scattered = good)

Checking Homogeneity of Variance

Levene’s Test:

  • H₀: Variances are equal
  • p > 0.05: Assume equal variances ✓
  • p ≤ 0.05: Variances unequal; use Welch’s ANOVA

Checking Independence

Verification:

  • Different subjects in each group? ✓
  • No repeated measures? ✓
  • Random assignment/sampling? ✓
  • Residuals not autocorrelated? ✓

Residuals Analysis

Plot residuals vs fitted values:

  • No pattern → Assumptions met ✓
  • Funnel shape → Heterogeneity of variance ✗
  • Curved pattern → Non-linearity ✗

Section 6: Effect Sizes

Eta-Squared (η²)

Proportion of variance in outcome explained by factor(s):

$$\eta^2 = \frac{SS_{\text{factor}}}{SS_{\text{total}}}$$

Interpretation:

  • η² ≈ 0.01: Small effect
  • η² ≈ 0.06: Medium effect
  • η² ≈ 0.14: Large effect

Omega-Squared (ω²)

Better estimate (less biased, especially small samples):

$$\omega^2 = \frac{SS_{\text{factor}} - (df_{\text{factor}}) \times MS_E}{SS_{\text{total}} + MS_E}$$

Partial Eta-Squared (Partial η²)

Used in multi-factor designs:

$$\text{Partial } \eta^2 = \frac{SS_{\text{factor}}}{SS_{\text{factor}} + SS_E}$$


Section 7: Violations and Alternatives

Normality Violated

ANOVA is robust to moderate non-normality if:

  • Large sample (n > 20 per group)
  • Distributions similar shape
  • Equal sample sizes

Use Kruskal-Wallis test if:

  • Severely non-normal
  • Small samples
  • Ordinal data

Homogeneity Violated

Standard ANOVA unreliable

Use Welch’s ANOVA:

  • More robust to unequal variances
  • Still requires normality
  • Adjusted F-statistic

Independence Violated

If same subjects measured multiple times:

  • Use Repeated Measures ANOVA

If clusters/grouping in data:

  • Use Mixed Models
  • Account for clustering structure

Section 8: Common Mistakes

Mistake 1: Multiple T-Tests Without Correction

Problem: Inflates family-wise error rate Solution: Use ANOVA + post-hoc tests

Mistake 2: Interpreting Main Effects with Interaction

Problem: Main effects misleading if interaction present Solution: Examine interaction first; interpret main effects only if no interaction

Mistake 3: Not Checking Assumptions

Problem: Violating assumptions invalidates results Solution: Always check normality and homogeneity

Mistake 4: Using ANOVA with 2 Groups

Problem: ANOVA redundant; t-test simpler Solution: Use t-test for 2 groups

Mistake 5: Ignoring Effect Size

Problem: P-value doesn’t indicate practical significance Solution: Report η² or ω² along with p-value


Section 9: Reporting ANOVA Results

Standard Format

“A one-way ANOVA was conducted to examine the effect of [factor] on [outcome]. Results indicated a significant effect, F(df_between, df_within) = F_value, p = probability, η² = effect_size.”

Full Example Report

“A one-way ANOVA was conducted to test whether salaries differ across three departments. Results showed a significant difference in mean salaries, F(2, 27) = 8.24, p = 0.001, η² = 0.37 (large effect). Engineering had significantly higher salaries (M = $85k) than Marketing (M = $72k), t = 4.23, p < 0.001 (Tukey HSD). Sales salaries (M = $78k) did not differ significantly from either group.”

Components

  1. Test name: One-way ANOVA, Two-way ANOVA, etc.
  2. Degrees of freedom: Between and within
  3. F-statistic value: F = XXX
  4. P-value: p = value
  5. Effect size: η² = value
  6. Post-hoc results: Which groups differ?
  7. Practical interpretation: What does it mean?

Decision Tree: When to Use What

Comparing group means?
│
├─→ 2 groups?
│   └─→ T-Test
│
└─→ 3+ groups?
    ├─→ Same subjects measured multiple times?
    │   └─→ Repeated Measures ANOVA
    │
    └─→ Different independent groups?
        ├─→ Normality violated, small n?
        │   └─→ Kruskal-Wallis Test
        │
        ├─→ Unequal variances?
        │   └─→ Welch's ANOVA
        │
        └─→ Meets assumptions?
            └─→ Standard One-Way ANOVA

Interactive ANOVA Calculator

[Calculator would be embedded here with tools for:]

  • One-way ANOVA calculator
  • Two-way ANOVA calculator
  • Repeated measures ANOVA calculator
  • Post-hoc test calculator (Tukey HSD, Scheffé)
  • Effect size calculator (eta-squared, omega-squared)
  • Critical F-value finder
  • P-value calculator
  • Assumption checkers (Levene’s test, Mauchly’s test)

Key Formulas Cheat Sheet

One-Way ANOVA F-Statistic

$$F = \frac{MS_B}{MS_W} = \frac{SS_B/(k-1)}{SS_W/(n-k)}$$

Tukey HSD

$$HSD = q_\alpha \sqrt{\frac{MS_W}{2}\left(\frac{1}{n_i} + \frac{1}{n_j}\right)}$$

Eta-Squared

$$\eta^2 = \frac{SS_{\text{factor}}}{SS_{\text{total}}}$$

Omega-Squared

$$\omega^2 = \frac{SS_{\text{factor}} - (df) \times MS_E}{SS_{\text{total}} + MS_E}$$


Summary Table: Types of ANOVA

ANOVA Type Factors Samples When to Use
One-Way 1 factor Independent 3+ independent groups
Two-Way 2 factors Independent 2 factors; interactions
Repeated Measures 1 factor Dependent Same subjects, repeated
Mixed 2+ factors Mix Combination designs

Alternative Comparison Tests:

Foundational Concepts:

Post-Hoc Methods:


Related Tests:

Foundational:

Advanced:


Next Steps

After mastering ANOVA:

  1. Post-Hoc Tests: Detailed comparisons (Tukey, Dunnett, Scheffé)
  2. Regression Analysis: More flexible modeling of groups
  3. Non-Parametric Tests: Alternatives when assumptions violated
  4. Mixed Models: Complex designs with multiple factors

References

  1. Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics for Engineers and Scientists (6th ed.). John Wiley & Sons. - Comprehensive ANOVA methodology including one-way, two-way, post-hoc tests, and effect sizes.

  2. Walpole, R.E., Myers, S.L., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists (9th ed.). Pearson. - Theory and application of ANOVA for comparing multiple population means.