Introduction to ANOVA
ANOVA (Analysis of Variance) is a statistical method for comparing means across three or more independent groups. It extends the t-test (which compares two groups) to handle multiple group comparisons.
Why ANOVA Instead of Multiple T-Tests?
Problem with Multiple T-Tests:
- Comparing 4 groups requires 6 t-tests: (4 choose 2) = 6 pairs
- Each t-test has α = 0.05 (5% Type I error)
- With 6 tests: Family-wise error = 1 - (0.95)^6 ≈ 26% false positive rate!
Solution: ANOVA
- Single test for all groups simultaneously
- Controls family-wise error rate at α = 0.05
- More powerful than multiple t-tests
- Accounts for multiple comparisons
ANOVA Key Features
- Parametric test: Assumes normality and equal variances
- Uses F-statistic: Ratio of between-group to within-group variance
- Based on F-distribution: Different from t-distribution
- Non-directional: Tests if ANY means differ (not which direction)
When to Use ANOVA
✓ Use when:
- Comparing 3+ independent groups
- Continuous outcome variable
- Approximately normal distributions
- Equal variances across groups
- Independent observations
✗ Don’t use when:
- Only 2 groups (use t-test instead)
- Categorical outcome (use chi-square)
- Severely non-normal data (use Kruskal-Wallis)
- Dependent/paired observations (use repeated measures ANOVA)
Section 1: One-Way ANOVA
Purpose and Hypothesis
Tests whether means differ across multiple levels of a single factor.
Hypotheses:
- H₀: μ₁ = μ₂ = … = μₖ (All group means are equal)
- H₁: At least one group mean differs
Assumptions
- Independence: Observations within and across groups are independent
- Normality: Outcome variable is normally distributed within each group
- Homogeneity of Variance: Groups have equal population variances
- Continuous data: Outcome is continuous (not categorical)
ANOVA Table
The ANOVA table summarizes variance decomposition:
| Source | Sum of Squares (SS) | df | Mean Square (MS) | F-Statistic |
|---|---|---|---|---|
| Between Groups | SS_B | k - 1 | MS_B = SS_B/df_B | F = MS_B/MS_W |
| Within Groups | SS_W | n - k | MS_W = SS_W/df_W | - |
| Total | SS_T | n - 1 | - | - |
Formulas
Total Sum of Squares: $$SS_T = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \overline{x}_{..})^2$$
Between-Group Sum of Squares: $$SS_B = \sum_{i=1}^{k} n_i (\overline{x}i - \overline{x}{..})^2$$
Within-Group Sum of Squares: $$SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \overline{x}_i)^2$$
F-Statistic: $$F = \frac{MS_B}{MS_W} = \frac{SS_B/(k-1)}{SS_W/(n-k)}$$
Where:
- k = number of groups
- n = total sample size
- $\overline{x}_i$ = mean of group i
- $\overline{x}_{..}$ = grand mean (overall mean)
Step-by-Step Procedure
Step 1: State Hypotheses
- H₀: All group means equal
- H₁: At least one mean differs
- α = 0.05
Step 2: Check Assumptions
- Independence verified?
- Normality (Q-Q plot or Shapiro-Wilk)?
- Levene’s test for equal variances?
Step 3: Calculate Summary Statistics
- Group means: $\overline{x}_i$
- Group variances: s_i²
- Grand mean: $\overline{x}_{..}$
Step 4: Calculate Sums of Squares
- SS_B, SS_W, SS_T
- Complete ANOVA table
Step 5: Calculate F-Statistic
- F = MS_B / MS_W
Step 6: Find P-Value
- Use F-distribution with df₁ = k-1, df₂ = n-k
- P-value = P(F > F_calc)
Step 7: Decision
- If p ≤ α: Reject H₀ (means differ significantly)
- If p > α: Fail to reject H₀ (insufficient evidence)
Example: One-Way ANOVA
Problem: Compare average salaries across three departments.
Data:
- Engineering: n=8, mean=$85k, SD=$8k
- Marketing: n=10, mean=$72k, SD=$6k
- Sales: n=12, mean=$78k, SD=$7k
- Overall: n=30, grand mean=$78.5k
Calculate Sums of Squares:
Within groups: $$SS_W = (8-1) \times 8^2 + (10-1) \times 6^2 + (12-1) \times 7^2 = 448 + 324 + 539 = 1311$$
Between groups: $$SS_B = 8(85-78.5)^2 + 10(72-78.5)^2 + 12(78-78.5)^2$$ $$= 8(6.5)^2 + 10(-6.5)^2 + 12(-0.5)^2 = 338 + 422.5 + 3 = 763.5$$
Total: SS_T = SS_B + SS_W = 2074.5
ANOVA Table:
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | 763.5 | 2 | 381.75 | 8.24 |
| Within | 1311 | 27 | 48.56 | - |
| Total | 2074.5 | 29 | - | - |
Decision:
- F = 8.24 with df₁ = 2, df₂ = 27
- P-value ≈ 0.001
- Since p < 0.05, reject H₀
Conclusion: Department salaries differ significantly (p = 0.001). Follow up with post-hoc tests to identify which departments differ.
Section 2: Post-Hoc Tests
Purpose
ANOVA tells you IF means differ, but not WHICH means differ. Post-hoc tests make pairwise comparisons while controlling family-wise error rate.
Tukey’s Honest Significant Difference (HSD)
Most common post-hoc test
$$HSD = q_\alpha \times \sqrt{\frac{MS_W}{2}} \times \sqrt{\frac{1}{n_i} + \frac{1}{n_j}}$$
Where:
- q_α = critical value from Studentized range distribution
- MS_W = within-group mean square from ANOVA
Decision Rule: If |mean_i - mean_j| > HSD, the two means differ significantly.
Other Post-Hoc Tests
| Test | When to Use | Characteristics |
|---|---|---|
| Tukey HSD | Equal sample sizes, general use | Balanced; most powerful |
| Scheffé | Unequal sample sizes, flexible | More conservative; safe |
| Bonferroni | Few planned comparisons | Simple correction; very conservative |
| Dunnett | Comparing to control only | More powerful for control comparison |
Example: Tukey Post-Hoc
From salary example:
- Engineering: $85k
- Marketing: $72k
- Sales: $78k
Pairwise Comparisons:
- Engineering vs Marketing: $85 - $72 = $13k
- Engineering vs Sales: $85 - $78 = $7k
- Marketing vs Sales: $78 - $72 = $6k
Tukey HSD (assuming equal n = 10): $$HSD = 3.51 \times \sqrt{\frac{48.56}{2}} \times \sqrt{\frac{2}{10}} = 3.51 \times 4.93 \times 0.447 ≈ 7.73k$$
Conclusions:
- Eng vs Mkt: $13k > $7.73k → Significantly different ✓
- Eng vs Sales: $7k < $7.73k → Not significantly different
- Mkt vs Sales: $6k < $7.73k → Not significantly different
Section 3: Two-Way ANOVA
Purpose
Tests effects of TWO factors (independent variables) on one outcome.
Research Questions:
- Do factor A and factor B independently affect the outcome?
- Is there an interaction (synergistic effect) between factors?
Example Design
Study: Comparing teaching methods (3 levels) and student gender (2 levels) on test scores
- Factor A (Teaching Method): Traditional, Online, Hybrid (3 levels)
- Factor B (Gender): Male, Female (2 levels)
- Outcome: Test score (continuous)
Hypotheses
- H₀(A): Factor A main effect = 0 (levels of A don’t differ)
- H₀(B): Factor B main effect = 0 (levels of B don’t differ)
- H₀(AB): Interaction effect = 0 (factors don’t interact)
Two-Way ANOVA Table
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Factor A | SS_A | a-1 | MS_A | F_A |
| Factor B | SS_B | b-1 | MS_B | F_B |
| Interaction (AB) | SS_AB | (a-1)(b-1) | MS_AB | F_AB |
| Error (Within) | SS_E | n-ab | MS_E | - |
| Total | SS_T | n-1 | - | - |
Where:
- a = levels of factor A
- b = levels of factor B
- n = total sample size
Interpreting Interactions
Main Effects Only (No Interaction):
- Effect of A same at all levels of B
- Effect of B same at all levels of A
- Parallel lines in interaction plot
Interaction Present:
- Effect of A differs by level of B
- Non-parallel lines in interaction plot
- Can’t interpret main effects alone
When Interaction is Significant
- Analyze simple effects (effect of A at each level of B)
- Don’t interpret main effects in isolation
- Graph the interaction to understand pattern
Section 4: Repeated Measures ANOVA
Purpose
Tests effects of a factor when the same subjects measured multiple times.
Common Scenarios:
- Pretest, posttest, follow-up on same subjects
- Multiple treatments applied to same subjects
- Growth measurements over time
Advantages
- Subjects are their own control (reduces individual differences)
- More powerful than between-subjects design
- Requires smaller sample size
Sphericity Assumption
Sphericity: The variances of differences between repeated measures are equal across all pairs.
Mauchly’s Test: Tests sphericity
- p > 0.05: Assume sphericity (use standard F)
- p ≤ 0.05: Violate sphericity (use adjusted F like Greenhouse-Geisser)
Adjustments for Sphericity Violation
If sphericity violated, adjust degrees of freedom:
$$F_{\text{adjusted}} = F \text{ (same numerator/denominator)}$$ $$df_{\text{adjusted}} = \varepsilon \times df_{\text{original}}$$
Where ε (epsilon) comes from Greenhouse-Geisser or Huynh-Feldt correction.
Section 5: Assumptions and Diagnostics
Checking Normality
Methods:
- Q-Q plot (points near diagonal = normal)
- Histogram (bell-shaped = normal)
- Shapiro-Wilk test (p > 0.05 = normal)
- Residuals plot (randomly scattered = good)
Checking Homogeneity of Variance
Levene’s Test:
- H₀: Variances are equal
- p > 0.05: Assume equal variances ✓
- p ≤ 0.05: Variances unequal; use Welch’s ANOVA
Checking Independence
Verification:
- Different subjects in each group? ✓
- No repeated measures? ✓
- Random assignment/sampling? ✓
- Residuals not autocorrelated? ✓
Residuals Analysis
Plot residuals vs fitted values:
- No pattern → Assumptions met ✓
- Funnel shape → Heterogeneity of variance ✗
- Curved pattern → Non-linearity ✗
Section 6: Effect Sizes
Eta-Squared (η²)
Proportion of variance in outcome explained by factor(s):
$$\eta^2 = \frac{SS_{\text{factor}}}{SS_{\text{total}}}$$
Interpretation:
- η² ≈ 0.01: Small effect
- η² ≈ 0.06: Medium effect
- η² ≈ 0.14: Large effect
Omega-Squared (ω²)
Better estimate (less biased, especially small samples):
$$\omega^2 = \frac{SS_{\text{factor}} - (df_{\text{factor}}) \times MS_E}{SS_{\text{total}} + MS_E}$$
Partial Eta-Squared (Partial η²)
Used in multi-factor designs:
$$\text{Partial } \eta^2 = \frac{SS_{\text{factor}}}{SS_{\text{factor}} + SS_E}$$
Section 7: Violations and Alternatives
Normality Violated
✓ ANOVA is robust to moderate non-normality if:
- Large sample (n > 20 per group)
- Distributions similar shape
- Equal sample sizes
✗ Use Kruskal-Wallis test if:
- Severely non-normal
- Small samples
- Ordinal data
Homogeneity Violated
✗ Standard ANOVA unreliable
✓ Use Welch’s ANOVA:
- More robust to unequal variances
- Still requires normality
- Adjusted F-statistic
Independence Violated
If same subjects measured multiple times:
- Use Repeated Measures ANOVA
If clusters/grouping in data:
- Use Mixed Models
- Account for clustering structure
Section 8: Common Mistakes
Mistake 1: Multiple T-Tests Without Correction
Problem: Inflates family-wise error rate Solution: Use ANOVA + post-hoc tests
Mistake 2: Interpreting Main Effects with Interaction
Problem: Main effects misleading if interaction present Solution: Examine interaction first; interpret main effects only if no interaction
Mistake 3: Not Checking Assumptions
Problem: Violating assumptions invalidates results Solution: Always check normality and homogeneity
Mistake 4: Using ANOVA with 2 Groups
Problem: ANOVA redundant; t-test simpler Solution: Use t-test for 2 groups
Mistake 5: Ignoring Effect Size
Problem: P-value doesn’t indicate practical significance Solution: Report η² or ω² along with p-value
Section 9: Reporting ANOVA Results
Standard Format
“A one-way ANOVA was conducted to examine the effect of [factor] on [outcome]. Results indicated a significant effect, F(df_between, df_within) = F_value, p = probability, η² = effect_size.”
Full Example Report
“A one-way ANOVA was conducted to test whether salaries differ across three departments. Results showed a significant difference in mean salaries, F(2, 27) = 8.24, p = 0.001, η² = 0.37 (large effect). Engineering had significantly higher salaries (M = $85k) than Marketing (M = $72k), t = 4.23, p < 0.001 (Tukey HSD). Sales salaries (M = $78k) did not differ significantly from either group.”
Components
- Test name: One-way ANOVA, Two-way ANOVA, etc.
- Degrees of freedom: Between and within
- F-statistic value: F = XXX
- P-value: p = value
- Effect size: η² = value
- Post-hoc results: Which groups differ?
- Practical interpretation: What does it mean?
Decision Tree: When to Use What
Comparing group means?
│
├─→ 2 groups?
│ └─→ T-Test
│
└─→ 3+ groups?
├─→ Same subjects measured multiple times?
│ └─→ Repeated Measures ANOVA
│
└─→ Different independent groups?
├─→ Normality violated, small n?
│ └─→ Kruskal-Wallis Test
│
├─→ Unequal variances?
│ └─→ Welch's ANOVA
│
└─→ Meets assumptions?
└─→ Standard One-Way ANOVA
Interactive ANOVA Calculator
[Calculator would be embedded here with tools for:]
- One-way ANOVA calculator
- Two-way ANOVA calculator
- Repeated measures ANOVA calculator
- Post-hoc test calculator (Tukey HSD, Scheffé)
- Effect size calculator (eta-squared, omega-squared)
- Critical F-value finder
- P-value calculator
- Assumption checkers (Levene’s test, Mauchly’s test)
Key Formulas Cheat Sheet
One-Way ANOVA F-Statistic
$$F = \frac{MS_B}{MS_W} = \frac{SS_B/(k-1)}{SS_W/(n-k)}$$
Tukey HSD
$$HSD = q_\alpha \sqrt{\frac{MS_W}{2}\left(\frac{1}{n_i} + \frac{1}{n_j}\right)}$$
Eta-Squared
$$\eta^2 = \frac{SS_{\text{factor}}}{SS_{\text{total}}}$$
Omega-Squared
$$\omega^2 = \frac{SS_{\text{factor}} - (df) \times MS_E}{SS_{\text{total}} + MS_E}$$
Summary Table: Types of ANOVA
| ANOVA Type | Factors | Samples | When to Use |
|---|---|---|---|
| One-Way | 1 factor | Independent | 3+ independent groups |
| Two-Way | 2 factors | Independent | 2 factors; interactions |
| Repeated Measures | 1 factor | Dependent | Same subjects, repeated |
| Mixed | 2+ factors | Mix | Combination designs |
Related Tests
Alternative Comparison Tests:
- T-Tests - For 2 group comparisons
- Z-Tests - For large samples
- Non-Parametric Alternative - Kruskal-Wallis test for non-normal data
Foundational Concepts:
- Hypothesis Testing Guide - Core principles
- Variance Tests - Testing homogeneity of variance
- Statistical Significance - P-values and error rates
Post-Hoc Methods:
- Chi-Square Tests - For categorical outcomes
Related Resources
Related Tests:
- T-Tests - For 2-group comparisons
- Variance Tests - Check homogeneity assumption
- Kruskal-Wallis Test - Non-parametric ANOVA alternative
Foundational:
Advanced:
- Regression Analysis - More flexible approach
- Mixed Models - Repeated measures generalization
Next Steps
After mastering ANOVA:
- Post-Hoc Tests: Detailed comparisons (Tukey, Dunnett, Scheffé)
- Regression Analysis: More flexible modeling of groups
- Non-Parametric Tests: Alternatives when assumptions violated
- Mixed Models: Complex designs with multiple factors
References
-
Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics for Engineers and Scientists (6th ed.). John Wiley & Sons. - Comprehensive ANOVA methodology including one-way, two-way, post-hoc tests, and effect sizes.
-
Walpole, R.E., Myers, S.L., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists (9th ed.). Pearson. - Theory and application of ANOVA for comparing multiple population means.