Hypothesis testing is the foundation of statistical inference. It allows you to make decisions about populations based on sample data, answer research questions, and determine whether observed differences are statistically significant or due to chance. Mastering hypothesis testing in R is essential for data analysis, research, and evidence-based decision-making.
This comprehensive guide covers all major hypothesis tests with practical R implementations and interpretations.
Foundations of Hypothesis Testing
Hypothesis testing follows a structured framework:
- Null Hypothesis (H₀): No effect or difference exists
- Alternative Hypothesis (H₁): An effect or difference exists
- Test Statistic: Calculated from sample data
- P-value: Probability of observing results if H₀ is true
- Significance Level (α): Threshold for decision (typically 0.05)
- Decision: Reject or fail to reject H₀
Key Concepts
# Understanding p-values and significance
alpha <- 0.05 # Significance level
# If p-value < alpha: Reject H₀ (statistically significant)
# If p-value >= alpha: Fail to reject H₀ (not significant)
# Type I Error (False Positive): Reject H₀ when it's true
# Type II Error (False Negative): Fail to reject H₀ when it's false
# Power: 1 - Type II Error rate (ability to detect true effects)
One-Sample Tests
One-Sample T-Test
Tests if sample mean differs from a hypothesized population mean.
# One-sample t-test
data <- c(23, 25, 24, 26, 25, 23, 24, 25, 26, 24)
# Test if mean differs from 25
result <- t.test(data, mu = 25)
print(result)
# t = -0.66667, df = 9, p-value = 0.5189
# 95 percent confidence interval: [23.71 25.89]
# sample estimates: mean of x = 24.8
# Interpretation
if (result$p.value < 0.05) {
print("Statistically significant difference from 25")
} else {
print("No statistically significant difference from 25")
}
# One-tailed test (mean > 25)
result_greater <- t.test(data, mu = 25, alternative = "greater")
# One-tailed test (mean < 25)
result_less <- t.test(data, mu = 25, alternative = "less")
One-Sample Wilcoxon Test
Non-parametric alternative to t-test (doesn’t assume normality).
# Wilcoxon signed-rank test
result <- wilcox.test(data, mu = 25)
print(result)
# W = 24, p-value = 0.5469
# Extract components
p_value <- result$p.value
statistic <- result$statistic
Two-Sample Tests
Two-Sample T-Test
Compares means between two independent groups.
# Create two independent samples
group1 <- c(23, 25, 24, 26, 25)
group2 <- c(20, 22, 21, 19, 23)
# Two-sample t-test (assume equal variances)
result <- t.test(group1, group2)
print(result)
# Welch's t-test (doesn't assume equal variances)
result_welch <- t.test(group1, group2, var.equal = FALSE)
# One-tailed test
result_one_tail <- t.test(group1, group2, alternative = "greater")
Paired T-Test
Compares means between paired (dependent) observations.
# Paired measurements (before/after)
before <- c(120, 125, 130, 118, 122)
after <- c(115, 120, 128, 116, 119)
# Paired t-test
result <- t.test(before, after, paired = TRUE)
print(result)
# Calculate mean difference
mean_diff <- mean(before - after)
print(paste("Mean difference:", mean_diff))
Two-Sample Wilcoxon Test
Non-parametric alternative for comparing two groups.
# Mann-Whitney U test (Wilcoxon rank-sum test)
result <- wilcox.test(group1, group2)
print(result)
# W = 23, p-value = 0.0635
ANOVA (Analysis of Variance)
One-Way ANOVA
Compares means across three or more groups.
# Create data with multiple groups
control <- c(20, 22, 21, 23, 19)
treatment_a <- c(25, 27, 26, 28, 24)
treatment_b <- c(30, 32, 31, 33, 29)
# Combine into data frame
data <- data.frame(
value = c(control, treatment_a, treatment_b),
group = rep(c("Control", "Treatment_A", "Treatment_B"), each = 5)
)
# One-way ANOVA
result <- aov(value ~ group, data = data)
print(summary(result))
# Extract F-statistic and p-value
f_stat <- summary(result)[[1]]$`F value`[1]
p_value <- summary(result)[[1]]$`Pr(>F)`[1]
print(paste("F-statistic:", f_stat, "P-value:", p_value))
Post-Hoc Tests (Multiple Comparisons)
When ANOVA is significant, determine which groups differ.
# Tukey HSD (Honestly Significant Difference)
tukey_result <- TukeyHSD(result)
print(tukey_result)
# Plot Tukey results
plot(tukey_result)
# Pairwise t-tests with Bonferroni correction
pairwise.t.test(data$value, data$group, p.adjust.method = "bonferroni")
Kruskal-Wallis Test
Non-parametric alternative to ANOVA.
# Kruskal-Wallis test (non-parametric ANOVA)
result_kw <- kruskal.test(value ~ group, data = data)
print(result_kw)
# Kruskal-Wallis chi-squared = 9.6, df = 2, p-value = 0.00821
Chi-Square Test
Tests association between categorical variables.
# Create contingency table
contingency <- matrix(c(10, 15, 20, 25), nrow = 2, ncol = 2,
dimnames = list(c("Yes", "No"), c("Success", "Failure")))
print(contingency)
# Chi-square test
result <- chisq.test(contingency)
print(result)
# Extract components
chi_stat <- result$statistic
p_value <- result$p.value
print(paste("Chi-square statistic:", chi_stat, "P-value:", p_value))
# Expected frequencies
print(result$expected)
Correlation Tests
Pearson Correlation Test
Tests linear relationship between two continuous variables.
# Two variables
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 6)
# Pearson correlation test
result <- cor.test(x, y, method = "pearson")
print(result)
# Extract correlation coefficient and p-value
correlation <- result$estimate
p_value <- result$p.value
print(paste("Correlation:", round(correlation, 3), "P-value:", p_value))
Spearman Correlation Test
Non-parametric correlation test.
# Spearman correlation test
result_spearman <- cor.test(x, y, method = "spearman")
print(result_spearman)
Other Important Tests
Fisher’s Exact Test
Tests association for 2×2 contingency tables (small samples).
# Fisher's exact test
contingency <- matrix(c(8, 2, 1, 9), nrow = 2, ncol = 2)
result <- fisher.test(contingency)
print(result)
# Odds ratio and p-value
Effect Sizes
Quantify practical significance beyond p-values.
# Cohen's d (effect size for t-tests)
cohens_d <- function(x1, x2) {
n1 <- length(x1)
n2 <- length(x2)
var1 <- var(x1)
var2 <- var(x2)
pooled_sd <- sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1 + n2 - 2))
(mean(x1) - mean(x2)) / pooled_sd
}
d <- cohens_d(group1, group2)
print(paste("Cohen's d:", round(d, 3)))
# Interpretation: |d| < 0.2 (small), 0.2-0.5 (small-medium),
# 0.5-0.8 (medium-large), > 0.8 (large)
Assumptions Testing
Normality Tests
# Shapiro-Wilk test for normality
result <- shapiro.test(data)
print(result)
# Visual inspection
qqnorm(data)
qqline(data)
# Histogram
hist(data, main = "Histogram with Normal Curve")
Homogeneity of Variance
# Levene's test (assumes normality)
library(car)
result <- leveneTest(value ~ group, data = data)
print(result)
# Bartlett's test
result_bartlett <- bartlett.test(value ~ group, data = data)
print(result_bartlett)
Complete Workflow Example
# Complete hypothesis testing workflow
library(dplyr)
# 1. Load and explore data
data(mtcars)
head(mtcars)
# 2. Define hypothesis
# H₀: Mean MPG for automatic and manual transmissions are equal
# H₁: Means differ
# 3. Check assumptions
# Normality
with(mtcars, {
print(shapiro.test(mpg[am == 0])) # Automatic
print(shapiro.test(mpg[am == 1])) # Manual
})
# Homogeneity of variance
with(mtcars, bartlett.test(mpg ~ am))
# 4. Perform test
auto <- mtcars$mpg[mtcars$am == 0]
manual <- mtcars$mpg[mtcars$am == 1]
result <- t.test(auto, manual)
# 5. Interpret results
print(result)
print(paste("Mean automatic:", round(mean(auto), 2)))
print(paste("Mean manual:", round(mean(manual), 2)))
print(paste("P-value:", round(result$p.value, 4)))
print(paste("Conclusion:", ifelse(result$p.value < 0.05,
"Significant difference",
"No significant difference")))
# 6. Calculate effect size
d <- cohens_d(auto, manual)
print(paste("Cohen's d:", round(d, 3)))
Best Practices
- State hypotheses first - Before seeing data
- Check assumptions - Verify test requirements
- Choose appropriate test - Match to data type and design
- Report effect sizes - Not just p-values
- Use significance level α = 0.05 - Standard convention
- Adjust for multiple comparisons - When doing many tests
- Interpret in context - Statistical vs practical significance
- Visualize results - Plots aid interpretation
Common Questions
Q: What if my data isn’t normal? A: Use non-parametric tests (Wilcoxon, Kruskal-Wallis) or transform data
Q: What’s a p-value? A: Probability of observing data if null hypothesis is true. Lower = stronger evidence against H₀
Q: Should I report p-values or effect sizes? A: Report both. P-values show significance, effect sizes show practical importance
Q: How many comparisons can I do? A: Use corrections (Bonferroni) for multiple tests to control error rate
Q: What’s the difference between statistical and practical significance? A: Statistical significance (p < 0.05) vs practical significance (meaningful effect size)
Related Topics
Build on hypothesis testing:
- R Descriptive Statistics - Complete Guide - Understand data before testing
- R Data Visualization - Complete Guide - Visualize test results
- R Distance Metrics - Complete Guide - Measure differences
Download R Script
Get all code examples from this tutorial: hypothesis-testing-examples.R