Tables are fundamental tools for data exploration and analysis, providing concise summaries of categorical variables and their relationships. Whether creating simple frequency distributions, analyzing relationships between multiple variables, or computing proportions and margins, R provides comprehensive functions for table construction and analysis.
This guide covers all aspects of tables and contingency analysis with practical examples.
Creating Frequency Tables
Basic Frequency Tables
# Create frequency table
colors <- c("red", "blue", "red", "green", "blue", "red", "blue", "green")
freq_table <- table(colors)
print(freq_table)
# colors
# blue green red
# 3 2 3
# From a vector
x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)
table(x)
# x
# 1 2 3 4
# 1 2 3 4
# Named frequency table
favorite_fruits <- c("apple", "banana", "apple", "orange", "banana", "apple")
table(favorite_fruits)
Two-Way Tables (Contingency Tables)
# Create 2x2 table
gender <- c("M", "M", "F", "F", "M", "F", "M", "F", "M", "F")
purchased <- c("Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes")
two_way <- table(gender, purchased)
print(two_way)
# purchased
# gender No Yes
# F 2 2
# M 3 2
# Access elements
two_way["M", "Yes"] # [1] 2
two_way[1, 2] # [1] 2
Three-Way and Higher Tables
# Create three-way table
age_group <- c("Young", "Young", "Old", "Old", "Young", "Old", "Young", "Old", "Young", "Old")
gender <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
purchased <- c("Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes")
three_way <- table(age_group, gender, purchased)
print(three_way)
# Access layers
three_way[, , "Yes"] # Slice for Yes
three_way[, , "No"] # Slice for No
Table Operations and Analysis
Margins and Totals
# Add row and column margins (totals)
two_way_with_margins <- addmargins(two_way)
print(two_way_with_margins)
# purchased
# gender No Yes Sum
# F 2 2 4
# M 3 2 5
# Sum 5 4 9
# Row totals only
addmargins(two_way, margin = 1)
# Column totals only
addmargins(two_way, margin = 2)
# Specific margin function
margin.table(two_way, 1) # Row margins
margin.table(two_way, 2) # Column margins
margin.table(two_way) # Grand total
Proportions and Percentages
# Calculate proportions
proportions <- prop.table(two_way)
print(proportions)
# All entries sum to 1
# Row proportions (sum to 1 across rows)
row_props <- prop.table(two_way, margin = 1)
print(row_props)
# Each row sums to 1
# Column proportions (sum to 1 down columns)
col_props <- prop.table(two_way, margin = 2)
print(col_props)
# Each column sums to 1
# Convert to percentages
percentages <- prop.table(two_way) * 100
round(percentages, 2)
Flattening Tables
# Convert table to data frame
df <- as.data.frame(two_way)
print(df)
# gender purchased Freq
# 1 F No 2
# 2 M No 3
# 3 F Yes 2
# 4 M Yes 2
# Access frequency counts
df$Freq
Using table() with Data Frames
Creating Tables from Data
# Create sample data
students <- data.frame(
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
grade = c("A", "B", "A", "C", "B"),
gender = c("F", "M", "M", "F", "F")
)
# Single variable table
table(students$grade)
# Two-way table
table(students$grade, students$gender)
# Using xtabs() for formula-based tables
xtabs(~ grade + gender, data = students)
Working with Raw Data
# Data with frequencies already counted
data <- data.frame(
color = c("red", "blue", "green"),
count = c(10, 15, 8)
)
# Expand back to individual records
expanded <- rep(data$color, data$count)
freq_table <- table(expanded)
print(freq_table)
Statistical Tests on Tables
Chi-Square Test
# Chi-square test for independence
chi_sq <- chisq.test(two_way)
print(chi_sq)
# X-squared = 0.4, df = 1, p-value = 0.5271
# Access components
chi_sq$statistic # Test statistic
chi_sq$p.value # P-value
chi_sq$expected # Expected frequencies
Expected Frequencies
# Calculate expected frequencies under independence
expected <- chisq.test(two_way)$expected
print(expected)
# Check if chi-square assumptions met
# All expected frequencies should be ≥ 5
all(expected >= 5)
Visualization
# Bar plot from table
plot(table(colors), main = "Color Distribution")
# Stacked bar plot from two-way table
barplot(two_way, beside = FALSE, legend = TRUE)
# Grouped bar plot
barplot(two_way, beside = TRUE, legend = TRUE)
# Heatmap for two-way table
heatmap(as.numeric(two_way), labRow = rownames(two_way),
labCol = colnames(two_way))
Complete Example
# Survey data
responses <- data.frame(
age = c("18-25", "26-35", "18-25", "26-35", "35+", "18-25", "35+", "26-35"),
satisfaction = c("High", "Low", "High", "High", "Low", "Low", "High", "High"),
product = c("A", "A", "B", "B", "A", "B", "A", "B")
)
# Create 3-way table
table(responses$age, responses$satisfaction, responses$product)
# Calculate proportions
prop.table(table(responses$age, responses$satisfaction))
# Chi-square test
chisq.test(table(responses$age, responses$satisfaction))
Best Practices
- Check frequencies - Ensure adequate counts in each cell
- Use margins - Add totals for context
- Calculate proportions - More interpretable than raw counts
- Verify assumptions - Chi-square needs expected frequencies ≥ 5
- Document sources - Note data collection and coding
- Visualize - Use barplots to show relationships
- Interpret carefully - Tables show association, not causation
Common Questions
Q: What’s the difference between table() and xtabs()? A: Both create tables, but xtabs() uses formula interface and handles weighted data better
Q: Should I use row or column proportions? A: Use the direction that matches your research question; usually proportions of explanatory variable
Q: How do I handle missing values in tables?
A: Use useNA = "ifany" in table() to include missing value categories
Q: What if my chi-square test shows low expected frequencies? A: Consider combining categories or using Fisher’s exact test for 2x2 tables
Related Topics
- R Hypothesis Testing - Complete Guide - Statistical tests for tables
- R Descriptive Statistics - Complete Guide - Table-based summaries
- R Data Visualization - Complete Guide - Visualize table data
Download R Script
Get all code examples from this tutorial: tables-contingency-examples.R