Introduction
A picture is worth a thousand words. In data analysis, this is absolutely true. A well-designed visualization can reveal patterns, tell stories, and convince stakeholders in ways that tables of numbers never can.
Data visualization is one of the most powerful skills in data science. Whether you’re exploring your own data, communicating findings to your team, or publishing research, the ability to create compelling visualizations is essential.
R has become the go-to language for data visualization, with multiple powerful tools at your disposal. From basic plots to interactive dashboards, R can do it all.
By the end of this guide, you’ll understand:
- How to create basic plots with base R
- How to build publication-quality graphics with ggplot2
- When to use each visualization type
- How to customize and enhance your plots
- How to create interactive visualizations
- Best practices for clear, effective communication
- How to save and export your work
This guide is for data analysts, researchers, business professionals, and anyone learning R visualization. We’ll use real datasets and practical examples throughout.
What you need:
- Basic R knowledge (vectors, data frames)
- R 4.0 or higher installed
- Curiosity about turning data into insights
Here’s what we’re covering:
- Visualization fundamentals
- Base R plotting
- ggplot2 essentials
- Statistical plots
- Customization & enhancement
- Interactive graphics
- Best practices & troubleshooting
Let’s create some beautiful visualizations.
Why Visualization Matters
Before diving into code, let’s understand why visualization is so important.
The Anscombe Quartet Example
Anscombe’s Quartet is a famous dataset with four different datasets that have:
- Identical means (9.0)
- Identical standard deviations (11.0)
- Identical correlation (0.816)
- Identical regression lines
But when you visualize them? Completely different! One is linear, one is curved, one is perfect except for an outlier, and one is random with one extreme outlier.
The lesson: Always visualize your data. Summary statistics can be misleading.
What Makes a Good Visualization?
A good visualization:
- Answers a question - Know what story you’re telling
- Is accurate - Don’t distort the data
- Is clear - No confusion about meaning
- Is accessible - Works for color-blind users, different screen sizes
- Avoids clutter - Remove unnecessary decoration (chartjunk)
- Uses color strategically - Not as decoration, but to enhance understanding
Visualization Fundamentals
Choosing the Right Plot Type
Different data needs different plots:
For comparing values across categories:
- Bar chart (most common)
- Lollipop chart (modern alternative)
- Dot plot (multiple categories)
For showing relationships:
- Scatter plot (continuous vs continuous)
- Bubble chart (add third variable with size)
- Line plot (emphasize trend over time)
For showing distributions:
- Histogram (single variable)
- Density plot (smoother distribution view)
- Box plot (compare distributions across groups)
- Violin plot (show full distribution shape)
For showing compositions:
- Pie chart (simple percentages only)
- Stacked bar chart (multiple categories)
- Treemap (hierarchical compositions)
For showing relationships between many variables:
- Heatmap (color intensity)
- Correlation plot (strength of relationships)
- Faceted plots (subsets by category)
Base R Plotting
Base R comes built-in with plotting functions. They’re fast, simple, and perfect for exploratory data analysis.
Example 1: Basic Scatter Plot
# Create scatter plot
data(mtcars)
plot(mtcars$wt, mtcars$mpg,
main = "Weight vs Fuel Efficiency",
xlab = "Weight (1000 lbs)",
ylab = "Miles Per Gallon",
pch = 16, # solid circle points
col = "blue", # blue color
cex = 1.5) # larger size
# Add a trend line
fit <- lm(mpg ~ wt, data = mtcars)
abline(fit, col = "red", lwd = 2)
Output: A scatter plot showing the negative relationship between car weight and fuel efficiency.
Example 2: Histogram
# Create histogram of MPG distribution
hist(mtcars$mpg,
main = "Distribution of Fuel Efficiency",
xlab = "Miles Per Gallon",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 15) # more bins for detail
Output: Shows most cars get 15-20 MPG, with a right tail.
Example 3: Boxplot
# Compare MPG by number of cylinders
boxplot(mpg ~ cyl, data = mtcars,
main = "MPG by Engine Type",
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
col = c("lightblue", "lightgreen", "lightcoral"))
Output: Shows that cars with fewer cylinders get better fuel efficiency.
Example 4: Bar Chart
# Count cars by cylinder
cyl_counts <- table(mtcars$cyl)
barplot(cyl_counts,
main = "Number of Cars by Engine Type",
xlab = "Number of Cylinders",
ylab = "Count",
col = "steelblue",
names.arg = c("4 Cyl", "6 Cyl", "8 Cyl"))
Output: Simple count of how many cars have each engine type.
Key Base R Parameters
| Parameter | Purpose | Example |
|---|---|---|
main |
Plot title | main = "My Title" |
xlab, ylab |
Axis labels | xlab = "X Axis" |
col |
Color | col = "blue" |
pch |
Point shape (1-25) | pch = 16 (solid circle) |
cex |
Point size multiplier | cex = 1.5 (50% larger) |
lwd |
Line width | lwd = 2 |
xlim, ylim |
Axis limits | xlim = c(0, 10) |
type |
Line plot type | type = "l" (line) or "b" (both) |
ggplot2: The Grammar of Graphics
ggplot2 creates publication-quality graphics with a consistent, intuitive syntax. It follows the “grammar of graphics” - building plots layer by layer.
Core ggplot2 Concept
Every ggplot has this structure:
ggplot(data = dataset, aes(x = var1, y = var2)) +
geom_point() +
labs(title = "Title", x = "X Label", y = "Y Label") +
theme_minimal()
Breaking it down:
ggplot()- Start the plot with dataaes()- Aesthetic mappings (which variables go where)geom_*()- Geometric objects (points, bars, lines)labs()- Labels and titlestheme_*()- Overall look and feel
Example 5: Simple ggplot2 Scatter Plot
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "blue", size = 3) +
geom_smooth(method = "lm", se = TRUE, color = "red") +
labs(
title = "Weight vs Fuel Efficiency",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon"
) +
theme_minimal()
Output: Professional scatter plot with regression line and confidence interval.
Example 6: Bar Chart with ggplot2
# Prepare data
cyl_data <- mtcars %>%
group_by(cyl) %>%
summarize(avg_mpg = mean(mpg), .groups = 'drop')
# Create plot
ggplot(cyl_data, aes(x = factor(cyl), y = avg_mpg)) +
geom_col(fill = "steelblue", color = "black") +
geom_text(aes(label = round(avg_mpg, 1)),
vjust = -0.5, size = 4) +
labs(
title = "Average MPG by Engine Type",
x = "Number of Cylinders",
y = "Average MPG"
) +
theme_minimal()
Output: Bar chart with values labeled on top.
Example 7: Histogram with ggplot2
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2, fill = "lightblue", color = "black") +
geom_density(aes(y = ..density..), color = "red", size = 1) +
labs(
title = "Distribution of Fuel Efficiency",
x = "Miles Per Gallon",
y = "Density"
) +
theme_minimal()
Output: Histogram with density curve overlay.
Example 8: Boxplot Comparison
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.3) +
labs(
title = "MPG Distribution by Engine Type",
x = "Number of Cylinders",
y = "Miles Per Gallon",
fill = "Cylinders"
) +
theme_minimal() +
theme(legend.position = "bottom")
Output: Boxplots with individual points overlaid, color-coded by group.
Key ggplot2 Geoms
| Geom | Use | Code |
|---|---|---|
| geom_point | Scatter plot | geom_point() |
| geom_col/geom_bar | Bar chart | geom_col() |
| geom_line | Line plot | geom_line() |
| geom_histogram | Histogram | geom_histogram() |
| geom_boxplot | Boxplot | geom_boxplot() |
| geom_density | Density plot | geom_density() |
| geom_smooth | Trend line | geom_smooth() |
| geom_tile | Heatmap | geom_tile() |
| geom_text | Text labels | geom_text() |
Statistical Visualizations
Example 9: Correlation Plot (Heatmap)
library(corrplot)
# Calculate correlations
cor_matrix <- cor(mtcars[, c("mpg", "wt", "hp", "cyl")])
# Visualize
corrplot(cor_matrix, method = "circle",
type = "upper",
addCoef.col = "black",
tl.col = "black")
Output: Circle heatmap showing correlation strength between variables.
Example 10: Violin Plot (Distribution Details)
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_violin(alpha = 0.6) +
geom_boxplot(width = 0.1, alpha = 0.8) +
labs(
title = "MPG Distribution - Violin Plot",
x = "Number of Cylinders",
y = "Miles Per Gallon"
) +
theme_minimal() +
theme(legend.position = "none")
Output: Shows full distribution shape compared to simple boxplot.
Example 11: Faceted Plots (Multiple Subplots)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
facet_wrap(~cyl, labeller = labeller(cyl = c("4" = "4 Cylinders", "6" = "6 Cylinders", "8" = "8 Cylinders"))) +
labs(
title = "Weight vs MPG by Engine Type",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon"
) +
theme_minimal()
Output: Separate plot for each cylinder count, showing different patterns.
Customization & Enhancement
Example 12: Custom Colors & Themes
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot(alpha = 0.7) +
scale_fill_manual(
values = c("4" = "#1f77b4", "6" = "#ff7f0e", "8" = "#d62728"),
labels = c("4 Cylinders", "6 Cylinders", "8 Cylinders")
) +
labs(
title = "Fuel Efficiency by Engine Type",
subtitle = "Real-world car data",
x = "Number of Cylinders",
y = "Miles Per Gallon",
fill = "Engine Type",
caption = "Data source: mtcars"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
axis.text = element_text(size = 11),
legend.position = "bottom"
)
Output: Professional-looking plot with custom colors, improved typography.
Example 13: Adding Annotations
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(size = 3, color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
annotate("text", x = 5.5, y = 30,
label = "Strong negative\nrelationship",
size = 4, color = "red") +
annotate("rect", xmin = 5, xmax = 6, ymin = 25, ymax = 32,
alpha = 0.1, fill = "yellow") +
labs(
title = "Weight vs Fuel Efficiency",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon"
) +
theme_minimal()
Output: Plot with text and shape annotations highlighting key insight.
Example 14: Saving Plots
# Create plot
p <- ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
theme_minimal()
# Save as PNG
ggsave("plot.png", plot = p, width = 8, height = 6, dpi = 300)
# Save as PDF
ggsave("plot.pdf", plot = p, width = 8, height = 6)
# Save multiple plots to PDF
pdf("multiple_plots.pdf", width = 8, height = 6)
plot(mtcars$wt, mtcars$mpg)
hist(mtcars$mpg)
dev.off()
Output: High-quality images ready for presentation or publication.
Interactive Graphics
Example 15: Interactive Plots with plotly
library(plotly)
# Create ggplot
p <- ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), size = hp)) +
geom_point(alpha = 0.6) +
labs(title = "Car Performance", x = "Weight", y = "MPG") +
theme_minimal()
# Convert to interactive
ggplotly(p)
Output: Hover over points to see exact values. Zoom, pan, and toggle series on/off.
Example 16: Plotly Directly
library(plotly)
plot_ly(mtcars, x = ~wt, y = ~mpg,
color = ~factor(cyl), size = ~hp,
type = "scatter", mode = "markers",
hovertemplate = "<b>%{customdata}</b><br>Weight: %{x}<br>MPG: %{y}<extra></extra>",
customdata = rownames(mtcars)) %>%
layout(
title = "Interactive Car Performance",
xaxis = list(title = "Weight (1000 lbs)"),
yaxis = list(title = "Miles Per Gallon"),
hovermode = "closest"
)
Output: Fully interactive scatter plot with hover information.
Best Practices for Data Visualization
1. Start with Questions, Not Tools
Before opening R:
- What question are you answering?
- Who is your audience?
- What story do you want to tell?
2. Choose the Right Visualization
Different data needs different plots. Refer to the “Choosing the Right Plot Type” section above.
3. Make It Easy to Read
# ✓ GOOD: Clear, accessible
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Fuel Efficiency by Engine Type",
x = "Number of Cylinders",
y = "Miles Per Gallon (MPG)") +
theme_minimal() +
theme(text = element_text(size = 12))
# ✗ BAD: Cluttered, hard to read
plot(mtcars$cyl, mtcars$mpg) # Minimal labeling
4. Use Color Strategically
# ✓ GOOD: Color serves a purpose
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(am))) +
geom_point(size = 3) +
scale_color_manual(
values = c("0" = "#d62728", "1" = "#2ca02c"),
labels = c("0" = "Automatic", "1" = "Manual")
) +
theme_minimal()
# ✗ BAD: Too many colors, no purpose
# (Colors everywhere but no clear meaning)
5. Avoid Chartjunk
Chartjunk is decorative elements that don’t add information:
- Excessive 3D effects
- Gridlines when not needed
- Decorative backgrounds
- Unnecessary legends
6. Ensure Accessibility
# Use color-blind friendly palettes
library(viridis)
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point(size = 3) +
scale_color_viridis_d() + # Colorblind-friendly
theme_minimal()
Troubleshooting Common Issues
Issue #1: “Plot is too crowded”
Solution: Use faceting or filter data
# Instead of:
# ggplot(huge_dataset, aes(x, y)) + geom_point()
# Do this:
ggplot(huge_dataset %>% filter(category %in% c("A", "B")),
aes(x, y)) +
geom_point() +
facet_wrap(~category)
Issue #2: “Colors look terrible printed in black & white”
Solution: Use position instead of color
# Instead of relying on color:
ggplot(data, aes(x = var1, y = var2, color = group)) +
geom_point()
# Combine color with shape:
ggplot(data, aes(x = var1, y = var2, color = group, shape = group)) +
geom_point(size = 3)
Issue #3: “Legend is too big”
Solution: Adjust legend position and size
ggplot(data, aes(x, y, color = group)) +
geom_point() +
theme(
legend.position = "right",
legend.title = element_text(size = 8),
legend.text = element_text(size = 7)
)
Frequently Asked Questions
Q: Should I use base R or ggplot2? A: For quick exploration, base R is faster. For publication-quality graphics, ggplot2 is better. Learn both - they’re complementary.
Q: What’s the difference between geom_col() and geom_bar()?
A: geom_col() uses values directly. geom_bar() counts occurrences. Use geom_col() with pre-aggregated data, geom_bar() with raw data.
Q: How do I make plots publication-ready? A: Use ggplot2 with custom themes, save at 300 DPI, choose a sans-serif font, use color-blind friendly palettes, and remove unnecessary elements.
Q: What’s the best color palette? A: Viridis, RColorBrewer, or colorblind-friendly palettes. Avoid rainbow colors - they’re hard to distinguish and problematic for colorblind viewers.
Q: Can I combine base R and ggplot2 plots? A: Not easily - they use different graphics systems. Pick one or use plotly for interactive conversion.
Q: How do I handle overlapping points?
A: Use geom_jitter() to add random noise, alpha for transparency, or geom_density2d() for density contours.
Q: What’s the best way to compare many groups?
A: Use faceting (facet_wrap() or facet_grid()), small multiples, or interactive plots (plotly).
Q: How do I add a horizontal/vertical line?
A: Use geom_hline() or geom_vline():
ggplot(data, aes(x, y)) +
geom_point() +
geom_hline(yintercept = mean(data$y), color = "red")
Summary & Next Steps
You now know:
- ✅ Why visualization is crucial for data analysis
- ✅ How to create basic plots with base R
- ✅ How to build professional graphics with ggplot2
- ✅ When to use each visualization type
- ✅ How to customize and enhance your plots
- ✅ How to create interactive visualizations
- ✅ Best practices for effective communication
- ✅ How to troubleshoot common issues
Download Your R Script
Download the complete R script with all examples from this guide
Data visualization is both an art and a science. The more you practice, the better you’ll become at telling stories with your data. Keep visualizing!