Importing and exporting data is one of the first and most important tasks in data analysis. Whether you’re working with CSV files, Excel spreadsheets, text files, or compressed archives, mastering data I/O operations in R is essential for efficient data workflow.
This comprehensive guide covers all methods for importing and exporting data with practical examples for every scenario.
Importing CSV Files
CSV (Comma-Separated Values) is the most common data format. R provides multiple methods to read CSV files.
Using read.csv()
# Basic CSV import
data <- read.csv("data.csv")
print(head(data))
# Specify separator (comma is default)
data <- read.csv("data.csv", sep = ",")
# Skip rows
data <- read.csv("data.csv", skip = 2)
# Skip header row
data <- read.csv("data.csv", header = FALSE)
# Set column names manually
data <- read.csv("data.csv", col.names = c("ID", "Name", "Age", "Score"))
# Specify data types
data <- read.csv("data.csv",
colClasses = c("integer", "character", "numeric", "logical"))
# Handle missing values
data <- read.csv("data.csv", na.strings = c("", "NA", "N/A", "."))
# Maximum rows to read
data <- read.csv("data.csv", nrows = 1000)
# Read first few rows
data <- read.csv("data.csv", nrows = 5)
Using read_csv() from readr Package
# Modern tidyverse approach
library(readr)
# Basic import
data <- read_csv("data.csv")
# Specify column types explicitly
data <- read_csv("data.csv",
col_types = cols(
ID = col_integer(),
Name = col_character(),
Age = col_integer(),
Score = col_double()
))
# Skip rows
data <- read_csv("data.csv", skip = 2)
# Read specific columns only
data <- read_csv("data.csv",
col_select = c("ID", "Name", "Score"))
# Handle encoding issues
data <- read_csv("data.csv", locale = locale(encoding = "UTF-8"))
# Progress bar for large files
data <- read_csv("data.csv", show_col_types = FALSE)
Importing Specific Rows
# Read CSV file line by line
lines <- readLines("data.csv")
# Find relevant rows and process
relevant_lines <- grep("pattern", lines)
# Read specific rows using skip and nrows
data <- read.csv("data.csv", skip = 5, nrows = 10)
# Using data.table for fast specific row reading
library(data.table)
data <- fread("data.csv", skip = 5, nrows = 10)
Importing Other Text Formats
Delimited Files (TSV, pipe-delimited, etc.)
# Read tab-separated values
data_tsv <- read.delim("data.tsv")
# Equivalent to:
data_tsv <- read.csv("data.tsv", sep = "\t")
# Read pipe-delimited file
data_pipe <- read.delim("data.txt", sep = "|")
# Read with custom delimiter
data_custom <- read.delim("data.txt", sep = ";")
# Using read_delim() from readr
library(readr)
data_tsv <- read_delim("data.tsv", delim = "\t")
data_pipe <- read_delim("data.txt", delim = "|")
# Specify skip and nrows
data <- read_delim("data.txt", delim = "|", skip = 1, n_max = 100)
Text Files
# Simple text file reading
content <- readLines("file.txt")
print(content)
# Read with connection
con <- file("file.txt", "r")
lines <- readLines(con, n = 10) # Read first 10 lines
close(con)
# Read entire file as single string
full_content <- paste(readLines("file.txt"), collapse = "\n")
# Read with line numbers
lines_with_numbers <- data.frame(
line_num = seq_along(readLines("file.txt")),
content = readLines("file.txt")
)
Using scan() for Flexible Reading
# Basic scan
data <- scan("file.txt")
# Scan with what parameter (specify type)
data <- scan("file.txt", what = numeric())
# Scan character data
text_data <- scan("file.txt", what = character())
# Scan with custom separator
data <- scan("file.txt", sep = ",")
# Scan with skip and n (number of items)
data <- scan("file.txt", skip = 5, n = 100)
# Quiet mode (suppress printing)
data <- scan("file.txt", quiet = TRUE)
Importing Excel Files
Using readxl Package
# Install package if needed
# install.packages("readxl")
library(readxl)
# Read Excel file (first sheet by default)
data <- read_excel("data.xlsx")
# Read specific sheet
data <- read_excel("data.xlsx", sheet = "Sheet2")
data <- read_excel("data.xlsx", sheet = 2) # By position
# Read with skip and column names
data <- read_excel("data.xlsx", skip = 1, col_names = TRUE)
# Specify data types
data <- read_excel("data.xlsx",
col_types = c("numeric", "text", "date", "numeric"))
# Read specific range
data <- read_excel("data.xlsx", range = "A1:D100")
# List all sheets
sheets <- excel_sheets("data.xlsx")
Using openxlsx Package (for large files)
# Install package
# install.packages("openxlsx")
library(openxlsx)
# Read Excel file
data <- read.xlsx("data.xlsx")
# Read specific sheet
data <- read.xlsx("data.xlsx", sheet = "Sheet1")
# Faster for large files
data <- read.xlsx("data.xlsx", sheet = 1, cols = 1:10)
# Handle merged cells and formatting
data <- read.xlsx("data.xlsx", startRow = 1)
Importing Compressed Files
Reading from ZIP Archives
# List files in ZIP
files <- unzip("archive.zip", list = TRUE)
print(files$Name)
# Extract and read CSV from ZIP
data <- read.csv(unzip("archive.zip", "data.csv"))
# Extract specific file
unzip("archive.zip", files = "data.csv", exdir = "extracted/")
data <- read.csv("extracted/data.csv")
# Process multiple files from ZIP
zip_files <- unzip("archive.zip", list = TRUE)$Name
all_data <- lapply(zip_files, function(file) {
read.csv(unzip("archive.zip", file))
})
Fast Data Import with data.table
Using fread() for Large Files
# Fast reading with fread
library(data.table)
data <- fread("large_data.csv")
# Much faster than read.csv for large files
system.time(read.csv("large_data.csv")) # Slower
system.time(fread("large_data.csv")) # Faster
# Specify separator
data <- fread("data.tsv", sep = "\t")
# Skip and read specific rows
data <- fread("data.csv", skip = 100, nrows = 1000)
# Select specific columns
data <- fread("data.csv", select = c("ID", "Name", "Score"))
# Handle missing values
data <- fread("data.csv", na.strings = c("", "NA", "null"))
# Read without header
data <- fread("data.csv", header = FALSE)
# Return as data frame instead of data.table
data <- fread("data.csv", data.table = FALSE)
Exporting Data
Exporting to CSV
# Basic CSV export
write.csv(data, "output.csv")
# Without row names
write.csv(data, "output.csv", row.names = FALSE)
# Custom separator
write.csv(data, "output.txt", sep = "\t", row.names = FALSE)
# Specify decimal and quote characters
write.csv(data, "output.csv",
dec = ".",
quote = TRUE,
row.names = FALSE)
# Using readr (tidyverse)
library(readr)
write_csv(data, "output.csv")
write_delim(data, "output.txt", delim = "\t")
# Using data.table (faster for large data)
library(data.table)
fwrite(data, "output.csv")
Exporting to Excel
# Using openxlsx
library(openxlsx)
# Export single sheet
write.xlsx(data, "output.xlsx", sheetName = "Sheet1")
# Export multiple sheets
data_list <- list(
employees = emp_data,
sales = sales_data,
summary = summary_data
)
write.xlsx(data_list, "output.xlsx")
# With formatting
write.xlsx(data, "output.xlsx",
sheetName = "Data",
startCol = 1,
startRow = 1)
# Using writexl (simpler alternative)
library(writexl)
write_xlsx(data, "output.xlsx")
write_xlsx(data_list, "output.xlsx")
Exporting to Text Files
# Write to text file
write.table(data, "output.txt", row.names = FALSE, quote = FALSE)
# With specific separator
write.table(data, "output.txt", sep = "\t", row.names = FALSE)
# Using writeLines for character data
writeLines(text_vector, "output.txt")
# Append to existing file
write.table(data, "output.txt", append = TRUE, row.names = FALSE)
# Using data.table
library(data.table)
fwrite(data, "output.txt", sep = "\t")
Exporting Lists and Complex Objects
# Export list to text
data_list <- list(
name = "Analysis",
results = data.frame(x = 1:5, y = 6:10),
summary = c(mean = 5, sd = 2)
)
# Write readable format
sink("output.txt")
str(data_list)
print(data_list)
sink()
# Save as R object (binary format)
saveRDS(data_list, "data_list.rds")
loaded_list <- readRDS("data_list.rds")
# Save multiple objects
save(data1, data2, data3, file = "objects.RData")
load("objects.RData")
File Management Operations
Common File Operations
# List files in directory
files <- list.files("data/")
csv_files <- list.files("data/", pattern = "\\.csv$")
# Check if file exists
file.exists("data.csv")
# Get file path info
file_path <- "data/myfile.csv"
dir_name <- dirname(file_path) # "data"
base_name <- basename(file_path) # "myfile.csv"
# File size
file.size("data.csv")
# File modification time
file.mtime("data.csv")
# Create directory
dir.create("output", showWarnings = FALSE)
# Remove files
file.remove("temp.csv")
# Copy files
file.copy("original.csv", "backup.csv")
# Rename files
file.rename("old_name.csv", "new_name.csv")
Complete Practical Example
# Workflow: Import, process, and export
# Import from CSV
library(readr)
raw_data <- read_csv("sales_data.csv")
# Clean and transform
library(dplyr)
cleaned_data <- raw_data %>%
filter(!is.na(amount)) %>%
mutate(
date = as.Date(date),
category = as.factor(category),
amount_usd = amount * 1.2
) %>%
group_by(category) %>%
summarize(
total_sales = sum(amount_usd),
avg_sale = mean(amount_usd),
n_transactions = n()
)
# Export to multiple formats
# CSV
write_csv(cleaned_data, "summary_sales.csv")
# Excel with formatting
library(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, "Summary")
writeData(wb, "Summary", cleaned_data)
saveWorkbook(wb, "summary_sales.xlsx")
# Text file
write.table(cleaned_data, "summary_sales.txt", sep = "\t", row.names = FALSE)
Best Practices
- Always specify encoding - Prevents character encoding issues
- Check data types after import - Use
str()to verify - Use fast methods for large files -
fread()vsread.csv() - Specify column types - More reliable than guessing
- Handle missing values explicitly - Define what represents NA
- Skip metadata rows - Use
skipparameter for headers/descriptions - Use relative paths - Better for reproducibility
- Backup original data - Never overwrite source files
- Document transformations - Comment why data is modified
- Test with small subset first - Use
nrowsto test import settings
Common Questions
Q: How do I import a large CSV file quickly?
A: Use fread() from data.table package - it’s 5-10x faster than read.csv()
Q: How do I handle different column types?
A: Specify with colClasses in read.csv() or col_types in read_csv()
Q: How do I import only specific columns?
A: Use col_select in read_csv() or select parameter in fread()
Q: How do I read multiple sheets from an Excel file?
A: Use lapply() with excel_sheets() to get all sheet names and read each
Q: What’s the best format to export data for sharing? A: CSV for universal compatibility, Excel for formatting, RDS for R objects
Q: How do I handle encoding issues?
A: Specify encoding with locale() in readr or encoding in base R functions
Q: How do I import data with inconsistent formats?
A: Use fread() with na.strings to handle multiple missing value representations
Related Topics
Build on your import/export skills:
- R Data Transformation - Complete Guide - Clean imported data
- R Data Frames - Complete Guide - Work with imported data frames
- R Descriptive Statistics - Complete Guide - Analyze imported data
Download R Script
Get all code examples from this tutorial: data-import-export-examples.R