When you load data into R, understanding the data types of each column is crucial. If a column should be numeric but R thinks it’s character, your calculations will fail. If a date column is stored as character, sorting and filtering won’t work correctly.

In this guide, I’ll show you:

  • How to check data types for a single column
  • How to check all columns at once
  • When to use class() vs str() vs typeof()
  • How to convert between data types
  • Common type problems and solutions

By the end, you’ll know exactly what’s in your data frame and how to fix type issues before they break your analysis.

Prerequisites

  • R 3.0 or higher (base R only)
  • Basic familiarity with data frames
  • RStudio (recommended but not required)

Why Data Types Matter

Here’s what happens when types are wrong:

# Example: Wrong data type causes problems
prices <- data.frame(
  item = c("apple", "banana", "orange"),
  price = c("1.50", "0.75", "2.00")  # Stored as character, not numeric!
)

# Try to calculate total
sum(prices$price)

# Error: invalid type (character) of argument to internal function

See? The sum() function won’t work because the prices are stored as text, not numbers. That’s why checking types first is essential.

Understanding R Data Types

R has several basic data types you’ll encounter:

  • numeric: Numbers (integers and decimals) - 5, 3.14
  • integer: Whole numbers only - 5L
  • character: Text strings - "apple", "hello"
  • logical: TRUE/FALSE boolean values
  • Date: Calendar dates - 2024-01-15
  • factor: Categorical data with levels
  • complex: Complex numbers (rare)

Each requires different functions and behaves differently. Let’s learn how to identify them.

Method 1: class() - The Most Common Approach

The class() function tells you the basic type of an object. This is what I use 95% of the time:

# Create sample data frame
df <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(28, 35, 42),
  hire_date = as.Date(c("2020-01-15", "2019-06-20", "2021-03-10")),
  is_manager = c(TRUE, FALSE, TRUE)
)

# Check class of individual columns
class(df$name)       # [1] "character"
class(df$age)        # [1] "numeric"
class(df$hire_date)  # [1] "Date"
class(df$is_manager) # [1] "logical"

Simple and straightforward. One function, one clear answer.

Method 2: sapply() - Check All Columns at Once

When you have many columns, checking each one individually gets tedious. Use sapply() to check them all:

# Check all columns with sapply()
sapply(df, class)

# Output:
#         name          age    hire_date   is_manager
#  "character"    "numeric"       "Date"    "logical"

This is much better. One line shows all types. I use this constantly.

Storing Results for Inspection

# Store in a data frame for better viewing
type_summary <- data.frame(
  column_name = names(df),
  data_type = sapply(df, class),
  row.names = NULL
)

print(type_summary)

# Output:
#  column_name data_type
# 1        name character
# 2         age   numeric
# 3  hire_date       Date
# 4 is_manager   logical

Much cleaner when you have 20 columns.

Method 3: str() - The Comprehensive Overview

The str() function (structure) gives you everything at once:

str(df)

# Output:
# 'data.frame': 3 obs. of 4 variables:
#  $ name       : chr  "Alice" "Bob" "Charlie"
#  $ age        : num  28 35 42
#  $ hire_date  : Date, format: "2020-01-15" "2019-06-20" "2021-03-10"
#  $ is_manager : logi  TRUE FALSE TRUE

Notice str() shows:

  • Total rows and columns
  • Each column’s type
  • Sample values
  • Format information (especially useful for dates)

When to Use str()

Use str() when you first load data to understand structure. I often run str(df) immediately after loading to understand what I’m working with.

# Load a CSV and immediately check structure
df <- read.csv("sales_data.csv")
str(df)  # Instantly see what's what

Method 4: typeof() - The Technical Approach

typeof() is more technical than class(). Most beginners should use class() instead:

typeof(df$age)    # [1] "double"
typeof(df$name)   # [1] "character"

# Compare with class()
class(df$age)     # [1] "numeric"

See the difference? typeof() says “double” (a technical term for floating-point numbers), while class() says “numeric” (more user-friendly).

Use typeof() only if you need technical details about storage. For everyday work, stick with class().

Method 5: dplyr::glimpse() - Modern Alternative

If you’re using tidyverse packages, glimpse() is like str() but more readable:

library(dplyr)

glimpse(df)

# Output:
# Rows: 3
# Columns: 4
# $ name       <chr> "Alice", "Bob", "Charlie"
# $ age        <dbl> 28, 35, 42
# $ hire_date  <date> 2020-01-15, 2019-06-20, 2021-03-10
# $ is_manager <logi> TRUE, FALSE, TRUE

Beautiful formatting. If you’re already using dplyr, this is my recommendation.

Practical Example: Real-World Data Frame

Let’s check a realistic data frame with mixed types:

# Create realistic data frame
sales_data <- data.frame(
  transaction_id = c(1001, 1002, 1003, 1004, 1005),
  customer_name = c("John Smith", "Jane Doe", "Bob Wilson", "Alice Brown", "Charlie Davis"),
  purchase_date = as.Date(c("2024-01-10", "2024-01-15", "2024-02-05", "2024-02-12", "2024-03-01")),
  amount = c(150.50, 200.00, 75.25, 320.99, 145.00),
  is_returning_customer = c(FALSE, TRUE, FALSE, TRUE, FALSE),
  notes = c("Standard order", NA, "Rush order", "VIP customer", "Bulk discount applied")
)

# Check types
sapply(sales_data, class)

# Output:
#  transaction_id   customer_name   purchase_date          amount is_returning_customer
#       "numeric"      "character"          "Date"       "numeric"           "logical"
#          notes
#      "character"

Perfect. Now you understand exactly what’s in this data frame.

Advanced: Check for Specific Types

Sometimes you want to find which columns are numeric, which are character, etc:

# Find all numeric columns
numeric_cols <- names(df)[sapply(df, is.numeric)]
print(numeric_cols)

# Output: [1] "age"

# Find all character columns
char_cols <- names(df)[sapply(df, is.character)]
print(char_cols)

# Output: [1] "name"

# Find all logical columns
logical_cols <- names(df)[sapply(df, is.logical)]
print(logical_cols)

# Output: [1] "is_manager"

Useful when you need to process columns by type.

Converting Data Types

Now that you know how to check types, here’s how to fix them when they’re wrong:

Convert Character to Numeric

# Problem: prices stored as character
prices <- c("19.99", "29.50", "15.00")
class(prices)  # [1] "character"

# Solution: convert to numeric
prices_numeric <- as.numeric(prices)
class(prices_numeric)  # [1] "numeric"
sum(prices_numeric)    # [1] 64.49

Convert Character to Date

# Problem: dates stored as character
dates <- c("2024-01-15", "2024-01-20", "2024-01-25")
class(dates)  # [1] "character"

# Solution: convert to Date
dates_proper <- as.Date(dates)
class(dates_proper)  # [1] "Date"

# Now date operations work
dates_proper[1] + 30  # Add 30 days

Convert to Factor (for categories)

# Product categories
categories <- c("Electronics", "Clothing", "Electronics", "Food", "Clothing")
class(categories)  # [1] "character"

# Convert to factor for analysis
categories_factor <- as.factor(categories)
class(categories_factor)  # [1] "factor"

# Now you can see levels
levels(categories_factor)  # [1] "Clothing" "Electronics" "Food"

Troubleshooting Common Type Problems

Problem 1: Numeric Column Stored as Character

df <- data.frame(
  values = c("100", "200", "300")  # Should be numeric!
)

# This happens when loading CSV with read.csv()
# Solution:
df$values <- as.numeric(df$values)
class(df$values)  # Now "numeric"

Problem 2: Date Column Stored as Character

df <- data.frame(
  dates = c("2024-01-15", "2024-01-20")
)

# This is character, but should be Date
df$dates <- as.Date(df$dates, format = "%Y-%m-%d")
class(df$dates)  # Now "Date"

Note the format parameter - tell R how to interpret the characters as dates.

Problem 3: Factor vs Character - When to Use Each

# Character: unlimited unique values
countries <- c("USA", "Canada", "USA", "Mexico", "Brazil")
countries <- as.character(countries)  # Good for open-ended data

# Factor: limited categories (better for storage)
product_types <- c("A", "B", "A", "C", "B")
product_types <- as.factor(product_types)  # 3 levels: A, B, C

Use factors when you know the possible values. Use character when values are open-ended.

FAQ

Q: What’s the difference between class() and typeof()? A: class() is user-friendly and what you’ll use 99% of the time. typeof() is more technical and tells you how R stores it internally. Beginners should use class().

Q: Why is my numeric column showing as character? A: Usually because the CSV had non-numeric characters mixed in (like currency symbols or commas). Check with head(column_name) and convert with as.numeric().

Q: How do I check if a column contains NA values? A: Use sum(is.na(df$column_name)) to count NAs, or any(is.na(df$column_name)) for TRUE/FALSE.

Q: Can I convert all character columns to numeric automatically? A: You can, but be careful - non-numeric characters will become NA. Better to check first: sapply(df, class) to see what needs converting.

Q: What if I have a Date stored as a weird format? A: Use as.Date() with the format parameter. For example: as.Date("01/15/2024", format = "%m/%d/%Y")

Q: Is there a way to change data types for multiple columns at once? A: Yes, with tidyverse: df %>% mutate(across(c(col1, col2), as.numeric))

Q: Why does R sometimes convert my numbers to factors? A: This usually happens when reading CSV files. Use read.csv(..., stringsAsFactors = FALSE) to prevent this.

Best Practices

  1. Always check str() after loading data - catches type issues immediately
  2. Use class() for checking individual columns - simpler and clearer
  3. Convert types right after loading - don’t wait until analysis breaks
  4. Be careful with automatic conversions - as.numeric() may silently create NAs
  5. Use factors for categorical data - saves memory and clarifies intent
  6. Document type requirements in scripts - “This script requires numeric prices in column 5”

Download R Script

Get all code examples from this tutorial: data-type-checking-examples.R