Rolling correlation calculates correlation coefficients across a moving window of data. It’s invaluable when analyzing time series data where the relationship between variables changes over time. Unlike a single correlation coefficient for your entire dataset, rolling correlation shows you how that relationship evolves.

I use rolling correlation frequently when analyzing financial data, climate patterns, and any time series where the relationship between variables isn’t constant. It reveals trends and structural breaks you’d miss with a single correlation.

When to Use Rolling Correlation

You’ll use rolling correlation when:

  • Analyzing time series data with changing relationships
  • Detecting regime changes in financial data
  • Understanding evolving relationships between variables
  • Smoothing noisy correlation estimates
  • Studying temporal dynamics in paired variables

Basic Concepts

Rolling Window: A fixed-size subset of data that “slides” across your dataset

  • Window size = 3: calculates correlation for rows [1,2,3], then [2,3,4], then [3,4,5], etc.
  • Smaller windows = more detail but noisier
  • Larger windows = smoother but less responsive to changes

Syntax and Methods

Method 1: Using rollapply() from zoo package

library(zoo)
rollapply(data, width=3, function(x) cor(x[,2],x[,3]), by.column=FALSE)

Method 2: Rolling correlation with TTR package

library(TTR)
runCor(x, y, n=20)  # 20-period rolling correlation

Use rollapply() to Calculate rolling correlation

Let’s see how we can calculate rolling correlation between column of data frame:

# Load library
library(zoo)

# Create data frame
df <- data.frame(month=1:8,
                 Pressure=c(12.39,11.25,12.15,13.48,13.78,12.89,12.21,12.58),
                 Temperature=c(78,89,85,84,81,79,77,85))

# Calculate rolling correlation
rollapply(df, width=3, function(x) cor(x[,2],x[,3]), by.column=FALSE)

Output:

[1] -0.8875682 -0.9028927 -0.8075214  0.5601691  0.9970314  0.2892685

As the output shows, rolling correlation between Pressure and Temperature columns with a window size of 3. Notice how the correlation shifts from strongly negative (-0.89) to positive (0.56-0.99), revealing the changing relationship over time.

Common Mistakes to Avoid

Mistake 1: Wrong window size interpretation

# ❌ CONFUSION - Think window=3 means something different
rollapply(df, width=3, FUN)  # Calculates on rows [1-3], [2-4], [3-5]...

# ✅ UNDERSTAND - Window size is the NUMBER OF OBSERVATIONS in each calculation
# width=3 = use 3 rows for each correlation
# width=5 = use 5 rows for each correlation (smoother but fewer results)

Mistake 2: Forgetting to specify by.column=FALSE

# ❌ WRONG - Treats each column separately
rollapply(df, width=3, function(x) cor(x[,2],x[,3]))  # Unexpected results

# ✅ CORRECT - by.column=FALSE passes entire window as rows
rollapply(df, width=3, function(x) cor(x[,2],x[,3]), by.column=FALSE)

Mistake 3: Not handling missing values (NA)

# ❌ PROBLEM - NA values break correlation calculation
df_with_na <- df
df_with_na[3,2] <- NA
rollapply(df_with_na, width=3, function(x) cor(x[,2],x[,3]), by.column=FALSE)
# Returns NAs in results

# ✅ SOLUTION - Use use="complete.obs"
rollapply(df_with_na, width=3, function(x) cor(x[,2],x[,3], use="complete.obs"), by.column=FALSE)

Mistake 4: Window too small gives unreliable correlations

# ❌ UNRELIABLE - Correlation with only 2-3 observations is noisy
rollapply(df, width=2, function(x) cor(x[,2],x[,3]), by.column=FALSE)

# ✅ BETTER - Use larger window for stability (minimum ~10-20 observations)
rollapply(df, width=10, function(x) cor(x[,2],x[,3]), by.column=FALSE)

Pro Tips

  1. Visualize rolling correlation - Plot it to see the trend:

    rolling_cor <- rollapply(df, width=5, function(x) cor(x[,2],x[,3]), by.column=FALSE)
    plot(rolling_cor, type="l")
    
  2. Compare different window sizes - To understand trade-offs:

    plot(rollapply(df, width=3, ...), main="width=3 (responsive)")
    plot(rollapply(df, width=10, ...), main="width=10 (smooth)")
    
  3. Use TTR package for convenience:

    library(TTR)
    runCor(df$Pressure, df$Temperature, n=5)  # Simpler syntax
    
  4. Handle edge cases - First (width-1) values will be NA for starting window

See Also