Bivariate analysis in R involves analyzing the relationship between two variables. To perform Bivariate analysis you can perform operation like finding corrlation coefficient and perform regression, visualizations like scatter plots.
To perform bivariate analysis you need to follow below steps:
Step 1: Load required libraries
library(tidyverse)
Here we load tidyverse library which is required for further use.
Step 2: Load dataset
You can load in-built dataset or create dataframe to perform bivariate analysis on it:
# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","E","F","G","H"),
Pressure=c(12.39,11.25,12.15,13.48,13.78,12.89,12.21,12.58),
Temperature=c(78,89,85,84,81,79,77,85),
Status=c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE))
print(df)
Output:
Machine_name Pressure Temperature Status
1 A 12.39 78 TRUE
2 B 11.25 89 TRUE
3 C 12.15 85 FALSE
4 D 13.48 84 TRUE
5 E 13.78 81 FALSE
6 F 12.89 79 FALSE
7 G 12.21 77 TRUE
8 H 12.58 85 FALSE
Here the output shows dataframe that we created in above code.
Step 3: Correlation Analysis
Let’s find correlation between two columns of dataframe using cor() function:
# Calculate correlation coeficient
c <- cor(df$Pressure,df$Temperature)
# Display correlation coeficient
print(c)
Output:
[1] -0.3579008
Here the output show correlation between Pressure and Temperature column of dataframe.
Step 4: Regression Analysis
To perform linear regression analysis you can use lm() function:
# Fit simple linear regression model
l <- lm(Pressure ~ Temperature,data=df)
# Calculate summary of linear regression model
s <- summary(l)
# Display summary of linear regression model
print(s)
Output:
Call:
lm(formula = Pressure ~ Temperature, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.87778 -0.55523 -0.08842 0.38541 1.10292
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.23874 6.02198 3.029 0.0231 *
Temperature -0.06866 0.07313 -0.939 0.3840
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8061 on 6 degrees of freedom
Multiple R-squared: 0.1281, Adjusted R-squared: -0.01722
F-statistic: 0.8815 on 1 and 6 DF, p-value: 0.384
Here we perform linear regression on dataframe.
Step 5: Visualization
To show correlation between columns of dataframe you can plot scatter chart using plot() function:
#create scatterplot of Pressure vs.Temperature
plot(df$Pressure, df$Temperature, pch=16, col='steelblue',
main='Pressure vs. Temperature',
xlab='Pressure', ylab='Temperature')
Output:
Here the above snippet shows scatter plot which shows correlation between Pressure and Temperature column of dataframe.