Univariate analysis involves summarizing and visualizing a single variable in a dataset. This involves calculating statistical values, calculate frequency table and plotting charts.
The following method shows how you can do it with syntax.
Method 1: Calculate Statistical Values
# Calculate mean
mean(df$column)
# Calculate median
median(df$column)
# Calculate difference between max and min value
max(df$column)-min(df$column)
# Calculate IQR
IQR(df$column)
# Calculate standard deviation
sd(df$column)
Method: Create Frequency Table
table(df$column)
Method: Plotting Chart
# Create boxplot
boxplot(df$column)
# Create histogram
hist(df$column)
# Create density curve
plot(density(df$column))
The following examples show how to perform univariate analysis of dataset in R.
Calculate Statistical Values
Let’s see how we can calculate statistical values of one of the column of dataframe using different functions:
# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","E","F","G","H"),
Pressure=c(12.39,11.25,12.15,13.48,13.78,12.89,12.21,12.58),
Temperature=c(78,89,85,84,81,79,77,85),
Status=c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE))
# Calculate mean
mean(df$Pressure)
# Calculate median
median(df$Pressure)
# Calculate difference between max and min value
max(df$Pressure)-min(df$Pressure)
# Calculate IQR
IQR(df$Pressure)
# Calculate standard deviation
sd(df$Pressure)
Output:
[1] 12.59125
[1] 12.485
[1] 2.53
[1] 0.8425
[1] 0.7992753
Here the output shows different statistical values of Pressure column of dataframe.
Create Frequency Table
To create frequency table use table() function. This function gives the count of repeated value in particular column of dataframe.
# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","E","F","G","H"),
Pressure=c(12.39,11.25,12.15,13.48,13.78,12.89,12.21,12.58),
Temperature=c(78,89,85,84,81,79,77,85),
Status=c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE))
# Create frequency table
table(df$Temperature)
Output:
77 78 79 81 84 85 89
1 1 1 1 1 2 1
Here the above output shows repeated values in Temperature column of dataframe.
Plotting Charts
You can create different types of charts for analysis like boxplot,histogram,etc.
# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","E","F","G","H"),
Pressure=c(12.39,11.25,12.15,13.48,13.78,12.89,12.21,12.58),
Temperature=c(78,89,85,84,81,79,77,85),
Status=c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE))
# Create boxplot
boxplot(df$Pressure)
# Create histogram
hist(df$Pressure)
# Create density curve
plot(density(df$Pressure))
Output:
Here the above snippet shows different charts created for analysis