To calculate summary statistics in R, you can use two different function in R.
The following methods show how you can do it with syntax.
Method 1: Use summary() Function
summary(data)
Method 2: Use summarize() Function from dplyr Package
library(dplyr)
summary <- df %>%
summarize(
Mean = mean(colum1),
Median = median(colum1),
Min = min(colum1),
Max = max(colum1),
StdDev = sd(colum1),
Variance = var(colum1),
)
The following examples show how use this methods to calculate summary statistics in R.
Use summary() Function
Let’s see how we can use summary() function to calculate summary statistics of dataframe.
# Create dataframe
df <- data.frame(Start_date=as.Date(c("2000-05-21","2000-05-22","2000-05-23","2000-05-24","2000-05-25","2000-05-26")),
Machine_name = c("Machine1","Machine2","Machine1","Machine3","Machine2","Machine3"),
Value = c(108,120,135,95,98,105),Reading= c(110,97,91,89,80,85))
# Calculate summary statistics of dataframe
d <- summary(df)
# Show summary statistics of dataframe
print(d)
Output:
Start_date Machine_name Value Reading
Min. :2000-05-21 Length:6 Min. : 95.00 Min. : 80.0
1st Qu.:2000-05-22 Class :character 1st Qu.: 99.75 1st Qu.: 86.0
Median :2000-05-23 Mode :character Median :106.50 Median : 90.0
Mean :2000-05-23 Mean :110.17 Mean : 92.0
3rd Qu.:2000-05-24 3rd Qu.:117.00 3rd Qu.: 95.5
Max. :2000-05-26 Max. :135.00 Max. :110.0
Here the output shows summary statistics of numeric columns of dataframe.
Use summarize() Function from dplyr
Let’s see how we can use summarize() function from dplyr package to calculate summary statistics:
# Import library
library(dplyr)
# Create dataframe
df <- data.frame(Start_date=as.Date(c("2000-05-21","2000-05-22","2000-05-23","2000-05-24","2000-05-25","2000-05-26")),
Machine_name = c("Machine1","Machine2","Machine1","Machine3","Machine2","Machine3"),
Value = c(108,120,135,95,98,105),Reading= c(110,97,91,89,80,85))
# Get statistical values
summary_reading <- df %>%
summarize(
Mean_reading = mean(Reading),
Median_reading = median(Reading),
Min_reading = min(Reading),
Max_reading = max(Reading),
StdDev_reading = sd(Reading),
Variance_reading = var(Reading),
)
# Print statistical values
print(summary_reading)
Output:
Mean_reading Median_reading Min_reading Max_reading StdDev_reading Variance_reading
1 92 90 80 110 10.50714 110.4
As the output shows statistics values for Reading column of dataframe.