To calculate summary statistics by group in R, you can use tapply() function or create function manually using group_by() summarise() function from dplyr package.

The following methods show how you can do it with syntax.

Method 1: Use tapply() Function

tapply(data, summary)


Method 2: Create Function Manually

library(dplyr)

d <- df %>%
group_by(column1) %>%
summarize(min = min(column2),
q1 = quantile(column2, 0.25),
median = median(column2),
mean = mean(column2),
q3 = quantile(column2, 0.75),
max = max(column2))


The following examples show how to calculate summary statistics by group in R.

Use tapply() to Calculate Summary Statistics

Let’s see how we can calculate summary statistics using tapply() function:

# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","A","B","C","D"),
Pressure=c(78.2, 78.2, 71.7, 80.21, 80.21, 82.56, 72.12, 73.85),
Temperature=c(35, 36, 36, 38, 32, 32, 31, 34))

# Calculate summary statistics of 'Pressure' grouped by 'Machine_name'
s <- tapply(df$Pressure, df$Machine_name, summary)

# Print summary statistics
print(s)


Output:

$A Min. 1st Qu. Median Mean 3rd Qu. Max. 78.20 78.70 79.20 79.20 79.71 80.21$B
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
78.20   79.29   80.38   80.38   81.47   82.56

$C Min. 1st Qu. Median Mean 3rd Qu. Max. 71.70 71.81 71.91 71.91 72.02 72.12$D
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
73.85   75.44   77.03   77.03   78.62   80.21


The output shows summary statistics values of Pressure column which group by Machine_name column of dataframe.

Create Function to Calculate Summary Statistics by Group

Let’s see how we can use group_by() and summarize() function from dplyr package to create function to calculate summary statistics by group:

# Import library
library(dplyr)

# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","A","B","C","D"),
Pressure=c(78.2, 78.2, 71.7, 80.21, 80.21, 82.56, 72.12, 73.85),
Temperature=c(35, 36, 36, 38, 32, 32, 31, 34))

# Calculate summary statistics of 'Temperature' grouped by 'Machine_name'
d <- df %>%
group_by(Machine_name) %>%
summarize(min = min(Temperature),
q1 = quantile(Temperature, 0.25),
median = median(Temperature),
mean = mean(Temperature),
q3 = quantile(Temperature, 0.75),
max = max(Temperature))

# Print summary statistics
print(s)


Output:

$A Min. 1st Qu. Median Mean 3rd Qu. Max. 78.20 78.70 79.20 79.20 79.71 80.21$B
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
78.20   79.29   80.38   80.38   81.47   82.56

$C Min. 1st Qu. Median Mean 3rd Qu. Max. 71.70 71.81 71.91 71.91 72.02 72.12$D
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
73.85   75.44   77.03   77.03   78.62   80.21


The output shows summary statistics of Temperature column which group by Machine_name column of dataframe.