To calculate summary statistics by group in R, you can use tapply() function or create function manually using group_by() summarise() function from dplyr package.

The following methods show how you can do it with syntax.

Method 1: Use tapply() Function

tapply(data, summary)

Method 2: Create Function Manually

library(dplyr)

d <- df %>%
  group_by(column1) %>% 
  summarize(min = min(column2),
            q1 = quantile(column2, 0.25),
            median = median(column2),
            mean = mean(column2),
            q3 = quantile(column2, 0.75),
            max = max(column2)) 

The following examples show how to calculate summary statistics by group in R.

Use tapply() to Calculate Summary Statistics

Let’s see how we can calculate summary statistics using tapply() function:

# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","A","B","C","D"),
                 Pressure=c(78.2, 78.2, 71.7, 80.21, 80.21, 82.56, 72.12, 73.85),
                 Temperature=c(35, 36, 36, 38, 32, 32, 31, 34))
                 
# Calculate summary statistics of 'Pressure' grouped by 'Machine_name'
s <- tapply(df$Pressure, df$Machine_name, summary)  

# Print summary statistics
print(s)

Output:

$A
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  78.20   78.70   79.20   79.20   79.71   80.21 

$B
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  78.20   79.29   80.38   80.38   81.47   82.56 

$C
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  71.70   71.81   71.91   71.91   72.02   72.12 

$D
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  73.85   75.44   77.03   77.03   78.62   80.21 

The output shows summary statistics values of Pressure column which group by Machine_name column of dataframe.

Create Function to Calculate Summary Statistics by Group

Let’s see how we can use group_by() and summarize() function from dplyr package to create function to calculate summary statistics by group:

# Import library
library(dplyr)

# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","A","B","C","D"),
                 Pressure=c(78.2, 78.2, 71.7, 80.21, 80.21, 82.56, 72.12, 73.85),
                 Temperature=c(35, 36, 36, 38, 32, 32, 31, 34))

# Calculate summary statistics of 'Temperature' grouped by 'Machine_name'
d <- df %>%
  group_by(Machine_name) %>% 
  summarize(min = min(Temperature),
            q1 = quantile(Temperature, 0.25),
            median = median(Temperature),
            mean = mean(Temperature),
            q3 = quantile(Temperature, 0.75),
            max = max(Temperature))      

# Print summary statistics
print(s)

Output:

$A
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  78.20   78.70   79.20   79.20   79.71   80.21 

$B
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  78.20   79.29   80.38   80.38   81.47   82.56 

$C
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  71.70   71.81   71.91   71.91   72.02   72.12 

$D
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  73.85   75.44   77.03   77.03   78.62   80.21 

The output shows summary statistics of Temperature column which group by Machine_name column of dataframe.