You can use aggregate() function for multiple column of data frame in R.The aggregate() function is basically used calculate statistical summary of multiple columns of data frame based on group.
The following method shows how you can do it with syntax.
Method: Use aggregate() Function
aggregate(column1 ~ column2, data = data frame, FUN = mean)
column1: Column used for applying aggregate() function.
column2: Column used for applying group_by function.
The following examples show how to use aggregate() function in R.
Summarize One Column & Group By One Column
Let’s first create data frame in R:
# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","E","F","G","H","A","B","A","C","D","B","E","H"),
Pressure=c(12.39,11.25,12.15,13.48,13.78,11.12,12.21,12.58,9.6,8.85,7.89,9.63,12.36,11.45,9.47,8.12),
Status=c("OK","Suspect","OK","OK","Suspect","Suspect","Suspect","OK","OK","OK","OK","OK","Suspect","Suspect","Suspect","OK"))
# Print data frame
print(df)
Output:
Machine_name Pressure Status
1 A 12.39 OK
2 B 11.25 Suspect
3 C 12.15 OK
4 D 13.48 OK
5 E 13.78 Suspect
6 F 11.12 Suspect
7 G 12.21 Suspect
8 H 12.58 OK
9 A 9.60 OK
10 B 8.85 OK
11 A 7.89 OK
12 C 9.63 OK
13 D 12.36 Suspect
14 B 11.45 Suspect
15 E 9.47 Suspect
16 H 8.12 OK
The output shows data frame which having string and numeric type data.
Now let’s apply aggregate() function to calculate mean of Pressure column which group by Status column.
# Apply aggregate function
a <- aggregate(Pressure ~ Status, data = df, FUN = mean)
# Print output
print(a)
Output:
Status Pressure
1 OK 10.52111
2 Suspect 11.66286
Here the output shows mean value of Pressure column which group according Status column.
Summarize One Column & Group By Multiple Columns
You can apply aggregate() function to single column which group by multiple columns. Let’s see example of this :
# Create data frame
df <- data.frame(Machine_name=c("A","B","C","D","E","F","G","H","A","B","A","C","D","B","E","H"),
Pressure=c(12.39,11.25,12.15,13.48,13.78,11.12,12.21,12.58,9.6,8.85,7.89,9.63,12.36,11.45,9.47,8.12),
Status=c("OK","Suspect","OK","OK","Suspect","Suspect","Suspect","OK","OK","OK","OK","OK","Suspect","Suspect","Suspect","OK"))
# Apply aggregate function
a <- aggregate(Pressure ~ Status + Machine_name, data = df, FUN = mean)
# Print output
print(a)
Output:
Status Machine_name Pressure
1 OK A 9.960
2 OK B 8.850
3 Suspect B 11.350
4 OK C 10.890
5 OK D 13.480
6 Suspect D 12.360
7 Suspect E 11.625
8 Suspect F 11.120
9 Suspect G 12.210
10 OK H 10.350
The output shows mean value of Pressure column based on Status and Machine_name column of data frame.
Note you can summarize those variables which having numeric values and apply group by function on categorical data variables.