subset() function in R returns the subset of the data frame which meets the condition. Subsetting data frame in R extracts the variables and observations from the data frame which meets the specified condition.
subset() function accepts three parameters as data frame to be subsetted, logical expression to keep row or elements to keep and select indicates columns to select from the data frame.
subset(x, subset, select, drop = FALSE, …)
In this R subset data frame by column value tutorial, we will discuss how to subset data frame by column value in R with examples.
R Subset Data Frame by Column Name
To subset data frame by columns wise, use subset() function.
Let’s consider an example to understand the subsetting of a data frame in r.
In the following R code, we have created a data frame having columns name, age, gender, and marks and stored it in student_info.
# Create a data frame
student_info <- data.frame(
name = c("Tom","Kim","Sam","Julie","Emily","Chris"),
age = c(20,21,19,20,21,22),
gender = c('M','F','M','F','F','M'),
marks = c(72,77,65,80,85,87)
)
# Print the data frame
student_info
In our following example, we select columns name and marks from the data frame.
subset(student_info,select = c('name','gender'))
subset function uses parameter data frame to be subsetted, and select expression to select multiple columns from the data frame.
The output of the above r code is:
name gender
1 Tom M
2 Kim F
3 Sam M
4 Julie F
5 Emily F
6 Chris M
R Subset Data Frame by Column Value
subset function has a subset parameter to write a logical expression.
Let’s use the above student data frame.
We can select only female candidates from the data frame using the following r code.
# Display all columns for Female candidate
subset(student_info,gender=="F")
In the above r code, we have specified the condition to filter data frame by column value having “F”.
The output of the above r code is:
name age gender marks
2 Kim 21 F 77
4 Julie 20 F 80
5 Emily 21 F 85
Subsetting Data Frame and Select Multiple Columns
Let’s consider an example of the student data frame to select the name and marks of the female candidate only.
Using the following subset() function in R, we can display names and marks for female candidates only.
# Using select to select variable
# Display only Name and Marks column from data frame for female candidate
subset(student_info,gender == 'F',select = c(name,marks) )
select argument is used to indicate multiple columns.
The output of the above r code is:
name marks
2 Kim 77
4 Julie 80
5 Emily 85
Subset Data Frame by Multiple Conditions
Using the subset() function in R, you can subset data frame by multiple conditions.
Let’s use the above student data frame to get a subset of the data frame where gender = “F” and the mark is greater than 80
# Get Female students having marks greater than 80
subset(student_info,marks > 80 & gender == 'F')
In the above R code, we have specified multiple conditions in subset parameter where marks > 80 & gender == “F”
The output of the above r code is:
name age gender marks
5 Emily 21 F 85
R Subset Data Frame exclude specified column
Using the select argument in the subset() function in R, you can select or deselect the column name.
Let’s use the above student_info data frame to get all columns except the age column from the data frame for female candidates.
# Display all columns except age column for Female candidate
subset(student_info,gender == "F", select = -age)
In the above R code, we have specified a condition to extract data for female candidates only.
Using the select argument, we have specified the age column with negative to exclude it from the output.
The output of the r code is:
name marks
2 Kim 77
4 Julie 80
5 Emily 85
Conclusion
I hope the above article on how to subset data frame in r using subset() function is helpful to you.
using the subset() function, you can specify the multiple conditions to extract the data from the data frame. Use select argument to select or deselect the variable from the date frame.