Selecting columns from the data frame in R is very important when we have to analyze large data set. Using the dplyr package select() function, we can select specific columns by column name or column index.
Using the base R, we can select specific columns from the data frame using their column name or column index as given below:
# Select column 1 and column 2 from data frame by column name
df[c('col1','col2')]
# Select column 2 and column 3 from data frame by column index
df[c(2,3)]
In this tutorial, we will discuss different ways of selecting columns in R from the data frame. We will understand specific columns selection using the select() function of dplyr.
Selecting Columns in R using base R
Using the base R, we can select specific columns from a data frame by column name or column index.
Let’s practice the selection of columns with an example.
# Create a data frame
student_info <- data.frame(
name = c("Tom","Kim","Sam","Julie","Emily","Chris"),
age = c(20,21,19,20,21,22),
gender = c('M','F','M','F','F','M'),
marks = c(72,77,65,80,85,87)
)
# Print the data frame
student_info
In the above R code, we have created a student_info data frame. It has columns name, age, gender, and marks.
To select specific columns by column name using base R, use the following code.
# Select the columns by column name
student_info[c('name','age')]
The output of the above R code to select columns from the data frame is:
name age
1 Tom 20
2 Kim 21
3 Sam 19
4 Julie 20
5 Emily 21
6 Chris 22
To select columns by column index in R, use the following code.
# Select the column by column index
student_info[c(1,2)]
The output of the above R code for selecting columns by an index is:
name age
1 Tom 20
2 Kim 21
3 Sam 19
4 Julie 20
5 Emily 21
6 Chris 22
Selecting Columns in R using dplyr selection()
Using the select() function in the dplyr package, we can select the specific column by their column name or index.
Let’s practice with an example. Use the above student_info data frame.
Select the columns in R by column name using dplyr
We can select the columns from the data frame in R using the select function, pass the column name in the select function for selection.
# load the dplyr library
library(dplyr)
# Select the columns by column name using select function
student_info %>% select (name,marks)
In the above R code, it uses the %>% piping operator to pass the data frame.
In the select function, we have specified the name of columns to returns from the data frame in R.
The output of the above R code is:
name marks
1 Tom 72
2 Kim 77
3 Sam 65
4 Julie 80
5 Emily 85
6 Chris 87
Select columns in R by column index using dplyr
Using the select() function of dplyr, we can select the columns by index. Use the following code to returns the specific columns from a data frame.
# Select column 1 and column 3 from data frame by index
student_info %>% select (1,3)
In the above R code, the select() function takes column index as input parameter and returns the column1 and column 3 data from the data frame in R.
The output of the above R code is:
name gender
1 Tom M
2 Kim F
3 Sam M
4 Julie F
5 Emily F
6 Chris M
Selecting columns by position in R using select
If you want to select the first three columns from the data frame, use the select() function of the dplyr package.
student_info %>% select (1:3)
In the above R code, we have specified column 1:3 in the select function. It selects the first three columns from the data frame and returns them.
name age gender
1 Tom 20 M
2 Kim 21 F
3 Sam 19 M
4 Julie 20 F
5 Emily 21 F
6 Chris 22 M
Select the column value as vector in R
Using the pull() function of the dplyr package, we can select the specific column from the data frame.
Use the created student_info data frame to understand selecting columns using the pull() function. It returns the column values as vectors.
# Select the column by name and returns column value as vector
student_info %>% pull(name)
The output of the above R code returns the column value as vectors from the data frame.
[1] "Tom" "Kim" "Sam" "Julie" "Emily" "Chris"
Conclusion
I hope the above article on selecting columns in R using the base R, the select function of dplyr package is helpful to you.
We can select columns by their column name, column index, or position from the data frame.