Merge data frames are quite useful when data is available in different data stores. The combining of data frames in R gives more insights to perform analysis of the dataset. Using merge() , rbind() functions, we can merge data frames in R.
merge() is a built-in R function that merges data frames by one or more common column names. It merges the two data frames horizontally.
rbind() function in R combines data frames vertically. Both data frames should have the same variables. If any of the variables from one data frame is not available in the second data frame, either add additional variables in the second data frame and set it NA value (missing).
In this tutorial, we will discuss how to merge data frames in R using the merge() function and combine two data frames using the rbind() function.
Merge Data Frames in R using merge()
Using the merge in-built R function, we can combine both data frames by common key variable.
Let’s consider an example to merge two data frames in R.
# Create a data frame
student_info <- data.frame(
id = c(1,2,3,4,5,6),
name = c("Tom","Kim","Sam","Julie","Emily","Chris"),
age = c(20,21,19,20,21,22),
gender = c('M','F','M','F','F','M'),
marks = c(72,77,65,80,85,87)
)
# Print the data frame
student_info
library_info <- data.frame(
id = c(1,2,4,6,3,5),
book_name = c("Statistics","R-Programming","Algebra","Python","Geometry","AP"),
book_isbn = c(978,829,129,233,120,23)
)
library_info
In the above R code, we have created two data frames using the data.frame().
In these two data frames, it has a common key variable as id.
We can merge two data frames by common key column id in R.
# Merge two data frames in R using merge()
student_book <- merge(student_info,library_info,by="id")
# Print the merge data set
student_book
The output of the merged data frame is:
id name age gender marks book_name book_isbn
1 1 Tom 20 M 72 Statistics 978
2 2 Kim 21 F 77 R-Programming 829
3 3 Sam 19 M 65 Geometry 120
4 4 Julie 20 F 80 Algebra 129
5 5 Emily 21 F 85 AP 23
6 6 Chris 22 M 87 Python 233
Combine Data Frames in R using rbind()
Using the rbind() R function, we can combine two data frames in R.
To use the rbind() function for combining two data frames needs
- Both data frames should have the same variable/columns
- If the variable is not available in the data frame then assign it to NA value (missing) or delete the extra column from data frame.
Let’s consider an example to demonstrate the merging of two data frames using rbind() in R.
Create two data frames using the below R code.
# Create a data frame as account1 using data.frame()
account1 <- data.frame(Name =c("Tom","Aroy","Kim"), BankName=c("Citi", "HSBC", "HSBC"), Balance=c(3550, 4500, 2800))
# Create data frame as account2
account2 <- data.frame(Name=c("Keory","Elon"), Balance=c(2500, 8000))
In the above R code, we have created two data frames.
account1 data frame and account2 data frame have a few common key variables like Name and Balance.
However, account2 doesn’t have the BankName column name.
If we try to merge two data frames using rbind(), we will get an error as
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
Calls: rbind -> rbind
There are two options to deal with the problem.
Either delete the BankName column from the account1 data frame or add the BankName column in the account2 data frame and set them to NA (missing) value.
Let’s implement the second option to add the missing BankName variable in a second data frame and assign it to the NA value.
account2$BankName <- NA
Now, both data frames have the same variables, use the rbind() function in R to join two data frames.
It combines data frames vertically.
# Create a data frame as account1 using data.frame()
account1 <- data.frame(Name =c("Tom","Aroy","Kim"), BankName=c("Citi", "HSBC", "HSBC"), Balance=c(3550, 4500, 2800))
# Create data frame as account2
account2 <- data.frame(Name=c("Keory","Elon"), Balance=c(2500, 8000))
# Add the missing variable and assign it NA value
account2$BankName <- "NA"
# Use rbind() in R to merge two data frames
merge_account <- rbind(account1,account2)
# Print the merge account data set
merge_account
The output of the above R code to join two data frames is:
Name BankName Balance
1 Tom Citi 3550
2 Aroy HSBC 4500
3 Kim HSBC 2800
4 Keory NA 2500
5 Elon NA 8000
Conclusion
I hope the above article on how to merge data frames in R using the merge() function is useful to you. merge() function join the data frames horizontally by common key variable.
You can use the rbind() function in R to bind two data frames. It combines two data frames vertically and required two data frames to have the same variables.