How to join (merge) data frames (inner, outer, left, right) in R

We will create two dataframe df1 and df2 to illustrate joins in R.


df1
= data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

#df1

CustomerId  Product

1         Toaster

2         Toaster

3                    Toaster

4                    Radio

5                    Radio

6                   Radio

 

#df2

 CustomerId    State

2                  Alabama

4                  Alabama

6                  Ohio

 

By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

 

The same can also be achieved using sqldf package in R, But using Merge is much more easier and understandable.

 

Leave a Reply

Close Menu