How to join (merge) data frames (inner, outer, left, right) in R

We will create two dataframe df1 and df2 to illustrate joins in R.

= data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))


CustomerId  Product

1         Toaster

2         Toaster

3                    Toaster

4                    Radio

5                    Radio

6                   Radio



 CustomerId    State

2                  Alabama

4                  Alabama

6                  Ohio


By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)


The same can also be achieved using sqldf package in R, But using Merge is much more easier and understandable.


Leave a Reply

Close Menu