We will create two dataframe df1 and df2 to illustrate joins in R.
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))
By using the
merge function and its optional parameters:
merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify
merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the
by.y parameters if the matching variables have different names in the different data frames.
merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
merge(x = df1, y = df2, by = NULL)
The same can also be achieved using sqldf package in R, But using Merge is much more easier and understandable.