1

I have two datasets both with different columns however both with names. I wanted to match the names and concatenate the rows to each other

df1 <- data.frame (Name = c("Smith, Carl", "Jones, Allan", "Waterman, Josh", "Woods, Alex"),
 number1 = c(1, 2, 3, 4))

df2 <- data.frame(fullname = c("Carl Smith", "Allan Jones", "Josh Waterman", "Alex Woods"),
research = c("genetics", "genetics", "mathematics", "plant"))

I would like to combine the dataset into

Name Number1 Research fullname
Smith, Carl 1 genetics Carl Smith
Jones, Allan 2 genetics Allan Jones
Waterman, Josh 3 mathematics Josh Waterman
Woods, Alex 4 plant Alex Woods

I have tried merge(df1, df2, by.x = "Name", by.y = "fullname", all = TRUE) but it doesn't match the partial names, it gives:

Name Number1 Research fullname
Smith, Carl 1 - -
Jones, Allan 2 - -
Waterman, Josh 3 - -
Woods, Alex 4 - -
- - genetics Carl Smith
- - genetics Allan Jones
- - mathematics Josh Waterman
- - plant Alex Woods
2
  • Why would you expect merge() to match Smith, Carl to Carl Smith? Is it truly "partial string" or do you need to match X, Y to Y X exactly? Commented Nov 21, 2023 at 16:02
  • 1
    Do a search on "fuzzy join" Commented Nov 21, 2023 at 16:05

1 Answer 1

0

This could be solved by a fuzzy (=not needing perfectly matching conditions) join of both datasets. This is different from merge as it joins two tables based on fuzzy string matching. Beware that this does not always work perfectly and may produce incorrectly joined rows. If the join is producing errors, try manipulating the max_dist argument.

library(dplyr)
library(fuzzyjoin)

df1 %>%
rename(fullname = Name) %>% #fuzzyjoin expects equal colnames
fuzzyjoin::stringdist_inner_join(df2,by = "fullname", method = "jaccard", max_dist=0.31)
Sign up to request clarification or add additional context in comments.

1 Comment

Please try to explain the solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.