I am looking for a function (let's call it scramblematch) that can do the following.
query='one five six'
target1='one six two six three four five '
target2=' two five six'
scramblematch(query, target1) returns TRUE and
scramblematch(query, targ2) returns FALSE
The stringdist package might be what I need, but I don't know how to use it.
Update1
Use case for the function I am looking for: I have a dataset with data entered gradually over the years. Values for one text field (textfield) of the dataset is not standardized so people entered differently. Now I want to clean up this data by using a standardized set of values for textfield. All those values that describes the same things by different wordings are to be replaced by standardized values. For example (I am making this up):
In my standardized choices of values (let's call this lookupfactors), I have lookupfactors=c('liver disease', 'and more').
In the textfield I have following rows:
liver cancer disease
some other thing
male, liver fibrosis disease
yet another thing
failure of liver, disease
I want in the final result, to have row 1, 3, and 5 (because they have 'liver' and 'disease' in the content) to be replaced by liver disease. Here I assume that people who entered the data do not know the precise term, but they know the keywords to put it. Therefore words in the values of lookupfactors are substring/subset of those in textfield.
queryappears in the target string? If so, you can tryReduce("&",lapply(strsplit(query," ")[[1]],grepl,c(target1,target2))).Reduce. UsingReducelike so is what I need:Reduce("&",lapply(strsplit(query," ")[[1]],grepl,target1))andReduce("&",lapply(strsplit(query," ")[[1]],grepl,target2)). Thank you. But I wonder if there is any other faster method.Reduceis not slow. It is just my intuition when looking at the usage oflapplyandgrepl. The set query with%in%in the answer by docendo below may be faster.