0

I have some strings and I'd like to convert each string in a number, so I'd like to use regular expression. My strings can be one of like:

["star"]
["near-star"]
["shared"]
["near-shared"]
["complete"]
["near-complete"]
["null"]
["near-null"]

my problem is that both these statements are true:

> grepl("star", "[\"near-star\"]")
[1] TRUE
> grepl("near-star", "[\"near-star\"]")
[1] TRUE

and this applies also to the other labels... any advice on how to write the right code to match each label is much appreciated.

best regards, Simone

3
  • It seems to me that this question calls for a regex tutorial, not a simple answer. Commented Feb 5, 2014 at 18:17
  • 1
    What do you mean by "convert each string in a number"? I don't see any numbers... If you want to convert N different strings to the numbers 1 to N then you can go via factors... Commented Feb 5, 2014 at 18:19
  • You can test for absolute string equality with ==. "star" == "near-star" returns FALSE. Commented Feb 5, 2014 at 18:22

3 Answers 3

3

Trying to answer what I think might be your real problem (convert each string "to" a number)...

Given data:

> strings = c('["star"]', '["near-stat"]', '["shared"]', '["near-shared"]')
> data = sample(strings,20,TRUE)

such that:

> head(data)
[1] "[\"near-stat\"]"   "[\"star\"]"        "[\"near-shared\"]"
[4] "[\"near-shared\"]" "[\"shared\"]"      "[\"star\"]"       

Simply do:

> dataf=factor(data)
> as.numeric(dataf)
 [1] 2 4 1 1 3 4 1 2 2 1 2 3 4 4 3 4 4 1 1 4

the mapping being given by:

> levels(dataf)
[1] "[\"near-shared\"]" "[\"near-stat\"]"   "[\"shared\"]"     
[4] "[\"star\"]"       
Sign up to request clarification or add additional context in comments.

Comments

2

Others have mentioned just using factors or the fixed argument (either of which will work fine for your stated question). But in general if you want to match a string or pattern, but only if it is not preceded by a given string then you can use negative look behind, an extension in Perl regular expressions:

> test <- c('star','near-star')
> grepl('(?<!near-)star', test, perl=TRUE )
[1]  TRUE FALSE

The regular expression here say to match the string "star", but only if not preceded by the string "near-". The help page ?regexp has details (you need to scroll almost all the way to the bottom).

Comments

1

You can include the square brackets and quotes in your pattern. Furthermore, you can use fixed = TRUE for matching the string as is.

> grepl("[\"star\"]", "[\"near-star\"]", fixed = TRUE)
[1] FALSE
> grepl("[\"star\"]", "[\"star\"]", fixed = TRUE)
[1] TRUE

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.