1

I want to split a column based on another. I explain in the following.
here is part of my data:

brand    products
APPLE    IPHONE6SPlus_16G
APPLE    IPHONE6S_64G
APPLE    IPHONE6S_16G
APPLE    IPhone6_32G
APPLE    iPadAir2_64G
APPLE    iPadmini2_16G
APPLE    iPadmini4_64G
HTC      ONEX
Samsung  SamsungGalaxy

I want to split brand based on Products. here is what I actually want.

brand       products
iPhone6S    IPHONE6SPlus_16G
iPhone6S    IPHONE6S_64G
iPhone6S    IPHONE6S_16G
iPhone6     IPhone6_32G
APPLE       iPadAir2_64G
APPLE       iPadmini2_16G
APPLE       iPadmini4_64G
HTC         ONEX
Samsung     SamsungGalaxy

I just want to split APPLE into three new(APPLE, iPhone6S, iPhone6) based on products. If the name in products contains IPHONE6SPlus, IPHONE6S, change brand to iPhone6S. If the name in products contains IPhone6, change brand to iPhone6. And the remainings do not change.

I think I can use iflese to do, but there are size(i.e. 16G, 64G, etc.) in products name.
How can I ignore these size and split the data.

4
  • 1
    You can use grepl to match the pattern i.e. df1$brand[grepl("IPHONE6S", df1$products)] <- "IPHONE6S" Commented Apr 21, 2017 at 8:48
  • It is not clear whether you have only a single pattern. Try v1 <- sub("^(.)(.)(.{5})(.).*", "\\L\\1\\U\\2\\L\\3\\U\\4", df1$products, perl = TRUE);df1$brand[v1=="iPhone6S"] <- v1[v1 == "iPhone6S"] Commented Apr 21, 2017 at 9:00
  • 1
    can you explain v1 <- sub("^(.)(.)(.{5})(.).*", "\\L\\1\\U\\2\\L\\3\\U\\4"", df1$products, perl = TRUE). Thanks Commented Apr 21, 2017 at 9:02
  • I posted a solution with some description Commented Apr 21, 2017 at 9:07

2 Answers 2

1

We can do this using a couple of methods. Here, is one with sub and ==

v1 <- sub("^(.)(.)(.{5})(.).*", "\\L\\1\\U\\2\\L\\3\\U\\4", df1$products, perl = TRUE)
df1$brand[v1=="iPhone6S"] <- v1[v1 == "iPhone6S"]
df1
#     brand         products
#1 iPhone6S IPHONE6SPlus_16G
#2 iPhone6S     IPHONE6S_64G
#3 iPhone6S     IPHONE6S_16G
#4    APPLE      IPhone6_32G
#5    APPLE     iPadAir2_64G
#6    APPLE    iPadmini2_16G
#7    APPLE    iPadmini4_64G
#8      HTC             ONEX
#9  Samsung    SamsungGalaxy

The sub matches the pattern of first element capture as a group ((.)) from the beginning of the string (^), followed by next element as another group, next 5 elements as third group ((.{5})), followed by another element as a group and the rest of the elements (.*). In the replacement, we either change the case to lower (\\L) or upper (\\U) for the backreference of those groups (\\1)


Or an easier option is with grepl

df1$brand[grepl("IPHONE6S", df1$products)] <- "iPhone6S"

If the column have both lower and upper case characters, then it can be converted to either one of them using tolower or toupper and then do the processing

df1$brand[grepl("IPHONE6S", toupper(df1$products))] <- "iPhone6S"

Suppose we want to change multiple elements, this can be done with looping

nm1 <- c("IPAD", "IPHONE", "SAMSUNG")
for(j in nm1) df1$brand[grepl(j, toupper(df1$products))] <- j
df1
#   brand         products
#1  IPHONE IPHONE6SPlus_16G
#2  IPHONE     IPHONE6S_64G
#3  IPHONE     IPHONE6S_16G
#4  IPHONE      IPhone6_32G
#5    IPAD     iPadAir2_64G
#6    IPAD    iPadmini2_16G
#7    IPAD    iPadmini4_64G
#8     HTC             ONEX
#9 SAMSUNG    SamsungGalaxy
Sign up to request clarification or add additional context in comments.

8 Comments

For example, IPHONE6SPlus and iphone6splus is the same
@PeterChen You can remove all the spaces with gsub("\\s+", "", df1$products) or if we wanted to ignore it
@PeterChen That you can do this either converting them to lower or upper i.e. tolower(df1$products) or toupper(df1$products)
maybe u can add these to ur answer. Thanks. it's awesome.
@PeterChen If you want to do this for each unique element, you can loop through the elements and do the grepl i.e. nm1 <- c("APPLE", "IPHONE6S"); for(j in nm1) df1$brand[grepl(j, toupper(df1$products))] <- j
|
1

'Dirty' solution but I hope it helps :)

x <- c('IPHONE6SPlus','IPHONE6S')
b$new <- grepl(paste(x, collapse = "|"), b$products)
b$brand[b$new==TRUE] <- "Iphone6S"
b$new <- NULL
y <- c('IPhone6')
b$new <- grepl(paste(y, collapse = "|"), b$products)
b$brand[b$new==TRUE] <- "Iphone6"
b$new <- NULL

     brand         products
1 Iphone6S IPHONE6SPlus_16G
2 Iphone6S     IPHONE6S_64G
3 Iphone6S     IPHONE6S_16G
4  Iphone6      IPhone6_32G
5    APPLE     iPadAir2_64G
6    APPLE    iPadmini2_16G
7    APPLE    iPadmini4_64G
8      HTC             ONEX
9  Samsung    SamsungGalaxy

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.