Splitting strings within a dataframe in R

Question

I've imported my CSV file into R as a data frame and I have assigned my column of interest (AtollInservice$Antenna) as a factor.

The original CSV file has the following format in my column of interest:

COM_CVV65BSX-M\COM_CVV65BSX-M_2100_T02

As you'll notice there's a single "\" in my original file yet upon importing read.csv and using the head() function R seems to have duplicated the backslash such as below:

> head(AtollInService$ANTENNA)
[1] COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 
    COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 
    COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02
[4] COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 
    COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 
    COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02

There's duplication at the beginning of the string in the file and I wish to retain everything beyond the \ or \ such that COM is the manufacturer, CVV65BSX-M is model, 2100 is the band and T02 is the tilt.

I tried incorporating Hadley Wickham's colsplit function within a transform function but I kept being prompted for additional information which I couldn't crack.

If anyone has a suggestion as to how I can split this particular column in my original data frame I would be delighted to hear from you. Attached is a link to a sample of the data that I am using and wish to split, in particular column "P" is of interest.

This is the data frame I am working with right now:

AtollInService <- with(Atoll, Atoll[!grepl("[_()]", NOMINAL_ID) & grepl("InService", MILESTONE) & grepl("^[A-Z][A-Z][0-9]{4}$", NOMINAL_ID) & !grepl("[L18]+[L08]", THREE_G_CELL_ID), ])

Could I incorporate a string split function just after the end square bracket and the end closed bracket?

Sample Data

You don't have duplicate slashes \ in your imported data. R prints a single slash as \\ since a slash can be used for special characters like a line-break \n or a tab \t — thelatemail
– thelatemail, Commented Sep 5, 2016 at 22:27

akuiper · Accepted Answer · 2016-09-06 01:03:09Z

Suppose this is your original data frame(simplified to two columns to illustrate):

AtollInservice
#                                   ANTENNA MILESTONE
# 1 COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService
# 2 COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService
# 3 COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService
# 4 COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService
# 5 COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService
# 6 COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 NoService

Here is an option with data.table package where you can filter and create new columns quite easily, this assumes the ANTENNA column always have the same format and you can use tstrsplit with regular expression _|\\\\ which split on either _ or \\ and then take the last four elements as columns:

library(data.table)
(setDT(AtollInservice)[grepl("InService", MILESTONE)]
                     [, c("manufacturer", "model", "band", "tilt") := 
                        tstrsplit(ANTENNA, "_|\\\\")[3:6]][])

#                                   ANTENNA MILESTONE manufacturer      model band tilt
#1: COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService          COM CVV65BSX-M 2100  T02
#2: COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService          COM CVV65BSX-M 2100  T02
#3: COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService          COM CVV65BSX-M 2100  T02
#4: COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService          COM CVV65BSX-M 2100  T02
#5: COM_CVV65BSX-M\\COM_CVV65BSX-M_2100_T02 InService          COM CVV65BSX-M 2100  T02

Collectives™ on Stack Overflow

Splitting strings within a dataframe in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related