R creating new column based on split column name

Question

I faced a problem while trying to re-arrange by data frame into long format. my table looks like this:

x <- data.frame("Accession"=c("AGI1","AGI2","AGI3","AGI4","AGI5","AGI6"),"wt_rep_1"=c(1,2,3,4,4,5), "wt_rep_2" = c(1,2,3,4,8,9), "mutant1_rep_1"=c(1,1,0,0,5,3), "mutant2_rep_1" = c(1,7,0,0,1,5), "mutant2_rep_2" = c(1,1,4,0,1,8) )

> x
  Accession wt_rep_1 wt_rep_2 mutant1_rep_1 mutant2_rep_1 mutant2_rep_2
1      AGI1        1        1             1             1             1
2      AGI2        2        2             1             7             1
3      AGI3        3        3             0             0             4
4      AGI4        4        4             0             0             0
5      AGI5        4        8             5             1             1
6      AGI6        5        9             3             5             8

I need to create a column that I would name "genotype", and it would containt the first part of the name of the column before "_" How to use strsplit(names(x), "_") for that? and preferably loop... please, anyone, help.

Try with sub i.e. sub("_.*", "", names(x))

akrun
– akrun

2017-07-29 17:25:41 +00:00
Commented Jul 29, 2017 at 17:25 — akrun
– akrun, Commented Jul 29, 2017 at 17:25

Rui Barradas · Accepted Answer · 2017-07-29 17:28:47Z

2

I'll extract the part of the column names of x before the first _ in two instructions. Note that it can be done in just one line, but I'm posting like this for clarity.

sp <- strsplit(names(x), "_")
sapply(sp[-1], `[`, 1)

Now, how can this be a new column in data.frame x? There are only five elements in the resulting vector and x has six rows.

answered Jul 29, 2017 at 17:28

Rui Barradas

78k8 gold badges41 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mónica Zamudio · Accepted Answer · 2017-07-29 18:17:02Z

0

I agree with Ruy Barradas: I don't get how this vector could be a part of your original dataframe. Could you please clarify?

William Doane's response to this question suggests that using regular expressions might do the trick. I like this approach because I find it elegant and fast:

  > gsub("(_.*)$", "", names(x))[-1]
  [1] "wt"      "wt"      "mutant1" "mutant2" "mutant2"

answered Jul 29, 2017 at 18:17

Mónica Zamudio

316 bronze badges

2 Comments

tralala Over a year ago

> x_long         Accession genotype replicate value         1      AGI1       wt      rep1     1         2      AGI1       wt      rep2     2         3      AGI1  mutant1      rep1     3         4      AGI1  mutant1      rep2     4

long format is what I want finally achive. Thank you very much for your tips! I meant, in long format it's still the same table but transposed for later simplest navigation and use. More suggestions will be very, very wellcome.

tralala Over a year ago

sorry, I have formatting problem x_long <- data.frame ("Accession" = c("AGI1", "AGI1", "AGI1", "AGI1"),"genotype" = c("wt", "wt", "mutant1", "mutant1"), "replicate" = c("rep1", "rep2", "rep1", "rep2"), "value" = c(1,2,3,4)) > x_long

Collectives™ on Stack Overflow

R creating new column based on split column name

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related