2

I faced a problem while trying to re-arrange by data frame into long format. my table looks like this:

x <- data.frame("Accession"=c("AGI1","AGI2","AGI3","AGI4","AGI5","AGI6"),"wt_rep_1"=c(1,2,3,4,4,5), "wt_rep_2" = c(1,2,3,4,8,9), "mutant1_rep_1"=c(1,1,0,0,5,3), "mutant2_rep_1" = c(1,7,0,0,1,5), "mutant2_rep_2" = c(1,1,4,0,1,8) )

> x
  Accession wt_rep_1 wt_rep_2 mutant1_rep_1 mutant2_rep_1 mutant2_rep_2
1      AGI1        1        1             1             1             1
2      AGI2        2        2             1             7             1
3      AGI3        3        3             0             0             4
4      AGI4        4        4             0             0             0
5      AGI5        4        8             5             1             1
6      AGI6        5        9             3             5             8

I need to create a column that I would name "genotype", and it would containt the first part of the name of the column before "_" How to use strsplit(names(x), "_") for that? and preferably loop... please, anyone, help.

1
  • Try with sub i.e. sub("_.*", "", names(x)) Commented Jul 29, 2017 at 17:25

2 Answers 2

2

I'll extract the part of the column names of x before the first _ in two instructions. Note that it can be done in just one line, but I'm posting like this for clarity.

sp <- strsplit(names(x), "_")
sapply(sp[-1], `[`, 1)

Now, how can this be a new column in data.frame x? There are only five elements in the resulting vector and x has six rows.

Sign up to request clarification or add additional context in comments.

Comments

0

I agree with Ruy Barradas: I don't get how this vector could be a part of your original dataframe. Could you please clarify?

William Doane's response to this question suggests that using regular expressions might do the trick. I like this approach because I find it elegant and fast:

  > gsub("(_.*)$", "", names(x))[-1]
  [1] "wt"      "wt"      "mutant1" "mutant2" "mutant2"

2 Comments

> x_long Accession genotype replicate value 1 AGI1 wt rep1 1 2 AGI1 wt rep2 2 3 AGI1 mutant1 rep1 3 4 AGI1 mutant1 rep2 4 long format is what I want finally achive. Thank you very much for your tips! I meant, in long format it's still the same table but transposed for later simplest navigation and use. More suggestions will be very, very wellcome.
sorry, I have formatting problem x_long <- data.frame ("Accession" = c("AGI1", "AGI1", "AGI1", "AGI1"),"genotype" = c("wt", "wt", "mutant1", "mutant1"), "replicate" = c("rep1", "rep2", "rep1", "rep2"), "value" = c(1,2,3,4)) > x_long

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.