1

I have a data frame with two columns: one is strings, the other one is integers.

> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> df
   x  rnames
1  5  item.1
2  3  item.2
3  5  item.3
4  3  item.4
5  1  item.5
6  3  item.6
7  4  item.7
8  5  item.8
9  4  item.9
10 5 item.10
11 5 item.11
12 2 item.12
13 2 item.13
14 1 item.14
15 3 item.15
16 4 item.16
17 5 item.17
18 4 item.18
19 1 item.19
20 1 item.20

I'm trying to aggregate the strings into list or vectors of strings (characters) with the 'c' or the 'list' function, but getting weird results:

> aggregate(rnames ~ x, df, c)
  x             rnames
1 1      16, 6, 11, 13
2 2               4, 5
3 3      12, 15, 17, 7
4 4      18, 20, 8, 10
5 5 1, 14, 19, 2, 3, 9

When I use 'paste' instead of 'c', I can see that the aggregate is working correctly - but the result is not what I'm looking for.

> aggregate(rnames ~ x, df, paste)
  x                                            rnames
1 1                 item.5, item.14, item.19, item.20
2 2                                  item.12, item.13
3 3                   item.2, item.4, item.6, item.15
4 4                  item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17

What I'm looking for is that every aggregated group would be presented as a vector or a lit (hence the use of c) as opposed to the single string I'm getting with 'paste'. Something along the lines of the following (which in reality doesn't work):

> aggregate(rnames ~ x, df, c)
  x                                            rnames
1 1                 item.5, item.14, item.19, item.20
2 2                                  item.12, item.13
3 3                   item.2, item.4, item.6, item.15
4 4                  item.7, item.9, item.16, item.18
5 5 item.1, item.3, item.8, item.10, item.11, item.17

Any help would be appreciated.

2 Answers 2

5

You fell in the usual trap of data.frame: your character column is not a character column, it is a factor column! Hence the numbers instead of the characters in your result:

> rnames = sapply(1:20, FUN=function(x) paste("item", x, sep="."))
> x <- sample(c(1:5), 20, replace = TRUE)
> df <- data.frame(x, rnames)
> str(df)
'data.frame':   20 obs. of  2 variables:
 $ x     : int  2 5 5 5 5 4 3 3 2 4 ...
 $ rnames: Factor w/ 20 levels "item.1","item.10",..: 1 12 14 15 16 17 18 19 20 2 ...

To prevent the conversion to factors, use argument stringAsFactors=FALSE in your call to data.frame:

> df <- data.frame(x, rnames,stringsAsFactors=FALSE)
> str(df)
'data.frame':   20 obs. of  2 variables:
 $ x     : int  5 5 3 5 5 3 2 5 1 5 ...
 $ rnames: chr  "item.1" "item.2" "item.3" "item.4" ...
> aggregate(rnames ~ x, df, c)
  x                                                                              rnames
1 1                                                            item.9, item.13, item.17
2 2                                                                              item.7
3 3                                                             item.3, item.6, item.19
4 4                                                           item.12, item.15, item.16
5 5 item.1, item.2, item.4, item.5, item.8, item.10, item.11, item.14, item.18, item.20

Another solution to avoid the conversion to factor is function I:

> df <- data.frame(x, I(rnames))
> str(df)
'data.frame':   20 obs. of  2 variables:
 $ x     : int  3 5 4 5 4 5 3 3 1 1 ...
 $ rnames:Class 'AsIs'  chr [1:20] "item.1" "item.2" "item.3" "item.4" ...

Excerpt from ?I:

In function data.frame. Protecting an object by enclosing it in I() in a call to data.frame inhibits the conversion of character vectors to factors and the dropping of names, and ensures that matrices are inserted as single columns. I can also be used to protect objects which are to be added to a data frame, or converted to a data frame via as.data.frame.

It achieves this by prepending the class "AsIs" to the object's classes. Class "AsIs" has a few of its own methods, including for [, as.data.frame, print and format.

Sign up to request clarification or add additional context in comments.

Comments

2

'm not sure just exactly what it is that you are looking for... so perhaps some reference output would be good to give us an idea of what we are aiming at?

But, since your last bit of code seems to be close to what you are after, maybe a solution like the following would work:

> library(plyr)
> ddply(df, .(x), summarize, rnames = paste(rnames, collapse = "|"))
  x                                         rnames
1 1                         item.9|item.11|item.20
2 2                  item.1|item.2|item.15|item.16
3 3                                  item.7|item.8
4 4           item.4|item.5|item.6|item.12|item.13
5 5 item.3|item.10|item.14|item.17|item.18|item.19

You can vary how the individual elements are stuck together by changing the collapse argument to paste().

Alternatively, if you want to just have each of the groups as a vetor then you could use this:

> df$rnames = as.character(df$rnames)
> L = dlply(df, .(x), function(df) {df$rnames})
> L
$`1`
[1] "item.9"  "item.11" "item.20"

$`2`
[1] "item.1"  "item.2"  "item.15" "item.16"

$`3`
[1] "item.7" "item.8"

$`4`
[1] "item.4"  "item.5"  "item.6"  "item.12" "item.13"

$`5`
[1] "item.3"  "item.10" "item.14" "item.17" "item.18" "item.19"

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  x
1 1
2 2
3 3
4 4
5 5

This gives you a list of vectors, which is what you were after. And each group can be indexed out of the resulting list:

> L[[1]]
[1] "item.9"  "item.11" "item.20"

1 Comment

I edited the question. What I'm trying to get is that each aggregated group would be returned as a vector / list, as opposed to a single string which is what I'm getting with 'paste'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.