2

I have a data table test:

id key
1 2365
1 2365
1 3709
2 6734
2 1908
2 4523

I want to aggregate unique key values by id into vector using data.table package.

Expected output:

id key_array
1 "2365", "3709"
2 "6734", "1908", "4523"

So, this should work like array_agg sql function.

I tried:
res <- test[, list(key_array = paste(unique(key), collapse = ", ")), by = "id"], but I get just a string. But I need to have opportunity to find the length of each vector and operate with its certain elements (find the intersection of two vectors for example).

1 Answer 1

3

1. Base R

This an aggregate one-liner.

x <- 'id    key
1   2365
1   2365
1   3709
2   6734
2   1908
2   4523'
test <- read.table(textConnection(x), header = TRUE)

aggregate(key ~ id, test, \(x) c(unique(x)))
#>   id              key
#> 1  1       2365, 3709
#> 2  2 6734, 1908, 4523

Created on 2022-06-14 by the reprex package (v2.0.1)

But if user @Chris's comment is right then the right solution as follows.

aggregate(key ~ id, test, \(x) paste(unique(x), collapse = ", "))

Note that both c(unique(x)) and as.character(c(unique(x))) will output a list column, so the latter solution is right anyway.


2. Package data.table

Once again a one-liner.

The output is a list column, with each list member an integer vector. To keep as integers use

list(unique(key))

instead.

suppressPackageStartupMessages(library(data.table))

res <- setDT(test)[, .(key_array = list(as.character(unique(key)))), by = id]
res
#>    id      key_array
#> 1:  1      2365,3709
#> 2:  2 6734,1908,4523

str(res)
#> Classes 'data.table' and 'data.frame':   2 obs. of  2 variables:
#>  $ id       : int  1 2
#>  $ key_array:List of 2
#>   ..$ : chr  "2365" "3709"
#>   ..$ : chr  "6734" "1908" "4523"
#>  - attr(*, ".internal.selfref")=<externalptr>

Created on 2022-06-14 by the reprex package (v2.0.1)

Then, in order to access the vectors use two extractors, one to extract the column and the other one to extract the vectors.

res$key_array[[1]]
#> [1] "2365" "3709"
res$key_array[[2]]
#> [1] "6734" "1908" "4523"

Created on 2022-06-14 by the reprex package (v2.0.1)


3. dplyr solution

Group by id and collapse the unique strings into one only.

suppressPackageStartupMessages(library(dplyr))

test %>%
  group_by(id) %>%
  summarise(key_array = paste(unique(key), collapse = ", "))
#> # A tibble: 2 × 2
#>      id key_array  
#>   <int> <chr>           
#> 1     1 2365, 3709      
#> 2     2 6734, 1908, 4523

Created on 2022-06-14 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

5 Comments

For whatever reason, it appears as.character(c(unique(x)))) is desired output. Just speaking to base::aggregate here.
Thanks a lot, Rui! I'm interested in a data.table approach and your solution setDT(test)[, .(key_array = paste(unique(key), collapse = ", ")), by = id] is the same as mine, but unfortunately it gives a string instead of a vector
@Hilary Changed, see if this is what you want. Also, should I remove the dplyr solution, since it is not asked for?
It is not necessary to remove the dplyr solution, maybe it will be useful to someone else
@Hilary Got it, will keep all solutions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.