R nested map through columns

Question

I got a function which was solved here.
This function takes a column filled with annotations and another grouping column and propagates the annotation to rows with missing values.

f1 <- function(data, group_col, expand_col){
  data %>%
    dplyr::group_by({{group_col}}) %>%
    dplyr::mutate( 
      {{expand_col}} := dplyr::case_when(
        !is.na({{expand_col}}) ~ 
          {{expand_col}} ,
      any( !is.na({{expand_col}})  ) & is.na({{expand_col}}) ~ 
        paste(unique(unlist(str_split(na.omit({{expand_col}}), " ")) ), 
                         collapse = " "),
      TRUE ~ 
        NA_character_  
    ))  %>%
    dplyr::ungroup()
}

Now I would like to do it through many columns grouping columns (group_col) and annotations columns (expand_col).
So if I have this df:

t <- tibble(a = c("a", "b", "c", "d", "e", "f", "g", "h"), 
            b = c(  1,   1,   1,   1,   2,   2,   2,   2),
            c = c(  1,   1,   2,   2,   3,   3,   4,   4),
            d = c( NA,  NA,  NA, "D", "E",  NA,  NA,  NA),
            e = c("A",  NA, "C",  NA,  NA,  NA, "G", "H")
            )

I may apply it like this

> t %>%
+   f1(c,e) %>%
+   f1(b,e) %>%
+   f1(c,d) %>%
+   f1(b,d)
# A tibble: 8 x 5
  a         b     c d     e    
  <chr> <dbl> <dbl> <chr> <chr>
1 a         1     1 D     A    
2 b         1     1 D     A    
3 c         1     2 D     C    
4 d         1     2 D     C    
5 e         2     3 E     G H  
6 f         2     3 E     G H  
7 g         2     4 E     G    
8 h         2     4 E     H

So, I have 3 groups of columns, the ids, the grouping columns (2:3), and annotation columns (4:5).
Since I call the function many times, I'd like to know how to use the map function to pass the column indexes to apply the function like in the example above.

I tried something like this

3:2 %>% 
  map(
    function(x) 4:5 %>% 
      map(
        function(y) f1(
          t, 
          !!(colnames(t)[x]) , 
          !!(colnames(t)[y])
        ) 
      )
  )

But the result is a wrong mess.

Thanks in advance

Do you need pass c, e as quoted strings. Also, is this sequential application of function — akrun
– akrun, Commented Oct 18, 2019 at 18:04
Yes, this is a sequential application of the function. I think it should be quoted because t %>% f1(3, 5) doesn't work. But I'd like to pass as the column index, like column 3 and 5 but I think dplyr does not accept it. — Aureliano Guedes
– Aureliano Guedes, Commented Oct 18, 2019 at 18:33
Actually, I figure out that it works t %>% f1( !!(colnames(t)[3]) , !!(colnames(t)[5]) ). So I jutst need to understand how do it sequentially with map (or maybe apply). — Aureliano Guedes
– Aureliano Guedes, Commented Oct 18, 2019 at 18:44
The issue with map is it is not sequential, you may have to do a coalesce afterwards. i would check ?compose — akrun
– akrun, Commented Oct 18, 2019 at 19:17
I think a for loop makes it easier. i1 <- 3:2; i2 <- 4:5; for(i in seq_along(i1)) t <- f1(!! rlang::sym(names(t)[.x]), !! rlang::sym(names(t)[.y]) — akrun
– akrun, Commented Oct 18, 2019 at 19:41

Artem Sokolov · Accepted Answer · 2019-10-18 20:07:33Z

1

Since f1 accepts column names, you need to first convert your indices to symbols:

v1 <- rlang::syms( colnames(t)[3:2] )
v2 <- rlang::syms( colnames(t)[4:5] )

Now, you can use tidyr::crossing() to get all possible pairs of your symbols, and purrr::reduce2() to sequentially apply f1() with those symbols:

V <- tidyr::crossing( v1, v2 )
Res <- purrr::reduce2( V$v1, V$v2, f1, .init=t )

# Validation
Res2 <- t %>% f1(c,e) %>% f1(b,e) %>% f1(c,d) %>% f1(b,d)
identical(Res, Res2)   # TRUE

edited Oct 18, 2019 at 20:07

answered Oct 18, 2019 at 19:59

Artem Sokolov

13.8k4 gold badges49 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akrun · Accepted Answer · 2019-10-18 20:17:24Z

1

This can be done easily in a for loop

i1 <- rep(names(t)[3:2], 2)
i2 <- rep(names(t)[4:5], each = 2)
for(i in seq_along(i1))
t <- f1(t, !! rlang::sym(i1[i]), !! rlang::sym(i2[i]))
t
# A tibble: 8 x 5
#  a         b     c d     e    
#  <chr> <dbl> <dbl> <chr> <chr>
#1 a         1     1 D     A    
#2 b         1     1 D     A    
#3 c         1     2 D     C    
#4 d         1     2 D     C    
#5 e         2     3 E     G H  
#6 f         2     3 E     G H  
#7 g         2     4 E     G    
#8 h         2     4 E     H

edited Oct 18, 2019 at 20:17

answered Oct 18, 2019 at 19:43

akrun

891k38 gold badges590 silver badges700 bronze badges

6 Comments

Aureliano Guedes Over a year ago

I don't know why but it not worked. Also, I'd like to avoid the side effect. But anyway I tried this way for(i in 4:5){ for(j in 3:2){ t <- f1(t, !!(colnames(t)[j]), !!(colnames(t)[i])) } }

akrun Over a year ago

@AurelianoGuedes. A closing bracket I forgot to add

Aureliano Guedes Over a year ago

Error in !rlang::sym(names(t)[i]) : invalid argument type  4. eval(lhs, parent, parent)  3. eval(lhs, parent, parent)  2. data %>% dplyr::group_by({     {         group_col     } ...  1. f1(!!rlang::sym(names(t)[i]), !!rlang::sym(names(t)[i]))

Aureliano Guedes Over a year ago

It is updated > packageVersion('rlang'); packageVersion('dplyr'); [1] ‘0.4.0’ [1] ‘0.8.3’

akrun Over a year ago

@AurelianoGuedes. Sorry, forgot about the 'i1' and 'i2'

|

Collectives™ on Stack Overflow

R nested map through columns

2 Answers 2

Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related