2

I got a function which was solved here.
This function takes a column filled with annotations and another grouping column and propagates the annotation to rows with missing values.

f1 <- function(data, group_col, expand_col){
  data %>%
    dplyr::group_by({{group_col}}) %>%
    dplyr::mutate( 
      {{expand_col}} := dplyr::case_when(
        !is.na({{expand_col}}) ~ 
          {{expand_col}} ,
      any( !is.na({{expand_col}})  ) & is.na({{expand_col}}) ~ 
        paste(unique(unlist(str_split(na.omit({{expand_col}}), " ")) ), 
                         collapse = " "),
      TRUE ~ 
        NA_character_  
    ))  %>%
    dplyr::ungroup()
}  

Now I would like to do it through many columns grouping columns (group_col) and annotations columns (expand_col).
So if I have this df:

t <- tibble(a = c("a", "b", "c", "d", "e", "f", "g", "h"), 
            b = c(  1,   1,   1,   1,   2,   2,   2,   2),
            c = c(  1,   1,   2,   2,   3,   3,   4,   4),
            d = c( NA,  NA,  NA, "D", "E",  NA,  NA,  NA),
            e = c("A",  NA, "C",  NA,  NA,  NA, "G", "H")
            )

I may apply it like this

> t %>%
+   f1(c,e) %>%
+   f1(b,e) %>%
+   f1(c,d) %>%
+   f1(b,d)
# A tibble: 8 x 5
  a         b     c d     e    
  <chr> <dbl> <dbl> <chr> <chr>
1 a         1     1 D     A    
2 b         1     1 D     A    
3 c         1     2 D     C    
4 d         1     2 D     C    
5 e         2     3 E     G H  
6 f         2     3 E     G H  
7 g         2     4 E     G    
8 h         2     4 E     H    

So, I have 3 groups of columns, the ids, the grouping columns (2:3), and annotation columns (4:5).
Since I call the function many times, I'd like to know how to use the map function to pass the column indexes to apply the function like in the example above.

I tried something like this

3:2 %>% 
  map(
    function(x) 4:5 %>% 
      map(
        function(y) f1(
          t, 
          !!(colnames(t)[x]) , 
          !!(colnames(t)[y])
        ) 
      )
  )

But the result is a wrong mess.

Thanks in advance

8
  • Do you need pass c, e as quoted strings. Also, is this sequential application of function Commented Oct 18, 2019 at 18:04
  • Yes, this is a sequential application of the function. I think it should be quoted because t %>% f1(3, 5) doesn't work. But I'd like to pass as the column index, like column 3 and 5 but I think dplyr does not accept it. Commented Oct 18, 2019 at 18:33
  • Actually, I figure out that it works t %>% f1( !!(colnames(t)[3]) , !!(colnames(t)[5]) ). So I jutst need to understand how do it sequentially with map (or maybe apply). Commented Oct 18, 2019 at 18:44
  • 1
    The issue with map is it is not sequential, you may have to do a coalesce afterwards. i would check ?compose Commented Oct 18, 2019 at 19:17
  • 1
    I think a for loop makes it easier. i1 <- 3:2; i2 <- 4:5; for(i in seq_along(i1)) t <- f1(!! rlang::sym(names(t)[.x]), !! rlang::sym(names(t)[.y]) Commented Oct 18, 2019 at 19:41

2 Answers 2

1

Since f1 accepts column names, you need to first convert your indices to symbols:

v1 <- rlang::syms( colnames(t)[3:2] )
v2 <- rlang::syms( colnames(t)[4:5] )

Now, you can use tidyr::crossing() to get all possible pairs of your symbols, and purrr::reduce2() to sequentially apply f1() with those symbols:

V <- tidyr::crossing( v1, v2 )
Res <- purrr::reduce2( V$v1, V$v2, f1, .init=t )

# Validation
Res2 <- t %>% f1(c,e) %>% f1(b,e) %>% f1(c,d) %>% f1(b,d)
identical(Res, Res2)   # TRUE
Sign up to request clarification or add additional context in comments.

Comments

1

This can be done easily in a for loop

i1 <- rep(names(t)[3:2], 2)
i2 <- rep(names(t)[4:5], each = 2)
for(i in seq_along(i1))
t <- f1(t, !! rlang::sym(i1[i]), !! rlang::sym(i2[i]))
t
# A tibble: 8 x 5
#  a         b     c d     e    
#  <chr> <dbl> <dbl> <chr> <chr>
#1 a         1     1 D     A    
#2 b         1     1 D     A    
#3 c         1     2 D     C    
#4 d         1     2 D     C    
#5 e         2     3 E     G H  
#6 f         2     3 E     G H  
#7 g         2     4 E     G    
#8 h         2     4 E     H    

6 Comments

I don't know why but it not worked. Also, I'd like to avoid the side effect. But anyway I tried this way for(i in 4:5){ for(j in 3:2){ t <- f1(t, !!(colnames(t)[j]), !!(colnames(t)[i])) } }
@AurelianoGuedes. A closing bracket I forgot to add
Error in !rlang::sym(names(t)[i]) : invalid argument type 4. eval(lhs, parent, parent) 3. eval(lhs, parent, parent) 2. data %>% dplyr::group_by({ { group_col } ... 1. f1(!!rlang::sym(names(t)[i]), !!rlang::sym(names(t)[i]))
It is updated > packageVersion('rlang'); packageVersion('dplyr'); [1] ‘0.4.0’ [1] ‘0.8.3’
@AurelianoGuedes. Sorry, forgot about the 'i1' and 'i2'
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.