Use map with nested lists

Question

I am struggling with using map from the library purrr correctly. I would like to calculate the weighted mean of my sample by nesting a common observations in a list and then using map(). (I know this would also work with group_by)

MWE: Suppose I have observed 3 different subjects (indicated by 'id'), I have their sample weights ('weights') and corresponding observations ('obs').

df <- tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1)
df
# A tibble: 6 x 3
     id weights   obs
  <dbl>   <dbl> <int>
1     1    0.3      6
2     1    0.7      5
3     2    0.25     4
4     2    0.75     3
5     3    0.14     2
6     3    0.86     1

I would like to calculate the weighted average in each subject.Therefore, I nest the weights and observations.

df %>% nest(data = c(weights, obs))
# A tibble: 3 x 2
     id data            
  <dbl> <list>          
1     1 <tibble [2 x 2]>
2     2 <tibble [2 x 2]>
3     3 <tibble [2 x 2]>

Now I would like to use map to apply a function to each element of data. More precisely, I try to solve it as following

df %>% nest(data = c(weights, obs)) %>% map(data, ~ (.x$weights*.x$obs)/sum(.x$weights))

Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found

As you can see this results in a lot of error messages. In order to better understand map I tried to multiply the weights vector of each ID with 2.

df %>% nest(data = c(weights, obs)) %>% map(data, ~ .x$weights*2)
$id
[1] ".x[[i]]"         "~.x$weights * 2"

$data
[1] ".x[[i]]"         "~.x$weights * 2"

Warning messages:
1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
2: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
4: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found

and

df %>% nest(data = c(weights, obs)) %>% map(data, function(x) x$weights*2)
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘function(x) x$weights * 2’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘function(x) x$weights * 2’ not found
$id
[1] ".x[[i]]"                   "function(x) x$weights * 2"

$data
[1] ".x[[i]]"                   "function(x) x$weights * 2"

So I also get error messages here. I am quite lost even after reading the documentation of map. I do not see my error. I am happy about any insights!

Thanks a lot!

akrun · Accepted Answer · 2022-02-08 16:15:35Z

3

We may pass the map within mutate because the data column is not accessible outside the data, unless we use .$data

library(dplyr)
library(purrr)
df %>%
   nest(data = c(weights, obs)) %>%
    mutate(wtd_mean = map_dbl(data, ~ sum(.x$weights*.x$obs)/sum(.x$weights)))

-output

# A tibble: 3 × 3
     id data             wtd_mean
  <dbl> <list>              <dbl>
1     1 <tibble [2 × 2]>     5.3 
2     2 <tibble [2 × 2]>     3.25
3     3 <tibble [2 × 2]>     1.14

There is also weighted.mean function from stats (base R)

df %>% 
   nest(data = c(weights, obs)) %>% 
   mutate(wtd_mean = map_dbl(data, ~ weighted.mean(.x$obs, .x$weights)))
# A tibble: 3 × 3
     id data             wtd_mean
  <dbl> <list>              <dbl>
1     1 <tibble [2 × 2]>     5.3 
2     2 <tibble [2 × 2]>     3.25
3     3 <tibble [2 × 2]>     1.14

edited Feb 8, 2022 at 16:15

answered Feb 8, 2022 at 16:05

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Baraliuh · Accepted Answer · 2022-02-08 16:14:44Z

2

Could do split-map_dbl for a named vector as return or map_df if a tibble is desired:

tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>% 
    split.data.frame(.$id) %>% 
    map_dbl(
        ~sum(.x$weights * .x$obs)/sum(.x$weights)
    )
   1    2    3 
5.30 3.25 1.14 
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>% 
    split.data.frame(.$id) %>% 
    map_df(
        ~sum(.x$weights * .x$obs)/sum(.x$weights)
    )
    # A tibble: 1 x 3
    `1`   `2`   `3`
  <dbl> <dbl> <dbl>
1   5.3  3.25  1.14

answered Feb 8, 2022 at 16:14

Baraliuh

2,1617 silver badges14 bronze badges

2 Comments

Methamortix Over a year ago

Thanks! Is there an advantage (e.g. in terms of speed) to use split.data.frame versus nest or group_by? I could also use ``` df %>% group_by(id) %>% summarize(weighted_avg = sum(obs*weight)/sum(weight))```. As far as I see now it comes down to personal preference or how the output should look like.

Baraliuh Over a year ago

Yeah, split-map can be faster for larger datasets. At least that was the case a few R-versions ago.

Collectives™ on Stack Overflow

Use map with nested lists

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related