2

I am struggling with using map from the library purrr correctly. I would like to calculate the weighted mean of my sample by nesting a common observations in a list and then using map(). (I know this would also work with group_by)

MWE: Suppose I have observed 3 different subjects (indicated by 'id'), I have their sample weights ('weights') and corresponding observations ('obs').

df <- tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1)
df
# A tibble: 6 x 3
     id weights   obs
  <dbl>   <dbl> <int>
1     1    0.3      6
2     1    0.7      5
3     2    0.25     4
4     2    0.75     3
5     3    0.14     2
6     3    0.86     1

I would like to calculate the weighted average in each subject.Therefore, I nest the weights and observations.

df %>% nest(data = c(weights, obs))
# A tibble: 3 x 2
     id data            
  <dbl> <list>          
1     1 <tibble [2 x 2]>
2     2 <tibble [2 x 2]>
3     3 <tibble [2 x 2]>

Now I would like to use map to apply a function to each element of data. More precisely, I try to solve it as following

df %>% nest(data = c(weights, obs)) %>% map(data, ~ (.x$weights*.x$obs)/sum(.x$weights))

Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found

As you can see this results in a lot of error messages. In order to better understand map I tried to multiply the weights vector of each ID with 2.

df %>% nest(data = c(weights, obs)) %>% map(data, ~ .x$weights*2)
$id
[1] ".x[[i]]"         "~.x$weights * 2"

$data
[1] ".x[[i]]"         "~.x$weights * 2"

Warning messages:
1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
2: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
4: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found

and

df %>% nest(data = c(weights, obs)) %>% map(data, function(x) x$weights*2)
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘function(x) x$weights * 2’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘function(x) x$weights * 2’ not found
$id
[1] ".x[[i]]"                   "function(x) x$weights * 2"

$data
[1] ".x[[i]]"                   "function(x) x$weights * 2"

So I also get error messages here. I am quite lost even after reading the documentation of map. I do not see my error. I am happy about any insights!

Thanks a lot!

2 Answers 2

3

We may pass the map within mutate because the data column is not accessible outside the data, unless we use .$data

library(dplyr)
library(purrr)
df %>%
   nest(data = c(weights, obs)) %>%
    mutate(wtd_mean = map_dbl(data, ~ sum(.x$weights*.x$obs)/sum(.x$weights)))

-output

# A tibble: 3 × 3
     id data             wtd_mean
  <dbl> <list>              <dbl>
1     1 <tibble [2 × 2]>     5.3 
2     2 <tibble [2 × 2]>     3.25
3     3 <tibble [2 × 2]>     1.14

There is also weighted.mean function from stats (base R)

df %>% 
   nest(data = c(weights, obs)) %>% 
   mutate(wtd_mean = map_dbl(data, ~ weighted.mean(.x$obs, .x$weights)))
# A tibble: 3 × 3
     id data             wtd_mean
  <dbl> <list>              <dbl>
1     1 <tibble [2 × 2]>     5.3 
2     2 <tibble [2 × 2]>     3.25
3     3 <tibble [2 × 2]>     1.14
Sign up to request clarification or add additional context in comments.

Comments

2

Could do split-map_dbl for a named vector as return or map_df if a tibble is desired:

tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>% 
    split.data.frame(.$id) %>% 
    map_dbl(
        ~sum(.x$weights * .x$obs)/sum(.x$weights)
    )
   1    2    3 
5.30 3.25 1.14 
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>% 
    split.data.frame(.$id) %>% 
    map_df(
        ~sum(.x$weights * .x$obs)/sum(.x$weights)
    )
    # A tibble: 1 x 3
    `1`   `2`   `3`
  <dbl> <dbl> <dbl>
1   5.3  3.25  1.14

2 Comments

Thanks! Is there an advantage (e.g. in terms of speed) to use split.data.frame versus nest or group_by? I could also use ``` df %>% group_by(id) %>% summarize(weighted_avg = sum(obs*weight)/sum(weight))```. As far as I see now it comes down to personal preference or how the output should look like.
Yeah, split-map can be faster for larger datasets. At least that was the case a few R-versions ago.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.