0

My aim is to join a dataframe to a dataframes held within a nested list-column, eg:

data(mtcars)
library(tidyr)
library(purrr)

mtcars_nest <- mtcars %>% rownames_to_column() %>% rename(rowname_1 = rowname) %>% select(-mpg) %>% group_by(cyl) %>% nest()
mtcars_mpg <- mtcars %>% rownames_to_column() %>% rename(rowname_2 = rowname) %>% select(rowname_2, mpg)

join_df <- function(df_nest, df_other) {
  df_all <- df_nest %>% inner_join(df_other, by = c("rowname_1" = "rowname_2"))
}

join_df <- mtcars_nest %>%
  mutate(new_mpg = map_df(data, join_df(., mtcars_mpg)))

This returns the following error:

# Error in mutate_impl(.data, dots) : Evaluation error: `by` can't contain join column `rowname_1` which is missing from LHS.

So the dataframe map_* receives from the nested input isn't offering a column name (ie rowname_1) to take part in the join. I can't work out why this is the case. I'm passing the data column that contains dataframes from the nested dataframe. I want a dataframe output that can be added to a new column in the input nested dataframe, eg

| rowname_1 | cyl | disp |...|mpg|
|:----------|:----|:-----|:--|:--|
1
  • 1
    Missing library(dplyr);library(tibble). Commented May 1, 2018 at 23:11

1 Answer 1

4

A couple things:

  • you should use the tilde to functionize (in purrr) the function argument to map*; and
  • I think you should be using map instead of map_df, and though I cannot find exactly why map_df doesn't work right, I can get what I think is your desired behavior without it.

Minor point:

  • you assign to df_all within join_df(), and the only reason it is working is because that assignment invisibly returns what you assigned to df_all; I suggest you should be explicit: either follow-up with return(df_all) or just don't assign it, end with inner_join(...).

Try this:

library(tibble) # rownames_to_column
library(dplyr)
library(tidyr)  # nest
library(purrr)

join_df <- function(df_nest, df_other) {
  df_all <- inner_join(df_nest, df_other, by = c("rowname_1" = "rowname_2"))
  return(df_all)
}

mtcars_nest %>%
  mutate(new_mpg = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 3
#     cyl data               new_mpg           
#   <dbl> <list>             <list>            
# 1    6. <tibble [7 x 10]>  <tibble [7 x 11]> 
# 2    4. <tibble [11 x 10]> <tibble [11 x 11]>
# 3    8. <tibble [14 x 10]> <tibble [14 x 11]>

The new_mpg is effectively the data column with one additional column. Since we know that we have full redundancy, you can always over-write (or remove) data:

mtcars_nest %>%
  mutate(data = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 2
#     cyl data              
#   <dbl> <list>            
# 1    6. <tibble [7 x 11]> 
# 2    4. <tibble [11 x 11]>
# 3    8. <tibble [14 x 11]>

and get your nested and now augmented frames.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @r2evans. Good tip about ~. Hadn't realised the effect of ~ on evaluation. For others, see this post stackoverflow.com/a/44834671/2802810

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.