0

I'm using the purrr::map function to iterate over several columns and tidy the result. for a short example, I provide the following code:

library(tidymodels)
library(broom)

> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(broom::tidy, .id = "var")
# A tibble: 12 × 6
   var               term                       estimate std.error statistic   p.value
   <chr>             <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 bill_length_mm    (Intercept)                 38.8       0.241    161.    2.47e-322
 2 bill_length_mm    penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 bill_length_mm    penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 bill_depth_mm     (Intercept)                 18.3       0.0912   201.    0        
 5 bill_depth_mm     penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 bill_depth_mm     penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 flipper_length_mm (Intercept)                190.        0.540    351.    0        
 8 flipper_length_mm penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 flipper_length_mm penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 body_mass_g       (Intercept)               3701.       37.6       98.4   2.49e-251
11 body_mass_g       penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 body_mass_g       penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77

This works as expected.

However, usually when I map functions with additional arguments, I use an anonymous function as suggested in the doc. When I try it in this example, only changing the last line of the code from previous code, I get the tidy table with all regerssions results, but without the "var" column which tells me the variable included in the regression

> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(\(x) broom::tidy(x, .id = "var"))
# A tibble: 12 × 5
   term                       estimate std.error statistic   p.value
   <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                 38.8       0.241    161.    2.47e-322
 2 penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 (Intercept)                 18.3       0.0912   201.    0        
 5 penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 (Intercept)                190.        0.540    351.    0        
 8 penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 (Intercept)               3701.       37.6       98.4   2.49e-251
11 penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77
> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(~ broom::tidy(.x, .id = "var"))
# A tibble: 12 × 5
   term                       estimate std.error statistic   p.value
   <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                 38.8       0.241    161.    2.47e-322
 2 penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 (Intercept)                 18.3       0.0912   201.    0        
 5 penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 (Intercept)                190.        0.540    351.    0        
 8 penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 (Intercept)               3701.       37.6       98.4   2.49e-251
11 penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77

What is the reason for this behavior?

1
  • 2
    tidy doesn't have an .id argument. It is passed to map itself. You can switch out the last line with map_df(coef, .id = "var") and see it still works Commented Jul 14, 2024 at 20:47

1 Answer 1

2

The problem is that .id = "var" is not an argument for broom::tidy, but for purrr::map_df(). Under the hood purrr::map_df() is like purrr::map(), returning a list. But then it calls dplyr::bind_rows(), creating a data frame. The .id argument is passed to that function. When you provide .id to bind_rows(), it turns the names of the list into a column with the column name provided in the .id argument. broom::tidy() discards the .id argument unless the tidying method has such an argument. This is why you are missing your column.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.