Row-wise operations on column subsets in dplyr

Question

I have a dataset with nine cities. I trained and tested four different machine learning models for each city. The results are in the tibble below:

set.seed(1)

result <- 
  tibble::tibble(city = letters[1:9],
                 m1_train = runif(9),
                 m1_test = runif(9),
                 m2_train = runif(9),
                 m2_test = runif(9),
                 m3_train = runif(9),
                 m3_test = runif(9),
                 m4_train = runif(9),
                 m4_test = runif(9))

result
#> # A tibble: 9 × 9
#>   city  m1_train m1_test m2_train m2_test m3_train m3_test m4_train m4_test
#>   <chr>    <dbl>   <dbl>    <dbl>   <dbl>    <dbl>   <dbl>    <dbl>   <dbl>
#> 1 a        0.266  0.0618   0.380    0.382    0.794  0.789    0.0707  0.332 
#> 2 b        0.372  0.206    0.777    0.870    0.108  0.0233   0.0995  0.651 
#> 3 c        0.573  0.177    0.935    0.340    0.724  0.477    0.316   0.258 
#> 4 d        0.908  0.687    0.212    0.482    0.411  0.732    0.519   0.479 
#> 5 e        0.202  0.384    0.652    0.600    0.821  0.693    0.662   0.766 
#> 6 f        0.898  0.770    0.126    0.494    0.647  0.478    0.407   0.0842
#> 7 g        0.945  0.498    0.267    0.186    0.783  0.861    0.913   0.875 
#> 8 h        0.661  0.718    0.386    0.827    0.553  0.438    0.294   0.339 
#> 9 i        0.629  0.992    0.0134   0.668    0.530  0.245    0.459   0.839

In this tibble m1_train is the RMSE obtained by model 1 for the train set, m1_test is the RMSE obtained by model 1 for the test set and so on.

I'd like to create two new columns in my tibble:

min_train is the minimum RMSE only for the columns that end with _train
min_test is the minimum RMSE only for the columns that end with _test

I've been trying too many different approaches (rowwise(), mutate(vars(ends_with("_train"))) and others), without success.

How can I approach his problem?

e.g. min_train = do.call(pmin, result[endsWith(names(result), 'train')]); min_test = do.call(pmin, result[endsWith(names(result), 'test')]). This task does not require any external libraries. — Friede
– Friede, Commented Oct 28 at 14:36
Highly inexperienced and inexperienced users tend to prefer dplyr (due to syntax) no matter the cost. I recommend to never use rowwise(); when going with dplyr::mutate I would opt for mutate(min_train = Rfast::rowMins(as.matrix(across(ends_with('train'))), value=TRUE), ...) — Friede
– Friede, Commented Oct 28 at 15:04
Note that the docs of rowwise state "[...] This is most useful when a vectorised function doesn't exist. [...]". — Friede
– Friede, Commented Oct 28 at 15:10
base R, tidy (add column selection either with ends_with or grepl) or dt — lailaps
– lailaps, Commented Oct 28 at 15:32

r2evans · Accepted Answer · 2025-10-30 14:54:19Z

4

For the record, combining Friede's pmin with dplyr is straight-forward.

library(dplyr)
result |>
  mutate(
    min_train = do.call(pmin, pick(ends_with("train"))),
    min_test = do.call(pmin, pick(ends_with("test")))
  )
# # A tibble: 9 × 11
#   city  m1_train m1_test m2_train m2_test m3_train m3_test m4_train m4_test min_train min_test
#   <chr>    <dbl>   <dbl>    <dbl>   <dbl>    <dbl>   <dbl>    <dbl>   <dbl>     <dbl>    <dbl>
# 1 a        0.266  0.0618   0.380    0.382    0.794  0.789    0.0707  0.332     0.0707   0.0618
# 2 b        0.372  0.206    0.777    0.870    0.108  0.0233   0.0995  0.651     0.0995   0.0233
# 3 c        0.573  0.177    0.935    0.340    0.724  0.477    0.316   0.258     0.316    0.177 
# 4 d        0.908  0.687    0.212    0.482    0.411  0.732    0.519   0.479     0.212    0.479 
# 5 e        0.202  0.384    0.652    0.600    0.821  0.693    0.662   0.766     0.202    0.384 
# 6 f        0.898  0.770    0.126    0.494    0.647  0.478    0.407   0.0842    0.126    0.0842
# 7 g        0.945  0.498    0.267    0.186    0.783  0.861    0.913   0.875     0.267    0.186 
# 8 h        0.661  0.718    0.386    0.827    0.553  0.438    0.294   0.339     0.294    0.339 
# 9 i        0.629  0.992    0.0134   0.668    0.530  0.245    0.459   0.839     0.0134   0.245

rowwise() makes some things easier, and for small data is perfectly fine. As your data size grows, it can be significantly slower; I wouldn't be worried about it until many more rows. As an example, if result has 10,000 rows, then the rowwise() method takes over 3 seconds, and this code is nearly instantaneous.

If you don't like do.call and really need to stick with tidyverse-functions, replace it with purrr::invoke for the same results; the runtime is still fast, though with 10K-row data do.call is almost 50% faster than invoke (not sure why). Still much faster than rowwise(). Edit: invoke is deprecated in favor of rlang::exec.

edited Oct 30 at 14:54

answered Oct 28 at 15:29

r2evans

167k8 gold badges92 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

r2evans Oct 28 at 17:11

Friede, perhaps instead of "perfectly fine" I could use "acceptable", but more-so as a distinction between "always implement best-practices and the fastest code" and "welcome to R, this is one way".

r2evans Oct 28 at 20:08

Friede, I don't understand your issue here. Are you taking so much issue with my use of "perfectly fine" that you are soap-boxing base-R on a question tagged with dplyr? Or are you defending against perceived criticism? For the latter, I think nobody is suggesting anything from your comment or answer. For the former, there is definitely value in showing different "dialects" in answers, but I do think that the original requested dialect should be addressed.

r2evans Oct 28 at 21:11

@Friede, I apologize for misinterpreting your objection. Other than the potential for hyperbole in "perfectly fine", is there another point?

Onyambu Oct 30 at 14:06

Just curious wasn't invoke and lift deprecated in favor of exec?

r2evans Oct 30 at 14:10

Yes it was, thanks @Onyambu

|

joran · Accepted Answer · 2025-10-28 16:01:28Z

3

The various other answers are good. Since you were initially trying rowwise, and I find people often stumble over getting the syntax right using rowwise, this will work using c_across:

result |> 
  rowwise() |> 
  mutate(
    min_train = min(c_across(ends_with("train"))),
    min_test = min(c_across(ends_with("test")))
  ) |> 
  ungroup()

edited Oct 28 at 16:01

answered Oct 28 at 14:43

joran

175k34 gold badges439 silver badges485 bronze badges

Comments

Friede · Accepted Answer · 2025-10-28 18:21:37Z

Sticking to your preferred library, sometimes we are looking for:

result |>
  tidyr::pivot_longer(cols=-city, names_to=c('mod', 'set'),                 
                      names_pattern='(m\\d+)_(train|test)', values_to='val') |>
  dplyr::filter(val==min(val), .by=c(city, set))

-output

# A tibble: 18 × 4
   city  mod   set      val
   <chr> <chr> <chr>  <dbl>
 1 a     m1    test  0.0618
 2 a     m4    train 0.0707
 3 b     m3    test  0.0233
 4 b     m4    train 0.0995
 5 c     m1    test  0.177 
 6 c     m4    train 0.316 
 7 d     m2    train 0.212 
 8 d     m4    test  0.479 
 9 e     m1    train 0.202 
10 e     m1    test  0.384 
11 f     m2    train 0.126 
12 f     m4    test  0.0842
13 g     m2    train 0.267 
14 g     m2    test  0.186 
15 h     m4    train 0.294 
16 h     m4    test  0.339 
17 i     m2    train 0.0134
18 i     m3    test  0.245

EDIT to add the suggestion from comment below question:

result |> 
  transform(
    min_train = do.call('pmin', result[endsWith(names(result), 'train')]),    
    min_test = do.call('pmin', result[endsWith(names(result), 'test')])
    )

Collectives™ on Stack Overflow

Row-wise operations on column subsets in dplyr

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related