R: minimumifs with a condition

Question

This is a sample data:

df <- data.table(cake = c(1, 2, 3, 4, 5, 6, 7, "c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3", "c3"),  walk = c(183, 789, 753, 130, 126, 44, 325, 710, 307, 264, 708, 769, 742, 559, 181, 138));

I wish to add a column final in this data.table which is equal to column walk only if adjoining row entry in the column cake is unique, but if it isnt unique i.e. there are multiple items then take minimum off all the values and only display it for the top one, rest can be set to zero.

e.g. cake:final :: 1:183, 2:789,,, c1:264, c3:138...etc.

ideally this would be final column.

final=c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)

I have tried this code, but it is wrong.

df[, is_unique := !duplicated(cake)]
df[, cake_count := .N, by = cake]
df[, min_walk := ifelse(duplicated(cake), min(walk), walk)]
df[, final := ifelse(is_unique, min_walk, 0)]

I would appreciate if it can be done using data.table package. I believe data.table is works better with very large datasets.

The column cake

is ordered here, but it isn't ordered sometimes.
has both numbers and characters or mix of both.

The column walk

is always values.

Please also give me code for, if in future i need to repeat the minimum value for all the non unique entries rather than fixing them to zero.

I need to apply it to a very large dataset around a million. Hence the code needs to be very efficient.

Gregor Thomas · Accepted Answer · 2023-10-26 15:21:51Z

4

df$goal = c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)

df[, result := c(min(walk), rep(0, .N - 1)), by = cake]
df
#     cake walk goal result
#  1:    1  183  183    183
#  2:    2  789  789    789
#  3:    3  753  753    753
#  4:    4  130  130    130
#  5:    5  126  126    126
#  6:    6   44   44     44
#  7:    7  325  325    325
#  8:   c1  710  264    264
#  9:   c1  307    0      0
# 10:   c1  264    0      0
# 11:   c2  708  708    708
# 12:   c2  769    0      0
# 13:   c3  742  181    138
# 14:   c3  559    0      0
# 15:   c3  181    0      0
# 16:   c3  138    0      0

Or for more control over the placement:

## put the min 2nd for groups
df[, result := fifelse(.N == 1 | seq_len(.N) == 2, min(walk), 0), by = cake]

edited Oct 26, 2023 at 15:21

answered Oct 26, 2023 at 14:48

Gregor Thomas

147k22 gold badges185 silver badges320 bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

r2evans Over a year ago

You don't need the conditional, as long as you accept the inefficiency that min of a single value is an identity. df[, result := c(min(walk), rep(0, .N - 1)), by = cake]. (The rep(0, .N-1) is safe enough to return length-0 when .N is 1.)

Gregor Thomas Over a year ago

@r2evans thanks - just realized that as you were posting. My guess is min() is pretty darn quick on a length-1 input, probably comparable to the if(.N == 1) check.

Gregor Thomas Over a year ago

df[, result := fifelse(walk == min(walk), min(walk), 0), by = cake]

Mohit Over a year ago

I have now tested it, purpose solved. However, the run time is bothering me. my data is around 1m. Actually earlier I was doing aggregation and then doing sum or min. but i prefer not to aggregate and keep all the rows. Aggregation using data.table was far more efficient than what I have done now.

Mohit Over a year ago

Can anything be done to make the code run quicker?

|

Waldi · Accepted Answer · 2023-10-26 14:52:18Z

2

One possible solution:

df[,minwalk:=min(walk),by="cake"][minwalk!=walk,walk:=0][,minwalk:=NULL][order(cake,-walk)]
#>     cake walk
#>  1:    1  183
#>  2:    2  789
#>  3:    3  753
#>  4:    4  130
#>  5:    5  126
#>  6:    6   44
#>  7:    7  325
#>  8:   c1  264
#>  9:   c1    0
#> 10:   c1    0
#> 11:   c2  708
#> 12:   c2    0
#> 13:   c3  138
#> 14:   c3    0
#> 15:   c3    0
#> 16:   c3    0

edited Oct 26, 2023 at 14:52

answered Oct 26, 2023 at 14:46

Waldi

41.6k6 gold badges38 silver badges90 bronze badges

1 Comment

Mohit Over a year ago

i would like the code to work without ordering.

Onyambu · Accepted Answer · 2023-10-26 15:18:15Z

2

You could consider multiplying the minimum with a bunch of 0's and 1's to remain with the final output:

df[,final := min(walk)*(seq_len(.N)==1), cake][]

    cake walk final
 1:    1  183   183
 2:    2  789   789
 3:    3  753   753
 4:    4  130   130
 5:    5  126   126
 6:    6   44    44
 7:    7  325   325
 8:   c1  710   264
 9:   c1  307     0
10:   c1  264     0
11:   c2  708   708
12:   c2  769     0
13:   c3  742   138
14:   c3  559     0
15:   c3  181     0
16:   c3  138     0

answered Oct 26, 2023 at 15:18

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

R: minimumifs with a condition

3 Answers 3

16 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

16 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related