3

This is a sample data:

df <- data.table(cake = c(1, 2, 3, 4, 5, 6, 7, "c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3", "c3"),  walk = c(183, 789, 753, 130, 126, 44, 325, 710, 307, 264, 708, 769, 742, 559, 181, 138));

I wish to add a column final in this data.table which is equal to column walk only if adjoining row entry in the column cake is unique, but if it isnt unique i.e. there are multiple items then take minimum off all the values and only display it for the top one, rest can be set to zero.

e.g. cake:final :: 1:183, 2:789,,, c1:264, c3:138...etc.

ideally this would be final column.

final=c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)

I have tried this code, but it is wrong.

df[, is_unique := !duplicated(cake)]
df[, cake_count := .N, by = cake]
df[, min_walk := ifelse(duplicated(cake), min(walk), walk)]
df[, final := ifelse(is_unique, min_walk, 0)]

I would appreciate if it can be done using data.table package. I believe data.table is works better with very large datasets.

The column cake

  1. is ordered here, but it isn't ordered sometimes.
  2. has both numbers and characters or mix of both.

The column walk

  1. is always values.

Please also give me code for, if in future i need to repeat the minimum value for all the non unique entries rather than fixing them to zero.

I need to apply it to a very large dataset around a million. Hence the code needs to be very efficient.

3 Answers 3

4
df$goal = c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)

df[, result := c(min(walk), rep(0, .N - 1)), by = cake]
df
#     cake walk goal result
#  1:    1  183  183    183
#  2:    2  789  789    789
#  3:    3  753  753    753
#  4:    4  130  130    130
#  5:    5  126  126    126
#  6:    6   44   44     44
#  7:    7  325  325    325
#  8:   c1  710  264    264
#  9:   c1  307    0      0
# 10:   c1  264    0      0
# 11:   c2  708  708    708
# 12:   c2  769    0      0
# 13:   c3  742  181    138
# 14:   c3  559    0      0
# 15:   c3  181    0      0
# 16:   c3  138    0      0

Or for more control over the placement:

## put the min 2nd for groups
df[, result := fifelse(.N == 1 | seq_len(.N) == 2, min(walk), 0), by = cake]
Sign up to request clarification or add additional context in comments.

16 Comments

You don't need the conditional, as long as you accept the inefficiency that min of a single value is an identity. df[, result := c(min(walk), rep(0, .N - 1)), by = cake]. (The rep(0, .N-1) is safe enough to return length-0 when .N is 1.)
@r2evans thanks - just realized that as you were posting. My guess is min() is pretty darn quick on a length-1 input, probably comparable to the if(.N == 1) check.
df[, result := fifelse(walk == min(walk), min(walk), 0), by = cake]
I have now tested it, purpose solved. However, the run time is bothering me. my data is around 1m. Actually earlier I was doing aggregation and then doing sum or min. but i prefer not to aggregate and keep all the rows. Aggregation using data.table was far more efficient than what I have done now.
Can anything be done to make the code run quicker?
|
2

One possible solution:

df[,minwalk:=min(walk),by="cake"][minwalk!=walk,walk:=0][,minwalk:=NULL][order(cake,-walk)]
#>     cake walk
#>  1:    1  183
#>  2:    2  789
#>  3:    3  753
#>  4:    4  130
#>  5:    5  126
#>  6:    6   44
#>  7:    7  325
#>  8:   c1  264
#>  9:   c1    0
#> 10:   c1    0
#> 11:   c2  708
#> 12:   c2    0
#> 13:   c3  138
#> 14:   c3    0
#> 15:   c3    0
#> 16:   c3    0

1 Comment

i would like the code to work without ordering.
2

You could consider multiplying the minimum with a bunch of 0's and 1's to remain with the final output:

df[,final := min(walk)*(seq_len(.N)==1), cake][]

    cake walk final
 1:    1  183   183
 2:    2  789   789
 3:    3  753   753
 4:    4  130   130
 5:    5  126   126
 6:    6   44    44
 7:    7  325   325
 8:   c1  710   264
 9:   c1  307     0
10:   c1  264     0
11:   c2  708   708
12:   c2  769     0
13:   c3  742   138
14:   c3  559     0
15:   c3  181     0
16:   c3  138     0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.