This is a sample data:
df <- data.table(cake = c(1, 2, 3, 4, 5, 6, 7, "c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3", "c3"), walk = c(183, 789, 753, 130, 126, 44, 325, 710, 307, 264, 708, 769, 742, 559, 181, 138));
I wish to add a column final in this data.table which is equal to column walk only if adjoining row entry in the column cake is unique, but if it isnt unique i.e. there are multiple items then take minimum off all the values and only display it for the top one, rest can be set to zero.
e.g. cake:final :: 1:183, 2:789,,, c1:264, c3:138...etc.
ideally this would be final column.
final=c(183, 789, 753, 130, 126, 44, 325, 264, 0, 0, 708, 0, 181, 0, 0, 0)
I have tried this code, but it is wrong.
df[, is_unique := !duplicated(cake)]
df[, cake_count := .N, by = cake]
df[, min_walk := ifelse(duplicated(cake), min(walk), walk)]
df[, final := ifelse(is_unique, min_walk, 0)]
I would appreciate if it can be done using data.table package. I believe data.table is works better with very large datasets.
The column cake
- is ordered here, but it isn't ordered sometimes.
- has both numbers and characters or mix of both.
The column walk
- is always values.
Please also give me code for, if in future i need to repeat the minimum value for all the non unique entries rather than fixing them to zero.
I need to apply it to a very large dataset around a million. Hence the code needs to be very efficient.