0

I have a data frame with a grouping variable "id" and a string variable "id_c". Within each group, there may be an 'id_c' with one or more trailing >.

example_df <- data.frame(
         id = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5),
         id_c = c("1", "1" , "1>", "2", "2", "3", "3", "4", "4", "4", "4>>", "5", "5>"))

   id id_c
1   1    1 #
2   1    1 #
3   1   1> # one trailing > in group 1
4   2    2 
5   2    2  
6   3    3  
7   3    3 
8   4    4  #
9   4    4  #
10  4    4  #
11  4  4>>  # two trailing > in group 4 
12  5    5  #
13  5   5>  # one trailing > in group 5

For each 'id', if there is an 'id_c' value with trailing > or >>, I want to paste either > or >> to the remaining rows (i.e. originally lacking >). It is a little hard to describe in words so here is my desired output:

   id id_c 
1   1   1> 
2   1   1> 
3   1   1> 
4   2    2 
5   2    2 
6   3    3 
7   3    3 
8   4  4>> 
9   4  4>> 
10  4  4>> 
11  4  4>> 
12  5   5> 
13  5   5> 
0

2 Answers 2

2
## the initial version of your question used vectors
id <- c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5)
id_c <- c("1", "1", "1>", "2", "2", "3", "3", "4", "4", "4", "4>>", "5", "5>")

A base R approach, using look-up:

## rows with ">"
rowid <- grep(">", id_c)
## look-up index
lookup <- match(id, id[rowid], nomatch = 0L)
## replacement using look-up
repl <- id_c[rowid][lookup]
## fill-in
id_c[lookup > 0L] <- repl

id_c
# [1] "1>"  "1>"  "1>"  "2"   "2"   "3"   "3"   "4>>" "4>>" "4>>" "4>>" "5>" 
#[13] "5>" 

The idea is not that transparent, but the code is vectorized and no type conversion or string manipulation is involved.

Sign up to request clarification or add additional context in comments.

Comments

0

Here's a dplyr approach. We first group_by the id column, and find out which record has one or more ">" symbol. Then we also need to "flag" the record that originally has the ">" symbol, so that we would skip these records when appending the ">" symbol, otherwise, we will append additional ">" to it.

library(dplyr)
library(tidyr)

example_df %>% 
  group_by(id) %>% 
  mutate(new_id_c = str_extract(id_c, ">+"),
         flag = is.na(new_id_c)) %>% 
  fill(new_id_c, .direction = "up") %>% 
  mutate(new_id_c = ifelse(flag & !is.na(new_id_c), paste0(id_c, new_id_c), id_c), .keep = "unused")

# A tibble: 13 × 3
# Groups:   id [5]
      id   day new_id_c
   <dbl> <dbl> <chr>   
 1     1    10 1>      
 2     1    15 1>      
 3     1    NA 1>      
 4     2    10 2       
 5     2    15 2       
 6     3    10 3       
 7     3    15 3       
 8     4    10 4>>     
 9     4    15 4>>     
10     4    20 4>>     
11     4    NA 4>>     
12     5    10 5>      
13     5    NA 5>   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.