Replacing values based on conditions

Question

I have a dataframe one of the cols is id and some of the values have been messed up during the recording of the data.

here's an example of the type of data

dput(df)
structure(list(Id = c("'110171786'", "'1103fbfd5'", "'0700edf6dc'", 
"'1103fad09'", "'01103fc9bb'", "''", "''", "0000fba2b'", "'01103fb169'", 
"'01103fd723'", "'01103f9c34'", "''", "''", "''", "'01103fc088'", 
"'01103fa6d8'", "'01103fb374'", "'01103fce8c'", "'01103f955d'", 
"'011016e633'", "'01103fa0da'", "''", "''", "''", "'01103fa4bd'", 
"'01103fb5c4'", "'01103fd0d7'", "'01103f9e2e'", "'01103fc657'", 
"'01103fd4d1'", "'011016e78e'", "'01103fbda2'", "'01103fbae7'", 
"'011016ee23'", "'01103fc847'", "'01103fbfbb'", "''", "'01103fb8bb'", 
"'01103fc853'", "''", "'01103fbcd5'", "'011016e690'", "'01103fb253'", 
"'01103fcb19'", "'01103fb446'", "'01103fa4fa'", "'011016cfbd'", 
"'01103fd250'", "'01103fac7d'", "'011016a86e'"), Weight = c(11.5, 
11.3, 11.3, 10.6, 10.6, 8.9, 18.7, 10.9, 11.3, 18.9, 18.9, 8.6, 
8.8, 8.4, 11, 10.4, 10.4, 10.8, 11.2, 11, 10.3, 9.5, 8.1, 9.3, 
10.2, 10.5, 11.2, 21.9, 18, 17.8, 11.3, 11.5, 10.8, 10.5, 12.8, 
10.9, 8.9, 10.3, 10.8, 8.9, 10.9, 9.9, 19, 11.6, 11.3, 11.7, 
10.9, 12.1, 11.3, 10.6)), class = "data.frame", row.names = c(NA, 
-50L))
>

What I would like to do is search through the id column and replace the following mistakes

some of the values have a zero missing off the front, all of these would start with a 1 now instead which makes finding them easily. So basically anything that has a character length of 9 and starts with a 1 needs a 0 as the first character.
some of the values are less than 10 characters long, these need to be removed.
some have more than one leading 0 and these need to be removed.

Why are your Id's double quoted eg "'110171786'" instead of "110171786"? Just curious — Jonny Phelps
– Jonny Phelps, Commented Dec 16, 2021 at 15:36
I think that just initially to stop excel from treating them like numbers and removing the zeros (which didn't work) and some of the IDs have an "E" in the middle and excel turns that into scientific notation. Relic of an old database system — McMahok
– McMahok, Commented Dec 17, 2021 at 7:53

r2evans · Accepted Answer · 2021-12-16 14:12:06Z

df$Id <- gsub("^('?)(1.{8}')$", "\\10\\2", df$Id)
df[ !grepl("^'?(00|'$)", df$Id),]
#              Id Weight
# 1  '0110171786'   11.5
# 2  '01103fbfd5'   11.3
# 3  '0700edf6dc'   11.3
# 4  '01103fad09'   10.6
# 5  '01103fc9bb'   10.6
# 9  '01103fb169'   11.3
# 10 '01103fd723'   18.9
# 11 '01103f9c34'   18.9
# 15 '01103fc088'   11.0
# 16 '01103fa6d8'   10.4
# 17 '01103fb374'   10.4
# 18 '01103fce8c'   10.8
# 19 '01103f955d'   11.2
# 20 '011016e633'   11.0
# 21 '01103fa0da'   10.3
# 25 '01103fa4bd'   10.2
# 26 '01103fb5c4'   10.5
# 27 '01103fd0d7'   11.2
# 28 '01103f9e2e'   21.9
# 29 '01103fc657'   18.0
# 30 '01103fd4d1'   17.8
# 31 '011016e78e'   11.3
# 32 '01103fbda2'   11.5
# 33 '01103fbae7'   10.8
# 34 '011016ee23'   10.5
# 35 '01103fc847'   12.8
# 36 '01103fbfbb'   10.9
# 38 '01103fb8bb'   10.3
# 39 '01103fc853'   10.8
# 41 '01103fbcd5'   10.9
# 42 '011016e690'    9.9
# 43 '01103fb253'   19.0
# 44 '01103fcb19'   11.6
# 45 '01103fb446'   11.3
# 46 '01103fa4fa'   11.7
# 47 '011016cfbd'   10.9
# 48 '01103fd250'   12.1
# 49 '01103fac7d'   11.3
# 50 '011016a86e'   10.6

Collectives™ on Stack Overflow

Replacing values based on conditions

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related