0

I have dataframe dataf with column PlayerValue

         PlayerName           playerValue
1     Michy Batshuayi        40,00 Mill. €  
2     Tiemoué Bakayoko       35,00 Mill. €  
3     Kurt Zouma             20,00 Mill. €  
4     Kenedy                 10,00 Mill. €  
5     Tammy Abraham          10,00 Mill. €  
6     Abdul Rahman Baba      8,00 Mill. €  
7     Mario Pasalic          8,00 Mill. €  
8     Lewis Baker            5,50 Mill. €  
9     Ola Aina               4,00 Mill. €  
10    Tomas Kalas            4,00 Mill. €  

I would like to make it get just the number (and if possible replace the comma with a decimal point) in the column like this

         PlayerName           playerValue
1     Michy Batshuayi           40,00 # 40.00, if possible
2     Tiemoué Bakayoko          35,00  
3     Kurt Zouma                20,00  
4     Kenedy                    10,00  
5     Tammy Abraham             10,00   
6     Abdul Rahman Baba         8,00   
7     Mario Pasalic             8,00  
8     Lewis Baker               5,50  
9     Ola Aina                  4,00   
10    Tomas Kalas               4,00   
1
  • If the column is always in "##,## Mill. €" format you can simply replace the non-numeric parts with blank character "". As in library(stringr); x <- str_replace(x, " Mill. €", ""); x <- str_replace(x, ",", "") But to cover more complicated cases you should first replace the comma with "" and then use regular expressions (also supported by stringr functions) to detect only numeric parts Commented Dec 11, 2018 at 20:00

2 Answers 2

1

This will do the trick

playerValue <- "40,00 Mill. € "
as.numeric(gsub("^(\\d+?)\\,(\\d+?)\\s.*", "\\1.\\2", playerValue, perl = TRUE))
# returns
40

Short expl. of the regex:

  • ^ tells it is the start of the string
  • \\d+\\,\\d+ means there a two sequences of numbers separated by a comma. We extract the two sequences using parentheses
  • \\s.* means after the second sequence comes a white space and after the space comes anything (nothing is also anything)
  • \\1,\\2 are the grouping we want to extract and we separate them by a dot in order to convert them to numeric
Sign up to request clarification or add additional context in comments.

Comments

1

Use gsub to replace anything after the space and also replace , to ., like this:

data$playerValue <- gsub(",", ".", gsub("[[:space:]].*", "", data$playerValue))

It will give you this output:

#         PlayerName           playerValue
#1     Michy Batshuayi               40.00
#2     Tiemoué Bakayoko              35.00  
#3     Kurt Zouma                    20.00  
#4     Kenedy                        10.00  
#5     Tammy Abraham                 10.00   
#6     Abdul Rahman Baba              8.00   
#7     Mario Pasalic                  8.00  
#8     Lewis Baker                    5.50  
#9     Ola Aina                       4.00   
#10    Tomas Kalas                    4.00   

This, if you want to convert it to a number, you can do it as follows:

data$playerValue <- as.numeric(data$playerValue)

Hope it helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.