9

I have a column in a pandas df of type object that I want to parse to get the first number in the string, and create a new column containing that number as an int.

For example:

Existing df

    col
    'foo 12 bar 8'
    'bar 3 foo'
    'bar 32bar 98'

Desired df

    col               col1
    'foo 12 bar 8'    12
    'bar 3 foo'       3
    'bar 32bar 98'    32

I have code that works on any individual cell in the column series

int(re.search(r'\d+', df.iloc[0]['col']).group())

The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:

df['col1'] = int(re.search(r'\d+', df['col']).group())

I get the following Error:

TypeError: expected string or bytes-like object

I tried wrapping a str() around df['col'] which got rid of the error but yielded all 0's in col1

I've also tried converting col to a list of strings and iterating through the list, which only yields the same error. Does anyone know what I'm doing wrong? Help would be much appreciated.

4
  • check out the DataFrame.apply() method. Probably your computation is too complex for a simple assign. Commented Sep 21, 2017 at 18:18
  • 5
    You might try df['col'].str.extract(r'(\d+)') Commented Sep 21, 2017 at 18:18
  • @WiktorStribiżew, i'd also add expand=False... Commented Sep 21, 2017 at 18:20
  • @WiktorStribiżew That worked perfectly, thanks! Commented Sep 21, 2017 at 18:29

1 Answer 1

17

This will do the trick:

new_column = []    
for values in df['col']:
    new_column.append(re.search(r'\d+', values).group())

df['col1'] = new_column

the output looks like this:

            col    col1
0  foo 12 bar 8      12
1     bar 3 foo       3
2  bar 32bar 98      32
Sign up to request clarification or add additional context in comments.

2 Comments

The list has the same name as the regex method (search), which I find a bit confusing. mylist = [], mylist.append... would imho make it a clearer example.
stackoverflow.com/questions/58973981/… this approach is much better. Using a for loop in a dataframe is an instant red flag

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.