Strip Numbers From String in Python [duplicate]

Question

Is there an efficient way to strip out numbers from a string in python? Using nltk or base python?

Thanks, Ben

if I have a string let's say for example: x = "I have 3 dogs" I'd want a way to turn x into: "I have dogs" — ben890
– ben890, Commented May 19, 2015 at 0:51
Or.... "It's the 1st road on your left, then take the 2nd road on the right, then the company you're after is called TRG1 it's about 100m up the road - if you're lazy - you can catch a bus for £2.50" ? — Jon Clements
– Jon Clements, Commented May 19, 2015 at 0:56
Check out other good answers here: stackoverflow.com/questions/12851791/… — tommy.carstensen
– tommy.carstensen, Commented Dec 19, 2019 at 22:07

Martin Konecny · Accepted Answer · 2016-12-22 17:10:46Z

38

Yes, you can use a regular expression for this:

import re
output = re.sub(r'\d+', '', '123hello 456world')
print output  # 'hello world'

edited Dec 22, 2016 at 17:10

answered May 19, 2015 at 0:51

Martin Konecny

60k20 gold badges144 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ben890 Over a year ago

This is perfect! Thanks Martin

Alex Huszagh Over a year ago

Can't go wrong with the regex solution, since it also translate very well to other instances (say he wants to remove letters next).

Kareem Khaleel Over a year ago

The best answer. Works like charm

Robᵩ · Accepted Answer · 2015-05-19 00:58:52Z

15

str.translate should be efficient.

In [7]: 'hello467'.translate(None, '0123456789')
Out[7]: 'hello'

To compare str.translate against re.sub:

In [13]: %%timeit r=re.compile(r'\d')
output = r.sub('', my_str)
   ....: 
100000 loops, best of 3: 5.46 µs per loop

In [16]: %%timeit pass
output = my_str.translate(None, '0123456789')
   ....: 
1000000 loops, best of 3: 713 ns per loop

edited May 19, 2015 at 0:58

answered May 19, 2015 at 0:53

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

5 Comments

Jon Clements Over a year ago

The problem is: str.translate is a bit difficult to make both 2.x/3.x compatible :(

Jon Clements Over a year ago

So you'd need my_str.translate({ord(ch): None for ch in '0123456789'}) in 3.x

Ross Over a year ago

I wonder how long r.sub() takes? Say, under conditions where you want to do this over multiple strings and you've pre-compiled the regex.

Robᵩ Over a year ago

@Ross - Judging from the code I put in my answer, 5.46µs.

Ross Over a year ago

@Rob - Ah right, I missed that the first line is the set up line. Looking at some best/worst cases translate seems to perform much better at worst case scenarios. Using 'python -m timeit' I came across the following in favour of translate; '123hello 456world' - x5.0 '1234567890987654321012345678909876543210' - x17.0 '5a$%&^@)9lhk45g08j%Gmj3g09jSDGjg0034k' - x9.0 'hello world im your boss' - x 1.8

Deacon · Accepted Answer · 2015-05-19 14:00:17Z

1

Here's a method using str.join(), str.isnumeric(), and a generator expression which will work in 3.x:

>>> my_str = '123Hello, World!4567'
>>> output = ''.join(c for c in my_str if not c.isnumeric())
>>> print(output)
Hello, World!
>>>

This will also work in 2.x, if you use a unicode string:

>>> my_str = u'123Hello, World!4567'
>>> output = ''.join(c for c in my_str if not c.isnumeric())
>>> print(output)
Hello, World!
>>>

Hmm. Throw in a paperclip and we'd have an episode of MacGyver.

Update

I know that this has been closed out as a duplicate, but here's a method that works for both Python 2 and Python 3:

>>> my_str = '123Hello, World!4567'
>>> output = ''.join(map(lambda c: '' if c in '0123456789' else c, my_str))
>>> print(output)
Hello, World!
>>>

edited May 19, 2015 at 14:00

answered May 19, 2015 at 1:20

Deacon

3,8332 gold badges35 silver badges54 bronze badges

Collectives™ on Stack Overflow

Strip Numbers From String in Python [duplicate]

3 Answers 3

3 Comments

5 Comments

Update

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

5 Comments

Update

Comments

Linked

Related