12

Is there an efficient way to strip out numbers from a string in python? Using nltk or base python?

Thanks, Ben

5
  • Can you provide an example of what you want to do? Commented May 19, 2015 at 0:50
  • if I have a string let's say for example: x = "I have 3 dogs" I'd want a way to turn x into: "I have dogs" Commented May 19, 2015 at 0:51
  • 2
    What would I have 3x as many dogs as 2 cats be? Commented May 19, 2015 at 0:53
  • 3
    Or.... "It's the 1st road on your left, then take the 2nd road on the right, then the company you're after is called TRG1 it's about 100m up the road - if you're lazy - you can catch a bus for £2.50" ? Commented May 19, 2015 at 0:56
  • Check out other good answers here: stackoverflow.com/questions/12851791/… Commented Dec 19, 2019 at 22:07

3 Answers 3

38

Yes, you can use a regular expression for this:

import re
output = re.sub(r'\d+', '', '123hello 456world')
print output  # 'hello world'
Sign up to request clarification or add additional context in comments.

3 Comments

This is perfect! Thanks Martin
Can't go wrong with the regex solution, since it also translate very well to other instances (say he wants to remove letters next).
The best answer. Works like charm
15

str.translate should be efficient.

In [7]: 'hello467'.translate(None, '0123456789')
Out[7]: 'hello'

To compare str.translate against re.sub:

In [13]: %%timeit r=re.compile(r'\d')
output = r.sub('', my_str)
   ....: 
100000 loops, best of 3: 5.46 µs per loop

In [16]: %%timeit pass
output = my_str.translate(None, '0123456789')
   ....: 
1000000 loops, best of 3: 713 ns per loop

5 Comments

The problem is: str.translate is a bit difficult to make both 2.x/3.x compatible :(
So you'd need my_str.translate({ord(ch): None for ch in '0123456789'}) in 3.x
I wonder how long r.sub() takes? Say, under conditions where you want to do this over multiple strings and you've pre-compiled the regex.
@Ross - Judging from the code I put in my answer, 5.46µs.
@Rob - Ah right, I missed that the first line is the set up line. Looking at some best/worst cases translate seems to perform much better at worst case scenarios. Using 'python -m timeit' I came across the following in favour of translate; '123hello 456world' - x5.0 '1234567890987654321012345678909876543210' - x17.0 '5a$%&^@)9lhk45g08j%Gmj3g09jSDGjg0034k' - x9.0 'hello world im your boss' - x 1.8
1

Here's a method using str.join(), str.isnumeric(), and a generator expression which will work in 3.x:

>>> my_str = '123Hello, World!4567'
>>> output = ''.join(c for c in my_str if not c.isnumeric())
>>> print(output)
Hello, World!
>>> 

This will also work in 2.x, if you use a unicode string:

>>> my_str = u'123Hello, World!4567'
>>> output = ''.join(c for c in my_str if not c.isnumeric())
>>> print(output)
Hello, World!
>>> 

Hmm. Throw in a paperclip and we'd have an episode of MacGyver.

Update

I know that this has been closed out as a duplicate, but here's a method that works for both Python 2 and Python 3:

>>> my_str = '123Hello, World!4567'
>>> output = ''.join(map(lambda c: '' if c in '0123456789' else c, my_str))
>>> print(output)
Hello, World!
>>>

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.