Is there an efficient way to strip out numbers from a string in python? Using nltk or base python?
Thanks, Ben
Yes, you can use a regular expression for this:
import re
output = re.sub(r'\d+', '', '123hello 456world')
print output # 'hello world'
str.translate should be efficient.
In [7]: 'hello467'.translate(None, '0123456789')
Out[7]: 'hello'
To compare str.translate against re.sub:
In [13]: %%timeit r=re.compile(r'\d')
output = r.sub('', my_str)
....:
100000 loops, best of 3: 5.46 µs per loop
In [16]: %%timeit pass
output = my_str.translate(None, '0123456789')
....:
1000000 loops, best of 3: 713 ns per loop
str.translate is a bit difficult to make both 2.x/3.x compatible :(my_str.translate({ord(ch): None for ch in '0123456789'}) in 3.x'123hello 456world' - x5.0 '1234567890987654321012345678909876543210' - x17.0 '5a$%&^@)9lhk45g08j%Gmj3g09jSDGjg0034k' - x9.0 'hello world im your boss' - x 1.8Here's a method using str.join(), str.isnumeric(), and a generator expression which will work in 3.x:
>>> my_str = '123Hello, World!4567'
>>> output = ''.join(c for c in my_str if not c.isnumeric())
>>> print(output)
Hello, World!
>>>
This will also work in 2.x, if you use a unicode string:
>>> my_str = u'123Hello, World!4567'
>>> output = ''.join(c for c in my_str if not c.isnumeric())
>>> print(output)
Hello, World!
>>>
Hmm. Throw in a paperclip and we'd have an episode of MacGyver.
I know that this has been closed out as a duplicate, but here's a method that works for both Python 2 and Python 3:
>>> my_str = '123Hello, World!4567'
>>> output = ''.join(map(lambda c: '' if c in '0123456789' else c, my_str))
>>> print(output)
Hello, World!
>>>
I have 3x as many dogs as 2 catsbe?