Mercurial > p > roundup > code
diff roundup/backends/indexer_dbm.py @ 5963:4c7662c86a36
fixed the dbm indexer test for unicode under python2.
Replaced str(text),upper() with text.upper(). The text variable is
already a string or unicode. Also changed the final line in the method
from:
re.findall(pat,text)
to
re.findall(pat,text,re.UNICODE).
Otherwise it was turning u'Spr\xfcnge' into a wordlist of two "words"
['Spr', 'cnge'] or some such. So those two "words" were in the index
and didn't match the search for u'Spr\xfcnge'.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Wed, 30 Oct 2019 17:48:48 -0400 |
| parents | e2baa4e6ed6d |
| children | 8e4c5db44fde |
line wrap: on
line diff
--- a/roundup/backends/indexer_dbm.py Tue Oct 29 22:13:49 2019 -0400 +++ b/roundup/backends/indexer_dbm.py Wed Oct 30 17:48:48 2019 -0400 @@ -132,11 +132,11 @@ """Split text/plain string into a list of words """ # case insensitive - text = str(text).upper() + text = text.upper() # Split the raw text return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength), - text) + text, re.UNICODE) # we override this to ignore too short and too long words # and also to fix a bug - the (fail) case.
