diff roundup/backends/indexer_dbm.py @ 5963:4c7662c86a36

fixed the dbm indexer test for unicode under python2. Replaced str(text),upper() with text.upper(). The text variable is already a string or unicode. Also changed the final line in the method from: re.findall(pat,text) to re.findall(pat,text,re.UNICODE). Otherwise it was turning u'Spr\xfcnge' into a wordlist of two "words" ['Spr', 'cnge'] or some such. So those two "words" were in the index and didn't match the search for u'Spr\xfcnge'.
author John Rouillard <rouilj@ieee.org>
date Wed, 30 Oct 2019 17:48:48 -0400
parents e2baa4e6ed6d
children 8e4c5db44fde
line wrap: on
line diff
--- a/roundup/backends/indexer_dbm.py	Tue Oct 29 22:13:49 2019 -0400
+++ b/roundup/backends/indexer_dbm.py	Wed Oct 30 17:48:48 2019 -0400
@@ -132,11 +132,11 @@
         """Split text/plain string into a list of words
         """
         # case insensitive
-        text = str(text).upper()
+        text = text.upper()
 
         # Split the raw text
         return re.findall(r'\b\w{%d,%d}\b' % (self.minlength, self.maxlength),
-                          text)
+                          text, re.UNICODE)
 
     # we override this to ignore too short and too long words
     # and also to fix a bug - the (fail) case.

Roundup Issue Tracker: http://roundup-tracker.org/