Mercurial > p > roundup > code
diff roundup/backends/indexer_rdbms.py @ 5416:56c9bcdea47f
Python 3 preparation: unicode.
This patch introduces roundup/anypy/strings.py, which has a comment
explaining the string representations generally used and common
functions to handle the required conversions. Places in the code that
explicitly reference the "unicode" type / built-in function are
generally changed to use the new functions (or, in a few places where
those new functions don't seem to fit well, other approaches such as
references to type(u'') or use of the codecs module). This patch does
not generally attempt to address text conversions in any places not
currently referencing the "unicode" type (although
scripts/import_sf.py is made to use binary I/O in places as fixing the
"unicode" reference didn't seem coherent otherwise).
| author | Joseph Myers <jsm@polyomino.org.uk> |
|---|---|
| date | Wed, 25 Jul 2018 09:05:58 +0000 |
| parents | a391a071d045 |
| children | 8bda74ee7070 |
line wrap: on
line diff
--- a/roundup/backends/indexer_rdbms.py Wed Jul 25 00:40:26 2018 +0000 +++ b/roundup/backends/indexer_rdbms.py Wed Jul 25 09:05:58 2018 +0000 @@ -5,6 +5,7 @@ import re from roundup.backends.indexer_common import Indexer as IndexerBase +from roundup.anypy.strings import us2u, u2s class Indexer(IndexerBase): def __init__(self, db): @@ -61,10 +62,9 @@ self.db.cursor.execute(sql, (id, )) # ok, find all the unique words in the text - if not isinstance(text, unicode): - text = unicode(text, "utf-8", "replace") + text = us2u(text, "replace") text = text.upper() - wordlist = [w.encode("utf-8") + wordlist = [u2s(w) for w in re.findall(r'(?u)\b\w{%d,%d}\b' % (self.minlength, self.maxlength), text)] words = set()
