Mercurial > p > roundup > code
diff 2to3-done.txt @ 5244:09235a01250a
Some porting advice from Joseph Myers.
| author | Eric S. Raymond <esr@thyrsus.com> |
|---|---|
| date | Thu, 24 Aug 2017 14:41:00 -0400 |
| parents | a86b0c02940d |
| children | bc250b4fb4c5 |
line wrap: on
line diff
--- a/2to3-done.txt Thu Aug 24 11:26:46 2017 -0400 +++ b/2to3-done.txt Thu Aug 24 14:41:00 2017 -0400 @@ -150,3 +150,61 @@ ./roundup/cgi/__init__.py ./roundup/cgi/apache.py ./roundup/cgi/client.py + +Joseph S. Myers notes: +>The key difficulty is undoubtedly dealing with the changes to string types +>- combined with how the extensibility of Roundup means people will have +>Python code in their instances (detectors, etc.), both directly and +>embedded in HTML - which passes strings to Roundup interfaces and gets +>strings from Roundup interfaces. +> +>Roundup makes heavy use of string objects that really are text strings - +>logically, sequences of Unicode code points. Right now, those strings, +>with Python 2, are str objects, encoded in UTF-8. This means that +>people's Python code in their instances, running under Python 2, will +>expect str objects encoded in UTF-8 (and if their code is e.g. generating +>HTML text encoded in UTF-8 to be sent to the user, it never actually has +>to deal with the encoding explicitly, just passes the text through). +>(The experimental Jinja2 templating engine then explicitly converts those +>UTF-8 encoded str objects to unicode objects because that's what Jinja2 +>expects to deal with.) +> +>It's quite plausible people's code in their instances will work fine with +>Python 3 if it gets str objects for both Python 2 and Python 3 (UTF-8 +>encoded str for Python 2, ordinary Unicode string objects for Python 3). +>It's more likely to break if it gets Python 2 unicode objects, although +>using such objects in Python 2 seems to be how a lot of people do their +>porting to Python 3. And certainly if when an instance is running with +>Python 3, it gets an object that's not a native sequence of Unicode code +>points, but has each UTF-8 byte as a separate element of the str object, +>things will break. +> +>(I have an instance that uses Unicode collation via PyICU on data from +>Roundup, for example. That works fine with UTF-8 str objects in Python 2, +>would work fine with Python 2 unicode objects though I don't use those, +>works fine with Python 3 str objects when used in their native way - the +>same code has a large part also used outside of Roundup that works with +>both Python 2 and Python 3. Actually, I'd like to have a way to make +>Roundup's built-in sorting of database objects use Unicode collation, or +>otherwise have a way of computing a sort key that isn't simply naming a +>particular property as the sort key, but that's another matter.) +> +>But Roundup *also* has strings that are sequences of bytes - String() +>database fields, which can be both. Many are data displayed directly on +>web pages and edited there by the user - those are ordinary strings (UTF-8 +>at present). But FileClass objects have a String() content property which +>is arbitrary binary data such as an attached file - which logically should +>appear to the user as a bytes object in Python 3. Except that some +>FileClass objects use that data to store text (e.g. the msg class in the +>classic scheme). So you definitely need a Bytes() alternative to String() +>fields, for binary data, and may or may not also need separate text and +>binary variants of FileClass. +> +>I've found that for text-heavy code, always using str objects for text and +>having them be normal Unicode strings in Python 3 but UTF-8-encoded in +>Python 2 works well with the vast bulk of code being encoding-agnostic and +>just passing the strings around. Obviously things are different for the +>sort of code that mixes text and binary data - that is, the sort of thing +>you describe as systems programs in your porting HOWTO. I don't think +>Roundup really is such a systems program, except in limited areas such as +>dealing with attached files.
