bpo-46659: Update the test on the mbcs codec alias#31168
bpo-46659: Update the test on the mbcs codec alias#31168vstinner merged 2 commits intopython:mainfrom vstinner:mbcs_alias
Conversation
encodings registers the _alias_mbcs() codec search function before the search_function() codec search function. Previously, the _alias_mbcs() was never used. Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code page, not a fake ANSI code page number. Remove the test_site.test_aliasing_mbcs() test: the alias is now implemented in the encodings module, no longer in the site module.
| # The encodings module create a "mbcs" alias to the ANSI code page | ||
| codec = codecs.lookup(encoding) | ||
| self.assertEqual(codec.name, "mbcs") |
There was a problem hiding this comment.
This was never true before. With 1252 as my ANSI code page, I checked codecs.lookup('cp1252') in 2.7, 3.4, 3.5, 3.6, 3.9, and 3.10, and none of them return the "mbcs" encoding. It's not equivalent, and not supposed to be. The implementation of "cp1252" should be cross-platform, regardless of whether we're on a Windows system with 1252 as the ANSI code page, as opposed to a Windows system with some other ANSI code page, or a Linux or macOS system.
The differences are that "mbcs" maps every byte, whereas our code-page encodings do not map undefined bytes, and the "replace" handler of "mbcs" uses a best-fit mapping (e.g. "α" -> "a") when encoding text, instead of mapping all undefined characters to "?".
There was a problem hiding this comment.
This issue is worse than what I expected, I created https://bugs.python.org/issue46668 to discuss it.
| # On Windows, the encoding name must be the ANSI code page | ||
| encoding = locale.getpreferredencoding(False) | ||
| self.assertTrue(encoding.startswith('cp'), encoding) |
There was a problem hiding this comment.
This will fail if PYTHONUTF8 is set in the environment, because it overrides getpreferredencoding(False) and _get_locale_encoding().
Move the test on the "mbcs" codec alias from test_site to
test_codecs. Moreover, the test now uses
locale.getpreferredencoding(False) rather than
locale.getdefaultlocale() to get the ANSI code page.
https://bugs.python.org/issue46659