Commit c2141a7
authored
Apply titlecase mapping in str.title() for uppercase digraphs (RustPython#7748)
The uppercase/titlecase branch of PyStr::title() pushed characters
unchanged when starting a new word, which left Latin Extended-B
digraphs (U+01F1 'DZ', U+01C4 'DŽ', etc.) in their uppercase form
instead of mapping them to their distinct titlecase counterparts
(U+01F2 'Dz', U+01C5 'Dž'). For ASCII letters and characters where
to_titlecase is identity this had no effect, hiding the bug for the
common case.
Mirror the lowercase branch — which already calls to_titlecase()
when starting a new word — so both branches symmetrically apply
the titlecase mapping. char::to_titlecase is identity for already-
titlecase and ASCII-uppercase characters, so existing cases stay
correct.
Also unmasks test_unicodedata.UnicodeMiscTest.test_bug_4971, which
asserts exactly this behavior (`'DŽ'.title() == 'Dž'` etc.)
and was marked expectedFailure with reason `+ Dž`.
Closes RustPython#7527 (the only example from that issue still failing on
3.14.4; the other four examples already pass on current main).1 parent dd1cbac commit c2141a7
2 files changed
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
359 | 359 | | |
360 | 360 | | |
361 | 361 | | |
362 | | - | |
363 | 362 | | |
364 | 363 | | |
365 | 364 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1053 | 1053 | | |
1054 | 1054 | | |
1055 | 1055 | | |
1056 | | - | |
| 1056 | + | |
1057 | 1057 | | |
1058 | 1058 | | |
1059 | 1059 | | |
| |||
2661 | 2661 | | |
2662 | 2662 | | |
2663 | 2663 | | |
| 2664 | + | |
| 2665 | + | |
| 2666 | + | |
| 2667 | + | |
2664 | 2668 | | |
2665 | 2669 | | |
2666 | 2670 | | |
| |||
0 commit comments