Skip to content

Commit c2141a7

Browse files
Apply titlecase mapping in str.title() for uppercase digraphs (RustPython#7748)
The uppercase/titlecase branch of PyStr::title() pushed characters unchanged when starting a new word, which left Latin Extended-B digraphs (U+01F1 'DZ', U+01C4 'DŽ', etc.) in their uppercase form instead of mapping them to their distinct titlecase counterparts (U+01F2 'Dz', U+01C5 'Dž'). For ASCII letters and characters where to_titlecase is identity this had no effect, hiding the bug for the common case. Mirror the lowercase branch — which already calls to_titlecase() when starting a new word — so both branches symmetrically apply the titlecase mapping. char::to_titlecase is identity for already- titlecase and ASCII-uppercase characters, so existing cases stay correct. Also unmasks test_unicodedata.UnicodeMiscTest.test_bug_4971, which asserts exactly this behavior (`'DŽ'.title() == 'Dž'` etc.) and was marked expectedFailure with reason `+ Dž`. Closes RustPython#7527 (the only example from that issue still failing on 3.14.4; the other four examples already pass on current main).
1 parent dd1cbac commit c2141a7

2 files changed

Lines changed: 5 additions & 2 deletions

File tree

Lib/test/test_unicodedata.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -359,7 +359,6 @@ def test_bug_5828(self):
359359
[0]
360360
)
361361

362-
@unittest.expectedFailure # TODO: RUSTPYTHON; + Dž
363362
def test_bug_4971(self):
364363
# LETTER DZ WITH CARON: DZ, Dz, dz
365364
self.assertEqual("\u01c4".title(), "\u01c5")

crates/vm/src/builtins/str.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1053,7 +1053,7 @@ impl PyStr {
10531053
if previous_is_cased {
10541054
title.extend(c.to_lowercase());
10551055
} else {
1056-
title.push_char(c);
1056+
title.extend(c.to_titlecase());
10571057
}
10581058
previous_is_cased = true;
10591059
} else {
@@ -2661,6 +2661,10 @@ mod tests {
26612661
("Greek Ωppercases ...", "greek ωppercases ..."),
26622662
// spell-checker:disable-next-line
26632663
("Greek ῼitlecases ...", "greek ῳitlecases ..."),
2664+
// Latin Extended-B digraphs: uppercase forms map to titlecase forms
2665+
// (e.g. U+01F1 'DZ' -> U+01F2 'Dz', U+01C4 'DŽ' -> U+01C5 'Dž').
2666+
("\u{01F2}", "\u{01F1}"),
2667+
("\u{01C5}", "\u{01C4}"),
26642668
];
26652669
for (title, input) in tests {
26662670
assert_eq!(PyStr::from(input).title().as_str(), Ok(title));

0 commit comments

Comments
 (0)