Skip to content

Commit da04480

Browse files
Fix unicodedata; unmask isprintable() test
Python bundles an old version of Unicode for compatibility. RustPython tries to mimic supporting that old version by checking the version of individual chars. This is a problem for a few reasons. The first is that the age check adds an additional hit per each char lookup in Unicode data. The check is outdated because the `unic-ucd-age` crate is several versions behind the current Unicode version. The check rejects valid chars because of the version differences. The check is subtly wrong because it returns properties for Unicode 16.0.0 for Unicode 3.2.0 while checking against a Unicode 10.0.0 database. Unfortunately, there isn't a crate that can help us here. `icu4x` targets modern Unicode versions. Writing a data provider for `icu4x` for Unicode 3.2.0 is a lot of work for a legacy path. I opted to parse the Unicode 3.2.0 data myself but to skip `icu4x` (mostly) to instead write small lookup tables. As of this commit, Unicode names is still wrong for 3.2.0. Luckily, the crate RustPython uses is fast and robust for modern Unicode.
1 parent dec9942 commit da04480

18 files changed

Lines changed: 51535 additions & 193 deletions

Cargo.lock

Lines changed: 0 additions & 49 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -318,8 +318,6 @@ icu_locale = "2"
318318
icu_properties = "2"
319319
icu_normalizer = "2"
320320
uuid = "1.23.1"
321-
ucd = "0.1.1"
322-
unic-ucd-age = "0.9.0"
323321
unicode_names2 = "2.0.0"
324322
widestring = "1.2.0"
325323
windows-sys = "0.61.2"

Lib/test/test_str.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -853,7 +853,6 @@ def test_isprintable(self):
853853
self.assertTrue('\U0001F46F'.isprintable())
854854
self.assertFalse('\U000E0020'.isprintable())
855855

856-
@unittest.expectedFailure # TODO: RUSTPYTHON
857856
@support.requires_resource('cpu')
858857
def test_isprintable_invariant(self):
859858
for codepoint in range(sys.maxunicode + 1):

crates/stdlib/Cargo.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,6 @@ unicode_names2 = { workspace = true }
8181
# update version all at the same time
8282
icu_properties = { workspace = true }
8383
icu_normalizer = { workspace = true }
84-
unic-ucd-age = { workspace = true }
85-
ucd = { workspace = true }
8684

8785
# compression
8886
adler32 = { workspace = true }
@@ -143,6 +141,9 @@ system-configuration = { workspace = true }
143141
insta = { workspace = true }
144142
rustpython-pylib = { workspace = true, features = [ "freeze-stdlib" ] }
145143

144+
[build-dependencies]
145+
icu_normalizer = { workspace = true }
146+
icu_properties = { workspace = true }
146147

147148
[lints]
148149
workspace = true

0 commit comments

Comments
 (0)