Skip to content

Add more unicode functions to c-api#8044

Merged
youknowone merged 2 commits into
RustPython:mainfrom
bschoenmaeckers:c-api-more-str
Jun 10, 2026
Merged

Add more unicode functions to c-api#8044
youknowone merged 2 commits into
RustPython:mainfrom
bschoenmaeckers:c-api-more-str

Conversation

@bschoenmaeckers

@bschoenmaeckers bschoenmaeckers commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Added support for additional Python C-API Unicode entrypoints to improve UTF-8 handling, filesystem-encoding conversion, and encoded-object conversion for C extensions
  • Tests

    • Added tests for Unicode operations: string interning, UTF-8 encoding via the FFI, decoding from encoded objects, and filesystem-default encoding round-trips (including non-UTF-8 filename cases)

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8ba77755-8543-4cae-89d2-bc2800ebb7d5

📥 Commits

Reviewing files that changed from the base of the PR and between 66105ff and c832e32.

📒 Files selected for processing (1)
  • crates/capi/src/unicodeobject.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/capi/src/unicodeobject.rs

📝 Walkthrough

Walkthrough

This PR extends the RustPython C-API layer by adding four new Unicode FFI functions that expose UTF-8 encoding, filesystem-default codec operations, and bytes-to-string decoding. Import statements are consolidated, and disabled test cases cover the new functionality for interning, UTF-8 wrapping, bytes decoding, and filesystem codec round-trips.

Changes

Unicode C-API FFI Extensions

Layer / File(s) Summary
Unicode FFI functions and imports
crates/capi/src/unicodeobject.rs
Four new public C-API functions added: PyUnicode_AsUTF8String encodes to UTF-8, PyUnicode_DecodeFSDefaultAndSize and PyUnicode_EncodeFSDefault handle filesystem-default codec operations, and PyUnicode_FromEncodedObject decodes from bytes-like objects. All functions handle null inputs, validate parameters, downcast to PyStr where needed, and delegate encoding/decoding to the codec registry. Imports are consolidated at the file top.
Unicode encoding and decoding tests
crates/capi/src/unicodeobject.rs
Test suite (currently disabled) validates string interning, UTF-8 encoding via the wrapper, decoding from encoded bytes objects, and round-trip filesystem-default encoding/decoding on Unix for non-UTF-8 and UTF-8 filenames.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • RustPython/RustPython#7904: Modifies the same Unicode FFI module to add PyUnicode_AsEncodedString and other C-API encoding functions with similar null-handling and codec-registry delegation patterns.

Suggested reviewers

  • youknowone

Poem

🐰 A rabbit hops through Unicode streams,
Encoding strings in filesystem dreams—
UTF-8 paths and bytes set free,
Four new functions, tested with glee! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add more unicode functions to c-api' accurately describes the main change: adding new Unicode-related FFI entrypoints (PyUnicode_AsUTF8String, PyUnicode_DecodeFSDefaultAndSize, PyUnicode_EncodeFSDefault, PyUnicode_FromEncodedObject) to the C-API module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: f513962a-ed5b-43d8-9cba-68f7157fb11d

📥 Commits

Reviewing files that changed from the base of the PR and between 51c97b9 and 66105ff.

📒 Files selected for processing (1)
  • crates/capi/src/unicodeobject.rs

Comment thread crates/capi/src/unicodeobject.rs Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@youknowone youknowone merged commit ee4e720 into RustPython:main Jun 10, 2026
26 checks passed
@bschoenmaeckers bschoenmaeckers deleted the c-api-more-str branch June 10, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants