Tighten CPython parity for str format spec, %-format, and str() constructor by changjoon-park · Pull Request #7769 · RustPython/RustPython

changjoon-park · 2026-05-03T16:36:52Z

Summary

Eight CPython parity gaps in str formatting and construction, grouped into three themes.

Group 1 — `str()` constructor decode mode

CPython's unicode_new_impl (Objects/unicodeobject.c) enters decode mode when encoding OR errors is provided, with separate early-reject paths for str input and non-bytes input. RustPython only checked encoding, and the non-bytes path surfaced a generic "a bytes-like object is required" error.

>>> str(b'foo', errors='strict')
# CPython 3.14.4: 'foo'
# RustPython main: "b'foo'"  ❌

>>> str('hello', errors='strict')
# CPython 3.14.4: TypeError: decoding str is not supported
# RustPython main: TypeError: a bytes-like object is required, not 'str'  ❌

>>> str(123, errors='strict')
# CPython 3.14.4: TypeError: decoding to str: need a bytes-like object, int found
# RustPython main: TypeError: a bytes-like object is required, not 'int'  ❌

Group 2 — `str.format()` spec engine

CPython's format spec engine (Python/formatter_unicode.c) truncates to precision before applying width pad. RustPython applied precision after, so the pad got truncated off. Separately, '{}'.format() with insufficient args wrapped Rust's tuple-index error rather than emitting CPython's domain-specific message.

>>> '{0:3.2s}'.format('abc')
# CPython 3.14.4: 'ab '
# RustPython main: 'ab'  ❌

>>> '{}'.format()
# CPython 3.14.4: IndexError: Replacement index 0 out of range for positional args tuple
# RustPython main: IndexError: tuple index out of range  ❌

The IndexError wording change covers both auto-numbered and explicit-index paths.

Group 3 — `%`-format integer dispatch and wording

CPython's %-format path (Objects/unicodeobject.c::formatlong and adjacent) dispatches integer conversions through PyNumber_Index (the __index__ protocol), and uses "a real number" / "an int or a unicode character" wording. RustPython only tried __int__ for %d / %i / %u, never tried __index__ for the integer-conversion error path on %x / %o / %X / %c, and used divergent wording.

>>> class P:
...     def __index__(self): return 42

>>> '%x' % P()
# CPython 3.14.4: '2a'
# RustPython main: TypeError: %x format: an integer is required, not P  ❌

>>> '%d' % P()
# CPython 3.14.4: '42'
# RustPython main: TypeError: %d format: a number is required, not P  ❌

>>> '%c' % 'foo'
# CPython 3.14.4: TypeError: %c requires an int or a unicode character, not str
# RustPython main: TypeError: %c requires int or char  ❌

%d / %i / %u now prefer __index__ over __int__ (matching CPython), and the "a number" wording becomes "a real number" for the complex/float-as-decimal mismatch. Applied to both bytes and str %-format paths.

Verification (CPython 3.14.4)

Probe set	Cases	Match
str format spec width × precision × content	252	252/252
%-format integer dispatch ({fmt} × {int, bool, P})	42	42/42
str() constructor combinations + edge cases	29	29/29
`'{}'.format()` IndexError variations	6	6/6
Non-ASCII format (`CharLen for String`)	4	4/4
% format with %s / %r / %a	33	33/33
% format with floats	49	49/49
% format with mappings	4	4/4
`str.format()` keyword + attr access	5	5/5
`format()` builtin	51	51/51
Total	475	475/475

cargo run --release -- -m test test_str test_bytes test_format test_codecs test_io test_unicode_identifiers — 1,429 tests pass, 0 regressions. All 188 extra_tests/snippets/*.py pass under the CI feature set. cargo test rustpython-common cformat — 11 unit tests pass (including non-ASCII).

Tests unmasked

Lib/test/test_str.py::test_constructor_keyword_args
Lib/test/test_str.py::test_constructor_defaults
Lib/test/test_descr.py::ClassPropertiesAndMethods.test_keywords — basic-type constructor parity aggregator; str(object=500) and str(object=b'abc', errors='strict') paths now pass.

Out of scope (markers retained, with updated reasons)

test_format — '{0:08s}'.format('result') treats 0 as fill+left-align in CPython but as numeric zero-pad in RustPython's spec parser. Separate format-spec parser concern.
test_formatting — %c error message expects fully-qualified module.qualname; RustPython returns the bare class name. Broader tp_name qualification concern.

coderabbitai · 2026-05-03T16:37:05Z

📝 Walkthrough

Walkthrough

This PR makes three orthogonal changes: string formatting now truncates to precision before alignment; the str() constructor’s fast-path and decode semantics are aligned with CPython (encoding/errors rules and errors behavior); and format specifiers plus positional field errors were extended to accept __index__ and produce clearer index-out-of-range messages.

Changes

String Formatting Precision Handling

Layer / File(s)	Summary
Precision Logic `crates/common/src/format.rs`	`FormatSpec::format_string` now truncates the input string to `precision` characters before calling `format_sign_and_align` (previously truncation happened after alignment).
Utility Implementation `crates/common/src/format.rs`	Added `impl CharLen for String` that returns `self.chars().count()` so alignment computes character length correctly.

String Constructor Behavior

Layer / File(s)	Summary
Fast-Path Condition `crates/vm/src/builtins/str.rs`	`PyStr::slot_new` fast-path (returning the exact `str` object) now requires both `encoding` and `errors` to be missing rather than only `encoding`.
Decode Logic / Allocation `crates/vm/src/builtins/str.rs`	`py_new` now enters decode mode when either `encoding` or `errors` is present; rejects `str` inputs in decode mode, rejects non-bytes-like inputs with a bytes-like `TypeError`, and defaults `encoding` to `"utf-8"` when only `errors` is provided. If neither is present, falls back to `input.str(vm)` as before.

Format Conversion CPython Parity

Layer / File(s)	Summary
Integer Conversion (`%d`/`%i`/`%u`) — bytes `crates/vm/src/cformat.rs`	`spec_format_bytes` accepts objects implementing `__index__` via `try_index_opt` in addition to `PyInt`, matching CPython behavior.
Integer Conversion (`%x`/`%o`/`%X`) — bytes `crates/vm/src/cformat.rs`	`spec_format_bytes` extended to accept `__index__` objects for non-decimal integer formats.
Character `%c` — bytes `crates/vm/src/cformat.rs`	`%c` accepts `PyInt` or any `__index__` integer (range-checked 0–255) and improves error messages when types mismatch.
Integer Conversion (`%d`/`%i`/`%u`) — strings `crates/vm/src/cformat.rs`	`spec_format_string` accepts `__index__` objects in addition to `PyInt` for decimal conversions.
Integer Conversion (`%x`/`%o`/`%X`) — strings `crates/vm/src/cformat.rs`	`spec_format_string` accepts `__index__` objects for hexadecimal/octal conversions.
Character `%c` — strings `crates/vm/src/cformat.rs`	`%c` now prefers a single-character `str` first; otherwise accepts `PyInt` or `__index__` integers convertible to Unicode code points (0x0..=0x10FFFF); error messages updated.
Field Resolution Errors `crates/vm/src/format.rs`	`format()` now raises index-specific `new_index_error` messages for `FieldType::Auto` and `FieldType::Index` when positional arguments are out of range, e.g., `"Replacement index {idx} out of range for positional args tuple"`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Preserve str subclass type returned by __str__ / __repr__ #7701: Overlaps changes to the str constructor fast-path and encoding/errors handling in crates/vm/src/builtins/str.rs.
Reject format spec with width above i32::MAX #7707: Related edits to crates/common/src/format.rs around FormatSpec behavior and parsing.

Suggested reviewers

ShaharNaveh
youknowone

Poem

🐰
I nibbled bytes, I trimmed the line,
Precision first, then pad just fine.
Decode when errors or encodings show,
Index-aware formats now steal the show.
Hoppity parity—rabbit says, "Bravo!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly summarizes the main changes: CPython parity improvements for str formatting (format spec, %-format) and str() constructor behavior.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-03T16:37:45Z

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_str.py (TODO: 10)
[x] test: cpython/Lib/test/test_fstring.py (TODO: 19)
[x] test: cpython/Lib/test/test_string_literals.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on str)

[ ] test: cpython/Lib/test/test_descr.py (TODO: 36)
[ ] test: cpython/Lib/test/test_descrtut.py (TODO: 3)

dependencies:

dependent tests: (no tests depend on descr)

Legend:

[+] path exists in CPython
[x] up-to-date, [ ] outdated

…ructor Five related CPython parity gaps in `str` formatting and construction: 1. **`str(bytes, errors=...)` triggers decode mode.** Previously, only `encoding=` triggered decode; passing only `errors=` fell back to `repr()`. CPython's behavior: presence of `encoding` OR `errors` triggers decode mode (default UTF-8 when only `errors` is given). 2. **`'{...}'.format() IndexError wording.** Generic Rust "tuple index out of range" replaced with CPython's "Replacement index N out of range for positional args tuple". 3. **`{0:3.2s}.format('abc')` → 'ab '.** String format spec applied precision after width padding; CPython truncates BEFORE padding. Reorder the operations. 4. **`%x` / `%o` / `%X` / `%c` accept `__index__` objects.** Previously only `PyInt` downcast was attempted. Mirror CPython's PyNumber_Index dispatch via `try_index_opt`. 5. **`%d` / `%u` / `%i` error wording.** "a number is required" → "a real number is required" (matches CPython). Also adds `not <type>` suffix to `%c` error messages so the type is visible in TypeError text (matches CPython structure even without fully-qualified names). Verified byte-identical with CPython 3.14.4 across 25+ probes covering the format/spec/constructor combinations. Unmasks `test_str.test_constructor_keyword_args` and `test_str.test_constructor_defaults`. test_str/test_bytes/test_format/ test_codecs/test_io/test_unicode_identifiers — 1,429 tests pass, 0 regressions. All 188 `extra_tests/snippets/*.py` pass under the CI feature set. `test_str.test_format` and `test_str.test_formatting` markers retained: `test_format` still trips on `'{0:08s}'.format('result')` (numeric zero-pad treated as fill+left-align by CPython for str type — separate format-spec parser concern). `test_formatting` still trips on `%c` error message expecting fully qualified `module.qualname` (RP returns bare class name — separate broader concern).

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/vm/src/cformat.rs (1)
74-85: ⚠️ Potential issue | 🟠 Major

Surface __int__ protocol failures instead of the generic error.

When __int__ returns a non-int value, Lines 74-80 and 192-198 fall through to the generic %d format: a real number is required error. CPython 3.14 instead raises TypeError: int returned non-int (type <T>) when __int__ violates the protocol. Raise the specific protocol error to match CPython's behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/cformat.rs` around lines 74 - 85, When calling the __int__
protocol (via vm.get_method(..., identifier!(vm, __int__)) and
method?.call(...)) detect when the returned value is not a PyInt and raise a
protocol-specific TypeError instead of falling through to the generic "%d
format" error; in the block around vm.get_method / identifier!(vm, __int__) (and
the analogous block at lines ~192-198) return vm.new_type_error with a message
like "int returned non-int (type <T>)" using the returned object's class name
rather than proceeding to check_int_to_str_digits or spec.format_number so the
violation is surfaced per CPython 3.14 behavior.

🧹 Nitpick comments (1)

crates/vm/src/cformat.rs (1)
67-97: 🏗️ Heavy lift

Consider extracting the new coercion paths into shared helpers.

The __index__ / __int__ dispatch and %c coercion logic is now duplicated in both spec_format_bytes() and spec_format_string(). These are parity-sensitive branches, so keeping one shared implementation would reduce the chance of the two paths drifting again on the next wording or precedence tweak.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

Also applies to: 120-145, 185-215, 229-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/cformat.rs` around lines 67 - 97, Extract the duplicated
`__index__`/`__int__` coercion and `%c` handling into shared helper(s) and call
them from spec_format_bytes() and spec_format_string(): create one helper (e.g.,
coerce_to_index_or_int(...)) that encapsulates the try_index_opt(vm) branch, the
vm.get_method(... identifier!(vm, __int__)) call, downcast to PyInt,
check_int_to_str_digits, and returning the BigInt (or an error), and another
small helper for the `%c`-specific validation if needed; then replace the
duplicated blocks in spec_format_bytes() and spec_format_string() with calls to
these helpers and use the returned BigInt with spec.format_number(...) (or the
char logic) so both functions share the same parity-sensitive dispatch logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/vm/src/cformat.rs`:
- Around line 74-85: When calling the __int__ protocol (via vm.get_method(...,
identifier!(vm, __int__)) and method?.call(...)) detect when the returned value
is not a PyInt and raise a protocol-specific TypeError instead of falling
through to the generic "%d format" error; in the block around vm.get_method /
identifier!(vm, __int__) (and the analogous block at lines ~192-198) return
vm.new_type_error with a message like "int returned non-int (type <T>)" using
the returned object's class name rather than proceeding to
check_int_to_str_digits or spec.format_number so the violation is surfaced per
CPython 3.14 behavior.

---

Nitpick comments:
In `@crates/vm/src/cformat.rs`:
- Around line 67-97: Extract the duplicated `__index__`/`__int__` coercion and
`%c` handling into shared helper(s) and call them from spec_format_bytes() and
spec_format_string(): create one helper (e.g., coerce_to_index_or_int(...)) that
encapsulates the try_index_opt(vm) branch, the vm.get_method(... identifier!(vm,
__int__)) call, downcast to PyInt, check_int_to_str_digits, and returning the
BigInt (or an error), and another small helper for the `%c`-specific validation
if needed; then replace the duplicated blocks in spec_format_bytes() and
spec_format_string() with calls to these helpers and use the returned BigInt
with spec.format_number(...) (or the char logic) so both functions share the
same parity-sensitive dispatch logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 28b0e721-5781-41f4-b9f1-77fca72a50b0

📥 Commits

Reviewing files that changed from the base of the PR and between 62a1279 and e4c9aaf.

⛔ Files ignored due to path filters (2)

Lib/test/test_descr.py is excluded by !Lib/**
Lib/test/test_str.py is excluded by !Lib/**

📒 Files selected for processing (4)

crates/common/src/format.rs
crates/vm/src/builtins/str.rs
crates/vm/src/cformat.rs
crates/vm/src/format.rs

✅ Files skipped from review due to trivial changes (1)

crates/vm/src/format.rs

🚧 Files skipped from review as they are similar to previous changes (1)

crates/vm/src/builtins/str.rs

changjoon-park force-pushed the fix-str-format-constructor-cformat branch from 62a1279 to e4c9aaf Compare May 3, 2026 17:58

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

youknowone approved these changes May 4, 2026

View reviewed changes

youknowone merged commit 83002d7 into RustPython:main May 4, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten CPython parity for str format spec, %-format, and str() constructor#7769

Tighten CPython parity for str format spec, %-format, and str() constructor#7769
youknowone merged 1 commit into
RustPython:mainfrom
changjoon-park:fix-str-format-constructor-cformat

changjoon-park commented May 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

changjoon-park commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Group 1 — str() constructor decode mode

Group 2 — str.format() spec engine

Group 3 — %-format integer dispatch and wording

Verification (CPython 3.14.4)

Tests unmasked

Out of scope (markers retained, with updated reasons)

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Library Dependencies

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changjoon-park commented May 3, 2026 •

edited

Loading

Group 1 — `str()` constructor decode mode

Group 2 — `str.format()` spec engine

Group 3 — `%`-format integer dispatch and wording

coderabbitai Bot commented May 3, 2026 •

edited

Loading

github-actions Bot commented May 3, 2026 •

edited

Loading