Skip to content

Bytecode parity#7536

Merged
youknowone merged 4 commits intoRustPython:mainfrom
youknowone:bytecode-parity
Mar 30, 2026
Merged

Bytecode parity#7536
youknowone merged 4 commits intoRustPython:mainfrom
youknowone:bytecode-parity

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented Mar 30, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed assertion message compilation behavior
    • Improved error propagation in constant folding
    • Enhanced exception handler control flow
  • Refactor

    • Optimized f-string compilation with merged literal handling
    • Improved chained comparison bytecode generation
    • Enhanced stack depth and control flow analysis
  • Tests

    • Added bytecode structure and control flow validation tests

… genexpr StopIteration wrapper

- split_blocks_at_jumps: split blocks at branch points so each has one exit
- jump_threading: thread jumps through single-jump blocks (flowgraph.c jump_thread)
- Backward conditional jump normalization: invert and create NOT_TAKEN+JUMP block
- Follow empty blocks in jump-to-return optimization (next_nonempty_block)
- Add PEP 479 StopIteration handler to compile_comprehension for generators
- Add Slice variant to ConstantData and BorrowedConstant
- Fold constant slices (x[:3], x[1:4]) into LOAD_CONST(slice(...))
- Marshal serialization/deserialization for Slice type
- Box::leak in borrow_obj_constant for PySlice roundtrip
Prepare infrastructure for exit block duplication optimization.
Currently disabled pending stackdepth integration.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

This PR refactors bytecode compilation in the code generator, optimizing assertion message handling, rewriting f-string compilation to merge adjacent literals, improving boolean jump code generation for chained comparisons, strengthening constant folding with error propagation, guarding super-call optimizations for extended-call syntax, and revising IR-level control flow normalization for line-number resolution and small-block inlining.

Changes

Cohort / File(s) Summary
Assertion and Call Compilation
crates/codegen/src/compile.rs
Optimized assertion message compilation to emit PushNull and argument count selectively based on message presence; added call_uses_ex_call guard to disable super-call optimization when arguments resemble CALL_FUNCTION_EX usage.
Boolean Jump and Comparison Compilation
crates/codegen/src/compile.rs
Enhanced compile_jump_if with source-range tracking; introduced compile_jump_if_compare for bytecode generation of chained comparisons using conditional jumps; refactored match arms to ensure consistent CompileResult returns and source-range restoration.
Slice Constant Folding
crates/codegen/src/compile.rs
Refactored try_fold_constant_slice to return CompileResult<bool>, propagating integer-conversion errors and updating call sites to use error operator (?).
F-String Compilation Rewrite
crates/codegen/src/compile.rs
Replaced element-count/build-string strategy with "pending literal merging" where adjacent literal segments accumulate into a single Wtf8Buf, emitted when interpolation occurs or completion; added compile_fstring_literal_value, compile_fstring_part_literal_value, compile_fstring_part_into, compile_fstring_elements_into, and finish_fstring helpers to implement new control flow and preserve surrogate behavior.
Constant Folding Refactor and Tests
crates/codegen/src/compile.rs
Added try_fold_constant_expr helper for expression-level constant folding; expanded test coverage for scope-exit line numbers, CALL_FUNCTION_EX attribute loading, assertion call shaping, chained comparison cleanup, constant slice bounds, exception cleanup inlining, and merged f-string literals.
IR Line-Number Resolution and Block Inlining
crates/codegen/src/ir.rs
Introduced inline_small_or_no_lineno_blocks and resolve_line_numbers phases to inline small blocks and propagate line numbers post-jump-normalization; refactored max_stackdepth stack-start computation and jump-normalization return inlining conditions; adjusted implicit-return duplication to compute fallthrough via next_nonempty_block.
Test Enum Extensions
crates/jit/tests/common.rs
Extended StackValue enum with Slice and Frozenset variants; updated From<ConstantData> conversion to destructure slice elements and convert frozenset element collections.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 Hops with glee through bytecode streams,
F-strings merge in pending dreams,
Jumps grow wise with comparisons chained,
Constants fold as errors explained—
Line numbers bloom where blocks align!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 79.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Bytecode parity' is vague and generic, using a non-descriptive term that does not clearly convey the specific changes made in the changeset. Consider a more specific title that highlights the main changes, such as 'Improve bytecode compilation for assertions, f-strings, and jumps' or 'Refactor bytecode compiler with assertion/f-string optimizations and jump improvements'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_compile.py (TODO: 27)
[x] test: cpython/Lib/test/test_compiler_assemble.py
[x] test: cpython/Lib/test/test_compiler_codegen.py
[x] test: cpython/Lib/test/test_peepholer.py (TODO: 24)

dependencies:

dependent tests: (no tests depend on compile)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone marked this pull request as ready for review March 30, 2026 07:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/codegen/src/compile.rs (1)

6786-6867: Extract the condition-dependent jump emission.

Lines 6804-6818 and 6837-6851 choose the same PopJumpIf* opcode twice. A tiny helper keeps the bytecode shape in one place and makes future parity tweaks less error-prone.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 6786 - 6867, The
compile_jump_if_compare function duplicates emission of PopJumpIfTrue vs
PopJumpIfFalse in two places; refactor by computing the appropriate instruction
once based on the boolean condition (e.g., determine an opcode or small helper
like emit_pop_jump_if(condition, delta)) and then call emit! with that chosen
instruction and target_block in both spots (replace the two conditional emit!
blocks around the final compare and the single-mid-cleanup branch). Ensure the
helper/variable is used in both locations inside compile_jump_if_compare so the
bytecode shape is controlled from one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 9052-9059: The current compile_fstring_part_literal_value reparses
the entire ast::StringLiteral range (via source_file.slice(string.range)) which
can span multiple adjacent literal tokens and thus misdecode in the
surrogate-preservation path; change this to reparse each constituent literal
token individually and concatenate their parsed results: when
string.value.contains(char::REPLACEMENT_CHARACTER) iterate the literal tokens
that form this ast::StringLiteral (use each token's range / slice from
source_file.slice(token.range)), call crate::string_parser::parse_string_literal
on each token slice with string.flags.into(), collect/concatenate the resulting
buffers, and return the combined Wtf8Buf instead of reparsing the whole range at
once.

---

Nitpick comments:
In `@crates/codegen/src/compile.rs`:
- Around line 6786-6867: The compile_jump_if_compare function duplicates
emission of PopJumpIfTrue vs PopJumpIfFalse in two places; refactor by computing
the appropriate instruction once based on the boolean condition (e.g., determine
an opcode or small helper like emit_pop_jump_if(condition, delta)) and then call
emit! with that chosen instruction and target_block in both spots (replace the
two conditional emit! blocks around the final compare and the single-mid-cleanup
branch). Ensure the helper/variable is used in both locations inside
compile_jump_if_compare so the bytecode shape is controlled from one place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9842e9fe-d3c7-43f6-91ff-60550c5cff91

📥 Commits

Reviewing files that changed from the base of the PR and between 3706c53 and c717101.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_peepholer.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/jit/tests/common.rs

Comment on lines +9052 to +9059
fn compile_fstring_part_literal_value(&self, string: &ast::StringLiteral) -> Wtf8Buf {
if string.value.contains(char::REPLACEMENT_CHARACTER) {
let source = self.source_file.slice(string.range);
crate::string_parser::parse_string_literal(source, string.flags.into()).into()
} else {
string.value.to_string().into()
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reparse each constituent literal token here.

Line 9054 reparses the combined ast::StringLiteral range with one flag set, but this node can span multiple adjacent literals. In the surrogate-preservation path that can misdecode or drop later pieces, so split plain-string parts next to an f-string can produce the wrong merged literal.

💡 Proposed fix
 fn compile_fstring_part_literal_value(&self, string: &ast::StringLiteral) -> Wtf8Buf {
     if string.value.contains(char::REPLACEMENT_CHARACTER) {
-        let source = self.source_file.slice(string.range);
-        crate::string_parser::parse_string_literal(source, string.flags.into()).into()
+        string
+            .value
+            .iter()
+            .map(|lit| {
+                let source = self.source_file.slice(lit.range);
+                crate::string_parser::parse_string_literal(source, lit.flags.into())
+            })
+            .collect()
     } else {
         string.value.to_string().into()
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 9052 - 9059, The current
compile_fstring_part_literal_value reparses the entire ast::StringLiteral range
(via source_file.slice(string.range)) which can span multiple adjacent literal
tokens and thus misdecode in the surrogate-preservation path; change this to
reparse each constituent literal token individually and concatenate their parsed
results: when string.value.contains(char::REPLACEMENT_CHARACTER) iterate the
literal tokens that form this ast::StringLiteral (use each token's range / slice
from source_file.slice(token.range)), call
crate::string_parser::parse_string_literal on each token slice with
string.flags.into(), collect/concatenate the resulting buffers, and return the
combined Wtf8Buf instead of reparsing the whole range at once.

@youknowone youknowone merged commit 1c39fdb into RustPython:main Mar 30, 2026
16 checks passed
@youknowone youknowone deleted the bytecode-parity branch March 30, 2026 09:51
@coderabbitai coderabbitai bot mentioned this pull request Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant