Skip to content

Bytecode parity#7535

Merged
youknowone merged 4 commits intoRustPython:mainfrom
youknowone:bytecode-parity
Mar 30, 2026
Merged

Bytecode parity#7535
youknowone merged 4 commits intoRustPython:mainfrom
youknowone:bytecode-parity

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented Mar 30, 2026

Summary by CodeRabbit

  • New Features

    • Added support for slice and frozenset constants.
    • Enabled compile-time folding for constant slice expressions.
  • Bug Fixes

    • Improved generator and coroutine handling to properly manage StopIteration exceptions.
  • Performance

    • Optimized control flow graph with block splitting, jump threading, and unreachable code elimination.

… genexpr StopIteration wrapper

- split_blocks_at_jumps: split blocks at branch points so each has one exit
- jump_threading: thread jumps through single-jump blocks (flowgraph.c jump_thread)
- Backward conditional jump normalization: invert and create NOT_TAKEN+JUMP block
- Follow empty blocks in jump-to-return optimization (next_nonempty_block)
- Add PEP 479 StopIteration handler to compile_comprehension for generators
- Add Slice variant to ConstantData and BorrowedConstant
- Fold constant slices (x[:3], x[1:4]) into LOAD_CONST(slice(...))
- Marshal serialization/deserialization for Slice type
- Box::leak in borrow_obj_constant for PySlice roundtrip
Add Frozenset to ConstantData, BorrowedConstant, and marshal support.
Actual frozenset folding (BUILD_SET + CONTAINS_OP → LOAD_CONST frozenset)
requires VirtualMachine for element hashing and is deferred.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

This PR extends the compiler's constant handling to support slice and frozenset literals at compile-time, implements PEP 479-style StopIteration exception handling for generators and coroutines, and optimizes the control flow graph with new passes for block splitting, jump threading, and unreachable code elimination.

Changes

Cohort / File(s) Summary
Constant Slice and Frozenset Support
crates/compiler-core/src/bytecode.rs, crates/compiler-core/src/marshal.rs, crates/vm/src/builtins/code.rs
Extended ConstantData and BorrowedConstant with Slice and Frozenset variants; implemented marshaling and runtime borrowing for these new constant types to enable compile-time constant folding of slice and frozenset objects.
Slice Constant Folding
crates/codegen/src/compile.rs
Added optimization to emit a single LOAD_CONST instruction for slices with compile-time-constant lower, upper, and step values, falling back to BuildSlice for dynamic cases.
Generator/Coroutine StopIteration Handling
crates/codegen/src/compile.rs
Wrapped generator and coroutine code generation with PEP 479-style SETUP_CLEANUP exception handlers that convert StopIteration to StopIterationError, including suppression of line number updates in handler blocks via set_no_location().
Control Flow Graph Optimization Passes
crates/codegen/src/ir.rs
Implemented new CFG transformations including block splitting at jumps (split_blocks_at_jumps), jump threading to eliminate intermediate blocks, unreachable code removal, and nop elimination; refactored normalize_jumps to operate on mutable vectors and simplified conditional-jump handling; added support for chained empty-block traversal during return inlining.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 A slice in hand, compile-time folded clean,
Generators now catch what StopIter means,
Blocks are split and jumps take flight,
CFG shines bright—constants, code, and might! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Bytecode parity' is vague and generic, using non-descriptive terminology that doesn't clearly convey what specific changes are being made in the changeset. Use a more specific title that describes the main changes, such as 'Add slice and frozenset constant folding with CFG optimization' or 'Implement constant slice/frozenset representation and optimize bytecode generation'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Prepare infrastructure for exit block duplication optimization.
Currently disabled pending stackdepth integration.
@youknowone youknowone marked this pull request as ready for review March 30, 2026 02:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
crates/codegen/src/compile.rs (1)

8416-8431: Extract the StopIteration wrapper into a shared helper.

This setup/teardown sequence is now effectively duplicated from Compiler::compile_function_body at Lines 3877-3892 and Lines 3924-3939. Any later tweak to handler depth, location suppression, or block layout now has to stay in sync in two places.

♻️ Suggested shape
-        let stop_iteration_block = if is_gen_scope {
-            let handler_block = self.new_block();
-            emit!(
-                self,
-                PseudoInstruction::SetupCleanup {
-                    delta: handler_block
-                }
-            );
-            self.set_no_location();
-            self.push_fblock(FBlockType::StopIteration, handler_block, handler_block)?;
-            Some(handler_block)
-        } else {
-            None
-        };
+        let stop_iteration_block = self.enter_stop_iteration_wrapper(is_gen_scope)?;
...
-        if let Some(handler_block) = stop_iteration_block {
-            emit!(self, PseudoInstruction::PopBlock);
-            self.set_no_location();
-            self.pop_fblock(FBlockType::StopIteration);
-            self.switch_to_block(handler_block);
-            emit!(
-                self,
-                Instruction::CallIntrinsic1 {
-                    func: oparg::IntrinsicFunction1::StopIterationError
-                }
-            );
-            self.set_no_location();
-            emit!(self, Instruction::Reraise { depth: 1u32 });
-            self.set_no_location();
-        }
+        self.exit_stop_iteration_wrapper(stop_iteration_block);

Also applies to: 8522-8538

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 8416 - 8431, Extract the
duplicated StopIteration setup/teardown into a shared helper (e.g., a method on
the compiler like setup_stop_iteration_handler or wrap_with_stopiteration) that
encapsulates the logic currently using is_gen_scope, self.new_block(), emitting
PseudoInstruction::SetupCleanup { delta: handler_block },
self.set_no_location(), and self.push_fblock(FBlockType::StopIteration,
handler_block, handler_block) and returns Option<handler_block>; then replace
the duplicated blocks that create stop_iteration_block (the sequences in
compile_function_body and the other location around is_gen_scope/is_async) with
a call to that helper so both sites share the same behavior and return value.
crates/codegen/src/ir.rs (1)

228-229: Consider adding a TODO comment explaining why duplicate_exits_without_lineno is disabled.

The comment "pending stackdepth fix" is helpful, but linking to a tracking issue would make it easier to follow up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/ir.rs` around lines 228 - 229, Update the existing TODO
comment above the disabled call to duplicate_exits_without_lineno(&mut
self.blocks) to include a reference to the tracking issue or bug ID and a short
timestamp/owner tag (e.g., "pending stackdepth fix - see ISSUE-1234
`@2026-03-30`") so future readers can find the follow-up; leave the call commented
out but expand the comment to mention the exact reason ("stackdepth calculation
bug") and the issue link/number and optionally who filed it.
crates/vm/src/builtins/code.rs (1)

319-326: The unimplemented! panic is appropriately documented.

The comment clearly explains that PyObjBag lacks VirtualMachine access needed for frozenset hashing, and that PyMarshalBag handles this case. Consider adding a link to where PyMarshalBag implements this, or logging a warning if this path is ever hit unexpectedly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/code.rs` around lines 319 - 326, In the
BorrowedConstant::Frozenset arm inside PyObjBag::make_constant, replace the bare
unimplemented!() with a logged warning that clearly states this path should be
handled by PyMarshalBag (reference PyMarshalBag's frozenset handling in the
message) and include a link or code reference to where PyMarshalBag implements
it; after logging, either return an appropriate Err or keep the unimplemented
panic if you want to preserve current behavior, but ensure the warning is
emitted first so unexpected hits are observable.
crates/compiler-core/src/bytecode.rs (1)

934-935: Note: Frozenset hash implementation doesn't account for set semantics.

The current hash implementation hashes elements in order, but frozensets are unordered. If two Frozenset constants have the same elements in different orders, they would have different hashes but should compare equal. However, since this is used for constant pool deduplication and constants are typically created in a deterministic order, this may be acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 934 - 935, The Frozenset
arm currently calls elements.hash(state) which is order-dependent; change it to
an order-independent hashing strategy so frozensets with the same elements in
different orders produce identical hashes. In the Frozenset match arm (the
Frozenset { elements } branch inside the Hash impl for constants), compute each
element's hash into a temporary value, sort those element-hash values (or
otherwise combine them with a commutative operation like XOR/add on u64 hashes),
then feed the sorted/combined result into the provided state so hashing is
deterministic and order-independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 7577-7581: try_fold_constant_slice currently only recognizes
numbers, booleans and None and limits integer folding to i64, which is narrower
than ExprExt::is_constant_slice; update try_fold_constant_slice to accept the
same constant kinds that ExprExt::is_constant_slice treats constant (include
string/bytes/ellipsis bounds and arbitrary-size integers, not just i64) so that
patterns like obj["a":"z"] and large integer literals are folded to
LOAD_CONST(slice) and hit the BINARY_SLICE fast path; make the same expansion in
the adjacent constant-folding block that has the i64 restriction so both places
have parity with ExprExt::is_constant_slice.

In `@crates/compiler-core/src/bytecode.rs`:
- Around line 854-887: The From<ConstantData> for StackValue implementation
currently uses a wildcard arm and will panic for the new ConstantData::Slice and
ConstantData::Frozenset variants; update the match in From<ConstantData> for
StackValue to explicitly handle ConstantData::Slice (map it to an appropriate
StackValue variant representing a slice object, constructing start/stop/step
from the elements array) and ConstantData::Frozenset (convert elements
Vec<ConstantData> into the corresponding StackValue frozenset representation),
ensuring you pattern-match on ConstantData::Slice and ConstantData::Frozenset
instead of relying on the unimplemented wildcard arm.

In `@crates/compiler-core/src/marshal.rs`:
- Around line 471-478: The make_frozenset implementation collects elements into
a Vec and passes them straight to make_constant, which preserves duplicates;
update make_frozenset to deduplicate elements before constructing the
BorrowedConstant::Frozenset so its behavior matches CPython. Concretely, in
make_frozenset use a set (e.g., HashSet or BTreeSet) to track seen values and
build a new Vec of unique elements (preserving a deterministic order such as
first-seen or sorted if needed), then call
self.make_constant::<Bag::Constant>(BorrowedConstant::Frozenset { elements:
&unique_elements }) instead of passing the original elements Vec. Ensure you
reference the existing function make_frozenset and the constant constructor
BorrowedConstant::Frozenset when applying the change.

---

Nitpick comments:
In `@crates/codegen/src/compile.rs`:
- Around line 8416-8431: Extract the duplicated StopIteration setup/teardown
into a shared helper (e.g., a method on the compiler like
setup_stop_iteration_handler or wrap_with_stopiteration) that encapsulates the
logic currently using is_gen_scope, self.new_block(), emitting
PseudoInstruction::SetupCleanup { delta: handler_block },
self.set_no_location(), and self.push_fblock(FBlockType::StopIteration,
handler_block, handler_block) and returns Option<handler_block>; then replace
the duplicated blocks that create stop_iteration_block (the sequences in
compile_function_body and the other location around is_gen_scope/is_async) with
a call to that helper so both sites share the same behavior and return value.

In `@crates/codegen/src/ir.rs`:
- Around line 228-229: Update the existing TODO comment above the disabled call
to duplicate_exits_without_lineno(&mut self.blocks) to include a reference to
the tracking issue or bug ID and a short timestamp/owner tag (e.g., "pending
stackdepth fix - see ISSUE-1234 `@2026-03-30`") so future readers can find the
follow-up; leave the call commented out but expand the comment to mention the
exact reason ("stackdepth calculation bug") and the issue link/number and
optionally who filed it.

In `@crates/compiler-core/src/bytecode.rs`:
- Around line 934-935: The Frozenset arm currently calls elements.hash(state)
which is order-dependent; change it to an order-independent hashing strategy so
frozensets with the same elements in different orders produce identical hashes.
In the Frozenset match arm (the Frozenset { elements } branch inside the Hash
impl for constants), compute each element's hash into a temporary value, sort
those element-hash values (or otherwise combine them with a commutative
operation like XOR/add on u64 hashes), then feed the sorted/combined result into
the provided state so hashing is deterministic and order-independent.

In `@crates/vm/src/builtins/code.rs`:
- Around line 319-326: In the BorrowedConstant::Frozenset arm inside
PyObjBag::make_constant, replace the bare unimplemented!() with a logged warning
that clearly states this path should be handled by PyMarshalBag (reference
PyMarshalBag's frozenset handling in the message) and include a link or code
reference to where PyMarshalBag implements it; after logging, either return an
appropriate Err or keep the unimplemented panic if you want to preserve current
behavior, but ensure the warning is emitted first so unexpected hits are
observable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 54e2036a-fe74-4e78-9275-4b95f948e8d1

📥 Commits

Reviewing files that changed from the base of the PR and between e6bcd64 and 42b24cb.

⛔ Files ignored due to path filters (1)
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
📒 Files selected for processing (5)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/code.rs

Comment on lines +7577 to +7581
// Try constant slice folding first
if self.try_fold_constant_slice(lower.as_deref(), upper.as_deref(), step.as_deref())
{
return Ok(());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

try_fold_constant_slice is narrower than the rest of the compiler’s constant logic.

Lines 8999-9016 only fold numbers, booleans, and None, and Line 9005 also limits integer folding to i64. But ExprExt::is_constant_slice() already treats string/bytes/ellipsis bounds as constant, so cases like obj["a":"z"] skip the BINARY_SLICE fast path and still never reach the new LOAD_CONST(slice) path. Large integer literals have the same gap. That leaves valid constant slices on the old bytecode path and misses the parity target for this feature.

💡 Suggested fix
-                if self.try_fold_constant_slice(lower.as_deref(), upper.as_deref(), step.as_deref())
-                {
+                if self.try_fold_constant_slice(
+                    lower.as_deref(),
+                    upper.as_deref(),
+                    step.as_deref(),
+                )? {
                     return Ok(());
                 }
-    fn try_fold_constant_slice(
-        &mut self,
-        lower: Option<&ast::Expr>,
-        upper: Option<&ast::Expr>,
-        step: Option<&ast::Expr>,
-    ) -> bool {
-        fn to_const(expr: Option<&ast::Expr>) -> Option<ConstantData> {
-            match expr {
-                None | Some(ast::Expr::NoneLiteral(_)) => Some(ConstantData::None),
-                Some(ast::Expr::NumberLiteral(ast::ExprNumberLiteral { value, .. })) => match value
-                {
-                    ast::Number::Int(i) => Some(ConstantData::Integer {
-                        value: i.as_i64()?.into(),
-                    }),
-                    ast::Number::Float(f) => Some(ConstantData::Float { value: *f }),
-                    ast::Number::Complex { real, imag } => Some(ConstantData::Complex {
-                        value: num_complex::Complex64::new(*real, *imag),
-                    }),
-                },
-                Some(ast::Expr::BooleanLiteral(ast::ExprBooleanLiteral { value, .. })) => {
-                    Some(ConstantData::Boolean { value: *value })
-                }
-                // Only match Constant_kind nodes (no UnaryOp)
-                _ => None,
-            }
-        }
+    fn try_fold_constant_slice(
+        &mut self,
+        lower: Option<&ast::Expr>,
+        upper: Option<&ast::Expr>,
+        step: Option<&ast::Expr>,
+    ) -> CompileResult<bool> {
+        let mut to_const = |expr: Option<&ast::Expr>| -> CompileResult<Option<ConstantData>> {
+            Ok(match expr {
+                None | Some(ast::Expr::NoneLiteral(_)) => Some(ConstantData::None),
+                Some(ast::Expr::NumberLiteral(ast::ExprNumberLiteral { value, .. })) => {
+                    Some(match value {
+                        ast::Number::Int(i) => ConstantData::Integer {
+                            value: ruff_int_to_bigint(i).map_err(|e| self.error(e))?,
+                        },
+                        ast::Number::Float(f) => ConstantData::Float { value: *f },
+                        ast::Number::Complex { real, imag } => ConstantData::Complex {
+                            value: Complex::new(*real, *imag),
+                        },
+                    })
+                }
+                Some(ast::Expr::StringLiteral(s)) => Some(ConstantData::Str {
+                    value: self.compile_string_value(s),
+                }),
+                Some(ast::Expr::BytesLiteral(b)) => Some(ConstantData::Bytes {
+                    value: b.value.bytes().collect(),
+                }),
+                Some(ast::Expr::BooleanLiteral(ast::ExprBooleanLiteral { value, .. })) => {
+                    Some(ConstantData::Boolean { value: *value })
+                }
+                Some(ast::Expr::EllipsisLiteral(_)) => Some(ConstantData::Ellipsis),
+                _ => None,
+            })
+        };
 
-        let (Some(start), Some(stop), Some(step_val)) =
-            (to_const(lower), to_const(upper), to_const(step))
+        let (Some(start), Some(stop), Some(step_val)) =
+            (to_const(lower)?, to_const(upper)?, to_const(step)?)
         else {
-            return false;
+            return Ok(false);
         };
 
         self.emit_load_const(ConstantData::Slice {
             elements: Box::new([start, stop, step_val]),
         });
-        true
+        Ok(true)
     }

Also applies to: 8992-9030

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 7577 - 7581,
try_fold_constant_slice currently only recognizes numbers, booleans and None and
limits integer folding to i64, which is narrower than
ExprExt::is_constant_slice; update try_fold_constant_slice to accept the same
constant kinds that ExprExt::is_constant_slice treats constant (include
string/bytes/ellipsis bounds and arbitrary-size integers, not just i64) so that
patterns like obj["a":"z"] and large integer literals are folded to
LOAD_CONST(slice) and hit the BINARY_SLICE fast path; make the same expansion in
the adjacent constant-folding block that has the i64 restriction so both places
have parity with ExprExt::is_constant_slice.

Comment on lines +854 to 887
Tuple {
elements: Vec<ConstantData>,
},
Integer {
value: BigInt,
},
Float {
value: f64,
},
Complex {
value: Complex64,
},
Boolean {
value: bool,
},
Str {
value: Wtf8Buf,
},
Bytes {
value: Vec<u8>,
},
Code {
code: Box<CodeObject>,
},
/// Constant slice(start, stop, step)
Slice {
elements: Box<[ConstantData; 3]>,
},
Frozenset {
elements: Vec<ConstantData>,
},
None,
Ellipsis,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there are other match statements on ConstantData that may need updating
rg -n "match.*ConstantData" --type=rust -A10 | head -100

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cat crates/jit/tests/common.rs

Repository: RustPython/RustPython

Length of output: 14312


🏁 Script executed:

rg "ConstantData::" --type=rust -B2 -A5 | head -150

Repository: RustPython/RustPython

Length of output: 9976


🏁 Script executed:

fd . crates/jit/tests --type f -name "*.rs"

Repository: RustPython/RustPython

Length of output: 295


🏁 Script executed:

rg "match.*\{" --type=rust -B3 -A15 | grep -A15 "ConstantData"

Repository: RustPython/RustPython

Length of output: 41022


🏁 Script executed:

rg "Slice\s*\{|Frozenset\s*\{" --type=rust -B2 -A2

Repository: RustPython/RustPython

Length of output: 16928


🏁 Script executed:

fd . crates/jit/tests --type f -name "*.rs" -exec grep -l "slice\|frozenset" {} \;

Repository: RustPython/RustPython

Length of output: 295


Update crates/jit/tests/common.rs to handle ConstantData::Slice and ConstantData::Frozenset variants.

The From<ConstantData> for StackValue implementation uses a wildcard match arm that will panic at runtime if these new constant types are encountered. Since the compiler actively generates ConstantData::Slice constants, any JIT test using slice expressions will fail with an unimplemented! panic. Add handling for both variants in the match statement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 854 - 887, The
From<ConstantData> for StackValue implementation currently uses a wildcard arm
and will panic for the new ConstantData::Slice and ConstantData::Frozenset
variants; update the match in From<ConstantData> for StackValue to explicitly
handle ConstantData::Slice (map it to an appropriate StackValue variant
representing a slice object, constructing start/stop/step from the elements
array) and ConstantData::Frozenset (convert elements Vec<ConstantData> into the
corresponding StackValue frozenset representation), ensuring you pattern-match
on ConstantData::Slice and ConstantData::Frozenset instead of relying on the
unimplemented wildcard arm.

Comment on lines +471 to 478
fn make_frozenset(&self, it: impl Iterator<Item = Self::Value>) -> Result<Self::Value> {
let elements: Vec<Self::Value> = it.collect();
Ok(
self.make_constant::<Bag::Constant>(BorrowedConstant::Frozenset {
elements: &elements,
}),
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n crates/compiler-core/src/marshal.rs | sed -n '460,490p'

Repository: RustPython/RustPython

Length of output: 1125


🏁 Script executed:

# Search for context around frozenset handling
rg -A 20 -B 5 "make_frozenset" crates/compiler-core/src/marshal.rs

Repository: RustPython/RustPython

Length of output: 2488


🏁 Script executed:

# Check if there's decoding logic that calls make_frozenset
rg -B 10 "TYPE_FROZENSET|frozenset" crates/compiler-core/src/marshal.rs | head -100

Repository: RustPython/RustPython

Length of output: 1368


🏁 Script executed:

# Find BorrowedConstant definition and Frozenset variant
rg -A 3 "enum BorrowedConstant|Frozenset {" crates/compiler-core/src/marshal.rs | head -50

Repository: RustPython/RustPython

Length of output: 194


🏁 Script executed:

# Search for BorrowedConstant definition in related files
fd -e rs | xargs rg "BorrowedConstant" | grep -E "(enum|struct)" | head -20

Repository: RustPython/RustPython

Length of output: 1031


🏁 Script executed:

# Check what makes up a constant and how frozensets are handled in constant bag
rg -B 5 -A 10 "struct.*Constant|enum.*Constant" crates/compiler-core/src/ | grep -A 10 -B 5 frozenset

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Find the BorrowedConstant enum definition
rg -A 40 "pub enum BorrowedConstant" crates/compiler-core/src/

Repository: RustPython/RustPython

Length of output: 3104


🏁 Script executed:

# Check if there's any deduplication logic in set/frozenset handling
rg -i "dedup\|duplicate\|hashset" crates/compiler-core/src/marshal.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look at the make_constant implementation to understand how frozensets are stored
rg -B 5 -A 15 "fn make_constant" crates/compiler-core/src/marshal.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search for the make_constant method implementation more broadly
fd -e rs -path "*/compiler-core/src/*" | xargs rg -A 10 "fn make_constant"

Repository: RustPython/RustPython

Length of output: 2908


🏁 Script executed:

# Look at how frozensets are used in the codebase
rg "Frozenset" crates/compiler-core/src/ -A 2 -B 2

Repository: RustPython/RustPython

Length of output: 4297


🏁 Script executed:

# Check if there's a ConstantBag trait that make_constant is part of
rg -B 5 -A 20 "trait ConstantBag" crates/compiler-core/src/

Repository: RustPython/RustPython

Length of output: 1662


🏁 Script executed:

# Check the marshal deserialization code for frozenset more carefully
rg -B 10 -A 5 "Type::FrozenSet" crates/compiler-core/src/marshal.rs

Repository: RustPython/RustPython

Length of output: 1290


🏁 Script executed:

# Look at how frozensets are handled in VM when executing constants
rg "Frozenset" crates/vm/src/ -A 3 -B 3 | head -80

Repository: RustPython/RustPython

Length of output: 3098


🏁 Script executed:

# Check if there are any tests for frozenset marshaling
fd -e rs -path "*/tests/*" -o -path "*test*" | xargs rg -l "frozenset\|Frozenset" | head -10

Repository: RustPython/RustPython

Length of output: 233


🏁 Script executed:

# Find test files for marshal
find crates -name "*test*" -o -name "*tests*" | grep -E "\.rs$" | xargs rg -l "frozenset\|Frozenset" 2>/dev/null | head -10

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Check the VM's marshal implementation for frozenset handling
rg -B 5 -A 20 "PyMarshalBag\|impl.*MarshalBag" crates/vm/src/stdlib/marshal.rs | head -100

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look for how frozenset constants are created in the VM context
rg -B 3 -A 10 "make_frozenset" crates/vm/src/stdlib/marshal.rs

Repository: RustPython/RustPython

Length of output: 501


🏁 Script executed:

# Find PyFrozenSet::from_iter implementation
rg -B 5 -A 20 "impl.*PyFrozenSet|fn from_iter" crates/vm/src/builtins/frozenset.rs | head -150

Repository: RustPython/RustPython

Length of output: 140


🏁 Script executed:

# Look at PyFrozenSet implementation
cat crates/vm/src/builtins/frozenset.rs | head -150

Repository: RustPython/RustPython

Length of output: 132


🏁 Script executed:

# Find PyFrozenSet definition
fd -e rs | xargs rg "struct PyFrozenSet|impl PyFrozenSet" | head -20

Repository: RustPython/RustPython

Length of output: 335


🏁 Script executed:

# Search for frozenset implementation in builtins
fd -e rs crates/vm/src/builtins/ | xargs rg -l "PyFrozenSet\|frozenset"

Repository: RustPython/RustPython

Length of output: 464


🏁 Script executed:

# Check PyFrozenSet implementation and from_iter method
rg -B 5 -A 30 "impl PyFrozenSet" crates/vm/src/builtins/set.rs | head -200

Repository: RustPython/RustPython

Length of output: 1897


🏁 Script executed:

# Search for from_iter implementation
rg -B 3 -A 15 "from_iter" crates/vm/src/builtins/set.rs

Repository: RustPython/RustPython

Length of output: 3361


Deduplicate frozenset elements before constructing the constant.

When deserializing TYPE_FROZENSET, the marshal code collects items directly into a Vec without deduplication, preserving duplicates from the marshal data. CPython's equivalent (PySet_Add in marshal.c) automatically deduplicates. This means crafted marshal data with duplicate elements will produce a different constant representation in the bytecode than CPython does, affecting constant equality and hashing. Normalize the elements before passing to make_constant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/marshal.rs` around lines 471 - 478, The
make_frozenset implementation collects elements into a Vec and passes them
straight to make_constant, which preserves duplicates; update make_frozenset to
deduplicate elements before constructing the BorrowedConstant::Frozenset so its
behavior matches CPython. Concretely, in make_frozenset use a set (e.g., HashSet
or BTreeSet) to track seen values and build a new Vec of unique elements
(preserving a deterministic order such as first-seen or sorted if needed), then
call self.make_constant::<Bag::Constant>(BorrowedConstant::Frozenset { elements:
&unique_elements }) instead of passing the original elements Vec. Ensure you
reference the existing function make_frozenset and the constant constructor
BorrowedConstant::Frozenset when applying the change.

@youknowone youknowone merged commit 3706c53 into RustPython:main Mar 30, 2026
15 checks passed
@youknowone youknowone deleted the bytecode-parity branch March 30, 2026 03:52
@coderabbitai coderabbitai bot mentioned this pull request Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant