-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add GC infrastructure: object tracking, tp_clear, and helper methods #6994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds GC-aware finalization and weakref-clearing APIs on PyObject, changes weakref clearing to collect callbacks for deferred invocation, and updates GC state to enforce enablement and reset gen0 counters; also adds a spell-check dictionary token. Changes
Sequence Diagram(s)sequenceDiagram
participant GC as Garbage Collector
participant Obj as PyObject
participant WRL as WeakRefList
participant CB as Callback
GC->>Obj: gc_clear_weakrefs_collect_callbacks()
Obj->>WRL: clear_for_gc_collect_callbacks()
WRL->>WRL: Unlink weakrefs, collect (weakref, callback) pairs
WRL-->>Obj: return pairs
Obj-->>GC: Vec<(PyWeak, PyObjectRef)>
GC->>Obj: try_call_finalizer()
Obj->>CB: invoke __del__
CB-->>Obj: returns (object may be resurrected)
Obj-->>GC: finalizer result
GC->>Obj: gc_clear()
Obj->>Obj: tp_clear / clear member slots (extract refs)
Obj-->>GC: extracted references
GC->>CB: invoke deferred callbacks (outside locks)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a1a8488 to
2766770
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@crates/vm/src/object/core.rs`:
- Around line 1151-1170: In try_call_finalizer, guard against
double-finalization by checking the FINALIZED flag before marking and invoking
__del__: if self.is_gc_finalized() return early; otherwise call
self.set_gc_finalized() and proceed to call the del slot (class().slots.del) and
handle errors with vm.run_unraisable as before; this ensures repeated calls to
try_call_finalizer won't invoke __del__ twice.
- Around line 1217-1224: In gc_clear_raw, the slot-clearing loop uses
non-blocking slot.try_write() which may skip clearing slots; replace try_write()
with the blocking write() (or otherwise unwrap/panic on failure) when iterating
obj.0.slots so each slot is deterministically cleared (matching the set_slot
pattern); ensure you still take() the guard and push the value into result, and
keep the exclusive-access assumption from gc_clear_raw intact.
Object tracking (CPython _PyObject_GC_TRACK/UNTRACK equivalent): - PyRef::new_ref() now calls gc_state().track_object() for traceable objects - track_object() sets GcBits::TRACKED flag via set_gc_tracked() - Static types call clear_gc_tracked() since they are immortal - is_gc_tracked() now checks the TRACKED bit flag tp_clear infrastructure for breaking reference cycles: - Add clear_for_gc_collect_callbacks() to WeakRefList (CPython: handle_weakrefs) - Add try_call_finalizer() for GC to call __del__ (CPython: PyObject_CallFinalizerFromDealloc) - Add gc_clear_raw()/gc_clear() for tp_clear (CPython: subtype_clear) - Add gc_get_referent_ptrs() for raw pointer traversal - Add gc_has_clear() to check clear capability GC state improvements: - maybe_collect() now checks gen0 threshold (CPython: _PyObject_GC_Alloc) - collect()/collect_force() reset gen0 count WeakRefList::clear() cleanup: - Drop lock before invoking callbacks to avoid deadlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/vm/src/object/core.rs`:
- Around line 1210-1247: gc_clear_raw currently clears vtable-held refs and
slots but never clears the instance dict even though gc_has_clear() returns true
when self.0.dict.is_some(); add a dict-clearing step in gc_clear_raw that takes
the dict out of obj.0.dict (e.g. use take() or replace(None) on obj.0.dict),
push the removed dict into the result Vec as a PyObjectRef, and thereby break
cycles involving instance dict entries; alternatively, if dicts should not be
cleared here, remove the dict.is_some() check from gc_has_clear() instead.
🧹 Nitpick comments (1)
crates/vm/src/object/core.rs (1)
437-479: Consider extracting shared logic withclear()to reduce duplication.The body of
clear_for_gc_collect_callbacksis nearly identical toclear()(lines 389-425). Both iterate the list, mark weakrefs as dead, unlink nodes, and collect callbacks. The only difference is whether callbacks are invoked immediately or returned.A helper method could consolidate the common traversal logic:
fn clear_and_collect_callbacks(&self, obj: &PyObject) -> Vec<(PyRef<PyWeak>, PyObjectRef)> { // ... shared traversal logic ... }Then
clear()would call this helper and invoke callbacks, whileclear_for_gc_collect_callbacks()would just return them.This is optional since the current implementation is correct and maintainable.
| pub unsafe fn gc_clear_raw(ptr: *mut PyObject) -> Vec<PyObjectRef> { | ||
| let mut result = Vec::new(); | ||
| let obj = unsafe { &*ptr }; | ||
|
|
||
| // 1. Clear payload-specific references (vtable.clear / tp_clear) | ||
| if let Some(clear_fn) = obj.0.vtable.clear { | ||
| unsafe { clear_fn(ptr, &mut result) }; | ||
| } | ||
|
|
||
| // 2. Clear member slots (subtype_clear) | ||
| for slot in obj.0.slots.iter() { | ||
| if let Some(val) = slot.write().take() { | ||
| result.push(val); | ||
| } | ||
| } | ||
|
|
||
| result | ||
| } | ||
|
|
||
| /// Clear this object for cycle breaking (tp_clear). | ||
| /// This version takes &self but should only be called during GC | ||
| /// when exclusive access is guaranteed. | ||
| /// | ||
| /// # Safety | ||
| /// - The caller must guarantee exclusive access (no other references exist) | ||
| /// - This is only safe during GC when the object is unreachable | ||
| pub unsafe fn gc_clear(&self) -> Vec<PyObjectRef> { | ||
| // SAFETY: During GC collection, this object is unreachable (gc_refs == 0), | ||
| // meaning no other code has a reference to it. The only references are | ||
| // internal cycle references which we're about to break. | ||
| unsafe { Self::gc_clear_raw(self as *const _ as *mut PyObject) } | ||
| } | ||
|
|
||
| /// Check if this object has clear capability (tp_clear) | ||
| // Py_TPFLAGS_HAVE_GC types have tp_clear | ||
| pub fn gc_has_clear(&self) -> bool { | ||
| self.0.vtable.clear.is_some() || self.0.dict.is_some() || !self.0.slots.is_empty() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if dict is traversed (would be counted in GC) but not cleared
# Look for Traverse implementation on PyInner or dict handling
echo "=== Checking Traverse implementation for dict ==="
rg -n "impl.*Traverse.*for.*PyInner" crates/vm/src/ -A 20
echo ""
echo "=== Checking if InstanceDict implements clear ==="
rg -n "InstanceDict" crates/vm/src/object/ -A 5 -B 2
echo ""
echo "=== Checking CPython-style subtype_clear patterns ==="
rg -n "subtype_clear|tp_clear.*dict" crates/vm/src/Repository: RustPython/RustPython
Length of output: 7211
🏁 Script executed:
#!/bin/bash
# Need to check if PyDict type itself has GC clearing
echo "=== Check PyDict type and its clear/tp_clear implementation ==="
rg -n "PyDict|tp_clear" crates/vm/src/builtins/dict.rs -A 3 -B 1 | head -100
echo ""
echo "=== Search for any dict.clear in gc_clear context ==="
rg -n "gc_clear" crates/vm/src/object/core.rs -B 2 -A 10
echo ""
echo "=== Check if dict entries are cleared as part of PyDict's vtable.clear ==="
rg -n "vtable.*clear" crates/vm/src/ -A 5 | grep -A 5 "dict"
echo ""
echo "=== Check try_clear_obj function to see how clearing works ==="
rg -n "try_clear_obj" crates/vm/src/ -A 15Repository: RustPython/RustPython
Length of output: 9356
🏁 Script executed:
#!/bin/bash
# Check if PyDict implements try_clear or MaybeClear
echo "=== Search for try_clear impl for PyDict ==="
rg -n "impl.*try_clear.*PyDict|impl.*MaybeClear.*PyDict" crates/vm/src/ -A 10
echo ""
echo "=== Check MaybeClear trait definition ==="
rg -n "trait MaybeClear|fn try_clear" crates/vm/src/object/core.rs -A 8
echo ""
echo "=== Verify: does PyDict instance have GC clearing? Search for PyPayload impl ==="
rg -n "impl PyPayload for PyDict" crates/vm/src/ -A 20
echo ""
echo "=== Check what happens when an instance dict field exists ==="
rg -n "instance.*dict|dict.*instance" crates/vm/src/object/core.rs -B 2 -A 2 | head -60Repository: RustPython/RustPython
Length of output: 3017
🏁 Script executed:
#!/bin/bash
# Check default try_clear implementation
echo "=== Search for default try_clear implementation ==="
rg -n "fn try_clear|impl.*try_clear" crates/vm/src/object/core.rs -B 2 -A 5
echo ""
echo "=== Search for PyPayload trait try_clear method ==="
rg -n "trait PyPayload" crates/vm/src/object/payload.rs -A 30
echo ""
echo "=== Verify dict is not cleared by looking at gc_clear_raw in detail ==="
sed -n '1210,1230p' crates/vm/src/object/core.rs
echo ""
echo "=== Check if there's any dict clearing elsewhere in gc flow ==="
rg -n "dict.*clear\|clear.*dict" crates/vm/src/gc.rsRepository: RustPython/RustPython
Length of output: 2410
🏁 Script executed:
#!/bin/bash
# Find try_clear trait definition more broadly
echo "=== Search for try_clear trait definition ==="
rg -n "try_clear" crates/vm/src/object/ -B 3 -A 3 | head -80
echo ""
echo "=== Find GC collection code ==="
fd "*.rs" crates/vm/src/ | xargs grep -l "gc_clear\|GC\|garbage" | head -10
echo ""
echo "=== Search for where gc_clear_raw is called ==="
rg -n "gc_clear_raw\|gc_clear\(" crates/vm/src/ -B 2 -A 2Repository: RustPython/RustPython
Length of output: 2770
🏁 Script executed:
#!/bin/bash
# Search for where gc_clear_raw is called with proper escaping
echo "=== Search for where gc_clear_raw is called ==="
rg "gc_clear_raw|gc_clear" crates/vm/src/ -B 1 -A 1
echo ""
echo "=== Check HAS_CLEAR definition ==="
rg "HAS_CLEAR" crates/vm/src/object/ -B 3 -A 3
echo ""
echo "=== Check if MaybeClear trait has HAS_CLEAR ==="
rg "trait MaybeClear|HAS_CLEAR" crates/vm/src/object/ -B 2 -A 5Repository: RustPython/RustPython
Length of output: 3657
🏁 Script executed:
#!/bin/bash
# Search more specifically for where gc_clear is called in the codebase
echo "=== Search for calls to gc_clear (not definitions) ==="
rg "\.gc_clear\(|gc_clear_raw\(" crates/vm/src/ -B 2 -A 2
echo ""
echo "=== Search in all files for gc_clear usage ==="
git ls-files crates/vm/src/ | xargs grep -l "gc_clear" | head -20
echo ""
echo "=== Check the actual file that calls gc_clear ==="
git ls-files crates/vm/src/ | xargs grep "\.gc_clear\(" -B 3 -A 3Repository: RustPython/RustPython
Length of output: 1099
🏁 Script executed:
#!/bin/bash
# Fix the grep search
echo "=== Search for calls to gc_clear ==="
rg "gc_clear" crates/vm/src/ | grep -v "pub unsafe fn\|pub fn\|Clear this object\|SAFETY"
echo ""
echo "=== Check what files call gc_clear_raw or gc_clear ==="
rg "gc_clear" crates/ --type rust | grep -v "pub unsafe\|pub fn\|//\|Clear this"
echo ""
echo "=== Look for GC-related module ==="
fd "gc\|collect" crates/vm/src/ --type f
echo ""
echo "=== Check imports of gc_clear in traverse_object or related files ==="
rg "use.*gc_clear|from.*gc_clear" crates/vm/src/Repository: RustPython/RustPython
Length of output: 468
🏁 Script executed:
#!/bin/bash
# Search for GC collection, garbage collection, or cycle-related code
echo "=== Search for garbage collection or cycle handling ==="
rg "garbage|collection|cycle|dealloc" crates/vm/src/ --type rust | head -50
echo ""
echo "=== Check if gc_clear is public API that's exported ==="
rg "pub.*gc_clear|pub use.*gc_clear" crates/vm/src/
echo ""
echo "=== Search broader for deallocation and cleanup ==="
rg "dealloc|__del__|finalize|cleanup" crates/vm/src/object/ -B 1 -A 3 | head -60
echo ""
echo "=== Check what gc_has_clear is used for ==="
rg "gc_has_clear" crates/vm/src/ -B 2 -A 2Repository: RustPython/RustPython
Length of output: 9664
🏁 Script executed:
#!/bin/bash
# Check default_dealloc implementation fully
echo "=== Full default_dealloc implementation ==="
rg "pub\(super\) unsafe fn default_dealloc" crates/vm/src/object/core.rs -A 30
echo ""
echo "=== Check drop_slow_inner implementation ==="
rg "fn drop_slow_inner" crates/vm/src/object/core.rs -A 20
echo ""
echo "=== Check stdlib/gc.rs for actual GC collection logic ==="
head -200 crates/vm/src/stdlib/gc.rs | tail -100Repository: RustPython/RustPython
Length of output: 5769
🏁 Script executed:
#!/bin/bash
# Now check if gc_has_clear is actually used for GC collection decisions
echo "=== Full gc.rs file to understand GC collection ==="
cat crates/vm/src/stdlib/gc.rs
echo ""
echo "=== Search for gc_has_clear usage ==="
rg "gc_has_clear" crates/ -B 3 -A 3Repository: RustPython/RustPython
Length of output: 8982
🏁 Script executed:
#!/bin/bash
# Search for all usages of gc_has_clear across the entire codebase
echo "=== Search ALL usages of gc_has_clear ==="
rg "gc_has_clear" crates/ -r '$0' | head -20
echo ""
echo "=== Check gc_state module for actual collection logic ==="
fd "gc_state" crates/ --type f
echo ""
echo "=== Now let's verify: does dict actually participate in cycle detection? ==="
echo "Looking at how dict entries are part of GC traversal..."
rg "impl.*Traverse.*for.*InstanceDict|impl.*Traverse.*for.*PyDict" crates/vm/src/ -A 5Repository: RustPython/RustPython
Length of output: 1163
🏁 Script executed:
#!/bin/bash
# Get the complete clear implementation for PyDict
echo "=== Full clear() implementation for PyDict ==="
rg "impl Traverse for PyDict" crates/vm/src/builtins/dict.rs -A 20
echo ""
echo "=== Verify: does PyDict set HAS_CLEAR? ==="
rg "impl.*PyPayload.*for.*PyDict" crates/vm/src/builtins/dict.rs -A 10
echo ""
echo "=== Check: MaybeTraverse implementation for PyDict ==="
rg "impl.*MaybeTraverse.*for.*PyDict|HAS_TRAVERSE.*PyDict\|HAS_CLEAR.*PyDict" crates/vm/src/ -B 5 -A 10Repository: RustPython/RustPython
Length of output: 1108
🏁 Script executed:
#!/bin/bash
# Check if PyDict implements MaybeTraverse
echo "=== Check if PyDict has MaybeTraverse impl ==="
rg "impl.*MaybeTraverse.*for.*PyDict|impl.*Traverse.*for.*PyDict" crates/vm/src/builtins/dict.rs -B 2 -A 15
echo ""
echo "=== Check if Traverse impl for PyDict means HAS_CLEAR is true ==="
echo "Looking at how Traverse.clear maps to HAS_CLEAR..."
rg "Traverse.*clear\|clear.*Traverse" crates/vm/src/object/ -B 3 -A 3
echo ""
echo "=== Now the key question: when an object has dict field, is its clear() called? ==="
echo "Look at gc_clear_raw again and how it calls vtable.clear"
sed -n '1210,1230p' crates/vm/src/object/core.rs
echo ""
echo "=== Check if dict.clear() is invoked somewhere else ==="
rg "dict.*clear\(\)|\.clear\(\).*dict" crates/vm/src/object/ -B 2 -A 2Repository: RustPython/RustPython
Length of output: 1658
🏁 Script executed:
#!/bin/bash
# Let me trace through the logic one more time to be absolutely sure
echo "=== Confirm: instance dict field is NOT cleared in gc_clear_raw ==="
sed -n '1210,1230p' crates/vm/src/object/core.rs
echo ""
echo "=== Instance dict is a field in PyInner struct ==="
sed -n '160,170p' crates/vm/src/object/core.rs
echo ""
echo "=== InstanceDict has a clear capability? Check Traverse impl for InstanceDict ==="
rg "impl.*Traverse.*for.*InstanceDict" crates/vm/src/ -A 10
echo ""
echo "=== The KEY test: if an object instance has dict, should dict entries be cleared? ==="
echo "Let's check how the slots are cleared to understand the pattern..."
sed -n '1219,1230p' crates/vm/src/object/core.rs
echo ""
echo "=== Compare with default_dealloc to see if it clears dict ==="
sed -n '1140,1165p' crates/vm/src/object/core.rsRepository: RustPython/RustPython
Length of output: 4000
🏁 Script executed:
#!/bin/bash
# Final verification: check InstanceDict methods and implementation
echo "=== Full InstanceDict implementation ==="
sed -n '643,700p' crates/vm/src/object/core.rs
echo ""
echo "=== Check if InstanceDict has any clear-like method ==="
rg "impl.*InstanceDict" crates/vm/src/object/core.rs -A 30
echo ""
echo "=== Check PyDict drain_entries method ==="
rg "drain_entries" crates/vm/src/builtins/dict.rs -B 3 -A 10
echo ""
echo "=== Final confirmation: slots.iter() in gc_clear_raw ==="
echo "Slots are cleared by iterating and taking ownership. Dict is NOT."
echo "This confirms the bug: dict field is not cleared while slots are."Repository: RustPython/RustPython
Length of output: 3358
Instance dict entries are not cleared during GC cycle breaking, but gc_has_clear() indicates they should be.
gc_has_clear() returns true when self.0.dict.is_some() (line 1246), signaling that this object can clear its references. However, gc_clear_raw() only clears:
- Payload-specific references via
vtable.clear - Member slots (lines 1219–1228)
The instance dict field (self.0.dict) is never cleared. Since dict entries are traversed by GC (traverse_object.rs lines 68, 90), they participate in cycle detection; if they form cycles, those cycles will not be broken during GC collection.
Add dict clearing to gc_clear_raw:
Suggested fix
// 3. Clear instance dict
if let Some(ref instance_dict) = obj.0.dict {
let old_dict = instance_dict.replace(vm.ctx.new_dict());
result.push(old_dict.into());
}Or remove dict.is_some() from gc_has_clear() if dict clearing is intentionally deferred.
🤖 Prompt for AI Agents
In `@crates/vm/src/object/core.rs` around lines 1210 - 1247, gc_clear_raw
currently clears vtable-held refs and slots but never clears the instance dict
even though gc_has_clear() returns true when self.0.dict.is_some(); add a
dict-clearing step in gc_clear_raw that takes the dict out of obj.0.dict (e.g.
use take() or replace(None) on obj.0.dict), push the removed dict into the
result Vec as a PyObjectRef, and thereby break cycles involving instance dict
entries; alternatively, if dicts should not be cleared here, remove the
dict.is_some() check from gc_has_clear() instead.
Summary by CodeRabbit
Bug Fixes
Refactor
Chores