Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Feb 8, 2026

Add source_path: PyMutex<&'static PyStrInterned> field to PyCode for interior mutability, replacing the UB-inducing #[allow(invalid_reference_casting)] + write_volatile pattern in update_code_filenames. Update all 12 read sites across the codebase to use the new source_path() method.

Summary by CodeRabbit

  • Refactor
    • Improved thread-safety and safer handling of code object source paths across the runtime by replacing direct internal field access and in-place mutations with controlled accessors and setters, reducing unsafe mutations and standardizing path retrieval in traceback, import, and warning/reporting flows.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

Encapsulate PyCode.source_path behind a PyMutex with getter/setter and replace direct field accesses across the VM to use the new methods; PyCode construction updated to use PyCode::new.

Changes

Cohort / File(s) Summary
PyCode core
crates/vm/src/builtins/code.rs
Added PyMutex-wrapped source_path, PyCode::new, and source_path() / set_source_path() accessors; updated repr, co_filename, and ReplaceArgs usage to call the new accessor/setter.
Traceback & error reporting
crates/vm/src/builtins/traceback.rs, crates/vm/src/exceptions.rs, crates/vm/src/frame.rs, crates/stdlib/src/faulthandler.rs
Replaced direct source_path field access with source_path() calls when serializing or printing filenames in tracebacks and faulthandler output.
Import & stdlib updates
crates/vm/src/import.rs, crates/vm/src/stdlib/imp.rs, crates/vm/src/stdlib/io.rs
Switched from field access to getter for __file__/filename handling; in imp.rs replaced unsafe direct mutation with set_source_path(new_name) and adjusted update traversal to use the public API.
VM construction & context
crates/vm/src/vm/context.rs
Changed PyCode instantiation from literal struct to PyCode::new(code) in Context::new_code.
Warnings / runtime setup
crates/vm/src/warn.rs
Replaced direct source_path field access with source_path() when setting up filenames for warnings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 A tiny mutex hugs the path so neat,
No tumble, no race, no hurried feet,
Getters whisper where raw fields slept,
Safe hops through code the rabbit kept. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main objective of the PR: replacing unsafe pointer casting with PyMutex for interior mutability of PyCode's source_path field.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review February 9, 2026 00:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/vm/src/builtins/code.rs`:
- Around line 337-351: Replace direct reads of the CodeObject field
"frame.code.source_path" with the PyCode accessor "frame.code.source_path()" so
the mutex-protected value from PyCode::source_path is used; search for
occurrences of "frame.code.source_path" in faulthandler.rs and update them to
call the accessor (leveraging the existing PyCode::source_path and
set_source_path methods).

Add `source_path: PyMutex<&'static PyStrInterned>` field to
`PyCode` for interior mutability, replacing the UB-inducing
`#[allow(invalid_reference_casting)]` + `write_volatile` pattern
in `update_code_filenames`. Update all read sites across the
codebase to use the new `source_path()` accessor method.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/faulthandler.rs`:
- Around line 254-256: dump_frame_from_raw is called from the signal handler
faulthandler_fatal_error but calls frame.code.source_path() which acquires a
PyMutex and can deadlock; make this signal-safe by avoiding any mutex
acquisition: either add a signal-safe cached string pointer on PyCode (e.g., an
atomic/AtomicPtr holding a nul-terminated C string updated outside signal
handlers and read here) and read that in dump_frame_from_raw, or replace the
call to source_path() with a non-blocking attempt (call try_lock() on the
PyCode/source_path mutex and if it fails print a fallback like "<unknown>" or
the raw pointer/address), updating references to Frame/ PyCode/ source_path
accordingly so dump_frame_from_raw never blocks in a signal context.

Comment on lines 254 to 256
fn dump_frame_from_raw(fd: i32, frame: &Frame) {
let filename = frame.code.source_path.as_str();
let filename = frame.code.source_path().as_str();
let funcname = frame.code.obj_name.as_str();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find PyMutex definition
echo "=== PyMutex Definition ==="
rg -n 'type PyMutex|struct PyMutex' --type rust -A3 -B1

echo ""
echo "=== source_path method signature and impl ==="
# Search for source_path method definition
rg -n 'fn source_path' --type rust -A8

echo ""
echo "=== Check signal handler context ==="
# Find faulthandler_fatal_error to confirm it's a signal handler
rg -n 'faulthandler_fatal_error|extern "C"' crates/stdlib/src/faulthandler.rs -B2 -A5

Repository: RustPython/RustPython

Length of output: 3633


🏁 Script executed:

#!/bin/bash
# Let's check the specific file to understand the context better
echo "=== Checking faulthandler.rs structure ==="
cat -n crates/stdlib/src/faulthandler.rs | head -300 | tail -100

Repository: RustPython/RustPython

Length of output: 4032


🏁 Script executed:

#!/bin/bash
# Check what the Frame structure looks like and where source_path comes from
echo "=== Frame structure and code field ==="
rg -n 'struct Frame|pub struct Frame' --type rust -A10

echo ""
echo "=== PyCode structure related to source_path ==="
rg -n 'pub struct PyCode|source_path' --type rust -B2 -A8

Repository: RustPython/RustPython

Length of output: 48965


source_path() acquires a mutex inside a signal handler — potential deadlock.

dump_frame_from_raw (line 254) is reachable from faulthandler_fatal_error, which is a signal handler. Calling source_path() at line 255 locks a PyMutex. If the signal fires while the same PyCode's source_path mutex is held elsewhere, this causes a deadlock.

The comment on line 252 explicitly documents this function as "(signal-safe)", which is now violated. Consider either:

  • Storing a separate signal-safe copy (e.g., AtomicPtr or atomic reference) for signal-handler reads, or
  • Using try_lock() with a fallback (e.g., print "<unknown>" if the lock is contended).
🤖 Prompt for AI Agents
In `@crates/stdlib/src/faulthandler.rs` around lines 254 - 256,
dump_frame_from_raw is called from the signal handler faulthandler_fatal_error
but calls frame.code.source_path() which acquires a PyMutex and can deadlock;
make this signal-safe by avoiding any mutex acquisition: either add a
signal-safe cached string pointer on PyCode (e.g., an atomic/AtomicPtr holding a
nul-terminated C string updated outside signal handlers and read here) and read
that in dump_frame_from_raw, or replace the call to source_path() with a
non-blocking attempt (call try_lock() on the PyCode/source_path mutex and if it
fails print a fallback like "<unknown>" or the raw pointer/address), updating
references to Frame/ PyCode/ source_path accordingly so dump_frame_from_raw
never blocks in a signal context.

@youknowone youknowone merged commit dc2d235 into RustPython:main Feb 9, 2026
13 checks passed
@youknowone youknowone deleted the pycode-path branch February 9, 2026 05:01
@jaens
Copy link

jaens commented Feb 9, 2026

@youknowone doesn't PyMutex add quite a bit of memory overhead? The mutex functionality is not really even used here, as there are no non-atomic operations done inside the lock...

Since the lifetime is static anyway, I think something like atomic_ref should work (or AtomicPtr from std, though that would still require a bit of unsafe).

What the update code does is actually an atomic compare and exchange, I think.

@youknowone
Copy link
Member Author

@jaens Thanks! That sounds right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants