Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 29, 2025

Summary by CodeRabbit

  • New Features

    • Added Unix statvfs/fstatvfs support for querying filesystem statistics.
    • RLock now exposes recursion depth tracking.
  • Bug Fixes

    • HMAC initialization now raises a specific error for unsupported digest modules to enable correct fallback behavior.
    • Memory-mapped buffers correctly reflect read-only semantics when opened in read mode.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

Adds Unix statvfs support, refactors HMAC initialization to raise a dedicated UnsupportedDigestmodError, makes mmap buffer readonly when opened read-only, and adds an atomic recursion counter plus accessor to RLock.

Changes

Cohort / File(s) Summary
HMAC error handling and API refactor
crates/stdlib/src/hashlib.rs
Added public UnsupportedDigestmodError(PyValueError). Replaced NewHMACHashArgs fields (name/data -> key/msg, digestmod -> OptionalArg<PyObjectRef>). hmac_new now accepts NewHMACHashArgs and raises UnsupportedDigestmodError to signal fallback to pure-Python.
Mmap buffer semantics
crates/stdlib/src/mmap.rs
PyMmap::as_buffer computes readonly from zelf.access and uses it in BufferDescriptor::simple(...), making read-mode mmaps expose a readonly buffer.
Unix statvfs / fstatvfs
crates/vm/src/stdlib/os.rs
Added unix-gated constants ST_RDONLY, ST_NOSUID; new StatvfsResultData and PyStatvfsResult (pystruct_sequence) and slot constructor; added statvfs (alias fstatvfs) accepting path or fd and converting libc::statvfs -> StatvfsResultData.
RLock recursion tracking
crates/vm/src/stdlib/thread.rs
Added count: AtomicUsize to RLock, initialize/reset on fork, increment on acquire and decrement on release, and exposed _recursion_count() to return current recursion depth for the owning thread.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I hopped through bytes and syscalls near and far,
Unsupported digests sent me to raise the bar.
Mmaps stay gentle, read-only where they roam,
RLock counts each cuddle, returning safe to home. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title claims to fix subprocess/multiprocessing, but the actual changes add statvfs support, HMAC error handling, mmap buffer semantics, and RLock recursion tracking—none directly related to subprocess or multiprocessing. Update the title to accurately reflect the main changes, such as: 'Add statvfs support and improve error handling in hashlib, mmap, and threading'
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the support2 branch 4 times, most recently from 1bcc263 to be5631e Compare December 29, 2025 12:52
@youknowone youknowone marked this pull request as ready for review December 29, 2025 13:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
crates/vm/src/stdlib/thread.rs (1)

231-234: Potential underflow if release() is called more times than acquire().

The count is decremented before unlocking, but there's no guard against underflow. If release() is somehow called more times than acquire() succeeded (e.g., due to a bug elsewhere), fetch_sub(1) on 0 would wrap around.

While the is_locked() check at line 228 should prevent this in normal usage, consider adding a debug assertion or checking the count value for defensive coding:

🔎 Proposed defensive check
         if !self.mu.is_locked() {
             return Err(vm.new_runtime_error("release unlocked lock"));
         }
+        debug_assert!(
+            self.count.load(std::sync::atomic::Ordering::Relaxed) > 0,
+            "RLock count underflow"
+        );
         self.count
             .fetch_sub(1, std::sync::atomic::Ordering::Relaxed);
         unsafe { self.mu.unlock() };
crates/vm/src/stdlib/os.rs (2)

1750-1769: The f_fsid extraction approach is fragile and platform-dependent.

The pointer-casting approach to extract f_fsid works but is somewhat brittle:

  1. The std::slice::from_raw_parts call requires the pointer to be valid and properly aligned for reading bytes, which should be fine for u8.
  2. However, the byte-order interpretation via from_ne_bytes produces different values on big-endian vs little-endian systems.

Consider documenting this behavior or using a simpler approach if the exact value isn't critical:

🔎 Alternative: Document the platform-specific behavior
             let f_fsid = {
-                // Use pointer cast to get raw bytes from fsid struct
+                // f_fsid is a struct on some platforms (e.g., Linux fsid_t) and a scalar on others.
+                // We extract raw bytes and interpret as a native-endian integer.
+                // Note: The value may differ across architectures due to endianness.
                 let ptr = std::ptr::addr_of!(st.f_fsid) as *const u8;
                 let size = std::mem::size_of_val(&st.f_fsid);

1787-1804: Error handling could include the filename for better diagnostics.

When statvfs fails with a path argument, the error doesn't include the filename, making debugging harder. Consider using OSErrorBuilder::with_filename like other path-based functions in this file:

🔎 Proposed fix to include filename in error
     #[pyfunction]
     #[pyfunction(name = "fstatvfs")]
     fn statvfs(path: OsPathOrFd<'_>, vm: &VirtualMachine) -> PyResult {
         let mut st: libc::statvfs = unsafe { std::mem::zeroed() };
         let ret = match path {
             OsPathOrFd::Path(path) => {
-                let cpath = path.into_cstring(vm)?;
-                unsafe { libc::statvfs(cpath.as_ptr(), &mut st) }
+                let cpath = path.clone().into_cstring(vm)?;
+                let ret = unsafe { libc::statvfs(cpath.as_ptr(), &mut st) };
+                if ret != 0 {
+                    return Err(OSErrorBuilder::with_filename(&io::Error::last_os_error(), path, vm));
+                }
+                ret
+            }
+            OsPathOrFd::Fd(fd) => unsafe { libc::fstatvfs(fd.as_raw(), &mut st) },
+        };
+        if ret != 0 {
+            return Err(io::Error::last_os_error().into_pyexception(vm));
             }
-            OsPathOrFd::Fd(fd) => unsafe { libc::fstatvfs(fd.as_raw(), &mut st) },
-        };
-        if ret != 0 {
-            return Err(io::Error::last_os_error().into_pyexception(vm));
-        }
         Ok(StatvfsResultData::from_statvfs(st).to_pyobject(vm))
     }

Alternatively, a cleaner refactor matching the pattern used in stat:

     fn statvfs(path: OsPathOrFd<'_>, vm: &VirtualMachine) -> PyResult {
         let mut st: libc::statvfs = unsafe { std::mem::zeroed() };
-        let ret = match path {
-            OsPathOrFd::Path(path) => {
-                let cpath = path.into_cstring(vm)?;
+        let ret = match &path {
+            OsPathOrFd::Path(p) => {
+                let cpath = p.clone().into_cstring(vm)?;
                 unsafe { libc::statvfs(cpath.as_ptr(), &mut st) }
             }
             OsPathOrFd::Fd(fd) => unsafe { libc::fstatvfs(fd.as_raw(), &mut st) },
         };
         if ret != 0 {
-            return Err(io::Error::last_os_error().into_pyexception(vm));
+            return Err(OSErrorBuilder::with_filename(&io::Error::last_os_error(), path, vm));
         }
         Ok(StatvfsResultData::from_statvfs(st).to_pyobject(vm))
     }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between feb5066 and be5631e.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_hmac.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • crates/stdlib/src/hashlib.rs
  • crates/stdlib/src/mmap.rs
  • crates/vm/src/stdlib/os.rs
  • crates/vm/src/stdlib/thread.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/mmap.rs
  • crates/vm/src/stdlib/os.rs
  • crates/stdlib/src/hashlib.rs
  • crates/vm/src/stdlib/thread.rs
🧠 Learnings (4)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/mmap.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/os.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/os.rs
📚 Learning: 2025-12-27T14:03:49.011Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/stdlib/src/hashlib.rs
🧬 Code graph analysis (2)
crates/stdlib/src/mmap.rs (1)
crates/vm/src/protocol/buffer.rs (1)
  • simple (176-184)
crates/stdlib/src/hashlib.rs (1)
crates/vm/src/class.rs (1)
  • static_type (71-79)
🔇 Additional comments (9)
crates/stdlib/src/mmap.rs (1)

654-657: Consider whether AccessMode::Copy should also produce a readonly buffer.

The current logic only sets readonly = true for AccessMode::Read. However, AccessMode::Copy creates a copy-on-write mapping where writes don't affect the underlying file. In CPython, ACCESS_COPY mmaps are still considered writable buffers since the user can modify the in-memory copy.

The implementation looks correct for matching CPython behavior, where only ACCESS_READ produces a truly readonly buffer.

crates/vm/src/stdlib/thread.rs (3)

258-265: LGTM!

The _recursion_count method correctly returns the acquisition depth when owned by the current thread, and 0 otherwise. This matches CPython's RLock._recursion_count() behavior and is useful for debugging and introspection.


218-223: LGTM!

The count is correctly incremented only when acquisition succeeds. The Relaxed ordering is appropriate here since the mutex operations provide the necessary synchronization.


244-244: LGTM!

Resetting the count to 0 during _at_fork_reinit is correct—after a fork, the child process should see a clean lock state.

crates/stdlib/src/hashlib.rs (2)

25-29: LGTM!

The UnsupportedDigestmodError exception follows the RustPython macro pattern for custom exceptions. Deriving from PyValueError is appropriate, and the #[repr(transparent)] ensures correct memory layout. This enables Python's hmac module to catch this exception and fall back to a pure-Python HMAC implementation.

Based on learnings, the macro system (pyexception) is correctly used for implementing Python functionality in Rust.


343-361: No changes needed—exception is properly initialized.

The #[pyattr] decorator on UnsupportedDigestmodError ensures it is automatically registered as a module attribute when the _hashlib module loads. This is the standard RustPython pattern, consistent with other exception definitions throughout the codebase (e.g., in crates/stdlib/src/ssl/error.rs). The static_type() call in hmac_new() will not panic because the exception is guaranteed to be initialized before any module functions can be called.

crates/vm/src/stdlib/os.rs (3)

205-212: LGTM!

The ST_RDONLY and ST_NOSUID constants are correctly guarded with #[cfg(all(unix, not(target_os = "redox")))], following the platform-specific constant pattern used throughout the codebase. Based on retrieved learnings, this matches CPython's approach of optional declaration for platform-specific constants.


1713-1731: LGTM!

The StatvfsResultData struct correctly mirrors Python's os.statvfs_result named tuple. The #[pystruct_sequence(skip)] annotation on f_fsid ensures it's accessible as an attribute but not included in the tuple indices, matching CPython's behavior.


1740-1746: LGTM!

The slot_new implementation follows the standard pattern for struct sequences, enabling Python-side construction of statvfs_result objects.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/vm/src/stdlib/thread.rs (1)

193-193: Public field allows external modification.

The count field is public, which means external code could directly access and modify it, potentially breaking the recursion depth invariant. Consider making this field private or add documentation warning against direct modification.

🔎 Suggested refactor to make the field private
 struct RLock {
     mu: RawRMutex,
-    count: std::sync::atomic::AtomicUsize,
+    pub(crate) count: std::sync::atomic::AtomicUsize,
 }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be5631e and def1fb9.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_hmac.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • crates/stdlib/src/hashlib.rs
  • crates/stdlib/src/mmap.rs
  • crates/vm/src/stdlib/os.rs
  • crates/vm/src/stdlib/thread.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/stdlib/src/mmap.rs
  • crates/vm/src/stdlib/os.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/hashlib.rs
  • crates/vm/src/stdlib/thread.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T14:03:49.011Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/stdlib/src/hashlib.rs
🧬 Code graph analysis (1)
crates/stdlib/src/hashlib.rs (1)
crates/vm/src/class.rs (1)
  • static_type (71-79)
🔇 Additional comments (6)
crates/vm/src/stdlib/thread.rs (3)

218-223: Verify memory ordering is sufficient.

The use of Ordering::Relaxed for the atomic counter operations should be safe here because the underlying RawRMutex provides the necessary synchronization barriers. However, it's worth confirming that this is the intended behavior and that no stronger ordering guarantees are needed.

The mutex's lock/unlock operations should provide sufficient memory barriers, making Relaxed ordering appropriate for this counter. Can you confirm this is the intended design?


231-236: Verify assertion placement and release order.

The debug_assert! checks that count > 0 before decrementing, which is correct. However, note that the decrement happens before the actual unlock() call. This ordering appears correct since we want to update our bookkeeping before releasing the lock, but ensure this is intentional.


262-269: LGTM: Correctly checks ownership before returning count.

The _recursion_count method correctly returns 0 when the current thread doesn't own the lock, and returns the actual count only when owned. This prevents exposing recursion depth of other threads.

crates/stdlib/src/hashlib.rs (3)

25-29: LGTM: Exception type properly defined.

The UnsupportedDigestmodError exception is correctly defined using the pyexception macro, inheriting from PyValueError. This follows Rust best practices for the macro system.

Based on coding guidelines, this follows the pattern of using the macro system when implementing Python functionality in Rust.


345-352: LGTM: HMAC parameter structure updated appropriately.

The refactored NewHMACHashArgs correctly reflects standard HMAC parameters:

  • key: the cryptographic key (required)
  • msg: optional message to hash
  • digestmod: optional digest algorithm

Field renames improve clarity over the previous name/data naming.


355-362: Address why native HMAC support is intentionally disabled.

The hmac_new function unconditionally raises UnsupportedDigestmodError to delegate to Python's pure-Python HMAC implementation added in this commit. This design is intentional—the exception pattern is valid and allows hmac.py to fall back to its native implementation.

However, clarify:

  • Why is native Rust HMAC support being intentionally disabled in favor of Python's pure-Python implementation? This appears unrelated to the PR's subprocess/multiprocessing goals.
  • Is this permanent or a placeholder for future native HMAC implementation?

The concerns about incomplete implementation and StaticType initialization safety are unfounded—the let _ = args pattern is intentional (no argument processing needed when unconditionally raising), and the #[pyexception] macro ensures proper exception type initialization.

@youknowone youknowone merged commit 4ce6827 into RustPython:main Dec 29, 2025
13 checks passed
@youknowone youknowone deleted the support2 branch December 29, 2025 15:24
@coderabbitai coderabbitai bot mentioned this pull request Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant