Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 16, 2025

close #2364

Summary by CodeRabbit

  • New Features

    • Major ctypes revamp: richer simple/array/pointer types, dynamic type creation, improved call/casting/interop, byref/out-parameter and COM/cdecl support; array/pointer slicing and buffer introspection added.
    • Support for extended Python struct format specifiers (PEP 3118).
  • Bug Fixes

    • Buffer/descriptor validation now correctly allows scalar (ndim=0) shapes and improves format error checks.
  • Refactor

    • Simplified Python-visible bindings and renamed several dunder/accessor exposures; removed legacy ctypes thunk/callback surface.
  • Chores

    • Spellchecker vocabulary updated with new terms.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

Major multi-module refactor: spell/config updates; standardized dunder/default binding attributes; PEP3118-aware format parsing and buffer validation; Py upcast helpers; and a large, modular redesign of the _ctypes subsystem (new Initializer/SetAttr lifecycle, array/pointer/structure/simple/union, libffi call plumbing, visibility/signature changes).

Changes

Cohort / File(s) Summary
Spell-check & config
\.cspell.dict/python-more.txt, \.cspell.json
Updated cspell dictionary and config: added tokens (needsfree, PYTHONAPI, oparg, pytype, sighandler, etc.), replaced pyargpyargs, removed unconstructible.
Dunder/default binding renames
crates/vm/src/builtins/str.rs, crates/vm/src/builtins/function.rs, crates/vm/src/builtins/builtin_func.rs, crates/vm/src/builtins/object.rs, crates/vm/src/builtins/type.rs, crates/vm/src/exceptions.rs, crates/vm/src/stdlib/functools.rs, crates/vm/src/stdlib/ast/python.rs
Simplified #[pymethod]/#[pygetset]/#[pyattr] usage and aligned Rust identifiers with Python-visible dunders/attributes (e.g., is_asciiisascii, get_class__class__, get_mro__mro__, get_suppress_context__suppress_context__, fields_fields).
Buffer parsing, validation & helpers
crates/stdlib/src/pystruct.rs, crates/vm/src/buffer.rs, crates/vm/src/protocol/buffer.rs, crates/vm/src/types/structseq.rs, crates/vm/src/object/core.rs
Replaced is_ascii()isascii() calls; enhanced FormatCode::parse to skip/consume PEP3118 constructs (T{...}, X{...}, (...), :name:) and report unmatched-construct errors; BufferDescriptor::validate now supports scalar ndim()==0; added Py<T>::to_base and upcast_ref.
_ctypes modularization & major rework
crates/vm/src/stdlib/ctypes.rs, crates/vm/src/stdlib/ctypes/*
(e.g. array.rs, pointer.rs, function.rs, structure.rs, union.rs, simple.rs, library.rs, util.rs, thunk.rs)
Large redesign: split module into submodules; new Initializer/SetAttr lifecycles; many types now wrap PyCData; added array-type caching, wchar support, pointer contents/indexing and slice semantics; expanded PyCFuncPtr/libffi conversion and call pipeline; removed public StgInfo and deleted thunk module; adjusted visibilities, signatures, and public exports; added numerous helpers and re-exports.
Minor codec/operator/structseq tweaks
crates/vm/src/stdlib/codecs.rs, crates/vm/src/stdlib/operator.rs, crates/vm/src/types/structseq.rs
Replaced is_ascii()isascii() in encode/compare paths; adjusted struct-sequence repr closure to handle PyObject values correctly.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Py as Python caller
    participant VM as Rust VM (_ctypes)
    participant FFI as libffi / Native
    participant Mem as Native memory

    Note right of Py: create / call ctypes objects
    Py->>VM: create type or instantiate Array/Struct/Pointer/FuncPtr
    VM->>VM: Initializer::init / process_fields / build metadata (StgInfo/type_data)
    VM->>Mem: allocate native buffer (size/align)
    Mem-->>VM: return buffer pointer
    VM-->>Py: return ctypes instance

    Note right of Py: call into native code
    Py->>VM: call PyCFuncPtr with Python args
    VM->>VM: resolve argtypes/restype, build_callargs (Python → FFI)
    VM->>FFI: invoke native function via libffi with prepared args
    FFI->>Mem: writes result/out-params to buffer
    FFI-->>VM: raw result/status
    VM->>VM: convert_raw_result (FFI → Python), extract out-parameters
    VM-->>Py: return Python result (and out-parameters)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas needing extra attention:

  • crates/vm/src/stdlib/ctypes/function.rs — FFI conversion, call pipeline, PyCFuncPtr state and errcheck/HRESULT/COM logic.
  • crates/vm/src/stdlib/ctypes/{array.rs,pointer.rs,structure.rs,union.rs,simple.rs} — Initializer/SetAttr lifecycle, buffer layout/packing/alignment, pointer indexing and slice semantics, ownership/safety.
  • Removal/migration of StgInfo and deletion of thunk.rs — verify all consumers updated.
  • crates/vm/src/buffer.rs — PEP3118 nested parsing, unmatched-construct error paths.
  • crates/vm/src/object/core.rs — unsafe transmute/upcast helpers and invariants.

Possibly related PRs

Suggested reviewers

  • arihant2math
  • ShaharNaveh

Poem

"🐰 I hopped through braces, bytes, and names,

Dunders snugged into their rightful frames,
I parsed the shapes PEP3118 drew,
Wrapped C memory with a careful chew,
A carrot cheer for every FFI claim!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Most changes focus on ctypes implementation [ctypes/.rs], but includes unrelated refactoring in configuration files (.cspell.) and method renames across multiple modules (is_ascii→isascii, get_→____) that appear tangential to core ctypes functionality. Separate ctypes-unrelated changes into independent PRs: spell-check dictionary updates, method renaming refactors, and buffer/protocol updates to maintain clear scope alignment with issue #2364.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ctypes overhaul' directly and specifically describes the main change in the PR, which is a comprehensive refactoring and implementation of the ctypes module across multiple files.
Linked Issues check ✅ Passed The PR successfully implements a comprehensive ctypes module for RustPython with Linux platform support as required by issue #2364, introducing extensive public API surface, FFI interop, buffer management, and architecture supporting future multi-platform extension.
Docstring Coverage ✅ Passed Docstring coverage is 88.93% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:

git pull origin ctypes-overhaul

@youknowone youknowone force-pushed the ctypes-overhaul branch 14 times, most recently from ee0efb7 to ca7743c Compare December 18, 2025 13:55
@youknowone youknowone marked this pull request as ready for review December 18, 2025 13:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
crates/vm/src/types/structseq.rs (1)

202-211: Fix type inconsistency in single-field case.

The closure format_field expects (&PyObject, _), and the multi-field case (line 215) explicitly converts using .as_ref(). However, the single-field case passes value directly from zelf.first().unwrap(), which returns &PyObjectRef, not &PyObject.

🔎 Apply this diff to fix the type inconsistency:
         let (body, suffix) =
             if let Some(_guard) = rustpython_vm::recursion::ReprGuard::enter(vm, zelf.as_ref()) {
                 if field_names.len() == 1 {
                     let value = zelf.first().unwrap();
-                    let formatted = format_field((value, field_names[0]))?;
+                    let formatted = format_field((value.as_ref(), field_names[0]))?;
                     (formatted, ",")
crates/vm/src/stdlib/ctypes/array.rs (1)

315-353: Fix incomplete overflow handling in PyCArrayType::multiply

The current overflow check only returns an error when checked_mul succeeds and the result exceeds isize::MAX. If checked_mul returns None (indicating overflow), the check is silently skipped and array_type_from_ctype is still called with a huge n value, creating nonsensical array types:

if let Some(new_size) = current_size.checked_mul(n as usize)
    && new_size > isize::MAX as usize
{
    return Err(vm.new_overflow_error("array too large"));
}

Handle both overflow cases:

-                if let Some(new_size) = current_size.checked_mul(n as usize)
-                    && new_size > isize::MAX as usize
-                {
-                    return Err(vm.new_overflow_error("array too large"));
-                }
+                if current_size.checked_mul(n as usize)
+                    .is_some_and(|new_size| new_size > isize::MAX as usize)
+                    .not()
+                {
+                    return Err(vm.new_overflow_error("array too large"));
+                }

Or more explicitly:

-                if let Some(new_size) = current_size.checked_mul(n as usize)
-                    && new_size > isize::MAX as usize
-                {
-                    return Err(vm.new_overflow_error("array too large"));
-                }
+                match current_size.checked_mul(n as usize) {
+                    Some(new_size) if new_size > isize::MAX as usize => {
+                        return Err(vm.new_overflow_error("array too large"));
+                    }
+                    None => {
+                        return Err(vm.new_overflow_error("array too large"));
+                    }
+                    _ => {}
+                }
crates/vm/src/stdlib/ctypes/function.rs (1)

333-407: c_bool ('?') is not correctly marshalled in callbacks and certain conversion paths

Two functions still lack handling for type code "?":

  • ffi_to_python (lines 1625-1681): C callbacks returning c_bool currently fall through to the default case and produce None instead of True/False.
  • python_to_ffi (lines 1683-1752): Python callbacks with restype = c_bool don't write a proper boolean value back into the C result slot.

Note: ReturnType::from_ffi_type already correctly handles "?", so regular function return types are not affected.

Add the missing "?" cases:

  1. In ffi_to_python, add before the final _ => return vm.ctx.none():
             Some("P") => vm.ctx.new_int(*(ptr as *const usize)).into(),
+            Some("?") => vm
+                .ctx
+                .new_bool(*(ptr as *const u8) != 0)
+                .into(),
             _ => return vm.ctx.none(),
  1. In python_to_ffi, add after the Some("P") | Some("z") | Some("Z") block:
             Some("P") | Some("z") | Some("Z") => {
                 if let Ok(i) = obj.try_int(vm) {
                     *(result as *mut usize) = i.as_bigint().to_usize().unwrap_or(0);
                 }
             }
+            Some("?") => {
+                if let Ok(b) = obj.try_to_bool(vm) {
+                    *(result as *mut u8) = if b { 1 } else { 0 };
+                }
+            }
             _ => {}
🧹 Nitpick comments (9)
crates/vm/src/stdlib/ctypes/structure.rs (1)

13-29: Silent error swallowing in calculate_struct_size.

Line 16 uses unwrap_or_default() which silently returns an empty vector if try_to_value fails. This could mask type errors in _fields_. Consider propagating the error or at least logging it.

🔎 Suggested improvement:
-        let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();
+        let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;
crates/vm/src/stdlib/ctypes/union.rs (2)

11-28: Same silent error swallowing as in calculate_struct_size.

Line 14 uses unwrap_or_default() which silently returns an empty vector on error. Consider propagating the error for consistency with proper error handling.


358-413: SetAttr may call process_fields twice for _fields_ assignment.

When setting _fields_:

  1. If a descriptor exists (lines 370-381), it calls the descriptor then process_fields at line 378
  2. Otherwise, it stores the attribute (lines 385-402), then calls process_fields again at line 408

This means process_fields is called after every _fields_ assignment regardless of path, but in the descriptor path it's called twice (once at 378, and the function returns at 380 so actually only once there).

Wait, looking again: line 380 has return Ok(()) so the second call at 408 is not reached in the descriptor case. This is correct but the control flow is complex. Consider restructuring for clarity.

🔎 Suggested restructure for clarity:
 impl SetAttr for PyCUnionType {
     fn setattro(
         zelf: &Py<Self>,
         attr_name: &Py<PyStr>,
         value: PySetterValue,
         vm: &VirtualMachine,
     ) -> PyResult<()> {
         let pytype: &Py<PyType> = zelf.to_base();
         let attr_name_interned = vm.ctx.intern_str(attr_name.as_str());
+        let is_fields = attr_name.as_str() == "_fields_";
 
         // 1. First, do PyType's setattro (PyType_Type.tp_setattro first)
         // Check for data descriptor first
         if let Some(attr) = pytype.get_class_attr(attr_name_interned) {
             let descr_set = attr.class().mro_find_map(|cls| cls.slots.descr_set.load());
             if let Some(descriptor) = descr_set {
                 descriptor(&attr, pytype.to_owned().into(), value.clone(), vm)?;
-                // After successful setattro, check if _fields_ and call process_fields
-                if attr_name.as_str() == "_fields_"
-                    && let PySetterValue::Assign(fields_value) = value
-                {
+                if is_fields && let PySetterValue::Assign(fields_value) = value {
                     PyCUnionType::process_fields(pytype, fields_value, vm)?;
                 }
                 return Ok(());
             }
         }
 
         // Store in type's attributes dict
         match &value {
             PySetterValue::Assign(v) => {
                 pytype
                     .attributes
                     .write()
                     .insert(attr_name_interned, v.clone());
+                if is_fields {
+                    PyCUnionType::process_fields(pytype, v.clone(), vm)?;
+                }
             }
             PySetterValue::Delete => {
                 let prev = pytype.attributes.write().shift_remove(attr_name_interned);
                 if prev.is_none() {
                     return Err(vm.new_attribute_error(format!(
                         "type object '{}' has no attribute '{}'",
                         pytype.name(),
                         attr_name.as_str(),
                     )));
                 }
             }
         }
-
-        // 2. If _fields_, call process_fields (which checks FINAL internally)
-        if attr_name.as_str() == "_fields_"
-            && let PySetterValue::Assign(fields_value) = value
-        {
-            PyCUnionType::process_fields(pytype, fields_value, vm)?;
-        }
 
         Ok(())
     }
 }
crates/vm/src/buffer.rs (1)

264-281: The implicit fallthrough behavior for standalone T/X is correct but could be more explicit.

When T or X appears without a following {, the code falls through to FormatType::try_from(c), which returns an error since T and X are not valid FormatType variants. This is the intended behavior (standalone T/X are invalid), but the code would be clearer with an explicit check and error message explaining that these characters require the {} syntax when used.

Consider adding an explicit guard:

if (c == b'T' || c == b'X') && chars.peek() != Some(&b'{') {
    return Err("'T' and 'X' format specifiers require '{}'".to_owned());
}

This would make the constraint explicit rather than relying on FormatType::try_from to catch it implicitly.

crates/vm/src/stdlib/ctypes/simple.rs (1)

943-963: is_cwchar_array_or_pointer panics if StgInfo is missing

is_cwchar_array_or_pointer uses .expect("array has StgInfo") / .expect("pointer has StgInfo"). If a user manages to construct a malformed PyCArray/PyCPointer subclass without proper StgInfo, this will panic the VM instead of raising a Python exception.

This is low‑probability but avoidable.

Suggested defensive change
-    if let Some(arr) = value.downcast_ref::<PyCArray>() {
-        let info = arr.class().stg_info_opt().expect("array has StgInfo");
-        let elem_type = info.element_type.as_ref().expect("array has element_type");
+    if let Some(arr) = value.downcast_ref::<PyCArray>() {
+        let info = if let Some(info) = arr.class().stg_info_opt() {
+            info
+        } else {
+            return false;
+        };
+        let elem_type = if let Some(elem_type) = info.element_type.as_ref() {
+            elem_type
+        } else {
+            return false;
+        };

Same pattern for the pointer branch.

crates/vm/src/stdlib/ctypes/array.rs (1)

871-910: Minor: slice ops ignore base buffer for arrays backed by external memory

getitem_by_slice and wchar_array_set_value always operate on zelf.0.buffer and ignore base / base_offset the way getitem_by_index does. For arrays created via from_buffer / views into other ctypes objects, slicing and the wchar setters will read/write the local buffer instead of the base buffer.

This is subtle and probably rare, but it makes slice semantics inconsistent with single‑index semantics on such views.

Direction for aligning slice ops with base-backed arrays
  • Mirror the logic used in getitem_by_index:

    let base_obj = zelf.0.base.read().clone();
    let (buffer_lock, offset0) = if let Some(cdata) = base_obj
        .as_ref()
        .and_then(|b| b.downcast_ref::<super::PyCData>())
    {
        (&cdata.buffer, zelf.0.base_offset.load())
    } else {
        (&zelf.0.buffer, 0)
    };
  • Then compute offsets relative to offset0 in both the c and u slice branches and in wchar setters.

Also applies to: 1141-1169

crates/vm/src/stdlib/ctypes.rs (1)

486-547: sizeof fallback behavior for non‑StgInfo ctypes types is a bit ad-hoc

The sizeof implementation tries several cases, then for simple types uses _type_ and get_size, but finally returns size_of::<usize>() or errors for many other cases. For some ctypes scenarios (custom simple subclasses, mis‑initialized types) this may yield surprising sizes instead of a clean error.

Not a blocker, but worth tightening once the rest of the ctypes semantics stabilize.

Possible cleanup direction
  • Prefer consistently using StgInfo.size for any type that has it, including simple types, rather than recomputing from _type_.
  • For types without StgInfo and without _type_/known metaclasses, consider always raising TypeError("this type has no size") instead of falling back to size_of::<usize>().
crates/vm/src/stdlib/ctypes/function.rs (1)

153-176: Known leak in Windows conv_param for str → wide pointer

The Windows branch of conv_param for PyStr intentionally allocates a Vec<u16>, leaks it with mem::forget, and returns its pointer:

let wide: Vec<u16> = s.as_str().encode_utf16().chain(std::iter::once(0)).collect();
let addr = wide.as_ptr() as usize;
std::mem::forget(wide);
return Ok((Type::pointer(), FfiArgValue::Pointer(addr)));

The comment notes this FIXME/leak. Given typical usage, this is probably acceptable for now (and mirrors CPython’s tendency to tie such allocations to _objects), but it is a real leak and will be hit for every Windows oledll string parameter converted this way.

Future improvement

Once the _objects / keep‑alive story for parameters is fully wired, this code should store the wide buffer in an owned object (e.g., a PyBytes/capsule referenced from the argument’s _objects) instead of leaking the Vec<u16>.

crates/vm/src/stdlib/ctypes/pointer.rs (1)

523-572: Negative pointer indices are stored in _objects under huge usize keys

setitem_by_index accepts index: isize (allowing negative indices, as C pointer arithmetic does), but when calling keep_ref it uses index as usize:

let offset = index * element_size as isize;
let addr = (ptr_value as isize + offset) as usize;
...
zelf.0.keep_ref(index as usize, value, vm)

For negative index, this wraps to a very large usize. Since _objects is only used to keep referents alive and not looked up by index later, this is mostly an internal bookkeeping quirk rather than a functional bug, but it’s confusing and may make debugging harder.

Possible clean‑up
  • Either forbid negative indices in __setitem__ (matching typical ctypes usage) and guard before calling setitem_by_index.
  • Or change keep_ref to accept a signed index (if feasible) so the stored key matches the logical index used in pointer arithmetic.
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e43966 and ca7743c.

📒 Files selected for processing (27)
  • .cspell.dict/python-more.txt (2 hunks)
  • .cspell.json (3 hunks)
  • crates/stdlib/src/pystruct.rs (1 hunks)
  • crates/vm/src/buffer.rs (1 hunks)
  • crates/vm/src/builtins/builtin_func.rs (2 hunks)
  • crates/vm/src/builtins/function.rs (1 hunks)
  • crates/vm/src/builtins/object.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (2 hunks)
  • crates/vm/src/builtins/type.rs (1 hunks)
  • crates/vm/src/exceptions.rs (3 hunks)
  • crates/vm/src/object/core.rs (1 hunks)
  • crates/vm/src/protocol/buffer.rs (1 hunks)
  • crates/vm/src/stdlib/ast/python.rs (1 hunks)
  • crates/vm/src/stdlib/codecs.rs (2 hunks)
  • crates/vm/src/stdlib/ctypes.rs (14 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (8 hunks)
  • crates/vm/src/stdlib/ctypes/library.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (0 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (4 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (0 hunks)
  • crates/vm/src/stdlib/functools.rs (1 hunks)
  • crates/vm/src/stdlib/operator.rs (1 hunks)
  • crates/vm/src/types/structseq.rs (2 hunks)
💤 Files with no reviewable changes (2)
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/pystruct.rs
  • crates/vm/src/buffer.rs
  • crates/vm/src/stdlib/operator.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/protocol/buffer.rs
  • crates/vm/src/types/structseq.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/stdlib/functools.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/stdlib/codecs.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/ctypes/library.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ast/python.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/builtins/builtin_func.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/exceptions.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • .cspell.json
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations

Applied to files:

  • crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
🧬 Code graph analysis (8)
crates/stdlib/src/pystruct.rs (1)
crates/vm/src/builtins/str.rs (3)
  • s (1783-1788)
  • s (1814-1818)
  • isascii (630-632)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
  • strerror (1544-1548)
crates/vm/src/stdlib/functools.rs (4)
crates/vm/src/builtins/builtin_func.rs (2)
  • __reduce__ (123-126)
  • __reduce__ (169-182)
crates/vm/src/builtins/function.rs (1)
  • __reduce__ (804-812)
crates/vm/src/builtins/object.rs (1)
  • __reduce__ (503-505)
crates/vm/src/stdlib/operator.rs (3)
  • __reduce__ (360-365)
  • __reduce__ (458-461)
  • __reduce__ (527-544)
crates/vm/src/builtins/function.rs (1)
crates/vm/src/types/slot.rs (1)
  • _cls_is (1083-1085)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytearray.rs (1)
  • isascii (271-273)
crates/vm/src/builtins/bytes.rs (1)
  • isascii (262-264)
crates/vm/src/bytes_inner.rs (1)
  • isascii (326-328)
crates/vm/src/stdlib/ctypes/union.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (33)
  • fields (2225-2225)
  • fields (2227-2227)
  • field_type (304-304)
  • field_type (2143-2143)
  • field_type (2158-2158)
  • field_type (2191-2191)
  • field_size (1183-1185)
  • default (129-145)
  • obj (725-725)
  • obj (749-749)
  • obj (771-771)
  • obj (786-786)
  • obj (1547-1547)
  • obj (1549-1549)
  • obj (1658-1659)
  • obj (1686-1687)
  • obj (1707-1708)
  • obj (1732-1732)
  • get_usize_attr (2081-2098)
  • std (2103-2103)
  • std (2186-2186)
  • size (626-628)
  • size (1628-1630)
  • get_field_size (2157-2187)
  • get_field_align (2190-2211)
  • is_big_endian (334-340)
  • make_anon_fields (2275-2316)
  • proto (291-293)
  • proto (874-874)
  • proto (939-939)
  • proto (1497-1497)
  • proto (1514-1514)
  • attr (895-895)
crates/vm/src/stdlib/ctypes.rs (5)
extra_tests/snippets/stdlib_ctypes.py (1)
  • c_double (124-125)
crates/vm/src/suggestion.rs (1)
  • name (25-25)
crates/vm/src/builtins/genericalias.rs (1)
  • args (102-102)
crates/vm/src/stdlib/ast/other.rs (1)
  • object (39-39)
crates/vm/src/stdlib/ctypes/library.rs (1)
  • null (31-31)
crates/vm/src/stdlib/ctypes/pointer.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (49)
  • obj (725-725)
  • obj (749-749)
  • obj (771-771)
  • obj (786-786)
  • obj (1547-1547)
  • obj (1549-1549)
  • obj (1658-1659)
  • obj (1686-1687)
  • obj (1707-1708)
  • obj (1732-1732)
  • mem (1912-1912)
  • mem (1921-1921)
  • mem (1930-1930)
  • mem (1939-1939)
  • mem (1948-1948)
  • mem (1957-1957)
  • mem (1966-1966)
  • mem (1975-1975)
  • mem (1984-1984)
  • mem (1993-1993)
  • mem (2004-2004)
  • mem (2018-2018)
  • mem (2059-2059)
  • mem (2060-2060)
  • value (808-809)
  • value (817-817)
  • value (817-817)
  • value (839-839)
  • value (853-853)
  • value (868-868)
  • value (878-878)
  • value (911-911)
  • value (1309-1309)
  • value (1317-1317)
  • value (1325-1325)
  • value (1373-1373)
  • value (1388-1388)
  • value (1402-1402)
  • value (1416-1416)
  • value (1442-1442)
  • from_bytes (399-408)
  • std (2103-2103)
  • std (2186-2186)
  • buffer (347-347)
  • buffer (354-354)
  • read_ptr_from_buffer (2102-2109)
  • at_address (432-444)
  • size (626-628)
  • size (1628-1630)
🪛 Biome (2.1.2)
.cspell.json

[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run tests under miri
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (28)
crates/vm/src/stdlib/ast/python.rs (1)

50-51: LGTM! Clean simplification of attribute naming.

The change removes redundant name mapping by directly naming the Rust method _fields instead of mapping fields to _fields. The Python API remains unchanged and consistent with CPython's AST specification.

.cspell.dict/python-more.txt (1)

151-151: LGTM! Dictionary entries are appropriate for ctypes work.

The added terms "needsfree" and "PYTHONAPI" are legitimate technical identifiers related to Python's C API and memory management, which align well with the ctypes overhaul objectives.

Also applies to: 172-172

.cspell.json (1)

78-78: LGTM! New dictionary terms are appropriate.

The added terms "oparg", "pytype", "summands", and "sighandler" are legitimate technical identifiers that align with the ctypes overhaul.

Note: The Biome parse errors are false positives. This file uses JSONC (JSON with Comments), which is the standard format for cspell configuration files. The syntax is correct.

Also applies to: 110-110, 116-116, 134-134

crates/vm/src/builtins/function.rs (1)

632-636: LGTM! Clean refactoring.

The change simplifies the return path by wrapping the if-else expression directly in Ok(), eliminating an intermediate variable assignment. The descriptor protocol logic remains correct and the behavior is unchanged.

crates/vm/src/types/structseq.rs (1)

213-219: LGTM! Correct type conversion in multi-field case.

The explicit conversion via .map(|value| value.as_ref()) correctly transforms PyObjectRef to &PyObject before pairing with field names and passing to the closure. This aligns with the updated closure signature.

crates/vm/src/stdlib/functools.rs (1)

76-77: LGTM! Naming convention now consistent with codebase patterns.

The change from explicit name mapping (#[pymethod(name = "__reduce__")] with fn reduce) to implicit name-based binding (#[pymethod] with fn __reduce__) aligns with the established pattern used across the codebase (as seen in builtins/object.rs, builtins/function.rs, and stdlib/operator.rs). This improves readability and maintains consistency with the broader method naming simplification effort mentioned in the PR objectives.

crates/vm/src/stdlib/codecs.rs (2)

191-196: LGTM! Consistent with the latin_1_encode change.

The method rename from is_ascii() to isascii() is applied consistently with the earlier change in latin_1_encode. The ASCII optimization path remains functional.


178-183: Method rename is correct and complete.

The change from is_ascii() to isascii() aligns with Python's naming convention. PyStr defines isascii() as a public const method, and both occurrences in latin_1_encode and ascii_encode correctly use it. No remaining is_ascii() calls exist on PyStr types in the codebase.

crates/vm/src/stdlib/operator.rs (1)

326-326: Approved: Method rename aligns with Python naming conventions.

The change from is_ascii() to isascii() correctly implements Python's str.isascii() method name on PyStrRef. The method is properly defined and all related wrapper types (PyStr, PyBytes, PyBytearray) have been systematically updated. No clippy warnings.

crates/vm/src/builtins/type.rs (1)

1448-1452: mro getter correctly wired to mro_map_collect

The new #[pygetset] __mro__ accessor cleanly exposes the existing MRO vector as a tuple, matching the prior behavior while aligning with the new naming convention.

crates/vm/src/builtins/builtin_func.rs (2)

117-120: PyNativeFunction.self now uses default pygetset binding

Switching to bare #[pygetset] keeps the getter behavior (always returning None here) while aligning with the refactored attribute exposure pattern.


184-187: PyNativeMethod.self getter remains correct under new exposure style

The __self__ property still returns the bound receiver from zelf.func.zelf, with only the pygetset metadata simplified to the default form.

crates/vm/src/exceptions.rs (3)

627-635: Expose suppress_context via suppress_context pygetset

Renaming the getter to __suppress_context__ and annotating it with #[pygetset] makes the Python-visible attribute explicit and pairs cleanly with set_suppress_context’s setter mapping, without changing the underlying AtomicCell logic.


1115-1116: Include suppress_context in serialized exception state

Serializing the suppress_context field via exc.__suppress_context__() ensures the chained-exception suppression flag is preserved in serde output, matching the BaseException state.


1860-1868: PyOSError.strerror getter now aligned with property name

Using #[pygetset] fn strerror(&self) with the existing set_strerror setter (named "strerror") keeps the Python attribute intact while simplifying the Rust-side naming.

crates/vm/src/builtins/object.rs (1)

453-456: class getter renamed to match exposed attribute

The new #[pygetset] fn __class__ still returns obj.class().to_owned(), preserving behavior while following the unified “method name == Python attribute name” convention.

crates/vm/src/builtins/str.rs (2)

628-632: isascii now uses StrKind for an O(1) check

Implementing isascii as matches!(self.kind(), StrKind::Ascii) is a nice fast-path, assuming StrData/StrKind maintain the invariant that ASCII-only strings are tagged as Ascii.

Please double-check that all constructors and transformations on PyStr correctly promote/demote StrKind so that ASCII-only results are always marked Ascii (e.g., slicing, joins, replacements), to keep isascii fully compatible with CPython.


963-984: format exposure via default #[pymethod] keeps semantics intact

Switching to #[pymethod] fn __format__ only affects how the method is registered; the logic for empty vs non-empty spec and the UTF-8-only formatting path is unchanged and remains consistent with the rest of the formatting machinery.

crates/stdlib/src/pystruct.rs (1)

30-41: Struct format ASCII validation updated to use PyStr.isascii

Using s.isascii() here matches the new string API and keeps the semantics of “format must be ASCII-only” the same for str inputs, with bytes still validated via AsciiStr::from_ascii.

crates/vm/src/buffer.rs (2)

297-307: Missing error message for unmatched : in field name.

The error message at line 302 says "unmatched ':' in format" which is correct but slightly inconsistent with PEP3118 terminology. Field names use the pattern :name:, so running out of input before finding the closing : is indeed an unmatched delimiter.

The implementation correctly handles this edge case.


309-312: Standalone braces are silently skipped.

This handles pointer target syntax (& followed by type in braces). The silent skip is appropriate since these are extended specifiers that don't contribute to the basic format parsing.

crates/vm/src/protocol/buffer.rs (1)

202-218: LGTM! Proper 0-dimensional buffer support added.

The validation now correctly handles scalar types (ndim=0) where itemsize == len, which is needed for ctypes Structure types. The branching logic is clear and the assertions are appropriate for debug builds.

crates/vm/src/stdlib/ctypes/library.rs (2)

62-86: Consider handling the race condition in get_or_insert_lib.

The current logic creates a new library, gets its pointer as a key, then checks if that key already exists. However, if the library already exists and is open, the newly created new_lib will be dropped (closing the handle) while the existing one is returned. This is mostly fine, but there's a subtle issue:

If l.is_closed() is true, the code inserts the new library. But the key comes from new_lib.get_pointer(), which might differ from the original closed library's key if library addresses can be reused.

This is likely safe in practice since libloading typically returns consistent handles, but worth noting.


93-98: LGTM! Clean static initialization pattern.

Moving the static_cell! macro invocation inside the function scope is a cleaner pattern and the pub(super) visibility appropriately restricts access to the ctypes module.

crates/vm/src/stdlib/ctypes/structure.rs (3)

66-129: LGTM! Well-structured initialization with proper base class handling.

The Initializer implementation correctly:

  1. Handles _abstract_ types
  2. Checks for already-initialized state
  3. Processes direct _fields_ or inherits from base
  4. Properly clears DICTFLAG_FINAL in subclasses while marking bases as final
  5. Uses separate scopes to avoid holding locks during mutable operations

The lock release pattern (lines 91-98, 100-118) is correct for avoiding deadlocks.


427-491: SetAttr implementation correctly guards _fields_ modification.

The implementation:

  1. Releases the read lock before calling process_fields (avoiding deadlock)
  2. Prevents deletion of _fields_
  3. Properly delegates non-_fields_ attributes to type's setattro logic

However, there's a potential TOCTOU issue: between checking is_final (line 440-445) and calling process_fields (line 456), another thread could modify the state. This is likely acceptable given Python's GIL semantics, but worth noting.


611-635: LGTM! Correct 0-dimensional buffer for Structure.

The AsBuffer implementation correctly creates a scalar buffer with:

  • ndim=0 (empty dim_desc)
  • itemsize equal to total buffer length
  • Format string from StgInfo with fallback to "B"

This aligns with the BufferDescriptor::validate changes reviewed earlier.

crates/vm/src/stdlib/ctypes/union.rs (1)

455-518: LGTM! Correct recursive field initialization.

The init_pos_args method properly:

  1. Recursively processes base class fields first
  2. Handles kwargs duplicate detection
  3. Returns the count of consumed arguments

The lock release pattern at lines 470-480 correctly avoids holding the lock during recursion.

Comment on lines +1105 to +1125
impl<T: crate::class::PySubclass> Py<T> {
/// Converts `&Py<T>` to `&Py<T::Base>`.
#[inline]
pub fn to_base(&self) -> &Py<T::Base> {
debug_assert!(self.as_object().downcast_ref::<T::Base>().is_some());
// SAFETY: T is #[repr(transparent)] over T::Base,
// so Py<T> and Py<T::Base> have the same layout.
unsafe { &*(self as *const Py<T> as *const Py<T::Base>) }
}

/// Converts `&Py<T>` to `&Py<U>` where U is an ancestor type.
#[inline]
pub fn upcast_ref<U: PyPayload + StaticType>(&self) -> &Py<U>
where
T: StaticType,
{
debug_assert!(T::static_type().is_subtype(U::static_type()));
// SAFETY: T is a subtype of U, so Py<T> can be viewed as Py<U>.
unsafe { &*(self as *const Py<T> as *const Py<U>) }
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for PySubclass trait definition
echo "=== Searching for PySubclass trait definition ==="
rg -n "trait PySubclass" --type=rust | head -20

echo ""
echo "=== Searching for impl PySubclass ==="
rg -n "impl.*PySubclass" --type=rust | head -20

echo ""
echo "=== Searching for Py<T> type definition ==="
rg -n "^pub struct Py<" --type=rust | head -10

echo ""
echo "=== Checking file structure around lines 900-1130 in core.rs ==="
wc -l crates/vm/src/object/core.rs

Repository: RustPython/RustPython

Length of output: 900


🏁 Script executed:

#!/bin/bash
# Find and display the relevant trait and type definitions
echo "=== Finding PySubclass trait ==="
ast-grep --pattern 'trait PySubclass { $$$ }' 2>/dev/null || rg -A 10 "trait PySubclass" --type=rust

echo ""
echo "=== Finding Py struct definition ==="
rg -B 2 -A 8 "pub struct Py<T>" --type=rust

echo ""
echo "=== Finding PyInner struct definition ==="
rg -B 2 -A 8 "pub struct PyInner<T>" --type=rust

echo ""
echo "=== Checking PySubclass context ==="
rg -B 5 -A 15 "pub fn to_base\|pub fn upcast_ref" crates/vm/src/object/core.rs

Repository: RustPython/RustPython

Length of output: 776


🏁 Script executed:

#!/bin/bash
# Get PySubclass trait definition
echo "=== PySubclass trait definition ==="
sed -n '193,220p' crates/vm/src/class.rs

echo ""
echo "=== PyInner struct definition ==="
rg -B 2 -A 15 "pub struct PyInner<T>" --type=rust crates/vm/src/object/core.rs

echo ""
echo "=== Full context of to_base and upcast_ref methods ==="
sed -n '1105,1125p' crates/vm/src/object/core.rs

echo ""
echo "=== Check PySubclass::Base associated type ==="
rg -A 5 "type Base" crates/vm/src/class.rs | head -15

Repository: RustPython/RustPython

Length of output: 1475


🏁 Script executed:

#!/bin/bash
# Look at PyInner definition
echo "=== PyInner struct definition ==="
rg -B 5 -A 20 "^pub struct PyInner" crates/vm/src/object/core.rs

echo ""
echo "=== Check existing PySubclass implementations ==="
# Get a few real implementations
rg -B 2 -A 10 "impl.*PySubclass" crates/vm/src/exceptions.rs | head -30

echo ""
echo "=== Look for repr attributes on PySubclass implementors ==="
rg -B 5 "impl.*PySubclass" crates/vm/src/class.rs

echo ""
echo "=== Check if PyInner has repr attributes ==="
rg -B 5 "struct PyInner" crates/vm/src/object/core.rs | grep -A 5 "repr"

Repository: RustPython/RustPython

Length of output: 663


🏁 Script executed:

#!/bin/bash
# Get full PyInner definition
echo "=== Full PyInner struct ==="
sed -n '104,130p' crates/vm/src/object/core.rs

echo ""
echo "=== Look at the actual PyOSError structure ==="
rg -B 10 "impl.*PySubclass for PyOSError" crates/vm/src/exceptions.rs | grep -E "struct|pub"

echo ""
echo "=== Search for PyOSError struct definition ==="
rg -B 2 -A 6 "struct PyOSError" crates/vm/src/exceptions.rs

echo ""
echo "=== Check how PyPayload relates to memory layout ==="
rg -A 10 "pub trait PyPayload" crates/vm/src/object/mod.rs

Repository: RustPython/RustPython

Length of output: 1551


🏁 Script executed:

#!/bin/bash
# Check PyException structure
echo "=== PyException definition ==="
rg -B 5 -A 15 "^pub struct PyException" crates/vm/src/exceptions.rs

echo ""
echo "=== Check if PyPayload trait has repr requirements ==="
rg -B 2 -A 20 "pub trait PyPayload" crates/vm/src/

echo ""
echo "=== Check comparisons with PyRef methods mentioned in review ==="
sed -n '1076,1103p' crates/vm/src/object/core.rs

echo ""
echo "=== Look for any repr attribute documentation on PySubclass implementors ==="
rg -B 10 "pub struct PyOSError" crates/vm/src/exceptions.rs | head -20

Repository: RustPython/RustPython

Length of output: 3014


The PySubclass trait doesn't enforce required layout guarantees for unsafe pointer casts.

The to_base() and upcast_ref() methods use unsafe pointer casts based on layout assumptions that are not enforced by the trait definition. Specifically:

  • to_base() assumes "T is #[repr(transparent)] over T::Base", but PySubclass only requires that T::Base is a PyPayload with an as_base() implementation. For example, PyOSError implements PySubclass yet has no #[repr(transparent)] attribute—it's a regular struct with a base field.

  • upcast_ref() assumes Python subtyping implies layout compatibility, which is not guaranteed by Rust type layout rules.

  • debug_assert!() only validates type relationships at runtime, not memory layout, so casts could unsafely reinterpret different layouts.

By contrast, PyRef<T>::into_base() and PyRef<T>::upcast() (lines 1076–1102) safely handle similar conversions by going through PyObjectRef and using type-safe downcasting.

Either enforce #[repr(transparent)] on all PySubclass implementors, use safe downcasting like PyRef, or provide explicit layout guarantees in the trait documentation.

🤖 Prompt for AI Agents
In crates/vm/src/object/core.rs around lines 1105 to 1125, the unsafe pointer
casts in to_base() and upcast_ref() rely on unguaranteed layout assumptions from
PySubclass and must be fixed: either enforce/layout-safety in the trait
(require/document #[repr(transparent)] on all PySubclass implementors) or remove
the unsafe casts and implement these methods using the safe object-level
conversions already used by PyRef (go through self.as_object() and use
downcast_ref/downcast into the appropriate Py<U> or convert via PyObjectRef as
PyRef::into_base()/upcast() do). Update the code to use the safe downcast path
(or add the explicit repr requirement to the trait and tests) and remove the
unsafe transmute-style pointer casts.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/ctypes/structure.rs (1)

514-562: Add validation to require fields before instance creation.

Each subclass must define a fields attribute. Currently, slot_new retrieves _fields_ via MRO lookup but doesn't validate it exists before creating instances. When _fields_ is absent, the field_names vector remains empty, preventing positional and keyword arguments from initializing any structure fields. This violates ctypes' expected behavior where positional arguments are used to initialize the fields in the same order as they appear in the fields definition, named arguments are used to initialize the fields with the corresponding name.

Add a check in slot_new to ensure _fields_ is defined on the class (directly or via inheritance) before allowing instance creation, returning an appropriate error if absent.

crates/vm/src/stdlib/ctypes/array.rs (1)

318-343: checked_mul result is ignored, so overflow is silently allowed

In PyCArrayType::as_number you compute:

if let Some(stg_info) = zelf_type.stg_info_opt() {
    let current_size = stg_info.size;
    if let Some(new_size) = current_size.checked_mul(n as usize)
        && new_size > isize::MAX as usize
    {
        return Err(vm.new_overflow_error("array too large"));
    }
}

If checked_mul returns None (overflow), this branch is skipped and you proceed to construct the array type anyway, which will then overflow again when computing its total size.

Treat None from checked_mul as an overflow condition and raise OverflowError instead of silently continuing.

🧹 Nitpick comments (5)
crates/vm/src/types/structseq.rs (1)

208-211: Apply .as_ref() explicitly for consistency.

In the multi-field case (line 215), there's an explicit .map(|value| value.as_ref()) to convert PyObjectRef to &PyObject. However, in the single-field case, value from zelf.first().unwrap() (which is &PyObjectRef) is passed directly to format_field. While implicit deref coercion should work, this creates an inconsistency that reduces code clarity.

🔎 Apply this diff for consistency:
         if field_names.len() == 1 {
-            let value = zelf.first().unwrap();
+            let value = zelf.first().unwrap().as_ref();
             let formatted = format_field((value, field_names[0]))?;
             (formatted, ",")
crates/vm/src/stdlib/ctypes.rs (2)

232-235: Clarify that get_align is only valid for simple types.

The comment states "for C types, alignment equals size" which isn't universally true (e.g., structs). Consider renaming or documenting that this only applies to the simple type codes handled by type_info().

🔎 Suggested documentation improvement:
-/// Get alignment for a simple type - for C types, alignment equals size
+/// Get alignment for a simple type code (b, B, h, H, i, I, etc.)
+/// For these primitive C types, natural alignment equals size.
 fn get_align(ty: &str) -> usize {
     get_size(ty)
 }

1216-1230: Dangerous COM interop—consider additional safeguards.

The vtable access and direct memory write are inherently unsafe. While this is necessary for COM interop, consider:

  1. Validate vtable pointer: Ensure *iunknown (the vtable pointer) is non-null before dereferencing
  2. Document safety requirements: The caller must ensure src_ptr is a valid IUnknown pointer
🔎 Add null check for vtable:
         if src_ptr != 0 {
             unsafe {
                 // IUnknown vtable: [QueryInterface, AddRef, Release, ...]
                 let iunknown = src_ptr as *mut *const usize;
                 let vtable = *iunknown;
+                if vtable.is_null() {
+                    return Ok(E_POINTER);
+                }
                 let addref_fn: extern "system" fn(*mut std::ffi::c_void) -> u32 =
                     std::mem::transmute(*vtable.add(1)); // AddRef is index 1
                 addref_fn(src_ptr as *mut std::ffi::c_void);
             }
         }
crates/vm/src/stdlib/ctypes/library.rs (1)

26-33: Minor: Unused return value from closed library.

When the library is closed (None), get_pointer returns a null pointer. This is used as a HashMap key in get_or_insert_lib. Consider adding a brief comment explaining that the key remains valid for identification purposes even after the library is closed.

crates/vm/src/stdlib/ctypes/structure.rs (1)

69-74: Consider simplifying the type conversion.

The conversion from PyRef<Self> to PyTypeRef via PyObjectRef intermediate is verbose. Since PyCStructType is #[repr(transparent)] over PyType, you could potentially use upcast or to_base directly.

Alternative approach:
// Using the new to_base() method since PyCStructType wraps PyType
let new_type: &Py<PyType> = zelf.to_base();

Note: This would require adjusting subsequent code that expects ownership.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e43966 and ca7743c.

📒 Files selected for processing (27)
  • .cspell.dict/python-more.txt (2 hunks)
  • .cspell.json (3 hunks)
  • crates/stdlib/src/pystruct.rs (1 hunks)
  • crates/vm/src/buffer.rs (1 hunks)
  • crates/vm/src/builtins/builtin_func.rs (2 hunks)
  • crates/vm/src/builtins/function.rs (1 hunks)
  • crates/vm/src/builtins/object.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (2 hunks)
  • crates/vm/src/builtins/type.rs (1 hunks)
  • crates/vm/src/exceptions.rs (3 hunks)
  • crates/vm/src/object/core.rs (1 hunks)
  • crates/vm/src/protocol/buffer.rs (1 hunks)
  • crates/vm/src/stdlib/ast/python.rs (1 hunks)
  • crates/vm/src/stdlib/codecs.rs (2 hunks)
  • crates/vm/src/stdlib/ctypes.rs (14 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (8 hunks)
  • crates/vm/src/stdlib/ctypes/library.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (0 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (4 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (0 hunks)
  • crates/vm/src/stdlib/functools.rs (1 hunks)
  • crates/vm/src/stdlib/operator.rs (1 hunks)
  • crates/vm/src/types/structseq.rs (2 hunks)
💤 Files with no reviewable changes (2)
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/codecs.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/operator.rs
  • crates/vm/src/object/core.rs
  • crates/stdlib/src/pystruct.rs
  • crates/vm/src/protocol/buffer.rs
  • crates/vm/src/stdlib/functools.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/builtins/builtin_func.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/buffer.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/types/structseq.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/library.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ast/python.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/function.rs
🧠 Learnings (7)
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • .cspell.json
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/exceptions.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations

Applied to files:

  • crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/ctypes/simple.rs
🧬 Code graph analysis (6)
crates/vm/src/protocol/buffer.rs (6)
crates/stdlib/src/array.rs (1)
  • itemsize (738-740)
crates/vm/src/builtins/memory.rs (2)
  • itemsize (584-586)
  • shape (594-603)
crates/vm/src/builtins/str.rs (1)
  • len (1855-1857)
crates/vm/src/builtins/dict.rs (2)
  • len (716-718)
  • len (747-749)
crates/vm/src/builtins/tuple.rs (1)
  • len (193-195)
crates/vm/src/builtins/set.rs (1)
  • len (205-207)
crates/vm/src/builtins/function.rs (1)
crates/vm/src/types/slot.rs (1)
  • _cls_is (1083-1085)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
  • strerror (1544-1548)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytes.rs (1)
  • isascii (262-264)
crates/vm/src/builtins/bytearray.rs (1)
  • isascii (271-273)
crates/vm/src/bytes_inner.rs (1)
  • isascii (326-328)
crates/vm/src/stdlib/ctypes/structure.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (34)
  • std (2103-2103)
  • std (2186-2186)
  • default (129-145)
  • obj (725-725)
  • obj (749-749)
  • obj (771-771)
  • obj (786-786)
  • obj (1547-1547)
  • obj (1549-1549)
  • obj (1658-1659)
  • obj (1686-1687)
  • obj (1707-1708)
  • obj (1732-1732)
  • new (149-165)
  • new (1194-1202)
  • value (808-809)
  • value (817-817)
  • value (817-817)
  • value (839-839)
  • value (853-853)
  • value (868-868)
  • value (878-878)
  • value (911-911)
  • value (1309-1309)
  • value (1317-1317)
  • value (1325-1325)
  • value (1373-1373)
  • value (1388-1388)
  • value (1402-1402)
  • value (1416-1416)
  • get_usize_attr (2081-2098)
  • size (626-628)
  • size (1628-1630)
  • attr (895-895)
crates/vm/src/stdlib/ctypes/function.rs (2)
crates/vm/src/stdlib/ctypes.rs (15)
  • base (61-61)
  • type_info (143-225)
  • std (528-528)
  • std (532-532)
  • std (543-543)
  • std (708-708)
  • std (793-793)
  • std (1202-1202)
  • std (1204-1204)
  • std (1206-1206)
  • t (55-55)
  • pointer (675-684)
  • obj (493-493)
  • obj (539-539)
  • src (1199-1199)
crates/vm/src/stdlib/ctypes/base.rs (35)
  • std (2103-2103)
  • std (2186-2186)
  • value (808-809)
  • value (817-817)
  • value (817-817)
  • value (839-839)
  • value (853-853)
  • value (868-868)
  • value (878-878)
  • value (911-911)
  • value (1309-1309)
  • value (1317-1317)
  • value (1325-1325)
  • value (1373-1373)
  • value (1388-1388)
  • value (1402-1402)
  • value (1416-1416)
  • value (1442-1442)
  • obj (725-725)
  • obj (749-749)
  • obj (771-771)
  • obj (786-786)
  • obj (1547-1547)
  • obj (1549-1549)
  • obj (1658-1659)
  • obj (1686-1687)
  • obj (1707-1708)
  • obj (1732-1732)
  • buffer (347-347)
  • buffer (354-354)
  • read_ptr_from_buffer (2102-2109)
  • call_paramfunc (1639-1652)
  • size (626-628)
  • size (1628-1630)
  • default (129-145)
🪛 Biome (2.1.2)
.cspell.json

[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (35)
crates/vm/src/stdlib/functools.rs (1)

76-77: LGTM! Clean refactoring to use default naming convention.

The change from explicit name = "__reduce__" to using the function name directly (__reduce__) is a clean refactoring that aligns with idiomatic usage of the pymethod macro. The Python-facing API remains unchanged, and this approach is more maintainable.

crates/vm/src/builtins/function.rs (1)

631-637: LGTM! Clean refactoring that improves code clarity.

The inlining of the conditional expression directly into Ok() removes an unnecessary intermediate variable while preserving the exact same behavior. This is idiomatic Rust and makes the descriptor logic more concise.

crates/vm/src/protocol/buffer.rs (1)

205-216: LGTM! Proper scalar buffer validation.

The branching logic correctly handles scalar buffers (ndim=0) separately from multi-dimensional buffers. The scalar assertion itemsize == len is sound, and this approach is consistent with existing scalar handling in for_each_segment and zip_eq methods.

crates/vm/src/stdlib/codecs.rs (1)

179-179: LGTM! Method rename aligns with Python naming conventions.

The change from is_ascii() to isascii() is consistent with the broader refactoring to align Rust method names with Python-exposed names. The logic remains unchanged—these are fast-path optimizations for ASCII strings.

Also applies to: 192-192

crates/vm/src/stdlib/operator.rs (1)

326-326: LGTM! Method rename is consistent.

The update to isascii() aligns with the refactoring in PyStr. The logic for constant-time ASCII string comparison remains unchanged.

crates/vm/src/builtins/str.rs (2)

628-632: LGTM! Improved naming consistency.

The rename from is_ascii to isascii aligns the Rust method name with the Python-exposed name, and simplifying the #[pymethod] attribute to use default naming is cleaner. This matches the naming convention used by PyBytes::isascii() and PyByteArray::isascii() as shown in the relevant code snippets.


963-968: LGTM! Binding attribute simplified.

Removing the explicit name = "__format__" parameter is appropriate since the Rust function name already matches the desired Python name. This follows the same pattern as the isascii change and reduces redundancy.

crates/stdlib/src/pystruct.rs (1)

31-31: LGTM! Call site updated correctly.

The method call is updated to use isascii(), consistent with the rename in PyStr. The format validation logic remains unchanged.

crates/vm/src/types/structseq.rs (1)

213-219: LGTM!

The explicit conversion from PyObjectRef to &PyObject via .as_ref() correctly aligns with the updated closure signature and ensures type safety.

crates/vm/src/stdlib/ast/python.rs (1)

50-51: Refactoring looks good - aligns Rust function naming with Python attribute name.

The change from #[pyattr(name = "_fields")] fn fields to #[pyattr] fn _fields improves consistency without breaking anything. The Python API remains unchanged (still exposes _fields), and no internal Rust code references the old method name.

.cspell.dict/python-more.txt (1)

151-151: LGTM!

The additions of needsfree (ctypes memory management flag) and PYTHONAPI (ctypes calling convention constant) are appropriate vocabulary expansions for the ctypes overhaul.

Also applies to: 172-172

.cspell.json (1)

78-78: LGTM!

The vocabulary additions (oparg, pytype, sighandler) are appropriate for the RustPython codebase. The static analysis errors from Biome are false positives—this is a JSONC file (JSON with comments and trailing commas), which Biome's strict JSON parser doesn't handle correctly.

Also applies to: 110-110, 134-134

crates/vm/src/builtins/builtin_func.rs (2)

117-120: LGTM!

The #[pygetset] macro correctly infers the attribute name __self__ from the function name. This simplification aligns with the broader PR pattern of using default naming instead of explicit name = "..." parameters.


184-187: LGTM!

Consistent simplification—the macro infers __self__ from the method name.

crates/vm/src/builtins/object.rs (1)

453-456: LGTM!

The function rename from get_class to __class__ allows the #[pygetset] macro to infer the Python attribute name, while the setter on line 458 correctly retains the explicit name = "__class__" since its function name is set_class.

crates/vm/src/exceptions.rs (3)

627-630: LGTM!

The function rename from get_suppress_context to __suppress_context__ follows the established pattern, allowing the macro to infer the Python attribute name.


1115-1115: LGTM!

The serialization code correctly calls the renamed method __suppress_context__().


1860-1863: LGTM!

The getter rename from get_strerror to strerror is consistent with the pattern. The os.rs:strerror function in the relevant snippets is unrelated—it's a standalone utility, not a getter on PyOSError.

crates/vm/src/stdlib/ctypes.rs (8)

1-31: LGTM!

Clean module reorganization with well-defined submodules and public re-exports. The structure provides a clear separation of concerns for the ctypes implementation.


32-85: LGTM!

The Py<PyType> extension methods provide clean abstractions for accessing and managing StgInfo type metadata. The mark_bases_final method's best-effort approach (discarding init_type_data result) is appropriate for its use case.


249-362: LGTM!

The CArgObject struct and its Representable implementation are well-structured, with comprehensive pattern matching for different type tags and appropriate fallback handling.


415-484: LGTM!

The Windows-only COMError implementation correctly follows CPython's pattern of setting instance attributes (hresult, text, details) in __init__ and storing the custom init function after new_heap to avoid the default init_wrapper override.


1083-1095: Inherently unsafe but correctly implemented.

The PyObj_FromPtr function correctly handles the conversion from raw pointer to PyObjectRef:

  • NULL check prevents null pointer dereference
  • ManuallyDrop prevents double-free
  • Clone creates a proper reference

However, there's no validation that ptr actually points to a valid PyObject. This matches CPython's behavior but is inherently dangerous. Consider adding a doc comment warning about this:

/// SAFETY: Caller must ensure ptr points to a valid, live PyObject.
/// Passing an invalid pointer causes undefined behavior.

854-894: LGTM!

The resize function correctly:

  • Validates minimum size against StgInfo.size
  • Checks for owned memory (rejects borrowed buffers)
  • Safely copies existing data to the new buffer

90-112: LGTM!

The make_module function properly initializes all ctypes metaclasses and extends the module with the necessary type registrations.


604-672: LGTM!

The create_pointer_type function correctly handles both complete pointer types (when cls is a type with _type_ set) and incomplete pointer types (when cls is a string). The caching strategy aligns with CPython's behavior.

crates/vm/src/builtins/type.rs (1)

1448-1452: LGTM!

Clean refactoring that aligns the function name with the Python attribute name. The simplified #[pygetset] macro will infer __mro__ from the function name, maintaining the same behavior.

crates/vm/src/buffer.rs (1)

264-312: LGTM! Well-structured PEP3118 extended format specifier handling.

The implementation correctly:

  • Uses depth tracking for nested T{...}, X{}, and (...) constructs
  • Provides clear error messages for unmatched delimiters
  • Skips field names (:name:) and standalone braces appropriately

One note: T/X characters not followed by { will fall through to FormatType::try_from, which will correctly error with "bad char in struct format" since they aren't valid standalone format codes per PEP3118.

crates/vm/src/stdlib/ctypes/library.rs (2)

45-49: LGTM!

The Drop implementation correctly delegates to close(), ensuring the underlying Library is properly released when SharedLibrary is dropped.


93-97: LGTM!

Good use of static_cell! to encapsulate the lazy initialization within libcache(), reducing top-level static exposure while maintaining thread-safe singleton behavior.

crates/vm/src/stdlib/ctypes/structure.rs (5)

170-218: LGTM! Well-structured field processing initialization.

The function properly:

  • Handles swapped byte order structures
  • Validates _fields_ as list or tuple
  • Extracts _pack_ and _align_ attributes with defaults
  • Initializes tracking variables from base class StgInfo when available

268-299: Good flag propagation logic.

The code correctly propagates type flags (HASPOINTER, HASUNION, HASBITFIELD) from field types to the containing structure, and properly marks field types as finalized when used as structure fields.


301-333: Format string construction follows PEP3118.

The padding and field format construction correctly:

  • Adds padding bytes before fields when needed
  • Handles array shapes with proper (shape)format syntax
  • Includes field names with :name: notation

427-491: SetAttr implementation handles finalization correctly.

Good pattern:

  • Releases read lock before calling process_fields to avoid deadlock (lines 440-445)
  • Guards _fields_ mutation with finalization check
  • Falls back to standard type attribute handling for non-_fields_ attributes

611-635: AsBuffer implementation aligns with PyCData model.

The buffer descriptor correctly:

  • Uses the actual buffer length from zelf.0.buffer
  • Retrieves format from class StgInfo with "B" fallback
  • Sets ndim=0 (empty dim_desc) for structure buffers per PEP3118

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (9)
crates/vm/src/stdlib/ctypes/union.rs (2)

553-577: Union buffer view now matches Structure’s 0‑dimensional semantics

PyCUnion::as_buffer now builds a BufferDescriptor with dim_desc: vec![] and itemsize = buffer_len, mirroring PyCStructure and CPython’s scalar ctypes behavior. This fixes the previous mismatch where unions appeared as 1‑D buffers.

Looks good.


12-29: Avoid silently treating malformed _fields_ as a zero-sized union

Both calculate_union_size and PyCUnion::init_pos_args use:

let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();

If _fields_ exists but isn’t a proper sequence (or try_to_value fails), these helpers quietly behave as if there were no fields at all, yielding size 0 and skipping positional initialization. That makes broken union definitions hard to debug.

As with the struct helper, prefer propagating the conversion error (or mapping it to a TypeError consistent with process_fields) when _fields_ is present, e.g.:

-    if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) {
-        let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();
+    if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) {
+        let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;

and similarly in init_pos_args.

Also applies to: 487-515

crates/vm/src/stdlib/ctypes/function.rs (4)

369-385: Pointer restypes (c_char_p, c_wchar_p, c_void_p/POINTER) always return ints

ReturnType::from_ffi_type treats "P", "z", and "Z" uniformly:

Some("P") | Some("z") | Some("Z") => {
    vm.ctx.new_int(unsafe { *(value as *const usize) }).into()
}

And ctypes_callproc routes such restypes through RawResult::Pointer, which convert_raw_result also turns into a Python int.

That means:

  • restype = c_char_p → Python int pointer, not bytes/None.
  • restype = c_wchar_p → Python int pointer, not str/None.
  • General pointer types (including c_void_p) are also exposed only as bare integers.

CPython’s ctypes decodes c_char_p and c_wchar_p restypes into Python bytes/str (or None for NULL), while pointer types may expose pointer objects or addresses.

You already have string/wstring logic (string_at_impl, wstring_at_impl) and array/pointer helpers; consider:

  • Distinguishing "z" and "Z" in convert_raw_result / ReturnType::from_ffi_type, decoding the pointed-to string and returning bytes/str (or None), and
  • Leaving "P"/general pointer types as the raw address or appropriate ctypes pointer object.

Right now all pointer restypes collapse to integers, which diverges notably from ctypes semantics.

Also applies to: 1376-1395


186-287: ArgumentType::convert_object still bypasses .from_param, limiting ctypes compatibility

The new convert_object for PyTypeRef adds better handling for pointers/structures and raw PyCSimple, but it still never consults the type’s from_param hook. Instead it:

  • Tries downcast::<PyCSimple>(), or
  • Calls the type as a constructor and expects a PyCSimple.

This misses the primary ctypes customization point (from_param) for both built-in and user-defined types, and was already a pain point in the previous version.

Consider:

  • First attempting to call from_param on the type (or using a shared paramfunc) if it exists, and using that result as the canonical converted object.
  • Only falling back to the current downcast/constructor path when from_param is absent or clearly unsupported.

Without that, many legitimate argtypes patterns (especially custom Structure/Union/pointer subclasses) will not behave like CPython.


723-766: COM-style constructor still panics on non-Windows via NULL function pointer

The COM-method constructor branch:

if args.args.len() >= 2
    && first_arg.try_int(vm).is_ok()
    && args.args[1].downcast_ref::<PyStr>().is_some()
{
    // ...
    return PyCFuncPtr { _base: PyCData::from_bytes(vec![0u8; ptr_size], None), ... };
}

runs on all platforms. On non-Windows it creates a PyCFuncPtr with a NULL function pointer and no index/iid. Later, Callable::call does:

let (func_ptr, is_com_method) = (None::<CodePtr>, false); // non-Windows
let code_ptr = func_ptr.unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));

Since get_code_ptr() returns None for a NULL pointer, this .expect will panic at call time.

You should either:

  • Gate this entire COM constructor branch under #[cfg(windows)], or
  • On non-Windows, detect this usage and raise a TypeError immediately instead of constructing a broken object.

As-is, non-Windows builds can create CFuncPtr instances that inevitably panic when called.

Also applies to: 1462-1480


1398-1452: OUT parameters currently discard the function’s own return value

build_result handles OUT buffers like this:

if prepared.out_buffers.is_empty() {
    return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none()));
}

let out_values = extract_out_values(prepared.out_buffers, vm);
Ok(match <[PyObjectRef; 1]>::try_from(out_values) {
    Ok([single]) => single,
    Err(v) => PyTuple::new_ref(v, &vm.ctx).into(),
})

As soon as any OUT parameters exist, the function’s (possibly errcheck-processed) return value is dropped and only the OUT values are returned. In CPython ctypes, the usual pattern is to return a tuple like (result, *outs).

You likely want:

  • To keep the existing “no OUT” branch as-is, and
  • When OUT buffers are present, return a tuple whose first element is result.unwrap_or(vm.ctx.none()) followed by the extracted OUT values.

That way, callers with both a meaningful return value and OUT/INOUT parameters see the full result, not just the OUTs.

crates/vm/src/stdlib/ctypes/structure.rs (1)

13-29: Don’t silently drop _fields_ conversion errors in calculate_struct_size

calculate_struct_size uses:

let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();

If _fields_ is malformed or try_to_value fails, you quietly return size 0 instead of surfacing an error, which makes misconfigured structures hard to debug and can mask real bugs. Prefer propagating the error (or mapping it to a descriptive TypeError) when _fields_ exists.

A minimal fix:

-    if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) {
-        let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();
+    if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) {
+        let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;

This keeps the Ok(0) fallback for the “no _fields_ attribute” case but no longer hides bad _fields_ contents.

crates/vm/src/stdlib/ctypes/array.rs (2)

64-73: Guard against element_size * length overflow in array_type_from_ctype

Here you compute the total array size as element_size * length without checking for overflow:

let stg_info = StgInfo::new_array(
    element_size * length,
    element_align,
    length,
    ...
);

For large length or element_size, this can overflow usize (panic in debug, wrap in release) and produce a bogus layout. In PyCArrayType::init you already use a length > usize::MAX / item_size guard; mirror that here before calling StgInfo::new_array.

Example adjustment:

-    let stg_info = StgInfo::new_array(
-        element_size * length,
+    if element_size != 0 && length > usize::MAX / element_size {
+        return Err(vm.new_overflow_error("array too large"));
+    }
+    let stg_info = StgInfo::new_array(
+        element_size * length,
         element_align,
         length,
         ...
     );

613-638: c_wchar_p reading still hard-codes 32‑bit wchar and ignores WCHAR_SIZE helpers

The "Z" branch in read_element_from_buffer still does:

let ptr = ptr_val as *const u32;
while *ptr.add(len) != 0 { ... }
let wchars = std::slice::from_raw_parts(ptr, len);
let s: String = wchars.iter().filter_map(|&c| char::from_u32(c)).collect();

This assumes 32‑bit wchar_t everywhere. On Windows, wchar_t is 16‑bit, and you already introduced WCHAR_SIZE, wchar_from_bytes, and wstring_from_bytes that correctly handle platform differences.

You should decode the pointed‑to buffer using those helpers instead of hard‑coding u32, e.g.:

  • Treat ptr_val as *const u8.
  • Walk in WCHAR_SIZE‑sized chunks until a 0 code unit.
  • Use wchar_from_bytes/wstring_from_bytes to build the String.

This will make "Z" consistent with the array and helper APIs and avoid misaligned, incorrect reads on Windows.

Also applies to: 1220-1268

🧹 Nitpick comments (2)
crates/vm/src/stdlib/ctypes/library.rs (1)

66-86: Library cache no longer actually reuses existing libraries

get_or_insert_lib computes key from the freshly created SharedLibrary’s Library pointer, then immediately looks up self.libraries.get(&key). That key has never been inserted before, so the Some(l) branch (with is_closed check) is effectively unreachable, and a new SharedLibrary is always inserted.

If you intend _ctypes to reuse an existing handle for the same library (or at least to make the “closed → replace” path meaningful), the key should be derived from something stable across calls (e.g., the OS handle or canonicalized path) rather than the address of the just-created Library.

crates/vm/src/stdlib/ctypes/pointer.rs (1)

449-471: Document memory safety assumptions for pointer slice operations

In getitem_by_slice for "c" (c_char) types, raw pointer slices are created without bounds validation:

if step == 1 {
    // Optimized contiguous copy
    let start_addr = (ptr_value as isize + start * element_size as isize) as *const u8;
    unsafe {
        result.extend_from_slice(std::slice::from_raw_parts(start_addr, len));
    }
}

There's no validation that:

  • ptr_value points to valid, allocated memory
  • The slice [start_addr, start_addr + len) stays within valid bounds
  • The memory region has the expected lifetime

While pointer arithmetic in ctypes inherently requires callers to ensure validity (matching CPython's behavior), the lack of documentation makes it unclear whether this is intentional. Consider:

  1. Adding a comment documenting that callers must ensure the pointer and computed offsets are valid
  2. Optionally, checking ptr_value != 0 before slice operations (though CPython allows this for non-zero invalid pointers)
  3. For development builds, consider optional bounds checking via a debug feature flag

Similar patterns appear at lines 481-490 (c_wchar slice) and throughout read/write operations.

Note: This is standard ctypes behavior—users working with raw pointers are responsible for memory safety—but explicit documentation improves maintainability.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca7743c and 8c5948f.

📒 Files selected for processing (27)
  • .cspell.dict/python-more.txt (2 hunks)
  • .cspell.json (3 hunks)
  • crates/stdlib/src/pystruct.rs (1 hunks)
  • crates/vm/src/buffer.rs (1 hunks)
  • crates/vm/src/builtins/builtin_func.rs (2 hunks)
  • crates/vm/src/builtins/function.rs (1 hunks)
  • crates/vm/src/builtins/object.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (2 hunks)
  • crates/vm/src/builtins/type.rs (1 hunks)
  • crates/vm/src/exceptions.rs (3 hunks)
  • crates/vm/src/object/core.rs (1 hunks)
  • crates/vm/src/protocol/buffer.rs (1 hunks)
  • crates/vm/src/stdlib/ast/python.rs (1 hunks)
  • crates/vm/src/stdlib/codecs.rs (2 hunks)
  • crates/vm/src/stdlib/ctypes.rs (14 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (8 hunks)
  • crates/vm/src/stdlib/ctypes/library.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (0 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (4 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (0 hunks)
  • crates/vm/src/stdlib/functools.rs (1 hunks)
  • crates/vm/src/stdlib/operator.rs (1 hunks)
  • crates/vm/src/types/structseq.rs (2 hunks)
💤 Files with no reviewable changes (2)
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • crates/vm/src/stdlib/codecs.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/buffer.rs
  • crates/vm/src/stdlib/operator.rs
  • crates/stdlib/src/pystruct.rs
  • crates/vm/src/types/structseq.rs
  • crates/vm/src/object/core.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/functools.rs
  • crates/vm/src/stdlib/ast/python.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/protocol/buffer.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/builtin_func.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/library.rs
🧠 Learnings (7)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/exceptions.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/union.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations

Applied to files:

  • crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • .cspell.json
🧬 Code graph analysis (6)
crates/vm/src/stdlib/functools.rs (7)
crates/vm/src/builtins/builtin_func.rs (2)
  • __reduce__ (123-126)
  • __reduce__ (169-182)
crates/vm/src/builtins/function.rs (1)
  • __reduce__ (804-812)
crates/vm/src/builtins/object.rs (1)
  • __reduce__ (503-505)
crates/vm/src/builtins/str.rs (1)
  • __reduce__ (302-307)
crates/vm/src/builtins/genericalias.rs (1)
  • __reduce__ (217-222)
crates/vm/src/builtins/list.rs (2)
  • __reduce__ (562-566)
  • __reduce__ (607-611)
crates/vm/src/stdlib/collections.rs (1)
  • __reduce__ (403-410)
crates/vm/src/builtins/function.rs (1)
crates/vm/src/types/slot.rs (1)
  • _cls_is (1083-1085)
crates/vm/src/protocol/buffer.rs (2)
crates/stdlib/src/array.rs (1)
  • itemsize (738-740)
crates/vm/src/builtins/memory.rs (2)
  • itemsize (584-586)
  • shape (594-603)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
  • strerror (1544-1548)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytes.rs (1)
  • isascii (262-264)
crates/vm/src/builtins/bytearray.rs (1)
  • isascii (271-273)
crates/vm/src/bytes_inner.rs (1)
  • isascii (326-328)
crates/vm/src/stdlib/ctypes.rs (6)
crates/vm/src/stdlib/ctypes/structure.rs (2)
  • base (111-111)
  • a (411-412)
crates/vm/src/stdlib/ctypes/function.rs (15)
  • std (102-102)
  • std (451-451)
  • std (453-453)
  • std (511-511)
  • std (699-699)
  • t (259-259)
  • t (323-323)
  • t (994-994)
  • t (1002-1002)
  • t (1004-1004)
  • t (1424-1424)
  • n (1049-1049)
  • src (614-614)
  • cast_impl (563-664)
  • typ (218-219)
crates/vm/src/stdlib/ctypes/array.rs (13)
  • std (597-597)
  • std (622-622)
  • std (1221-1221)
  • a (331-332)
  • t (62-62)
  • t (255-255)
  • t (376-376)
  • obj (1083-1083)
  • obj (1091-1091)
  • obj (1111-1111)
  • obj (1123-1123)
  • obj (1136-1136)
  • obj (1147-1147)
crates/vm/src/stdlib/ctypes/pointer.rs (9)
  • std (38-38)
  • std (268-268)
  • std (352-352)
  • std (437-437)
  • std (479-479)
  • std (547-547)
  • a (164-165)
  • t (543-543)
  • typ (138-140)
crates/vm/src/stdlib/ctypes/base.rs (28)
  • std (2113-2113)
  • std (2196-2196)
  • mem (1922-1922)
  • mem (1931-1931)
  • mem (1940-1940)
  • mem (1949-1949)
  • mem (1958-1958)
  • mem (1967-1967)
  • mem (1976-1976)
  • mem (1985-1985)
  • mem (1994-1994)
  • mem (2003-2003)
  • mem (2014-2014)
  • mem (2028-2028)
  • mem (2069-2069)
  • mem (2070-2070)
  • obj (735-735)
  • obj (759-759)
  • obj (781-781)
  • obj (796-796)
  • obj (1557-1557)
  • obj (1559-1559)
  • obj (1668-1669)
  • obj (1696-1697)
  • obj (1717-1718)
  • obj (1742-1742)
  • size (636-638)
  • size (1638-1640)
crates/vm/src/stdlib/ctypes/field.rs (3)
  • name (278-280)
  • type_ (283-285)
  • size (258-260)
🪛 Biome (2.1.2)
.cspell.json

[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (16)
crates/vm/src/stdlib/ast/python.rs (1)

50-51: LGTM! Clean refactoring improves naming consistency.

The function name now matches the exposed Python attribute name, eliminating the need for an explicit name parameter in the decorator. This simplifies the code while preserving the external API.

crates/vm/src/builtins/builtin_func.rs (2)

117-120: LGTM! Improved consistency by removing redundant attribute name.

The explicit name = "__self__" parameter is unnecessary since the function name already matches the desired Python attribute name. This change aligns with the existing pattern used for other #[pygetset] attributes in this implementation block (lines 79-115).


184-187: LGTM! Consistent refactoring across both function and method types.

This change mirrors the improvement at line 117, removing the redundant name = "__self__" parameter. Both PyNativeFunction and PyNativeMethod now follow the same concise pattern for attribute binding.

crates/vm/src/exceptions.rs (3)

627-630: LGTM! Consistent with Python naming conventions.

The rename from get_suppress_context to __suppress_context__ aligns with Python's dunder method conventions and the broader refactoring pattern across the codebase. The implementation correctly returns the boolean value from the atomic cell.


1115-1115: Correctly updated to use the renamed method.

The serialization code properly uses the new __suppress_context__() method name, maintaining consistency with the refactored getter.


1860-1863: LGTM! Naming convention improved.

The strerror getter follows the standardized naming pattern without the get_ prefix, matching the style of other OSError field accessors (errno, filename, etc.). The implementation correctly returns the optional PyObjectRef.

crates/vm/src/builtins/function.rs (1)

632-636: LGTM! Clean refactoring.

The inlining of the conditional expression into the return statement improves conciseness without changing behavior. The descriptor protocol logic correctly returns the unbound function when accessed from a class context and a bound method when accessed from an instance.

.cspell.dict/python-more.txt (1)

151-151: LGTM! Dictionary additions align with ctypes implementation.

The additions of needsfree and PYTHONAPI are appropriate spell-check exceptions for the ctypes module implementation.

Also applies to: 172-172

.cspell.json (2)

78-78: LGTM! Vocabulary additions are appropriate.

The additions of oparg, pytype, and sighandler are reasonable spell-check exceptions for Python bytecode and signal handling terms.

Note: The static analysis hints from Biome are false positives—the JSON structure is valid.

Also applies to: 110-110, 134-134


80-80: Summary inconsistency: pyarg was not removed.

The AI summary claims pyarg was removed, but it's still present on Line 80. The term remains in the vocabulary list.

crates/vm/src/builtins/object.rs (1)

453-456: Good refactoring to align method name with Python attribute name.

The change from get_class with an explicit name parameter to __class__ without it is more idiomatic and consistent with the broader pattern in this PR. The search confirms no remaining references to the old get_class method exist in the codebase, so all necessary call sites have been updated.

crates/vm/src/protocol/buffer.rs (1)

205-216: LGTM! Correctly implements scalar buffer validation.

The branching logic properly handles scalar (0-dimensional) buffers:

  • When ndim() == 0, the buffer contains a single scalar item, so len must equal itemsize.
  • When ndim() > 0, the existing validation logic ensures shape_product * itemsize == len.

This change aligns with PEP 3118 and is consistent with how for_each_segment and zip_eq already handle scalar buffers. The implementation correctly supports ctypes Structure types as scalar buffers.

crates/vm/src/builtins/str.rs (1)

628-632: isascii/format binding changes look correct and behavior-preserving

Using #[pymethod] with Rust names isascii and __format__ keeps the Python API unchanged while simplifying bindings, and the isascii implementation remains consistent with the internal StrKind::Ascii flag.

Also applies to: 963-984

crates/vm/src/stdlib/functools.rs (1)

76-101: Partial.reduce/setstate implementation looks sound and CPython-aligned

State serialization/deserialization correctly handles func/args/keywords/instance __dict__, normalizes tuple/dict subclasses, and enforces callable/shape checks. No functional or safety issues spotted.

Also applies to: 103-192, 207-244

crates/vm/src/stdlib/ctypes/function.rs (2)

41-57: Double‑check ffi_type_from_tag('u') on Windows (wchar_t size)

ffi_type_from_tag maps the 'u' tag to Type::u32():

b'u' => Type::u32(),  // wchar_t

On Windows, wchar_t is 16‑bit, so using a 32‑bit FFI type here may misalign the call ABI for c_wchar/c_wchar_p arguments and returns. On Unix, 32‑bit is fine.

Please verify that this mapping matches how type_info("u") is defined and what libffi expects for wchar_t on each platform; if not, this should be made conditional on the platform (e.g., u16 on Windows, u32 elsewhere).


1585-1881: Callback/thunk implementation is thoughtfully structured; lifetime management looks correct

The PyCThunk/ThunkData/ThunkUserData machinery:

  • Builds a Cif using type codes via get_ffi_type,
  • Allocates ThunkUserData on the heap, stores a raw pointer with a ThunkData wrapper and Drop impl to free it,
  • Uses with_current_vm in thunk_callback to safely re-enter the VM, converting args and return values via ffi_to_python / python_to_ffi,
  • Exposes the callable and code pointer cleanly.

Given the constraints of libffi and ctypes callbacks, this design looks solid and much closer to CPython’s behavior.

Comment on lines +1084 to +1105
#[pyfunction(name = "PyObj_FromPtr")]
fn py_obj_from_ptr(ptr: usize, vm: &VirtualMachine) -> PyResult {
if ptr == 0 {
return Err(vm.new_value_error("NULL pointer access"));
}
let raw_ptr = ptr as *mut crate::object::PyObject;
unsafe {
let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr));
let obj = std::mem::ManuallyDrop::new(obj);
Ok((*obj).clone())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Find the py_obj_from_ptr function and surrounding context
echo "=== py_obj_from_ptr function ==="
rg -n "py_obj_from_ptr" crates/vm/src/stdlib/ctypes.rs -A 15

echo ""
echo "=== Usages of py_obj_from_ptr in codebase ==="
rg -r "py_obj_from_ptr" --type rust

echo ""
echo "=== PyObjectRef::from_raw definition ==="
rg -n "fn from_raw" --type rust -B 3 -A 10 | head -50

echo ""
echo "=== Checking for validation in ctypes module ==="
grep -n "NULL\|validation\|check" crates/vm/src/stdlib/ctypes.rs | head -30

Repository: RustPython/RustPython

Length of output: 5490


🏁 Script executed:

#!/bin/bash

# Look at the ctypes.rs file to understand the context around py_obj_from_ptr
echo "=== Context around lines 1084-1095 ==="
sed -n '1070,1110p' crates/vm/src/stdlib/ctypes.rs

echo ""
echo "=== Search for usages of PyObj_FromPtr ==="
rg "PyObj_FromPtr" --type rust -B 2 -A 2

echo ""
echo "=== Search for tests of this function ==="
rg "test.*py_obj_from_ptr\|py_obj_from_ptr.*test" --type rust -B 2 -A 5

echo ""
echo "=== Find PyObjectRef::from_raw ==="
rg "fn from_raw" --type rust -B 2 -A 8

Repository: RustPython/RustPython

Length of output: 6542


🏁 Script executed:

#!/bin/bash

# Check the file size and path
ls -la crates/vm/src/stdlib/ctypes.rs

echo ""
echo "=== Lines 1080-1100 of ctypes.rs ==="
wc -l crates/vm/src/stdlib/ctypes.rs
sed -n '1080,1100p' crates/vm/src/stdlib/ctypes.rs

Repository: RustPython/RustPython

Length of output: 954


🏁 Script executed:

cat crates/vm/src/stdlib/ctypes.rs | sed -n '1084,1095p'

Repository: RustPython/RustPython

Length of output: 549


🏁 Script executed:

#!/bin/bash
set -e

# Search for the file
find . -name ctypes.rs -type f 2>/dev/null | head -5

# Show the function
echo "=== Function code ==="
rg -n "def py_obj_from_ptr|fn py_obj_from_ptr" --type rust -A 15

echo ""
echo "=== All references to PyObj_FromPtr ==="
rg "PyObj_FromPtr" --type rust

echo ""
echo "=== from_raw implementations ==="
rg "fn from_raw" --type rust -A 10

Repository: RustPython/RustPython

Length of output: 6152


🏁 Script executed:

#!/bin/bash

# Search for the exact code pattern
rg -A 12 'PyObj_FromPtr.*pyfunction' --type rust

echo ""
echo "=== Check for usages of this function ==="  
rg "py_obj_from_ptr\|PyObj_FromPtr" --type rust -B 2 -A 2

echo ""
echo "=== Examine from_raw method ==="
rg "\.from_raw" --type rust -B 3 -A 3

Repository: RustPython/RustPython

Length of output: 140


🏁 Script executed:

#!/bin/bash

# First, locate and examine the exact function
rg -n "py_obj_from_ptr" --type rust -B 2 -A 15

# Search for any uses of this function
rg "py_obj_from_ptr" --type rust | grep -v "def \|fn " | head -10

# Search for from_raw usages to understand the pattern
rg "from_raw" --type rust -B 2 -A 5 | head -40

# Check if there are any tests
rg "test.*ctypes\|PyObj" --type rust | grep -i "test\|spec" | head -10

Repository: RustPython/RustPython

Length of output: 3420


Critical safety issue: PyObj_FromPtr enables arbitrary memory dereference with insufficient validation

The py_obj_from_ptr function converts an arbitrary usize pointer to a PyObjectRef with only a NULL check. It is exposed via #[pyfunction], making it directly callable from Python code:

if ptr == 0 {
    return Err(vm.new_value_error("NULL pointer access"));
}
let raw_ptr = ptr as *mut crate::object::PyObject;
unsafe {
    let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr));

This can cause:

  • Segmentation faults from dereferencing invalid memory addresses
  • Use-after-free if the pointer refers to a freed object
  • Memory corruption from invalid reference counting or structure access
  • Arbitrary memory read/write via the returned object

There is no validation that:

  • The pointer points to valid, allocated memory
  • The memory contains a valid PyObject with correct layout
  • The object is still alive (not freed)
  • The reference count is correct

Remove this function or restrict it to internal use only with comprehensive safety documentation explaining the caller's obligations. If it must remain public, add runtime validation (though this cannot catch all invalid pointers).

🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes.rs around lines 1084-1095, the exposed
#[pyfunction] py_obj_from_ptr allows arbitrary usize -> PyObjectRef conversion
with only a NULL check; remove the Python-callable surface and either delete the
function or make it internal (remove #[pyfunction], change visibility to
pub(crate) or private) and document it as unsafe with clear caller obligations,
or gate it behind a safety feature flag (e.g. cfg(feature =
"internal_unsafe_api")); if you must keep a runtime entrypoint, replace the
pyfunction with a safe wrapper that performs strict validation using VM-managed
metadata (e.g. verify the pointer belongs to the VM’s allocation table/arena, is
aligned, type-tag matches PyObject, and the object is live) and ensure the
actual unsafe conversion is only in a clearly documented unsafe helper not
exported to Python.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (7)
crates/vm/src/stdlib/ctypes/array.rs (2)

64-73: Guard against element_size * length overflow when constructing array StgInfo

StgInfo::new_array is called with element_size * length without overflow checking. For large length or element_size this can overflow usize, corrupting the computed size and later allocations (e.g. in PyCArray::slot_new’s buffer).

You already have a safe pattern in PyCArrayType::Initializer (length > usize::MAX / item_size): mirror that here (e.g. checked_mul + OverflowError) before calling StgInfo::new_array.

Suggested fix sketch
-    let stg_info = StgInfo::new_array(
-        element_size * length,
+    let total_size = element_size
+        .checked_mul(length)
+        .ok_or_else(|| vm.new_overflow_error("array too large"))?;
+    let stg_info = StgInfo::new_array(
+        total_size,
         element_align,
         length,
         itemtype_ref.clone(),
         element_size,
         item_format.as_deref(),
         &item_shape,
         item_flags,
     );

613-637: c_wchar_p reader hardcodes 32‑bit wchar and ignores WCHAR_SIZE

The "Z" arm of read_element_from_buffer interprets the target as *const u32, assuming 4‑byte wide chars. On Windows wchar_t is 2 bytes; on other platforms you already expose WCHAR_SIZE, wchar_from_bytes, and wstring_from_bytes for platform‑correct handling.

This mismatch will break c_wchar_p arrays on Windows (wrong alignment/size) and duplicates logic that should be centralized in the wchar helpers.

Refactor the "Z" arm to use WCHAR_SIZE + wstring_from_bytes (or a shared helper in base.rs) rather than *const u32, so wide pointer reading matches the rest of the ctypes wchar handling.

Suggested direction (high level)
  • Keep reading the pointer value as you do now.
  • Use std::slice::from_raw_parts(ptr as *const u8, len_in_bytes) and then wstring_from_bytes to build the String.
  • Compute len_in_bytes as n * WCHAR_SIZE where n is the number of wchar elements up to the terminator, using wchar_from_bytes to walk until 0.

Also applies to: 1214-1262

crates/vm/src/stdlib/ctypes/function.rs (2)

1377-1395: Pointer restypes (c_char_p / c_wchar_p / c_void_p) are always exposed as Python ints

convert_raw_result maps RawResult::Pointer directly to vm.ctx.new_int(*ptr), and ReturnType::from_ffi_type also returns ints for type codes "P", "z", and "Z". Combined with CallInfo::is_pointer_return, this means:

  • restype = c_char_p yields an integer address, not bytes/None.
  • restype = c_wchar_p yields an integer address, not str/None.
  • restype = c_void_p also yields a bare int, which is arguable but inconsistent with ctypes’ pointer object behavior.

This diverges from ctypes, where "z"/"Z" restypes decode the pointed‑to C string and return bytes/str (or None), while generic pointers expose addresses.

Suggested high-level fix
  • In convert_raw_result, when RawResult::Pointer(ptr) and call_info.restype_obj is a type with _type_ "z" or "Z", read from foreign memory:
    • "z": if ptr == 0 return None, else read a char* C string (like string_at_impl) and return PyBytes.
    • "Z": similarly, read a wchar_t* string using the same platform‑aware logic as wstring_at_impl and return PyStr.
  • For "P" and other pointer types, keep returning the integer address or wrap it in the appropriate ctypes object.
  • Optionally, reuse string_at_impl/wstring_at_impl to avoid duplicating the memory‑walk logic.
```web How does CPython ctypes treat function results when restype is c_char_p or c_wchar_p? ```

1409-1452: OUT parameters currently discard the function’s own return value

build_result returns only OUT values when prepared.out_buffers is non‑empty:

if prepared.out_buffers.is_empty() {
    return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none()));
}
let out_values = extract_out_values(prepared.out_buffers, vm);
Ok(match <[PyObjectRef; 1]>::try_from(out_values) { ... })

This drops the function’s result (even post‑errcheck), whereas ctypes normally returns (result, *outs) when paramflags specify OUT/INOUT parameters.

Suggested behavior
  • Keep the current branch when out_buffers is empty.
  • When there are OUT values, construct a tuple whose first element is result.unwrap_or(vm.ctx.none()), followed by the extracted out values.
  • Preserve the single‑OUT special case only for compatibility if you explicitly want it, but align with ctypes’ (result, out1, out2, ...) semantics.
```web In CPython ctypes, what is returned when a function has both a non-void restype and OUT parameters via paramflags? ```
crates/vm/src/stdlib/ctypes/simple.rs (2)

59-75: c_char_p / c_wchar_p still reject None in constructors and .value assignments

set_primitive for "z" and "Z" only accepts ints/bytes/str:

"z" => {
    if value.downcast_ref_if_exact::<PyInt>(vm).is_some()
        || value.downcast_ref_if_exact::<PyBytes>(vm).is_some()
    { ... } else { TypeError(...) }
}
"Z" => {
    if value.downcast_ref_if_exact::<PyStr>(vm).is_some() { ... } else { TypeError(...) }
}

PyCSimple::slot_new and set_value both route explicit arguments through set_primitive when they’re not the special bytes/str fast paths. As a result:

  • c_char_p(None) and c_wchar_p(None) raise TypeError instead of representing a NULL pointer.
  • x.value = None for these types also raises, rather than clearing the pointer.

Although from_param accepts None for "z", "Z", and "P", constructors and attribute setters remain incompatible with ctypes semantics.

Suggested fix
  • Adjust set_primitive to accept None for "z" and "Z" (similar to "P"):
- "z" => {
-     if value.downcast_ref_if_exact::<PyInt>(vm).is_some()
-         || value.downcast_ref_if_exact::<PyBytes>(vm).is_some()
-     {
-         Ok(value.to_owned())
-     } else {
-         Err(vm.new_type_error(...))
-     }
- }
+ "z" => {
+     if value.downcast_ref_if_exact::<PyInt>(vm).is_some()
+         || value.downcast_ref_if_exact::<PyBytes>(vm).is_some()
+         || value.downcast_ref_if_exact::<PyNone>(vm).is_some()
+     {
+         Ok(value.to_owned())
+     } else {
+         Err(vm.new_type_error("bytes, integer address or None expected"))
+     }
+ }
  • Do the analogous change for "Z" (accept PyNone and mention “or None” in the error).
  • With that in place, slot_new and set_value will naturally allow None and encode it as a NULL pointer (all‑zero buffer) via value_to_bytes_endian.

Also applies to: 140-173, 961-1003, 1167-1221


705-713: c_wchar serialization ignores actual wchar_t size (hardcoded 4 bytes)

In value_to_bytes_endian, the "u" (c_wchar) branch always encodes as a 4‑byte u32 and returns vec![0; 4] on fallback, regardless of platform. Other parts of the ctypes code (e.g., the wchar helpers in array.rs / base.rs) correctly use size_of::<libc::wchar_t>() via WCHAR_SIZE.

On Windows, where wchar_t is 2 bytes, this will desynchronize simple c_wchar layout from arrays/pointers and from the actual ABI, risking data corruption.

Suggested adjustment
  • Use the same platform‑dependent size as the rest of the ctypes code:
- "u" => {
-     // c_wchar - 4 bytes (wchar_t on most platforms)
-     if let Ok(s) = value.str(vm)
-         && let Some(c) = s.as_str().chars().next()
-     {
-         return to_bytes!(c as u32);
-     }
-     vec![0; 4]
- }
+ "u" => {
+     use crate::stdlib::ctypes::array::WCHAR_SIZE; // or shared base helper
+     if let Ok(s) = value.str(vm)
+         && let Some(c) = s.as_str().chars().next()
+     {
+         let mut buf = vec![0u8; WCHAR_SIZE];
+         super::array::wchar_to_bytes(c as u32, &mut buf);
+         return buf;
+     }
+     vec![0; WCHAR_SIZE]
+ }
  • Or equivalently, reuse the shared WCHAR_SIZE / wchar_to_bytes from the common wchar helper module.
crates/vm/src/stdlib/ctypes/pointer.rs (1)

646-669: write_value_at_address "Z" branch still leaks wchar buffers and diverges from the safer make_wchar_buffer pattern

The "Z" arm of write_value_at_address constructs a Vec<libc::wchar_t>, takes its pointer, and mem::forgets it:

let mut wchars: Vec<libc::wchar_t> = s
    .as_str()
    .chars()
    .map(|c| c as libc::wchar_t)
    .chain(std::iter::once(0))
    .collect();
let ptr = wchars.as_mut_ptr();
std::mem::forget(wchars);
ptr as usize

This heap allocation is never tracked or freed. Meanwhile, setitem_by_index has its own "Z"+PyStr special case that uses make_wchar_buffer and stores a holder in buffer_holder, so you effectively have two divergent code paths for the same type:

  • This helper leaks memory on every call.
  • The higher-level setter uses a safer holder, but nothing prevents future callers from hitting this helper directly.
Suggested refactor
  • Remove the ad‑hoc Vec<libc::wchar_t> + mem::forget logic and instead:

    • Reuse make_wchar_buffer (as in setitem_by_index) to build a managed buffer and return its pointer.
    • Ensure the corresponding owner object is kept alive via keep_ref or a dedicated holder, not just a single shared buffer_holder.
  • If you want a single authoritative path, consider making setitem_by_index always call a small helper that:

    • Builds/keeps the holder.
    • Writes the pointer.
    • Registers the holder for lifetime tracking.
🧹 Nitpick comments (1)
crates/vm/src/stdlib/ctypes/function.rs (1)

500-535: Apply consistent wide-string handling to callback "Z" type, matching wstring_at_impl platform logic

In ffi_to_python, the "Z" (c_wchar_p) handler at lines 1631-1645 naively casts each wchar_t to char without platform branching, causing UTF-16 surrogate pairs on Windows to be mishandled. Align with the correct platform-aware logic used in wstring_at_impl: use Wtf8Buf::from_wide on Windows and char::from_u32 on other platforms.

Similarly, in python_to_ffi at line 1702, "Z" is treated as a raw integer pointer rather than a true wchar buffer. For callback returns, create and manage a wchar buffer (similar to make_wchar_buffer pattern) and return its pointer, keeping the buffer alive during callback execution to match non-callback c_wchar_p semantics.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c5948f and e640ebf.

📒 Files selected for processing (4)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (8 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (2)
crates/vm/src/stdlib/ctypes/array.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (25)
  • new_array (172-216)
  • size (659-661)
  • size (1658-1660)
  • buffer (347-347)
  • buffer (354-354)
  • mem (1942-1942)
  • mem (1951-1951)
  • mem (1960-1960)
  • mem (1969-1969)
  • mem (1978-1978)
  • mem (1987-1987)
  • mem (1996-1996)
  • mem (2005-2005)
  • mem (2014-2014)
  • mem (2023-2023)
  • mem (2034-2034)
  • mem (2048-2048)
  • mem (2089-2089)
  • mem (2090-2090)
  • std (365-365)
  • std (366-366)
  • std (2133-2133)
  • std (2216-2216)
  • new (149-165)
  • new (1232-1240)
crates/vm/src/stdlib/ctypes/pointer.rs (3)
crates/vm/src/stdlib/ctypes/array.rs (17)
  • init (141-277)
  • obj (1077-1077)
  • obj (1085-1085)
  • obj (1105-1105)
  • obj (1117-1117)
  • obj (1130-1130)
  • obj (1141-1141)
  • from_param (283-312)
  • array_type_from_ctype (16-93)
  • b (543-543)
  • b (822-822)
  • std (597-597)
  • std (622-622)
  • std (1215-1215)
  • t (62-62)
  • t (255-255)
  • t (376-376)
crates/vm/src/stdlib/ctypes/base.rs (57)
  • obj (758-758)
  • obj (782-782)
  • obj (804-804)
  • obj (819-819)
  • obj (1577-1577)
  • obj (1579-1579)
  • obj (1688-1689)
  • obj (1716-1717)
  • obj (1737-1738)
  • obj (1762-1762)
  • proto (291-293)
  • proto (907-907)
  • proto (977-977)
  • proto (1527-1527)
  • proto (1544-1544)
  • mem (1942-1942)
  • mem (1951-1951)
  • mem (1960-1960)
  • mem (1969-1969)
  • mem (1978-1978)
  • mem (1987-1987)
  • mem (1996-1996)
  • mem (2005-2005)
  • mem (2014-2014)
  • mem (2023-2023)
  • mem (2034-2034)
  • mem (2048-2048)
  • mem (2089-2089)
  • mem (2090-2090)
  • value (841-842)
  • value (850-850)
  • value (850-850)
  • value (872-872)
  • value (886-886)
  • value (901-901)
  • value (911-911)
  • value (949-949)
  • value (1347-1347)
  • value (1355-1355)
  • value (1363-1363)
  • value (1412-1412)
  • value (1427-1427)
  • value (1441-1441)
  • value (1455-1455)
  • value (1481-1481)
  • from_bytes (426-436)
  • std (365-365)
  • std (366-366)
  • std (2133-2133)
  • std (2216-2216)
  • buffer (347-347)
  • buffer (354-354)
  • read_ptr_from_buffer (2132-2139)
  • at_address (461-474)
  • offset (1653-1655)
  • size (659-661)
  • size (1658-1660)
crates/vm/src/stdlib/ctypes/library.rs (1)
  • drop (46-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy

Comment on lines 1477 to 1604
// 6. Get code pointer
let code_ptr = func_ptr
.unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));

// 7. Call the function
let raw_result = ctypes_callproc(code_ptr, &prepared, &call_info);

// 8. Build result
build_result(raw_result, &call_info, prepared, zelf, &args, vm)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Calling a CFuncPtr with a NULL function pointer will panic instead of raising a Python error

In PyCFuncPtr::call you compute:

let code_ptr = func_ptr
    .unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));

If the underlying buffer stores a NULL pointer (e.g., CFuncPtr() with no address, failed symbol lookup, or explicit zero address), get_code_ptr() returns None and this expect panics the VM instead of raising a Python exception like ValueError("NULL function pointer").

Suggested fix
-        let code_ptr = func_ptr
-            .unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));
+        let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
+            Some(cp) => cp,
+            None => {
+                return Err(vm.new_value_error("NULL function pointer"));
+            }
+        };
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/function.rs around lines 1477 to 1486, replace
the panic-causing expect when resolving the code pointer with a proper Python
exception: check if func_ptr is Some, otherwise call zelf.get_code_ptr() and if
that returns None return an Err containing a Python ValueError (e.g., "NULL
function pointer") via the VM API instead of calling expect; ensure the function
returns that Err so the VM raises the Python exception rather than panicking.

Comment on lines 523 to 596
fn setitem_by_index(
zelf: &Py<Self>,
index: isize,
value: PyObjectRef,
vm: &VirtualMachine,
) -> PyResult {
let offset = offset.unwrap_or(0);
if offset < 0 {
return Err(vm.new_value_error("offset cannot be negative".to_owned()));
) -> PyResult<()> {
let ptr_value = zelf.get_ptr_value();
if ptr_value == 0 {
return Err(vm.new_value_error("NULL pointer access"));
}
let offset = offset as usize;
let size = std::mem::size_of::<usize>();

let source_bytes = source.borrow_buf();
let buffer_len = source_bytes.len();
// Get element type, size and type_code from StgInfo.proto
let stg_info = zelf.class().stg_info(vm)?;
let proto_type = stg_info.proto();

if offset + size > buffer_len {
return Err(vm.new_value_error(format!(
"Buffer size too small ({} instead of at least {} bytes)",
buffer_len,
offset + size
)));
}
// Get type code from proto's _type_ attribute
let type_code: Option<String> = proto_type
.as_object()
.get_attr("_type_", vm)
.ok()
.and_then(|t| t.downcast_ref::<PyStr>().map(|s| s.to_string()));

// Read pointer value from buffer
let ptr_bytes = &source_bytes[offset..offset + size];
let ptr_val = usize::from_ne_bytes(ptr_bytes.try_into().expect("size is checked above"));
let element_size = proto_type
.stg_info_opt()
.map_or(std::mem::size_of::<usize>(), |info| info.size);

Ok(PyCPointer {
_base: PyCData::new(CDataObject::from_bytes(vec![], None)),
contents: PyRwLock::new(vm.ctx.new_int(ptr_val).into()),
// Calculate address
let offset = index * element_size as isize;
let addr = (ptr_value as isize + offset) as usize;

// Write value at address
// Handle Structure/Array types by copying their buffer
if let Some(cdata) = value.downcast_ref::<super::PyCData>()
&& (cdata.fast_isinstance(PyCStructure::static_type())
|| cdata.fast_isinstance(PyCArray::static_type())
|| cdata.fast_isinstance(PyCSimple::static_type()))
{
let src_buffer = cdata.buffer.read();
let copy_len = src_buffer.len().min(element_size);
unsafe {
let dest_ptr = addr as *mut u8;
std::ptr::copy_nonoverlapping(src_buffer.as_ptr(), dest_ptr, copy_len);
}
} else {
// Handle c_wchar_p (Z) specially to store holder
if type_code.as_deref() == Some("Z")
&& let Some(s) = value.downcast_ref::<PyStr>()
{
let (holder, ptr_val) = super::base::make_wchar_buffer(s.as_str(), vm);
*zelf.0.buffer_holder.write() = Some(holder);
unsafe {
*(addr as *mut usize) = ptr_val;
}
} else {
Self::write_value_at_address(addr, element_size, &value, type_code.as_deref(), vm)?;
}
}
.into_ref_with_type(vm, cls)?
.into())

// KeepRef: store reference to keep value alive using actual index
zelf.0.keep_ref(index as usize, value, vm)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Negative pointer indices combined with keep_ref(index as usize) can blow up reference tracking

PyCPointer::setitem_by_index allows arbitrary (including negative) indices for pointer arithmetic:

let offset = index * element_size as isize;
let addr = (ptr_value as isize + offset) as usize;
// ...
zelf.0.keep_ref(index as usize, value, vm)

Casting a negative index to usize wraps to a huge value (e.g. -1usize::MAX), which can cause pathological behavior in _objects/keep_ref (large sparse structures, unexpected panics, etc.).

You either need to normalize the index you use for keep_ref, or disallow negative indices for writes.

Suggested fix (simple)
  • Keep pointer arithmetic support, but clamp or reject negative indices for reference tracking:
-        let offset = index * element_size as isize;
+        let offset = index * element_size as isize;
         let addr = (ptr_value as isize + offset) as usize;
         // ...
-        // KeepRef: store reference to keep value alive using actual index
-        zelf.0.keep_ref(index as usize, value, vm)
+        // KeepRef: use a non-negative key (e.g. logical position) to track lifetime
+        let key = if index >= 0 { index as usize } else { 0 };
+        zelf.0.keep_ref(key, value, vm)
  • Or, if you don’t need per‑index tracking for pointers, consider always using a small fixed key (e.g. 0) instead of the index.
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/pointer.rs around lines 523–583, casting the
signed index to usize (keep_ref(index as usize, ...)) will wrap negative indices
into huge usize values; change the logic so keep_ref never receives a wrapped
value — either reject negative indices for writes (return a ValueError when
index < 0) or normalize the index used for tracking (e.g., map any negative
index to a fixed small key like 0 or clamp to 0) and pass that safe usize into
keep_ref; implement the chosen approach consistently before calling keep_ref.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/ctypes/function.rs (1)

300-385: Struct and Array return types incorrectly default to 32-bit ints

When ReturnType::to_ffi_type for PyTypeRef returns None (which happens for Structure and Array types without a _type_ attribute), extract_call_info falls back to Type::i32(). The from_ffi_type method then reads the return value as an i32, exposing only the low 32 bits as a Python int instead of creating an instance of the declared ctypes type.

Since CPython supports by-value struct and array returns, this implementation either needs to properly construct the return type instance from the memory value, or explicitly raise TypeError/NotImplementedError rather than silently returning an integer.

♻️ Duplicate comments (9)
crates/vm/src/stdlib/ctypes/function.rs (4)

227-287: Typed argument conversion still bypasses from_param, breaking custom and built‑in ctypes conversions

ArgumentType::convert_object for PyTypeRef only:

  • Checks for CArgObject, pointer/structure types, and _type_ == "P", then
  • Tries a direct downcast to PyCSimple, or
  • Calls the type as a constructor (self.as_object().call((value.clone(),), vm)).

It never calls the type’s from_param or routes through call_paramfunc for the typed (argtypes) path. This means:

  • Custom from_param on user ctypes types is ignored when argtypes is set.
  • Built‑in behavior for c_char_p, c_wchar_p, pointer/array types, etc. implemented via from_param/ParamFunc won’t be used in the typed path.

You already rely on call_paramfunc in the no‑argtypes path (conv_param); the typed path should be unified to consult from_param (or a shared paramfunc) first and only fall back to the direct PyCSimple/constructor logic when it’s absent or fails.


1376-1395: Pointer restypes (c_char_p, c_wchar_p, c_void_p) are always exposed as Python ints

When CallInfo::is_pointer_return is true (for _type_ "P" | "z" | "Z"), ctypes_callproc uses RawResult::Pointer, and convert_raw_result maps that to:

RawResult::Pointer(ptr) => Some(vm.ctx.new_int(*ptr).as_object().to_pyobject(vm)),

So:

  • restype = c_char_p → Python int address, never bytes/None.
  • restype = c_wchar_p → Python int address, never str/None.
  • restype = c_void_p → bare int, instead of a c_void_p/pointer object.

That diverges sharply from ctypes expectations and from your own pointer/string helpers elsewhere.

You likely want to branch here on the restype’s _type_:

  • "z" → decode char* as a null‑terminated C string → bytes / None.
  • "Z" → decode wchar_t* using the same wstring logic as wstring_at_impl / array::wstring_from_bytesstr / None.
  • "P" / general pointer types → keep returning an address (or a c_void_p instance).

Right now, any API using those common restypes will get unusable integer addresses.


1409-1452: OUT parameters drop the function’s own return value instead of returning (result, *outs)

In build_result:

if prepared.out_buffers.is_empty() {
    return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none()));
}

let out_values = extract_out_values(prepared.out_buffers, vm);
Ok(match <[PyObjectRef; 1]>::try_from(out_values) {
    Ok([single]) => single,
    Err(v) => PyTuple::new_ref(v, &vm.ctx).into(),
})

As soon as any OUT buffers exist, the (possibly errcheck‑processed) function result is discarded; callers only see the OUT values, unlike ctypes where the convention is (result, *outs).

This will break code that relies on both a return value and OUT/INOUT parameters. Consider returning a tuple whose first element is result (or None when absent), followed by the OUT values, while keeping the existing “no OUTs → bare result” behavior.


1477-1480: Calling a CFuncPtr with a NULL function pointer panics instead of raising a Python error

PyCFuncPtr::call resolves the code pointer as:

let code_ptr = func_ptr
    .unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));

If:

  • No COM function pointer is resolved (func_ptr is None), and
  • The stored pointer in _base.buffer is NULL (get_code_ptr() returns None),

then the expect will panic the VM instead of raising a Python exception like ValueError("NULL function pointer"). This also affects the COM‑style constructor path on non‑Windows where the pointer is left at 0.

Return an error instead of panicking, e.g.:

let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
    Some(cp) => cp,
    None => return Err(vm.new_value_error("NULL function pointer")),
};

so callers get a proper Python‐level failure.

crates/vm/src/stdlib/ctypes/array.rs (2)

64-73: Unchecked element_size * length multiplication can overflow when building array StgInfo

array_type_from_ctype computes the total size as:

let stg_info = StgInfo::new_array(
    element_size * length,
    element_align,
    length,
    ...
);

with no checked_mul/overflow guard. For very large length or element sizes this can overflow usize (panic in debug, wrap in release), creating a broken StgInfo (wrong size/shape) for the array type.

You already guard this in PyCArrayType::Initializer (item_size != 0 && length > usize::MAX / item_size). Mirror that here (or use checked_mul + OverflowError) before calling StgInfo::new_array.


613-638: c_wchar_p array elements still assume 32‑bit wchar_t, ignoring WCHAR_SIZE

In read_element_from_buffer for "Z":

let ptr = ptr_val as *const u32;
while *ptr.add(len) != 0 { ... }
let wchars = std::slice::from_raw_parts(ptr, len);
let s: String = wchars.iter().filter_map(|&c| char::from_u32(c)).collect();

This hardcodes a 32‑bit wchar_t. On Windows, libc::wchar_t is 16‑bit, so this is misaligned and undefined behavior. You’ve already introduced WCHAR_SIZE, wchar_from_bytes, and wstring_from_bytes below to handle platform‑dependent wchar_t size; this branch should use those helpers instead of *const u32.

E.g., read the pointer value as now, then treat it as a raw u8 buffer and feed it to a wstring_from_bytes‑style reader that respects WCHAR_SIZE.

crates/vm/src/stdlib/ctypes/simple.rs (2)

140-161: set_primitive for c_char_p/c_wchar_p rejects None, breaking ctypes‑style NULL semantics

The "z" and "Z" arms currently accept only ints/bytes and str respectively:

"z" => {
    if value.downcast_ref_if_exact::<PyInt>(vm).is_some()
        || value.downcast_ref_if_exact::<PyBytes>(vm).is_some()
    { ... }
}
"Z" => {
    if value.downcast_ref_if_exact::<PyStr>(vm).is_some() { ... }
}

None is treated as an error, so:

  • c_char_p(None) / c_wchar_p(None) raise TypeError instead of representing a NULL pointer.
  • x.value = None for those types also fails.

CPython accepts None for these as a NULL pointer, and your own pointer "P" case already does. PyCSimpleType::from_param special‑cases None for "z" | "Z" | "P", but constructors and value setter still go through set_primitive and break.

Allow None here (e.g., vm.is_none(value) in the condition) and adjust the error messages accordingly so None is treated as a valid “NULL pointer” input.


685-693: c_wchar serialization ignores actual wchar_t size; should use shared WCHAR_SIZE helpers

The "u" branch of value_to_bytes_endian is:

"u" => {
    // c_wchar - 4 bytes (wchar_t on most platforms)
    if let Ok(s) = value.str(vm)
        && let Some(c) = s.as_str().chars().next()
    {
        return to_bytes!(c as u32);
    }
    vec![0; 4]
}

This hardcodes 4 bytes and encodes via u32, but elsewhere (array.rs) you’ve correctly abstracted wchar_t using WCHAR_SIZE = size_of::<libc::wchar_t>() plus wchar_to_bytes. On Windows this mismatch will desynchronize c_wchar size/representation from arrays, pointers, and StgInfo.

Reuse the same abstraction here: allocate WCHAR_SIZE bytes and call wchar_to_bytes with the code point, falling back to vec![0; WCHAR_SIZE] when invalid.

crates/vm/src/stdlib/ctypes/pointer.rs (1)

523-591: Negative indices combined with keep_ref(index as usize, …) can blow up reference tracking

setitem_by_index allows arbitrary signed indices for pointer arithmetic:

let offset = index * element_size as isize;
let addr = (ptr_value as isize + offset) as usize;
// ...
return zelf.0.keep_ref(index as usize, converted, vm);
// ...
zelf.0.keep_ref(index as usize, value, vm)

Casting a negative index to usize wraps (e.g. -1 → usize::MAX), so _objects/keep_ref may get extremely large keys. That can lead to pathological behavior (huge sparse dicts, surprising growth, potential panics in other code assuming “small” indices).

You probably want:

  • To keep pointer arithmetic semantics for reads/writes, but
  • Normalize the key used for keep_ref (e.g. clamp negatives to 0 or some small fixed key), or disallow negative indices for writes.

Right now, negative writes both mutate memory at a negative offset and also poison _objects with huge indices.

🧹 Nitpick comments (1)
crates/vm/src/stdlib/ctypes/pointer.rs (1)

133-155: set_type updates StgInfo.proto but not the pointer’s buffer format metadata

PyCPointerType::set_type only does:

if let Some(mut stg_info) = zelf.get_type_data_mut::<StgInfo>() {
    stg_info.proto = Some(typ_type.clone());
}
zelf.as_object().set_attr("_type_", typ_type, vm)?;

whereas the Initializer computes a full StgInfo including:

  • proto,
  • format string "&<element_format>",
  • flags |= TYPEFLAG_ISPOINTER, etc.

For incomplete pointer types that are later completed via set_type, StgInfo.format (and any other derived fields) remain whatever they were at initialization time, ignoring the newly provided element type’s format. That can desync the buffer protocol metadata from the actual pointed‑to type.

Not critical for basic usage, but you may want to recompute stg_info.format (and any other derived fields) in set_type to match the initializer’s logic.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e640ebf and c8478c0.

📒 Files selected for processing (4)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (8 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
  • crates/vm/src/stdlib/ctypes/array.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (1)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (22)
  • obj (1081-1081)
  • obj (1089-1089)
  • obj (1109-1109)
  • obj (1121-1121)
  • obj (1134-1134)
  • obj (1145-1145)
  • value (295-295)
  • array_type_from_ctype (16-93)
  • b (543-543)
  • b (826-826)
  • std (597-597)
  • std (622-622)
  • std (1219-1219)
  • __getitem__ (848-863)
  • item (850-850)
  • item (858-858)
  • item (965-965)
  • item (973-973)
  • t (62-62)
  • t (255-255)
  • t (376-376)
  • __setitem__ (955-978)
crates/vm/src/stdlib/ctypes/base.rs (55)
  • obj (768-768)
  • obj (792-792)
  • obj (814-814)
  • obj (829-829)
  • obj (1581-1581)
  • obj (1583-1583)
  • obj (1692-1693)
  • obj (1720-1721)
  • obj (1741-1742)
  • obj (1766-1766)
  • mem (1946-1946)
  • mem (1955-1955)
  • mem (1964-1964)
  • mem (1973-1973)
  • mem (1982-1982)
  • mem (1991-1991)
  • mem (2000-2000)
  • mem (2009-2009)
  • mem (2018-2018)
  • mem (2027-2027)
  • mem (2038-2038)
  • mem (2052-2052)
  • mem (2093-2093)
  • mem (2094-2094)
  • new (149-165)
  • new (1237-1245)
  • value (851-852)
  • value (860-860)
  • value (860-860)
  • value (882-882)
  • value (896-896)
  • value (911-911)
  • value (921-921)
  • value (1352-1352)
  • value (1360-1360)
  • value (1368-1368)
  • value (1417-1417)
  • value (1432-1432)
  • value (1446-1446)
  • value (1460-1460)
  • value (1485-1485)
  • type_ref (2148-2148)
  • std (365-365)
  • std (366-366)
  • std (2137-2137)
  • std (2220-2220)
  • buffer (347-347)
  • buffer (354-354)
  • read_ptr_from_buffer (2136-2143)
  • at_address (475-487)
  • offset (1657-1659)
  • type_code (1537-1537)
  • type_code (1554-1554)
  • size (669-671)
  • size (1662-1664)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check the WASM package and demo

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/ctypes.rs (1)

377-397: SIZEOF_TIME_T is hard-coded and wrong on 64‑bit Unix

You currently expose:

#[cfg(target_os = "windows")]
const SIZEOF_TIME_T: usize = 8;
#[cfg(not(target_os = "windows"))]
const SIZEOF_TIME_T: usize = 4;

On 64‑bit Linux and many other Unix platforms, time_t is 8 bytes, so this constant is incorrect there and will confuse code/tests that introspect sizes through ctypes.

You should derive this from the actual C type:

const SIZEOF_TIME_T: usize = std::mem::size_of::<libc::time_t>();

and drop the OS-based branching.

crates/vm/src/stdlib/ctypes/structure.rs (1)

514-588: Structure instance creation is correct, but _fields_ instance getter is a stub

PyCStructure::slot_new:

  • Reads size, align, and length from the class StgInfo.
  • Marks the class as final on first instantiation.
  • Collects field names from _fields_ and zero-initializes the buffer.
  • Applies kwargs and then positional args, with a clear “too many initializers” error.

That part looks solid. However, the _fields_ instance getter is currently:

fn _fields_(&self, vm: &VirtualMachine) -> PyObjectRef {
    // Return the _fields_ from the class, not instance
    vm.ctx.none()
}

This contradicts the comment: callers expecting instance._fields_ to reflect the class fields will just get None. If you want to support this attribute on instances, you probably want:

self.0.as_object().class().as_object().get_attr("_fields_", vm).unwrap_or(vm.ctx.none())

or similar. If you don’t intend to expose _fields_ on instances at all, consider removing this getset to let the class attribute be inherited normally.

♻️ Duplicate comments (7)
crates/vm/src/stdlib/ctypes/structure.rs (1)

13-29: calculate_struct_size now correctly propagates _fields_ errors

Switching to:

let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;

instead of unwrap_or_default() means malformed _fields_ now raise rather than silently producing a 0‑sized structure. This is a nice improvement in debuggability and aligns with how process_fields handles invalid field tuples.

crates/vm/src/stdlib/ctypes/union.rs (2)

12-29: calculate_union_size now surfaces malformed _fields_ instead of silently returning 0

Using:

let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;

prevents bad _fields_ definitions from turning into zero-sized unions. This improves debuggability and matches the stricter behavior introduced for structures.


553-577: Union buffer interface now matches structure’s 0‑D scalar behavior

AsBuffer for PyCUnion:

  • Uses info->format if available (falling back to "B").
  • Exposes len == itemsize == buffer_len and dim_desc: vec![] for a 0‑dimensional buffer.

This resolves the previous inconsistency with structures and aligns both with CPython’s “scalar struct/union” buffer semantics.

crates/vm/src/stdlib/ctypes/function.rs (3)

1473-1516: OUT parameters still discard the function’s own return value

When prepared.out_buffers is non‑empty, build_result ignores the function’s result (even after errcheck) and returns only the OUT values:

if prepared.out_buffers.is_empty() {
    return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none()));
}

let out_values = extract_out_values(prepared.out_buffers, vm);
Ok(match <[PyObjectRef; 1]>::try_from(out_values) { ... })

In ctypes, when paramflags specify OUT/INOUT parameters, the usual behavior is to return a tuple (result, *outs), preserving the function’s own return value.

You should change this branch to:

  • Use result.unwrap_or_else(|| vm.ctx.none()) as the first element, and
  • Append the extracted OUT values, returning a tuple even for a single OUT (to avoid ambiguity).

1521-1556: Calling a CFuncPtr with a NULL function pointer still invokes FFI with a null CodePtr

In PyCFuncPtr::call:

let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
    Some(cp) => cp,
    None => {
        debug_assert!(false, "NULL function pointer");
        CodePtr(std::ptr::null_mut())
    }
};
let raw_result = ctypes_callproc(code_ptr, &prepared, &call_info);

If the underlying buffer holds a NULL pointer and no COM method override is present, this will call libffi with a null CodePtr, leading to undefined behavior or a crash. The previous expect("Function pointer should be set") panic was replaced, but the core problem remains.

Instead, when both func_ptr and zelf.get_code_ptr() are None, return a Python exception (e.g. ValueError("NULL function pointer")) rather than constructing a null CodePtr:

let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
    Some(cp) => cp,
    None => return Err(vm.new_value_error("NULL function pointer")),
};

739-782: COM-style constructor branch still runs on non-Windows and builds a NULL pointer object

The COM constructor path:

if args.args.len() >= 2
    && first_arg.try_int(vm).is_ok()
    && args.args[1].downcast_ref::<PyStr>().is_some()
{
    #[cfg(windows)]
    let index = ...;
    // ...
    let paramflags = args.args.get(2)...
    return PyCFuncPtr { _base: PyCData::from_bytes(vec![0u8; ptr_size], None), ... }.into_ref...
}

is compiled and executed on all platforms. On non‑Windows:

  • The condition still matches, so this branch creates a PyCFuncPtr with a NULL function pointer and COM-related metadata partially absent.
  • Later, call will fall back to get_code_ptr() and hit the null‑pointer case above, causing UB/crash instead of a clean Python error.

The COM constructor should be Windows-only or explicitly rejected elsewhere. Options:

  • Guard the entire if block with #[cfg(windows)] and make it a no-op on non‑Windows, or
  • On non‑Windows, detect this pattern early and return Err(vm.new_type_error("COM methods are only supported on Windows")).
crates/vm/src/stdlib/ctypes.rs (1)

1093-1105: PyObj_FromPtr is an extremely unsafe public escape hatch

PyObj_FromPtr exposes:

let raw_ptr = ptr as *mut crate::object::PyObject;
unsafe {
    let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr));
    let obj = std::mem::ManuallyDrop::new(obj);
    Ok((*obj).clone())
}

This lets arbitrary user code reinterpret any integer as a PyObjectRef with only a NULL check. Passing an invalid, stale, or non‑RustPython pointer can cause memory corruption or crashes, defeating RustPython’s usual safety guarantees.

If this is only for internal testing/compatibility with CPython’s PyObj_FromPtr, please:

  • Remove the #[pyfunction] export, or
  • Gate it behind a clearly documented cfg(feature = "unsafe_debug_apis"), or
  • At minimum, mark it as unsafe on the Python side and limit its availability (e.g. only in debug builds).

Leaving it callable from normal Python code in release builds is a significant safety hazard.

🧹 Nitpick comments (3)
crates/vm/src/stdlib/ctypes/function.rs (3)

79-150: Potentially surprising integer handling in convert_to_pointer

For integer inputs, convert_to_pointer uses:

int_val.as_bigint().to_usize().unwrap_or(0)

so any value outside usize silently turns into 0 (NULL). That can hide caller mistakes (e.g. negative addresses or too‑large values) and lead to confusing NULL dereferences.

Consider returning an OverflowError/ValueError instead when the integer cannot be represented as usize, to make misuses visible.


954-1030: extract_call_info pointer-return detection is hard‑coded to "P"|"z"|"Z"

is_pointer_return is currently:

.is_some_and(|tc| matches!(tc.as_str(), "P" | "z" | "Z"))

This works for c_void_p, c_char_p, c_wchar_p, but misses user pointer types (e.g. POINTER(T) where _type_ is a dynamically created type code or composite structure). In those cases, ctypes_callproc will treat the return as a Value instead of Pointer, and convert_raw_result won’t run the pointer-specific logic.

Consider deriving pointer-return behavior from the restype’s StgInfo / flags (TYPEFLAG_ISPOINTER) rather than only _type_ string, so custom pointer subclasses and more complex pointer restypes are correctly recognized.


1392-1460: Wide-string pointer restype decoding is platform-agnostic and may mishandle UTF‑16 on Windows

For RawResult::Pointer with _type_ == "Z" (c_wchar_p), you read a libc::wchar_t* and then build a String via:

let slice = std::slice::from_raw_parts(wstr_ptr, len);
let s: String = slice
    .iter()
    .filter_map(|&c| char::from_u32(c as u32))
    .collect();

On Windows (wchar_t = 16 bits, UTF‑16), this treats each 16‑bit code unit as an independent Unicode scalar, which breaks surrogate pairs and produces incorrect results for non‑BMP characters. In wstring_at_impl you already use Wtf8Buf::from_wide on Windows to handle UTF‑16 correctly.

It would be better to share that logic here (e.g. a helper that takes *const wchar_t and length and returns a String using Wtf8 on Windows and char::from_u32 on Unix), so c_wchar_p restypes behave consistently with wstring_at.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8478c0 and 2bf7dac.

📒 Files selected for processing (27)
  • .cspell.dict/python-more.txt (2 hunks)
  • .cspell.json (3 hunks)
  • crates/stdlib/src/pystruct.rs (1 hunks)
  • crates/vm/src/buffer.rs (1 hunks)
  • crates/vm/src/builtins/builtin_func.rs (2 hunks)
  • crates/vm/src/builtins/function.rs (1 hunks)
  • crates/vm/src/builtins/object.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (2 hunks)
  • crates/vm/src/builtins/type.rs (1 hunks)
  • crates/vm/src/exceptions.rs (4 hunks)
  • crates/vm/src/object/core.rs (1 hunks)
  • crates/vm/src/protocol/buffer.rs (1 hunks)
  • crates/vm/src/stdlib/ast/python.rs (1 hunks)
  • crates/vm/src/stdlib/codecs.rs (2 hunks)
  • crates/vm/src/stdlib/ctypes.rs (14 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (8 hunks)
  • crates/vm/src/stdlib/ctypes/library.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (0 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (4 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (0 hunks)
  • crates/vm/src/stdlib/functools.rs (1 hunks)
  • crates/vm/src/stdlib/operator.rs (1 hunks)
  • crates/vm/src/types/structseq.rs (2 hunks)
💤 Files with no reviewable changes (2)
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
🚧 Files skipped from review as they are similar to previous changes (8)
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/operator.rs
  • crates/vm/src/builtins/builtin_func.rs
  • crates/vm/src/types/structseq.rs
  • .cspell.dict/python-more.txt
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/stdlib/codecs.rs
  • crates/vm/src/object/core.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/protocol/buffer.rs
  • crates/vm/src/buffer.rs
  • crates/vm/src/stdlib/functools.rs
  • crates/vm/src/stdlib/ast/python.rs
  • crates/stdlib/src/pystruct.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/library.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/builtins/str.rs
🧠 Learnings (7)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/exceptions.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations

Applied to files:

  • crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • .cspell.json
🧬 Code graph analysis (5)
crates/vm/src/stdlib/functools.rs (3)
crates/vm/src/builtins/builtin_func.rs (2)
  • __reduce__ (123-126)
  • __reduce__ (169-182)
crates/vm/src/builtins/function.rs (1)
  • __reduce__ (804-812)
crates/vm/src/builtins/object.rs (1)
  • __reduce__ (503-505)
crates/stdlib/src/pystruct.rs (2)
crates/vm/src/builtins/str.rs (3)
  • s (1783-1788)
  • s (1814-1818)
  • isascii (630-632)
crates/vm/src/builtins/bytes.rs (1)
  • isascii (262-264)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
  • strerror (1544-1548)
crates/vm/src/stdlib/ctypes/union.rs (2)
crates/vm/src/stdlib/ctypes/base.rs (28)
  • std (365-365)
  • std (366-366)
  • std (2137-2137)
  • std (2220-2220)
  • default (129-145)
  • obj (768-768)
  • obj (792-792)
  • obj (814-814)
  • obj (829-829)
  • obj (1581-1581)
  • obj (1583-1583)
  • obj (1692-1693)
  • obj (1720-1721)
  • obj (1741-1742)
  • obj (1766-1766)
  • set_or_init_stginfo (2146-2152)
  • new (149-165)
  • new (1237-1245)
  • get_usize_attr (2115-2132)
  • size (669-671)
  • size (1662-1664)
  • is_big_endian (334-340)
  • proto (291-293)
  • proto (917-917)
  • proto (982-982)
  • proto (1531-1531)
  • proto (1548-1548)
  • attr (938-938)
crates/vm/src/protocol/buffer.rs (1)
  • format (187-200)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytearray.rs (1)
  • isascii (271-273)
crates/vm/src/builtins/bytes.rs (1)
  • isascii (262-264)
crates/vm/src/bytes_inner.rs (1)
  • isascii (326-328)
🪛 Biome (2.1.2)
.cspell.json

[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (46)
.cspell.json (2)

78-78: Static analysis hints are false positives.

The Biome parse errors are false positives. The JSON syntax is valid, and the added words are correctly formatted as string entries within the words array.

Also applies to: 110-110, 134-134


78-78: LGTM!

The added spell-check vocabulary terms are appropriate technical terms that align with the ctypes overhaul and broader codebase changes.

Also applies to: 110-110, 134-134

crates/vm/src/stdlib/ast/python.rs (1)

50-51: LGTM! No Rust-level references to the old method name detected.

The refactoring correctly aligns the Rust method name with the Python attribute name, allowing the simplified #[pyattr] decorator. All access to the _fields attribute uses Python-level dynamic lookup via get_attr("_fields", vm), ensuring the change is backward compatible and introduces no breaking changes.

crates/vm/src/protocol/buffer.rs (1)

205-216: LGTM! Scalar buffer validation correctly implemented.

The branching logic properly handles zero-dimensional (scalar) buffers, which is essential for ctypes structures. The validation is sound:

  • Scalar case (ndim == 0): Correctly asserts itemsize == len since the buffer contains exactly one item.
  • Multi-dimensional case: Properly validates total size, strides, and suboffsets through shape product calculation.

This change is consistent with existing methods like for_each_segment and zip_eq, which already handle the ndim() == 0 case by returning early with special handling. Buffer creation sites in ctypes structures correctly initialize scalar buffers with an empty dim_desc vector to set ndim=0.

crates/stdlib/src/pystruct.rs (1)

30-41: Use of PyStr::isascii is consistent with the new string API

Switching to s.isascii() matches the renamed PyStr helper and keeps the ASCII-only invariant for struct format strings without changing behavior.

crates/vm/src/buffer.rs (1)

264-312: PEP 3118 extended-spec skipping looks correct; worth validating against edge cases

The new handling for T{...}/X{...}, array shapes (...), field names :name:, and standalone {/} is internally consistent: it correctly tracks nesting, errors on unmatched delimiters, and cleanly skips constructs the current FormatType doesn’t model. This should allow parsing richer PEP 3118 buffer formats without breaking existing struct codes.

Given the subtle semantics, I’d recommend adding/confirming tests for:

  • Unmatched {, (, or : at end of format.
  • Nested braces/parentheses.
  • Formats mixing repeats or regular codes around these extended constructs.
crates/vm/src/exceptions.rs (3)

605-635: __suppress_context__ pygetset wiring is coherent

Exposing __suppress_context__ via a dedicated getter plus the existing set_suppress_context setter cleanly matches the Python attribute name and keeps internal use of AtomicCell<bool> intact, with no behavior change.


1102-1116: Serializer now tracks suppress_context via the new dunder accessor

Switching serialization to self.exc.__suppress_context__() keeps the on-the-wire shape the same while aligning with the new API; no functional issues here.


1552-1562: PyOSError layout and strerror accessor updates are consistent with usage

Adding #[repr(C)] to PyOSError is compatible with its explicit base: PyException field and the transparent wrappers for subclasses, and it matches the comments that rely on well-defined layout. Renaming the getter to strerror and binding it via #[pygetset] plus #[pygetset(setter, name = "strerror")] keeps the Python attribute name stable while simplifying the Rust side.

Also applies to: 1861-1868

crates/vm/src/builtins/str.rs (2)

628-632: str.isascii implementation matches internal StrKind invariant

Using matches!(self.kind(), StrKind::Ascii) provides an O(1) check that’s consistent with how StrKind is maintained and with bytes/bytearray’s isascii helpers, while the #[pymethod] name exposes the correct Python method.


963-984: __format__ binding rename preserves formatting semantics

Renaming the method to __format__ and relying on a plain #[pymethod] only affects the binding name; the logic (empty-spec fast path, UTF‑8 validation, FormatSpec parsing, and use of CharLenStr) is unchanged, so str.__format__ behavior stays intact.

crates/vm/src/stdlib/functools.rs (1)

76-101: PyPartial.__reduce__ rename keeps pickle protocol intact

Switching to a method literally named __reduce__ with a plain #[pymethod] keeps the pickle tuple (partial_type, (func,), state) identical while making the Rust binding clearer; no behavioral concerns.

crates/vm/src/builtins/type.rs (1)

1448-1452: get_mro to __mro__ renaming is complete

The renaming is clean with no remaining callsites. The #[pygetset] macro correctly exposes __mro__ as a Python property, and the implementation properly preserves behavior via mro_map_collect. Rust code uses internal methods and the mro field directly, not the old get_mro method.

crates/vm/src/stdlib/ctypes/function.rs (3)

40-77: Wide‐char FFI mapping looks correct, but signedness for 'c' is inconsistent with tags

ffi_type_from_tag now uses WideChar size to choose Type::u16() vs Type::u32() for 'u', which correctly tracks platform wchar_t size and aligns with type_info("u"). Good change.

Note however that "c" in type_info maps to Type::u8, while the 'c' tag here maps to Type::i8(). That signed/unsigned mismatch can cause subtle differences between code paths that use _type_ vs CArgObject.tag for the same c_char value (e.g. negative bytes). If you care about strict CPython parity, consider unifying these to a single choice (likely i8).
[ suggest_recommended_refactor ]


573-680: cast_impl pointer/value semantics and _objects tracking look correct

The revamped cast_impl:

  • Validates the second argument is a pointer-like type via cast_check_pointertype.
  • Extracts the pointer value from integers, bytes, str, pointer instances, simple ctypes and other PyCData instances.
  • Shares and updates the _objects dict between src and the result to keep the source object alive, and
  • Writes the pointer either into a PyCPointer result or directly into the result’s buffer.

This matches CPython’s cast/_objects lifetime behavior and fixes the previous incorrect addressof handling. No issues spotted here.


1656-1850: Callback thunk implementation looks solid and mirrors ctypes semantics

The new PyCThunk/thunk_callback path:

  • Builds arg_types/res_type via type codes and get_ffi_type.
  • Converts C arguments to Python in ffi_to_python, including z/Z/P/? handling.
  • Invokes the Python callable and routes exceptions through run_unraisable with a helpful message.
  • Converts the Python result back to C in python_to_ffi, matching type codes and sizes.
  • Manages ThunkUserData lifetime safely via ThunkData and a Drop impl.

This is a non-trivial and error-prone area; the layering and lifetime management look correct.

crates/vm/src/stdlib/ctypes.rs (10)

118-135: LONG_DOUBLE_SIZE fallback seems reasonable but may mask exotic ABIs

The new LONG_DOUBLE_SIZE constants handle:

  • 16 bytes on x86_64/aarch64 non‑Windows,
  • sizeof(c_double) on Windows and the remaining platforms.

This matches most common ABIs (80‑bit extended on x86_64, 128‑bit on AArch64, double on Windows). If you need to support more exotic long‑double ABIs, this may need refinement, but for mainstream targets this is a sensible simplification.


142-237: type_info now correctly tracks platform c_long, c_ulong, and c_wchar sizes

The new type_info:

  • Uses mem::size_of::<c_long>() / c_ulong() to choose i32/i64 and u32/u64.
  • Uses mem::size_of::<WideChar>() to choose u16 vs u32 for "u".
  • Maps pointer-like codes to Type::pointer().

This fixes the previous hard‑coded 32‑bit mappings and aligns the FFI types with the actual C layout on LP64 vs LLP64 platforms. Good, no issues here.


499-560: sizeof for types/instances looks correct and much closer to CPython

The new sizeof implementation:

  • Prefers StgInfo.size for ctypes types.
  • Falls back to struct/union size calculators and simple type codes.
  • Returns pointer-size for pointer types and instances, and uses PyCData::size() for actual cdata.

This is substantially more robust and matches CPython’s behavior well; the remaining fallbacks to sizeof::<usize>() for simple types without _type_ seem acceptable as a last resort.


617-685: Pointer type creation and caching are sound, but incomplete-pointer handling is subtle

create_pointer_type:

  • Uses _pointer_type_cache in _ctypes to memoize pointer types.
  • Produces incomplete pointer types when cls is a str name, keyed by the result’s id.
  • Produces complete pointer types when cls is a ctypes type, keyed by cls.

This follows CPython’s approach. One subtlety: the incomplete pointer types are only retrievable by object id, not by the original string name, so calling POINTER("Foo") twice will create two distinct types. That matches CPython, but it’s worth being aware of; if you ever want to change that, you’d need a mapping keyed on the string instead.


716-724: addressof now returns actual cdata buffer address (fixes previous bug)

Switching addressof to:

if let Some(cdata) = obj.downcast_ref::<PyCData>() {
    Ok(cdata.buffer.read().as_ptr() as usize)
}

removes the incorrect size_of::<usize>() special case for PyCPointer and always returns the underlying buffer address, which is what ctypes users expect. Good fix.


726-762: byref semantics and lifetime tracking match CPython

The new byref:

  • Validates the argument is a ctypes instance.
  • Computes (char *)b_ptr + offset for PyCSimple and PyCData, matching CPython’s pointer arithmetic.
  • Stores the base object in CArgObject.obj and the offset in CArgObject.offset, so lifetime and address are tracked correctly.

This is a solid and faithful port of CPython’s byref.


864-904: resize correctly guards ownership and avoids mutating borrowed buffers

resize:

  • Verifies the class has initialized StgInfo (so this is a ctypes instance).
  • Rejects sizes smaller than the original size, with appropriate ValueError.
  • Uses Cow checks to prevent resizing borrowed memory (Cow::Borrowed), returning a clear error.
  • Reallocates and copies the buffer safely when owned.

This is a careful implementation and should be safe.


1043-1091: call_function_internal raw pointer usage looks safe given argument lifetimes

The helper:

  • Validates func_addr != 0.
  • Builds arg_values: Vec<isize> and ffi_args: Vec<Arg> that point into this stack-allocated vector.
  • For bytes and str, it takes pointers into the underlying buffers but the original args tuple (and its elements) live at least until the FFI call completes, so there is no dangling-pointer risk here.

The conversion set is intentionally small and the error message (“Don’t know how to convert parameter …”) is clear. No changes needed.


1144-1182: Windows FormatError helper looks correct and frees the buffer

FormatError:

  • Fetches or uses the provided error code.
  • Calls FormatMessageW with FORMAT_MESSAGE_ALLOCATE_BUFFER.
  • Handles the zero-length / null-buffer case with a safe "<no description>" fallback.
  • Converts to UTF‑16, trims the trailing newline, and calls LocalFree on the buffer.

This is a good, well-scoped wrapper around the Windows APIs.


1184-1244: CopyComPointer correctly handles AddRef semantics and byref destination

The COM helper:

  • Locates the destination pointer-to-pointer via a CArgObject from byref, returning E_POINTER when the shape is wrong or pointers are NULL.
  • Reads the source COM pointer from a ctypes instance buffer or treats None as NULL.
  • Calls AddRef when src_ptr != 0 by reading the IUnknown vtable and invoking the function at index 1.
  • Stores the pointer into *pdst and returns S_OK.

This matches expected COM semantics (copy and AddRef) and should interoperate well with ctypes COM usages.

crates/vm/src/stdlib/ctypes/array.rs (7)

137-278: PyCArrayType initialization and _length_ / _type_ validation look robust

The Initializer for PyCArrayType:

  • Validates _length_ is a non-negative integer fitting in usize, with clear TypeError/ValueError/OverflowError messages.
  • Validates _type_ is a proper ctypes type with initialized StgInfo.
  • Checks for item_size * length overflow.
  • Propagates flags and adds char/wchar getsets based on the element type code.

This is a solid and CPython-like initialization flow for array subtypes.


315-351: PyCArrayType.__mul__ overflow guard is good, but relies on stg_info.size

In AsNumber for PyCArrayType, you:

  • Convert b to an isize and reject negatives.
  • Use stg_info.size and an isize::MAX / current_size check to prevent current_size * n overflow before calling array_type_from_ctype.

That’s the right pattern; just be aware it uses the existing array type’s total size as the “element size” for the new array-of-arrays type, which is consistent with how (T * n) * m behaves in ctypes. Combined with the earlier comment about guarding element_size * length inside array_type_from_ctype, this will give you consistent safety.


441-523: int_to_bytes/bytes_to_int correctly handle signed/unsigned and size variants

The helpers:

  • Try unsigned conversions first for 1/2/4/8 bytes and fall back to signed for negative values.
  • Inverse that mapping when reading back from bytes based on the presence of unsigned type codes (B, H, I, L, Q).

This matches how ctypes treats integral arrays and should round-trip typical values correctly.


668-801: write_element_to_buffer nicely unifies type-specific writes and keep_ref handling

The writer:

  • Handles c, u, z, Z, f, d/g, ctypes instances, and plain ints.
  • Uses ensure_z_null_terminated and str_to_wchar_bytes to allocate backing buffers and then stores their holders via keep_ref, tying lifetime to the array element.
  • Calls PyCData::should_keep_ref/get_kept_objects for other ctypes instances to keep backing objects alive where needed.

This is a big improvement in clarity and correctness over ad-hoc per‑type code; no issues spotted.


844-1014: Slice get/set implementations match CPython semantics, including char/wchar special cases

__getitem__/__setitem__:

  • Delegate index vs slice handling to getitem_by_index/setitem_by_index and getitem_by_slice/setitem_by_slice.
  • getitem_by_slice returns:
    • bytes for c_char arrays,
    • str for c_wchar arrays (using WCHAR_SIZE and wchar_from_bytes), and
    • a list for other types.
  • setitem_by_slice uses SaturatedSliceIter to correctly handle negative steps and enforces size equality between source and target.

These behaviors are consistent with CPython’s ctypes array slicing.


1030-1072: Array buffer interface correctly exposes PEP 3118 metadata when available

as_buffer:

  • Pulls format, shape, and element_size from the type’s StgInfo.
  • Computes C-contiguous strides from innermost to outermost dimension.
  • Falls back to a simple 1D buffer when no format/shape info is available.

This should integrate cleanly with NumPy and other PEP 3118 consumers.


1077-1210: Char/wchar array getsets and helpers are well-implemented

The dynamically added getsets:

  • Provide value/raw for c_char arrays and value for c_wchar arrays.
  • Use wchar_from_bytes/wchar_to_bytes and wstring_from_bytes with WCHAR_SIZE to handle platform-dependent wchar_t.

These helpers give a convenient, CPython-like API while respecting platform width differences.

crates/vm/src/stdlib/ctypes/library.rs (3)

19-43: SharedLibrary abstraction and close semantics look fine

SharedLibrary wraps libloading::Library in a PyMutex<Option<_>> and provides:

  • close() that sets lib to None,
  • Drop that calls close() again.

Using the Library’s address as the key for the handle is a bit unconventional but internally consistent: as long as you always go through libcache() to create/load libraries, the numeric handle you hand to Python uniquely identifies that instance. No issues from a correctness standpoint.


66-86: get_or_insert_lib doesn’t deduplicate by path, but behavior is consistent

The new get_or_insert_lib:

let new_lib = SharedLibrary::new(library_path)?;
let key = new_lib.get_pointer();
match self.libraries.get(&key) { ... }
self.libraries.insert(key, new_lib);
Ok((key, self.libraries.get(&key).expect("just inserted")))

This effectively means each call returns a fresh handle (keyed by the new SharedLibrary pointer). There’s no deduplication for the same path, but ctypes doesn’t strictly require it, and the drop_lib path will correctly drop whatever handle the caller passes back.

If you ever want re-use by path (e.g. to share handles for the same library), you’d need a second map keyed by OsString, but the current behavior is still correct.


93-98: libcache refactor to static_cell! is a good improvement

Moving LIBCACHE into libcache() via static_cell!:

  • Avoids manual lazy_static-style boilerplate,
  • Keeps visibility scoped to this module via pub(super),
  • Still returns a stable 'static reference to a shared PyRwLock<ExternalLibs>.

This is a neat and idiomatic way to manage the global ctypes library cache.

crates/vm/src/stdlib/ctypes/structure.rs (4)

66-129: PyCStructType initialization flow is well-structured and handles inheritance correctly

The Initializer for PyCStructType:

  • Skips initialization for abstract types (_abstract_).
  • Uses check_not_initialized to prevent double init.
  • Processes direct _fields_ via process_fields, or
  • Clones StgInfo from the base class, clearing FINAL and then marking the base as final, or
  • Sets a default StgInfo for types without _fields_.

This mirrors CPython’s behavior and avoids RwLock reentrancy by carefully dropping guards before taking mutable references. Nicely done.


169-387: process_fields is comprehensive; just ensure get_field_format covers all relevant ctypes

The field-processing logic:

  • Validates _fields_ type, packing, and _align_.
  • Inherits size/align/flags/FFI field types from the base.
  • Computes padding and alignment per field, including array shapes.
  • Builds a PEP 3118 T{...} format string with field names and padding.
  • Propagates pointer/union/bitfield flags into the type’s StgInfo.

This is a faithful port of CPython’s structure layout logic. The only assumption is that get_field_format correctly covers all ctypes-y field types; given that’s centralized in base, this is the right place to delegate.


427-491: SetAttr for PyCStructType correctly protects _fields_ and respects descriptors

The custom setattro:

  • Rejects modifications to _fields_ once the type is final.
  • Uses the descriptor protocol to let data descriptors (e.g. slots) handle assignments before falling back to plain attribute storage.
  • Calls process_fields whenever _fields_ is set, updating StgInfo and descriptors accordingly.

This avoids deadlocks by checking StgInfo finality in a short-lived scope and then performing the heavier work afterwards. Looks good.


598-636: Structure buffer interface is consistent with union and exposes correct itemsize

AsBuffer for PyCStructure:

  • Uses the current buffer length as both len and itemsize, treating the structure as a scalar object.
  • Pulls format from type StgInfo with "B" as a fallback, and uses dim_desc: vec![] for a 0‑D buffer.

This matches the new union behavior and should integrate cleanly with buffer consumers.

crates/vm/src/stdlib/ctypes/union.rs (6)

66-133: PyCUnionType initialization mirrors structure logic and handles inheritance/finality well

The Initializer:

  • Skips _abstract_ types.
  • Uses check_not_initialized to avoid double init.
  • If _fields_ is present directly, delegates to set_attr to trigger setattro and process_fields.
  • Otherwise, clones StgInfo from the base (clearing FINAL) or builds a default union StgInfo with TYPEFLAG_HASUNION and a "B" format.

This is a clean and consistent design with the structure side.


136-296: Union process_fields correctly computes max size/align and updates flags

For each field in _fields_, process_fields:

  • Validates the tuple structure and name type.
  • Uses base helpers to get size/align and apply packing.
  • Tracks the maximum field size and effective alignment.
  • Propagates pointer and bitfield flags, marks field types as final, and collects FFI field types.
  • Sets every field’s offset to 0 (union semantics).
  • Computes an aligned total size and stores a StgInfo with TYPEFLAG_HASUNION, DICTFLAG_FINAL, and a "B" format.

This is exactly what we want for a union layout engine.


299-345: from_param for unions provides good diagnostics and handles CArgObject/proto

The new from_param:

  • Reuses an existing instance of the requested union type.
  • Accepts a CArgObject when its wrapped object matches the union’s proto or the union type itself, with a clear error when the pointer is to the wrong type.
  • Recurses through _as_parameter_ when needed.

This matches ctypes customization patterns and gives clear error messages on misuse.


359-413: Custom SetAttr for PyCUnionType correctly hooks _fields_ without breaking descriptors

setattro:

  • First defers to any data descriptor on the metatype, and if the attribute is _fields_, runs process_fields afterwards.
  • Otherwise, stores/removes attributes in the type’s dict.
  • When _fields_ is assigned directly, again calls process_fields so StgInfo and CField descriptors are created.

This is symmetric with the structure implementation and should behave well under subclassing.


432-449: Union instance creation correctly uses StgInfo and finalizes the type

PyCUnion::slot_new:

  • Reads size and align from the type’s StgInfo.
  • Marks the type as final on first instantiation.
  • Allocates a zero-initialized buffer via PyCData::from_stg_info.

This keeps layout metadata centralized in StgInfo and aligns with the structure side.


521-545: Union Initializer mirrors structure init and handles extra args nicely

PyCUnion::init:

  • Walks positional args through the inheritance chain with init_pos_args, checking for “too many initializers”.
  • Then applies keyword arguments as attribute assignments.

This is consistent with Structure initialization and should make union construction intuitive.

Comment on lines 247 to 303
fn convert_object(&self, value: PyObjectRef, vm: &VirtualMachine) -> PyResult<FfiArgValue> {
// Call from_param first to convert the value (like CPython's callproc.c:1235)
// converter = PyTuple_GET_ITEM(argtypes, i);
// v = PyObject_CallOneArg(converter, arg);
let from_param = self
.as_object()
.get_attr(vm.ctx.intern_str("from_param"), vm)?;
let converted = from_param.call((value.clone(),), vm)?;

// Then pass the converted value to ConvParam logic
// CArgObject (from from_param) -> use stored value directly
if let Some(carg) = converted.downcast_ref::<CArgObject>() {
return Ok(carg.value);
}

// None -> NULL pointer
if vm.is_none(&converted) {
return Ok(FfiArgValue::Pointer(0));
}

// For pointer types (POINTER(T)), we need to pass the ADDRESS of the value's buffer
if self.fast_issubclass(PyCPointer::static_type()) {
if let Some(cdata) = converted.downcast_ref::<PyCData>() {
let addr = cdata.buffer.read().as_ptr() as usize;
return Ok(FfiArgValue::Pointer(addr));
}
return convert_to_pointer(&converted, vm);
}

// For structure types, convert to pointer to structure
if self.fast_issubclass(PyCStructure::static_type()) {
return convert_to_pointer(&converted, vm);
}

// Get the type code for this argument type
let type_code = self
.as_object()
.get_attr(vm.ctx.intern_str("_type_"), vm)
.ok()
.and_then(|t| t.downcast_ref::<PyStr>().map(|s| s.to_string()));

// For pointer types (c_void_p, c_char_p, c_wchar_p), handle as pointer
if matches!(type_code.as_deref(), Some("P") | Some("z") | Some("Z")) {
return convert_to_pointer(&converted, vm);
}

// PyCSimple (already a ctypes instance from from_param)
if let Ok(simple) = converted.clone().downcast::<PyCSimple>() {
let typ = ArgumentType::to_ffi_type(self, vm)?;
let arg = simple
.to_arg(typ, vm)
.ok_or(vm.new_type_error("Unsupported argument type".to_string()))?;
return Ok(arg);
let ffi_value = simple
.to_ffi_value(typ, vm)
.ok_or(vm.new_type_error("Unsupported argument type"))?;
return Ok(ffi_value);
}
Err(vm.new_type_error("Unsupported argument type".to_string()))

Err(vm.new_type_error("Unsupported argument type"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ArgumentType::convert_object may drop from_param temporaries before the FFI call

convert_object calls the type’s from_param, uses the returned converted value to build an FfiArgValue, and then returns only the FfiArgValue. The converted object itself is not kept anywhere:

let converted = from_param.call((value.clone(),), vm)?;
// ...
return Ok(FfiArgValue::Pointer(addr)); // or from PyCSimple / convert_to_pointer

If a custom from_param allocates a fresh ctypes object (or bytes/str buffer) that isn’t otherwise referenced from the original argument, that temporary can be dropped as soon as convert_object returns. The FFI call will still use its raw pointer, risking use‑after‑free.

CPython avoids this by keeping the converted objects alive for the duration of the call (e.g. via a converters/keepalive list).

You should keep converted objects alive until after ctypes_callproc returns — for example, by:

  • Extending the API so convert_object also returns an optional keepalive PyObjectRef, and
  • Storing all such keepalives in the PreparedArgs or on the PyCFuncPtr instance for the duration of the call.

Comment on lines 302 to 324
Some("Z") => {
// 1. str → create c_wchar_p
if value.downcast_ref::<PyStr>().is_some() {
return create_simple_with_value("Z", &value);
}
// 2. Array/Pointer with c_wchar element type
if is_cwchar_array_or_pointer(&value, vm) {
return Ok(value);
}
// 3. CArgObject (byref(c_wchar(...)))
if let Some(carg) = value.downcast_ref::<CArgObject>()
&& carg.tag == b'u'
{
return Ok(value.clone());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, examine the from_param "Z" case (lines 302-316 plus context)
sed -n '295,325p' crates/vm/src/stdlib/ctypes/simple.rs | cat -n

Repository: RustPython/RustPython

Length of output: 1646


🏁 Script executed:

# Examine the value_to_bytes_endian "Z" case (lines 900-908 plus context)
sed -n '895,920p' crates/vm/src/stdlib/ctypes/simple.rs | cat -n

Repository: RustPython/RustPython

Length of output: 1212


🏁 Script executed:

# Examine the "z" PyBytes handling case (lines 277-300)
sed -n '270,310p' crates/vm/src/stdlib/ctypes/simple.rs | cat -n

Repository: RustPython/RustPython

Length of output: 2089


🏁 Script executed:

# Check if create_simple_with_value calls value_to_bytes_endian
sed -n '215,240p' crates/vm/src/stdlib/ctypes/simple.rs | cat -n

Repository: RustPython/RustPython

Length of output: 1435


🏁 Script executed:

# Search for any special PyStr handling in the file
rg -n 'PyStr.*"Z"|"Z".*PyStr|make_wchar_buffer' crates/vm/src/stdlib/ctypes/simple.rs

Repository: RustPython/RustPython

Length of output: 152


🏁 Script executed:

# Check recent commits that touched this file to see if this was supposedly "addressed"
git log --oneline -20 -- crates/vm/src/stdlib/ctypes/simple.rs

Repository: RustPython/RustPython

Length of output: 83


🏁 Script executed:

# Search for commits around the hashes mentioned in the review (1707861 to 2bf7dac)
git log --oneline --all | grep -E '1707861|2bf7dac'

Repository: RustPython/RustPython

Length of output: 83


from_param for c_wchar_p ("Z") produces NULL pointers for PyStr values

When from_param receives a PyStr for type code "Z" (c_wchar_p), it calls create_simple_with_value("Z", &value) (line 311). This function calls value_to_bytes_endian("Z", val, false, vm) (line 223), which only handles integer addresses via try_index(vm) and returns a zero-filled buffer (NULL pointer) for PyStr:

"Z" => {
    // PyStr case is handled in slot_new/set_value with make_wchar_buffer()
    if let Ok(int_val) = value.try_index(vm) {
        let v = int_val.as_bigint().to_usize().unwrap_or(0);
        return to_bytes!(v);
    }
    vec![0; std::mem::size_of::<usize>()]  // ← returns NULL for PyStr
}

The comment about slot_new/set_value is misleading: from_param bypasses those code paths. In contrast, the "z" case (c_char_p) at lines 278-287 has direct special handling to extract the actual bytes pointer. The "Z" case should have equivalent handling to create a wchar buffer pointer, not return NULL.

🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/simple.rs around lines 302-316: from_param for "Z"
currently routes PyStr through create_simple_with_value which ends up returning
a zeroed (NULL) pointer because value_to_bytes_endian only handles integer
addresses; change from_param to special-case PyStr like the "z" (c_char_p)
branch: when value is a PyStr, allocate/produce a wchar buffer (using the
existing make_wchar_buffer logic or equivalent conversion to wide chars), store
its pointer in the argument representation (or return the proper
CArgObject/array/pointer wrapper) and return that instead of falling back to
create_simple_with_value; keep the existing checks for cwchar arrays/pointers
and CArgObject intact.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
crates/vm/src/stdlib/ctypes/library.rs (1)

5-43: Check the cache before calling SharedLibrary::new in get_or_insert_lib

Line 71 eagerly calls SharedLibrary::new(library_path)? (which invokes unsafe { Library::new(..) }) before checking if a library with the same path is already cached. This means:

  • Every call triggers dlopen and library initialization routines, even when an equivalent library exists in the cache
  • While modern Unix systems return the same handle for repeated dlopen calls on the same path and only run init once, this is not guaranteed across all platforms or if different path representations (relative vs. absolute, different symlink resolutions) are passed
  • If the path provided differs even when the absolute path is the same, different file handles are returned, potentially triggering initialization multiple times

Refactor get_or_insert_lib to check the cache by library path before constructing a new SharedLibrary. You can check if a path is already loaded (using dlopen with RTLD_NOLOAD flag or by maintaining a path-based lookup) before calling SharedLibrary::new.

crates/vm/src/builtins/builtin_func.rs (1)

117-121: __self__ for builtin_function_or_method always returns None instead of the bound object

CPython exposes __self__ for built-in methods as either the bound object or None. The current implementation at lines 117-120 always returns None, discarding the bound object information already maintained by PyNativeFunction.zelf.

The PyNativeFunction struct stores the bound object in its zelf field and a get_self() method already exists to access it. However, the __self__ getter ignores this:

#[pygetset]
fn __self__(_zelf: PyObjectRef, vm: &VirtualMachine) -> PyObjectRef {
    vm.ctx.none()
}

By contrast, PyNativeMethod.__self__ (lines 184-187) correctly returns the bound object via zelf.func.zelf.clone().

Align PyNativeFunction.__self__ with the existing pattern by routing it through NativeFunctionOrMethod and get_self():

#[pygetset]
fn __self__(zelf: NativeFunctionOrMethod, vm: &VirtualMachine) -> PyObjectRef {
    match zelf.0.get_self() {
        Some(obj) => obj.clone().into(),
        None => vm.ctx.none(),
    }
}

This ensures both builtin_function_or_method and builtin_method expose the correct bound object to introspection tools.

crates/vm/src/stdlib/ctypes.rs (1)

704-714: Public _check_HRESULT uses todo!() and will panic at runtime

On Windows, calling _ctypes._check_HRESULT will unconditionally hit todo!(), aborting the VM instead of raising a Python exception. This should at least raise a proper error or delegate to the fully implemented check_hresult in function.rs.

Suggested minimal fix
-    fn check_hresult(_self: PyObjectRef, hr: i32, _vm: &VirtualMachine) -> PyResult<i32> {
-        // TODO: fixme
-        if hr < 0 {
-            // vm.ctx.new_windows_error(hr)
-            todo!();
-        } else {
-            Ok(hr)
-        }
-    }
+    fn check_hresult(_self: PyObjectRef, hr: i32, vm: &VirtualMachine) -> PyResult<i32> {
+        if hr < 0 {
+            return Err(vm.new_os_error(format!("HRESULT: 0x{:08X}", hr as u32)));
+        }
+        Ok(hr)
+    }
♻️ Duplicate comments (13)
crates/vm/src/stdlib/ctypes/pointer.rs (1)

528-596: Negative pointer indices cause pathological keep_ref behavior

This issue was flagged in a previous review and remains unaddressed. Lines 580, 588, and 595 cast index (an isize) to usize for keep_ref. When index is negative (valid for pointer arithmetic), it wraps to a huge value (e.g., -1usize::MAX), causing issues in the internal _objects tracking.

🔎 Suggested fix

Either reject negative indices for writes, or use a normalized key for reference tracking:

     fn setitem_by_index(
         zelf: &Py<Self>,
         index: isize,
         value: PyObjectRef,
         vm: &VirtualMachine,
     ) -> PyResult<()> {
+        // Use a stable key for keep_ref (negative indices are valid for pointer arithmetic
+        // but problematic for reference tracking)
+        let keep_ref_key = index.unsigned_abs();
+        
         // ... existing code ...
         
-        return zelf.0.keep_ref(index as usize, converted, vm);
+        return zelf.0.keep_ref(keep_ref_key, converted, vm);
         
         // ... and similarly for other keep_ref calls ...
crates/vm/src/stdlib/ctypes.rs (2)

716-762: addressof / byref semantics align with CPython and earlier review feedback

addressof now consistently returns the underlying PyCData buffer address (including pointer types), and byref computes the pointer as buffer_ptr + offset for both PyCSimple and PyCData. This matches CPython’s semantics and fixes the earlier PyCPointer special-case bug.


1093-1105: PyObj_FromPtr exposes unsafe arbitrary pointer → PyObjectRef conversion to Python

PyObj_FromPtr takes an arbitrary usize, does only a NULL check, and then constructs a PyObjectRef via from_raw(NonNull::new_unchecked(raw_ptr)). Any Python code can pass an invalid address here, leading to:

  • Undefined behavior inside Rust (invalid NonNull),
  • Use-after-free on stale pointers,
  • Silent memory corruption or crashes.

This is much less controlled than normal ctypes FFI (which at least goes through libffi and explicit C signatures) and breaks the VM’s safety guarantees.

If you want to keep a CPython-like escape hatch, it should not be directly exposed as a normal _ctypes function:

  • Either remove this pyfunction, or
  • Gate it behind a debug / internal feature (cfg(feature = "internal_unsafe_api")) and mark it unsafe on the Rust side with clear documentation for internal callers, or
  • At minimum, hide it from Python-level code and only use it in tightly controlled internals.
Suggested safe-facing change
-    #[pyfunction(name = "PyObj_FromPtr")]
-    fn py_obj_from_ptr(ptr: usize, vm: &VirtualMachine) -> PyResult {
+    // Do not expose directly to Python; keep as internal unsafe helper.
+    pub(crate) unsafe fn py_obj_from_ptr_unsafe(ptr: usize, vm: &VirtualMachine) -> PyResult {
         if ptr == 0 {
             return Err(vm.new_value_error("NULL pointer access"));
         }
         let raw_ptr = ptr as *mut crate::object::PyObject;
-        unsafe {
-            let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr));
-            let obj = std::mem::ManuallyDrop::new(obj);
-            Ok((*obj).clone())
-        }
+        let obj = crate::PyObjectRef::from_raw(
+            std::ptr::NonNull::new(raw_ptr)
+                .ok_or_else(|| vm.new_value_error("invalid object pointer"))?,
+        );
+        let obj = std::mem::ManuallyDrop::new(obj);
+        Ok((*obj).clone())
     }

and drop the #[pyfunction] export for now.

crates/vm/src/stdlib/ctypes/union.rs (2)

12-29: Union size calculation now fails fast on malformed _fields_

calculate_union_size now uses try_to_value(vm)? and only falls back to Ok(0) when _fields_ is missing, not when it’s malformed. That fixes the previous “silent zero-sized union” behavior while still allowing empty unions.


553-577: Union buffer descriptor now uses 0-D scalar layout, matching Structure

PyCUnion::as_buffer now constructs a BufferDescriptor with dim_desc: vec![] and itemsize = buffer_len, using the StgInfo format when available. That brings unions in line with structures as scalar (0-D) ctypes values and addresses the earlier inconsistency.

crates/vm/src/stdlib/ctypes/simple.rs (3)

59-179: Primitive coercion logic for simple ctypes is much closer to CPython now

set_primitive’s handling of:

  • "z" / "Z" (accepting None, int, and the appropriate bytes / str),
  • numeric integer and float codes via try_index/try_float,
  • "?" and "v" via try_to_bool,

mirrors CPython’s ctypes expectations and fixes the earlier regressions around c_char_p(None) / c_wchar_p(None) and VARIANT_BOOL.


268-413: from_param for pointer-like simple types (z/Z/P) correctly builds owned buffers

The new from_param logic:

  • For "z": uses ensure_z_null_terminated and FfiArgValue::OwnedPointer to keep the bytes alive.
  • For "Z": uses str_to_wchar_bytes with an owned holder.
  • For "P": supports ints, bytes, str, ctypes arrays/pointers, CArgObject, PyCFuncPtr, and c_char_p/c_wchar_p instances.

This addresses earlier issues where from_param could produce NULL pointers for bytes/str and improves lifetime guarantees via OwnedPointer.


675-734: c_wchar serialization now respects platform wchar_t size

The "u" arm of value_to_bytes_endian now uses WCHAR_SIZE and wchar_to_bytes, rather than hardcoding 4 bytes. This aligns c_wchar with WCHAR_SIZE used elsewhere (e.g., arrays) and fixes the previous platform mismatch on Windows vs Unix.

crates/vm/src/stdlib/ctypes/function.rs (5)

33-77: Centralizing FFI type lookup through type_info is a solid improvement

get_ffi_type and the updated ffi_type_from_tag:

  • Reuse the shared type_info mapping,
  • Handle c_long/c_ulong via size_of::<libc::c_long/ulong>(),
  • Map u (c_wchar) using size_of::<super::WideChar>() to choose u16 vs u32.

That keeps libffi type selection consistent across the module.


247-303: Remaining caveat: custom from_param implementations still need to manage lifetimes carefully

ArgumentType::convert_object now correctly:

  • Uses CArgObject.value directly when from_param returns a CArgObject (including OwnedPointer keepalives),
  • Treats None as NULL,
  • Delegates pointer-like types to convert_to_pointer,
  • Extracts primitive values from PyCSimple via to_ffi_value.

For builtin ctypes types this is enough to keep buffers alive across the call, but user-defined from_param implementations that don’t return a CArgObject or a ctypes instance may still accidentally create temporaries whose buffers are dropped before the FFI call. That’s essentially the same concern as the earlier review.

Consider documenting that third‑party from_param should either:

  • Return a CArgObject with appropriate value/obj/keepalive semantics, or
  • Return a ctypes instance whose backing memory is kept alive elsewhere.

785-830: COM-style constructor branch still creates unusable CFuncPtr on non-Windows

The (index, name, [paramflags], [iid]) constructor path is compiled on all platforms, but:

  • On Windows it’s fine: index/iid are stored and resolve_com_method supplies the actual function pointer at call time.
  • On non-Windows, this branch still runs, but index/iid don’t exist, _base.buffer is all zeros, and call ends up building a CodePtr from address 0 and calling through libffi.

That revives the earlier issue: COM-style construction on non-Windows builds a broken object which later calls a NULL function pointer.

You should either gate this branch behind #[cfg(windows)] or reject it explicitly on non-Windows.

Suggested fix
-        if args.args.len() >= 2
-            && first_arg.try_int(vm).is_ok()
-            && args.args[1].downcast_ref::<PyStr>().is_some()
-        {
+        #[cfg(windows)]
+        if args.args.len() >= 2
+            && first_arg.try_int(vm).is_ok()
+            && args.args[1].downcast_ref::<PyStr>().is_some()
+        {
             // existing Windows COM constructor logic...
             // (index, iid, paramflags)
             return PyCFuncPtr { /* unchanged */ }
                 .into_ref_with_type(vm, cls)
                 .map(Into::into);
         }
+
+        #[cfg(not(windows))]
+        if args.args.len() >= 2
+            && first_arg.try_int(vm).is_ok()
+            && args.args[1].downcast_ref::<PyStr>().is_some()
+        {
+            return Err(vm.new_type_error(
+                "COM-style CFuncPtr construction is only supported on Windows",
+            ));
+        }

1510-1564: OUT parameters still discard the function’s own return value

When prepared.out_buffers is non-empty, build_result ignores the function’s (potentially errcheck-processed) return value and returns only the OUT/INOUT arguments. ctypes normally returns a tuple (result, *outs) in that situation.

This matches an earlier review concern and is still present.

You likely want:

  • result_obj = result.unwrap_or_else(|| vm.ctx.none()) as the first element, followed by extracted out values.
Suggested fix
-    if prepared.out_buffers.is_empty() {
-        return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none()));
-    }
-
-    let out_values = extract_out_values(prepared.out_buffers, vm);
-    Ok(match <[PyObjectRef; 1]>::try_from(out_values) {
-        Ok([single]) => single,
-        Err(v) => PyTuple::new_ref(v, &vm.ctx).into(),
-    })
+    if prepared.out_buffers.is_empty() {
+        return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none()));
+    }
+
+    let mut values = Vec::new();
+    // First element: function result (after errcheck), or None
+    values.push(result.unwrap_or_else(|| vm.ctx.none()));
+    // Followed by OUT / INOUT values
+    values.extend(extract_out_values(prepared.out_buffers, vm));
+
+    Ok(match <[PyObjectRef; 1]>::try_from(values) {
+        Ok([single]) => single,
+        Err(v) => PyTuple::new_ref(v, &vm.ctx).into(),
+    })

1589-1597: Calling a CFuncPtr with a NULL function pointer still calls through address 0

The new code_ptr resolution replaces the previous expect panic with:

let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
    Some(cp) => cp,
    None => {
        debug_assert!(false, "NULL function pointer");
        CodePtr(std::ptr::null_mut())
    }
};

In release builds this will happily call libffi with a null CodePtr, which is undefined behaviour and likely to crash the process. CPython’s ctypes is “unsafe”, but in Rust this crosses into UB territory.

A safer choice is to raise a Python ValueError("NULL function pointer") instead of constructing a null CodePtr.

Suggested change
-        let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
-            Some(cp) => cp,
-            None => {
-                debug_assert!(false, "NULL function pointer");
-                // In release mode, this will crash like CPython
-                CodePtr(std::ptr::null_mut())
-            }
-        };
+        let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
+            Some(cp) => cp,
+            None => {
+                return Err(vm.new_value_error("NULL function pointer"));
+            }
+        };
🧹 Nitpick comments (5)
crates/vm/src/stdlib/ctypes/structure.rs (1)

13-29: calculate_struct_size now surfaces errors instead of silently defaulting

Switching from unwrap_or_default() to fields_attr.try_to_value(vm)? is a good change: invalid _fields_ contents will now raise instead of quietly treating the structure as size 0, which makes debugging much easier.

If you ever touch this again, you might also consider guarding against total_size usize overflow when summing many large field sizes, but that’s a very edge‑case scenario.

crates/vm/src/stdlib/ctypes/pointer.rs (1)

664-682: Silent value truncation on integer overflow

Using unwrap_or(0) silently truncates values that don't fit in the target type. For example, writing 256 to a c_char results in 0 being written instead of raising an error.

🔎 Suggested fix with proper error handling
 match size {
     1 => {
-        *ptr = i.to_u8().unwrap_or(0);
+        *ptr = i.to_u8().ok_or_else(|| {
+            vm.new_overflow_error("value out of range for c_char")
+        })?;
     }
     2 => {
-        *(ptr as *mut i16) = i.to_i16().unwrap_or(0);
+        std::ptr::write_unaligned(
+            ptr as *mut i16,
+            i.to_i16().ok_or_else(|| {
+                vm.new_overflow_error("value out of range for c_short")
+            })?,
+        );
     }
     // ... similar for other sizes
 }
crates/vm/src/stdlib/ctypes.rs (2)

388-394: Derive SIZEOF_TIME_T from libc::time_t instead of hardcoding

Hardcoding SIZEOF_TIME_T to 8 on Windows and 4 elsewhere can be wrong on some Unix targets and diverges from CPython’s “reflect actual C type” behavior. Prefer computing this once from libc::time_t.

Suggested change
-    #[pyattr]
-    const SIZEOF_TIME_T: usize = 8;
+    #[pyattr]
+    const SIZEOF_TIME_T: usize = std::mem::size_of::<libc::time_t>();
@@
-    #[pyattr]
-    const SIZEOF_TIME_T: usize = 4;
+    #[pyattr]
+    const SIZEOF_TIME_T: usize = std::mem::size_of::<libc::time_t>();

1107-1117: Py_INCREF / Py_DECREF are no-ops – document or restrict usage

The exported Py_INCREF / Py_DECREF functions currently just return the object unchanged and don’t affect refcounts. That’s probably acceptable for now, but:

  • It’s observably different from CPython,
  • And misleading for anyone expecting real refcount semantics.

Consider either documenting them as stubs in _ctypes (e.g., in the docstring) or gating them behind a feature until a correct implementation is provided.

crates/vm/src/stdlib/ctypes/simple.rs (1)

957-977: Avoid panicking on missing StgInfo in is_cwchar_array_or_pointer

is_cwchar_array_or_pointer uses expect("array has StgInfo") / expect("pointer has StgInfo"). If someone constructs an incomplete array/pointer type or a misconfigured ctypes type without StgInfo, this will panic the VM instead of raising a Python exception.

You can preserve the fast path but turn misconfigurations into TypeError:

Suggested change
-    if let Some(arr) = value.downcast_ref::<PyCArray>() {
-        let info = arr.class().stg_info_opt().expect("array has StgInfo");
-        let elem_type = info.element_type.as_ref().expect("array has element_type");
+    if let Some(arr) = value.downcast_ref::<PyCArray>() {
+        let info = match arr.class().stg_info_opt() {
+            Some(info) => info,
+            None => return false,
+        };
+        let Some(elem_type) = info.element_type.as_ref() else {
+            return false;
+        };
@@
-    if let Some(ptr) = value.downcast_ref::<PyCPointer>() {
-        let info = ptr.class().stg_info_opt().expect("pointer has StgInfo");
+    if let Some(ptr) = value.downcast_ref::<PyCPointer>() {
+        let Some(info) = ptr.class().stg_info_opt() else {
+            return false;
+        };
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf7dac and 5ebd684.

📒 Files selected for processing (27)
  • .cspell.dict/python-more.txt (2 hunks)
  • .cspell.json (3 hunks)
  • crates/stdlib/src/pystruct.rs (1 hunks)
  • crates/vm/src/buffer.rs (1 hunks)
  • crates/vm/src/builtins/builtin_func.rs (2 hunks)
  • crates/vm/src/builtins/function.rs (1 hunks)
  • crates/vm/src/builtins/object.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (2 hunks)
  • crates/vm/src/builtins/type.rs (1 hunks)
  • crates/vm/src/exceptions.rs (4 hunks)
  • crates/vm/src/object/core.rs (1 hunks)
  • crates/vm/src/protocol/buffer.rs (1 hunks)
  • crates/vm/src/stdlib/ast/python.rs (1 hunks)
  • crates/vm/src/stdlib/codecs.rs (2 hunks)
  • crates/vm/src/stdlib/ctypes.rs (14 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/library.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/simple.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (0 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (4 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (0 hunks)
  • crates/vm/src/stdlib/functools.rs (1 hunks)
  • crates/vm/src/stdlib/operator.rs (1 hunks)
  • crates/vm/src/types/structseq.rs (2 hunks)
💤 Files with no reviewable changes (2)
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
🚧 Files skipped from review as they are similar to previous changes (9)
  • crates/stdlib/src/pystruct.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/types/structseq.rs
  • crates/vm/src/buffer.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/stdlib/codecs.rs
  • crates/vm/src/stdlib/functools.rs
  • crates/vm/src/exceptions.rs
  • .cspell.dict/python-more.txt
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/protocol/buffer.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/ast/python.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/stdlib/operator.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/library.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/builtins/builtin_func.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/function.rs
🧠 Learnings (6)
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • .cspell.json
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/function.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/simple.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations

Applied to files:

  • crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/function.rs
🧬 Code graph analysis (3)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytearray.rs (1)
  • isascii (271-273)
crates/vm/src/builtins/bytes.rs (1)
  • isascii (262-264)
crates/vm/src/bytes_inner.rs (1)
  • isascii (326-328)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (22)
  • obj (1097-1097)
  • obj (1105-1105)
  • obj (1125-1125)
  • obj (1137-1137)
  • obj (1150-1150)
  • obj (1161-1161)
  • value (300-300)
  • array_type_from_ctype (16-98)
  • b (546-546)
  • b (842-842)
  • std (600-600)
  • std (625-625)
  • std (1235-1235)
  • __getitem__ (864-879)
  • item (866-866)
  • item (874-874)
  • item (981-981)
  • item (989-989)
  • t (67-67)
  • t (260-260)
  • t (379-379)
  • __setitem__ (971-994)
crates/vm/src/stdlib/ctypes/base.rs (39)
  • obj (768-768)
  • obj (792-792)
  • obj (814-814)
  • obj (829-829)
  • obj (1581-1581)
  • obj (1583-1583)
  • obj (1692-1693)
  • obj (1720-1721)
  • obj (1741-1742)
  • obj (1766-1766)
  • new (149-165)
  • new (1237-1245)
  • value (851-852)
  • value (860-860)
  • value (860-860)
  • value (882-882)
  • value (896-896)
  • value (911-911)
  • value (921-921)
  • value (1352-1352)
  • value (1360-1360)
  • value (1368-1368)
  • value (1417-1417)
  • value (1432-1432)
  • value (1446-1446)
  • value (1460-1460)
  • value (1485-1485)
  • type_ref (2151-2151)
  • std (365-365)
  • std (366-366)
  • std (2140-2140)
  • std (2223-2223)
  • buffer (347-347)
  • buffer (354-354)
  • offset (1657-1659)
  • type_code (1537-1537)
  • type_code (1554-1554)
  • size (669-671)
  • size (1662-1664)
crates/vm/src/stdlib/ctypes/function.rs (1)
crates/vm/src/stdlib/ctypes.rs (7)
  • type_info (143-237)
  • std (541-541)
  • std (545-545)
  • std (556-556)
  • t (55-55)
  • pointer (688-697)
  • src (1209-1209)
🪛 Biome (2.1.2)
.cspell.json

[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 78-78: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 110-110: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 134-134: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (18)
.cspell.json (1)

78-78: LGTM! Vocabulary additions are appropriate.

The three new terms added to the spell-check dictionary are legitimate technical terms for the ctypes implementation:

  • oparg: operation argument (bytecode terminology)
  • pytype: Python type terminology
  • sighandler: signal handler terminology

Note: The Biome parse errors on these lines are false positives. Biome is attempting to parse this as standard JSON, but .cspell.json uses an extended JSON format that supports trailing commas and comments. The file is valid cspell configuration.

Also applies to: 110-110, 134-134

crates/vm/src/stdlib/ast/python.rs (1)

50-53: LGTM! Improved naming clarity.

The refactoring from fields with #[pyattr(name = "_fields")] to _fields with #[pyattr] aligns the Rust function name with the Python attribute name. This improves code maintainability by eliminating the explicit name override while preserving the Python API.

crates/vm/src/protocol/buffer.rs (1)

203-218: LGTM! Correct implementation of scalar type support.

The conditional validation logic properly handles both scalar (ndim=0) and non-scalar cases:

  • Scalar case correctly asserts itemsize == len since a dimensionless buffer's length equals its item size
  • Non-scalar case maintains the existing validation: shape_product * itemsize == len with proper checks for suboffset >= 0 and stride != 0

This change is essential for ctypes Structure support and aligns with how for_each_segment (line 277) and zip_eq (line 316) already handle scalar buffers.

crates/vm/src/stdlib/operator.rs (1)

318-341: ASCII check rename to isascii is consistent with PyStr API

The _compare_digest path still correctly rejects non-ASCII strings; this just follows the new PyStr::isascii naming and looks good.

crates/vm/src/builtins/type.rs (1)

1446-1452: __mro__ getter matches expected semantics

Exposing __mro__ via mro_map_collect is straightforward and consistent with CPython; returning a new PyTuple per access is fine here.

crates/vm/src/builtins/object.rs (1)

453-488: __class__ getter rename is behavior‑preserving

Exposing __class__ directly instead of via a named getter still returns obj.class().to_owned() and keeps the existing setter semantics, so this is a clean naming/annotation tidy‑up.

crates/vm/src/builtins/str.rs (1)

628-632: isascii / __format__ binding cleanups are consistent

Renaming the internal method to isascii and relying on #[pymethod]’s default export keeps the Python API (str.isascii) intact and matches the bytes/bytearray helpers. Likewise, dropping the explicit name = "__format__" while keeping the Rust function named __format__ doesn’t change behavior.

Also applies to: 963-984

crates/vm/src/stdlib/ctypes/array.rs (2)

1-132: Array type creation and PyCArrayType initialization look sound

The new array_type_from_ctype and PyCArrayType::Initializer wiring are a big improvement:

  • You now cache array types in _array_type_cache keyed by (itemtype, length), matching CPython’s strategy and avoiding type churn.
  • element_size * length is checked with checked_mul (and mirrored in the Initializer path), which closes previous overflow holes for huge arrays.
  • _length_ and _type_ are validated and can be inherited from parent StgInfo, with appropriate ValueError/OverflowError semantics.
  • StgInfo for arrays is centralized via StgInfo::new_array, and _type_ / _length_ attributes plus char/wchar getsets are consistently attached.

This design should be much easier to reason about than ad‑hoc per‑class initialization.

Also applies to: 136-283, 285-357


963-1032: Wide‑char helpers and char/wchar array getsets are aligned and platform‑aware

The new wide‑char utilities and array getsets look good:

  • WCHAR_SIZE is derived from libc::wchar_t, and both wchar_from_bytes/wchar_to_bytes and wstring_from_bytes consistently branch on that size, covering UTF‑16 (Windows) and UTF‑32 (Unix‑like) layouts.
  • The "Z" (c_wchar_p) read/write paths in read_element_from_buffer / write_element_to_buffer now use WCHAR_SIZE and assemble/disassemble code points accordingly, fixing the earlier hard‑coded 32‑bit assumption.
  • char_array_* and wchar_array_* get/set pairs implement value/raw in a reasonably CPython‑like way, with length checks and null‑termination for char/wchar arrays; the dynamic add_*_array_getsets helpers correctly attach these descriptors to the generated array types.

Overall this brings the wchar handling much closer to CPython, especially on Windows.

Also applies to: 1040-1091, 1095-1229, 1230-1281

crates/vm/src/stdlib/ctypes/structure.rs (2)

31-129: Struct metatype initialization and _fields_ handling look correct and deadlock‑safe

The new PyCStructType implementation is a substantial cleanup:

  • slot_new marks base classes as FINAL and seeds a default StgInfo, deferring real layout work to Initializer::init.
  • The Initializer respects _abstract_, supports inheriting StgInfo from a base (clearing FINAL in the subclass, marking the base FINAL), and falls back to a sensible default StgInfo when there’s no base.
  • process_fields carefully computes offsets, padding, alignment, flags (TYPEFLAG_HASPOINTER, TYPEFLAG_HASUNION, TYPEFLAG_HASBITFIELD), and a PEP‑3118 style format string, while cloning base ffi field types; it also marks field types FINAL as they’re used, mirroring CPython’s semantics.
  • SetAttr for _fields_ checks the FINAL flag in a separate scope so the read lock on StgInfo is dropped before calling process_fields, which avoids RwLock self‑deadlocks; it also forbids deletion and mutation after finalization, as expected.

Overall this matches CPython’s structure layout behavior quite closely and should be much easier to maintain.

Also applies to: 131-167, 169-393, 407-425, 427-491


493-507: PyCStructure wrapping and buffer exposure align with the new StgInfo model

The PyCStructure refactor to a transparent wrapper over PyCData integrates nicely with the new StgInfo‑driven layout:

  • slot_new pulls size, align, and length from the type’s StgInfo, marks the type FINAL on first instantiation, and uses _fields_ to collect field names for initializing from positional/keyword arguments.
  • The instance buffer is created via PyCData::from_stg_info, so you don’t have to manually juggle buffer sizing or alignment here.
  • Field initialization delegates to normal attribute setting (set_attr), letting the CField descriptors do the heavy lifting; you correctly guard against too many positional initializers.
  • AsBuffer uses StgInfo.format when available (or "B" otherwise) and exposes a 0‑D buffer with itemsize == struct_size and len == buffer_len, which is consistent with a scalar struct view and CPython’s buffer semantics.

This all looks coherent with the rest of the ctypes overhaul.

Also applies to: 511-589, 598-637

crates/vm/src/stdlib/ctypes/pointer.rs (3)

17-56: LGTM!

The Initializer implementation for PyCPointerType correctly sets up the StgInfo with appropriate size, alignment, flags, and format string. The expect on line 47 is safe since it's guarded by if let Some(ref proto).


60-119: LGTM!

The from_param implementation follows the correct precedence order matching CPython's behavior: None → same type instance → pointed-to type (wrapped with byref) → compatible array/pointer → _as_parameter_ fallback.


222-234: LGTM!

The Initializer for PyCPointer correctly handles optional initial contents, delegating to set_contents only when a non-None value is provided.

crates/vm/src/stdlib/ctypes.rs (1)

136-237: FFI type mapping now matches platform C sizes

The TypeInfo / type_info table correctly derives FFI types from mem::size_of for c_long/c_ulong and WideChar, fixing the earlier LP64/LLP64 mismatches and ensuring c_wchar follows the actual underlying size. This looks consistent with ffi_type_from_tag in function.rs.

crates/vm/src/stdlib/ctypes/union.rs (1)

136-295: Union _fields_ processing and StgInfo population look consistent with structure handling

The new process_fields:

  • Validates _fields_ type (list/tuple of tuples, name is str, type is ctypes type),
  • Derives size/align via get_field_size / get_field_align, honours _pack_ / _align_,
  • Propagates HASPOINTER/HASBITFIELD, records ffi_field_types, and marks field types FINAL,
  • Computes an aligned union size and sets TYPEFLAG_HASUNION and a PEP-3118 "B" format.

This matches CPython’s ctypes union semantics and integrates cleanly with the new StgInfo model.

crates/vm/src/stdlib/ctypes/function.rs (2)

79-150: Pointer conversion helper (convert_to_pointer) aligns with ctypes’ ConvParam semantics

The layered handling (CArgObject → byref buffers, None → NULL, array/structure/pointer/simple → appropriate buffer or stored pointer, bytes/int → raw address, _as_parameter_ recursion) matches CPython’s ConvParam logic and provides a single place to reason about pointer arguments.


1860-1898: Thunk callback plumbing and unraisable handling look solid

The thunk_callback implementation:

  • Converts C arguments to Python using ffi_to_python,
  • Calls the Python callable inside the current VM,
  • Sends any exceptions through run_unraisable with a helpful context message,
  • Writes the result back via python_to_ffi using the declared restype.

The lifetime management via ThunkData / ThunkUserData (boxed userdata + libffi Closure) and PyCThunk’s PyRwLock<Option<ThunkData>> keeps the closure and userdata alive for the lifetime of the PyCThunk. This overall structure looks correct.

Comment on lines +530 to 683
fn getitem_by_index(zelf: &Py<PyCArray>, i: isize, vm: &VirtualMachine) -> PyResult {
let stg = zelf.class().stg_info_opt();
let length = stg.as_ref().map_or(0, |i| i.length) as isize;
let index = if i < 0 { length + i } else { i };
if index < 0 || index >= length {
return Err(vm.new_index_error("array index out of range".to_owned()));
return Err(vm.new_index_error("invalid index"));
}
let index = index as usize;
let element_size = zelf.element_size.load();
let element_size = stg.as_ref().map_or(0, |i| i.element_size);
let offset = index * element_size;
let buffer = zelf.cdata.read().buffer.clone();
if offset + element_size <= buffer.len() {
let bytes = &buffer[offset..offset + element_size];
Ok(Self::bytes_to_int(bytes, element_size, vm))
let type_code = Self::get_element_type_code(zelf, vm);

// Get target buffer and offset (base's buffer if available, otherwise own)
let base_obj = zelf.0.base.read().clone();
let (buffer_lock, final_offset) = if let Some(cdata) = base_obj
.as_ref()
.and_then(|b| b.downcast_ref::<super::PyCData>())
{
(&cdata.buffer, zelf.0.base_offset.load() + offset)
} else {
Ok(vm.ctx.new_int(0).into())
(&zelf.0.buffer, offset)
};

let buffer = buffer_lock.read();
Self::read_element_from_buffer(
&buffer,
final_offset,
element_size,
type_code.as_deref(),
vm,
)
}

/// Helper to read an element value from a buffer at given offset
fn read_element_from_buffer(
buffer: &[u8],
offset: usize,
element_size: usize,
type_code: Option<&str>,
vm: &VirtualMachine,
) -> PyResult {
match type_code {
Some("c") => {
// Return single byte as bytes
if offset < buffer.len() {
Ok(vm.ctx.new_bytes(vec![buffer[offset]]).into())
} else {
Ok(vm.ctx.new_bytes(vec![0]).into())
}
}
Some("u") => {
// Return single wchar as str
if let Some(code) = wchar_from_bytes(&buffer[offset..]) {
let s = char::from_u32(code)
.map(|c| c.to_string())
.unwrap_or_default();
Ok(vm.ctx.new_str(s).into())
} else {
Ok(vm.ctx.new_str("").into())
}
}
Some("z") => {
// c_char_p: pointer to bytes - dereference to get string
if offset + element_size > buffer.len() {
return Ok(vm.ctx.none());
}
let ptr_bytes = &buffer[offset..offset + element_size];
let ptr_val = usize::from_ne_bytes(
ptr_bytes
.try_into()
.unwrap_or([0; std::mem::size_of::<usize>()]),
);
if ptr_val == 0 {
return Ok(vm.ctx.none());
}
// Read null-terminated string from pointer address
unsafe {
let ptr = ptr_val as *const u8;
let mut len = 0;
while *ptr.add(len) != 0 {
len += 1;
}
let bytes = std::slice::from_raw_parts(ptr, len);
Ok(vm.ctx.new_bytes(bytes.to_vec()).into())
}
}
Some("Z") => {
// c_wchar_p: pointer to wchar_t - dereference to get string
if offset + element_size > buffer.len() {
return Ok(vm.ctx.none());
}
let ptr_bytes = &buffer[offset..offset + element_size];
let ptr_val = usize::from_ne_bytes(
ptr_bytes
.try_into()
.unwrap_or([0; std::mem::size_of::<usize>()]),
);
if ptr_val == 0 {
return Ok(vm.ctx.none());
}
// Read null-terminated wide string using WCHAR_SIZE
unsafe {
let ptr = ptr_val as *const u8;
let mut chars = Vec::new();
let mut pos = 0usize;
loop {
let code = if WCHAR_SIZE == 2 {
let bytes = std::slice::from_raw_parts(ptr.add(pos), 2);
u16::from_ne_bytes([bytes[0], bytes[1]]) as u32
} else {
let bytes = std::slice::from_raw_parts(ptr.add(pos), 4);
u32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
};
if code == 0 {
break;
}
if let Some(ch) = char::from_u32(code) {
chars.push(ch);
}
pos += WCHAR_SIZE;
}
let s: String = chars.into_iter().collect();
Ok(vm.ctx.new_str(s).into())
}
}
Some("f") => {
// c_float
if offset + 4 <= buffer.len() {
let bytes: [u8; 4] = buffer[offset..offset + 4].try_into().unwrap();
let val = f32::from_ne_bytes(bytes);
Ok(vm.ctx.new_float(val as f64).into())
} else {
Ok(vm.ctx.new_float(0.0).into())
}
}
Some("d") | Some("g") => {
// c_double / c_longdouble - read f64 from first 8 bytes
if offset + 8 <= buffer.len() {
let bytes: [u8; 8] = buffer[offset..offset + 8].try_into().unwrap();
let val = f64::from_ne_bytes(bytes);
Ok(vm.ctx.new_float(val).into())
} else {
Ok(vm.ctx.new_float(0.0).into())
}
}
_ => {
if offset + element_size <= buffer.len() {
let bytes = &buffer[offset..offset + element_size];
Ok(Self::bytes_to_int(bytes, element_size, type_code, vm))
} else {
Ok(vm.ctx.new_int(0).into())
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Slice reads for c_char / c_wchar arrays ignore base / base_offset

getitem_by_index and setitem_by_index correctly respect PyCData::base and base_offset: for arrays that are views into another object’s buffer (e.g. a c_char * n field inside a Structure, or arrays created via from-buffer APIs), element access is routed through the underlying base buffer.

In contrast, getitem_by_slice does:

  • For "c" (c_char arrays), it reads from zelf.0.buffer only, computing offsets as idx * element_size.
  • For "u" (c_wchar arrays), it also reads from zelf.0.buffer using WCHAR_SIZE, again ignoring base and base_offset.

So for array instances whose storage actually lives in a base CData, integer indexing will hit the correct underlying bytes, but slicing will instead read from the local buffer (which may be zero‑initialized or simply not tied to the base). That can lead to:

  • Inconsistent results between arr[i] and arr[i:j] for the same array.
  • Wrong behavior for c_char * n and c_wchar * n fields on structures/unions or other base‑backed arrays.

The “other types” branch of getitem_by_slice already delegates to getitem_by_index, so it inherits the correct base handling; only the "c" and "u" cases bypass it.

I’d suggest:

  • Mirroring the base/offset resolution from getitem_by_index in the "c" and "u" branches of getitem_by_slice (compute (buffer_lock, base_offset) once, then slice within that view), or
  • Reusing read_element_from_buffer via per‑index iteration even for the contiguous cases, but with the resolved final_offset based on base/base_offset.

Also applies to: 686-860, 881-967, 996-1032

🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/array.rs around lines 530-683: the slice-read
branches for type codes "c" and "u" ignore PyCData::base and base_offset,
causing slices to read from the local buffer instead of the underlying base
buffer; fix by reusing the same base/offset resolution logic used in
getitem_by_index (determine base_obj, select buffer_lock and compute final
base_offset = zelf.0.base_offset.load() + relative_offset when base exists,
otherwise use zelf.0.buffer and offset) and then read the slice data from that
resolved buffer/offset (either by calling read_element_from_buffer per element
with computed offsets or by slicing the resolved buffer at the computed start
and length); apply the same pattern to the other affected ranges (lines
~686-860, 881-967, 996-1032).

Comment on lines +1440 to +1507
/// Convert raw FFI result to Python object
fn convert_raw_result(
raw_result: &mut RawResult,
call_info: &CallInfo,
vm: &VirtualMachine,
) -> Option<PyObjectRef> {
match raw_result {
RawResult::Void => None,
RawResult::Pointer(ptr) => {
// Get type code from restype to determine conversion method
let type_code = call_info
.restype_obj
.as_ref()
.and_then(|t| t.clone().downcast::<PyType>().ok())
.and_then(|t| t.as_object().get_attr(vm.ctx.intern_str("_type_"), vm).ok())
.and_then(|t| t.downcast_ref::<PyStr>().map(|s| s.to_string()));

match type_code.as_deref() {
Some("z") => {
// c_char_p: NULL -> None, otherwise read C string -> bytes
if *ptr == 0 {
Some(vm.ctx.none())
} else {
let cstr = unsafe { std::ffi::CStr::from_ptr(*ptr as _) };
Some(vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into())
}
}
Some("Z") => {
// c_wchar_p: NULL -> None, otherwise read wide string -> str
if *ptr == 0 {
Some(vm.ctx.none())
} else {
let wstr_ptr = *ptr as *const libc::wchar_t;
let mut len = 0;
unsafe {
while *wstr_ptr.add(len) != 0 {
len += 1;
}
}
let slice = unsafe { std::slice::from_raw_parts(wstr_ptr, len) };
let s: String = slice
.iter()
.filter_map(|&c| char::from_u32(c as u32))
.collect();
Some(vm.ctx.new_str(s).into())
}
}
_ => {
// c_void_p ("P") and other pointer types: NULL -> None, otherwise int
if *ptr == 0 {
Some(vm.ctx.none())
} else {
Some(vm.ctx.new_int(*ptr).into())
}
}
}
}
RawResult::Value(val) => call_info
.restype_obj
.as_ref()
.and_then(|f| f.clone().downcast::<PyType>().ok())
.map(|f| f.from_ffi_type(&mut output, vm).ok().flatten())
.unwrap_or_else(|| Some(vm.ctx.new_int(output as i32).as_object().to_pyobject(vm)));
if let Some(return_type) = return_type {
Ok(return_type)
} else {
Ok(vm.ctx.none())
.map(|f| {
f.from_ffi_type(val as *mut _ as *mut c_void, vm)
.ok()
.flatten()
})
.unwrap_or_else(|| Some(vm.ctx.new_int(*val as usize).as_object().to_pyobject(vm))),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pointer restypes: wide-char decoding is still wrong on Windows

convert_raw_result now distinguishes "z" / "Z" / others, which is good, but the "Z" branch still does:

  • Treat libc::wchar_t elements as full Unicode code points with char::from_u32(c as u32).
  • On Windows, wchar_t is 16-bit UTF‑16, so surrogate pairs are mishandled and dropped.

This is the same UTF‑16 vs UTF‑32 issue as in PyCSimple::value and will corrupt c_wchar_p restype returns containing non‑BMP characters.

You already have a correct Windows implementation in wstring_at_impl using Wtf8Buf::from_wide. That logic should be reused here.

Suggested change for the `"Z"` arm
-                Some("Z") => {
-                    // c_wchar_p: NULL -> None, otherwise read wide string -> str
-                    if *ptr == 0 {
-                        Some(vm.ctx.none())
-                    } else {
-                        let wstr_ptr = *ptr as *const libc::wchar_t;
-                        let mut len = 0;
-                        unsafe {
-                            while *wstr_ptr.add(len) != 0 {
-                                len += 1;
-                            }
-                        }
-                        let slice = unsafe { std::slice::from_raw_parts(wstr_ptr, len) };
-                        let s: String = slice
-                            .iter()
-                            .filter_map(|&c| char::from_u32(c as u32))
-                            .collect();
-                        Some(vm.ctx.new_str(s).into())
-                    }
-                }
+                Some("Z") => {
+                    if *ptr == 0 {
+                        Some(vm.ctx.none())
+                    } else {
+                        let wstr_ptr = *ptr as *const libc::wchar_t;
+                        #[cfg(windows)]
+                        {
+                            use rustpython_common::wtf8::Wtf8Buf;
+                            let mut len = 0;
+                            unsafe {
+                                while *wstr_ptr.add(len) != 0 {
+                                    len += 1;
+                                }
+                                let slice = std::slice::from_raw_parts(wstr_ptr, len);
+                                let wide: Vec<u16> = slice.iter().map(|&c| c as u16).collect();
+                                let wtf8 = Wtf8Buf::from_wide(&wide);
+                                Some(vm.ctx.new_str(wtf8).into())
+                            }
+                        }
+                        #[cfg(not(windows))]
+                        {
+                            let mut len = 0;
+                            unsafe {
+                                while *wstr_ptr.add(len) != 0 {
+                                    len += 1;
+                                }
+                                let slice = std::slice::from_raw_parts(wstr_ptr, len);
+                                let s: String = slice
+                                    .iter()
+                                    .filter_map(|&c| char::from_u32(c as u32))
+                                    .collect();
+                                Some(vm.ctx.new_str(s).into())
+                            }
+                        }
+                    }
+                }
🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/function.rs around lines 1440 to 1507: the "Z"
branch decodes libc::wchar_t as full Unicode code points which breaks on Windows
where wchar_t is UTF‑16; replace the manual char::from_u32 iteration with the
same logic used by wstring_at_impl (or call wstring_at_impl/Wtf8Buf::from_wide)
to correctly decode UTF‑16 surrogate pairs on Windows and UTF‑32 on Unix, i.e.
build the wide slice from ptr, convert it via Wtf8Buf::from_wide (or
platform-appropriate conversion) into a Rust String/WTF-8 and return
vm.ctx.new_str(...) ; ensure NULL handling, unsafe slice creation, and
feature-gated/conditional code so non-Windows keeps the existing UTF‑32
behavior.

Comment on lines +1723 to +1767
fn ffi_to_python(ty: &Py<PyType>, ptr: *const c_void, vm: &VirtualMachine) -> PyObjectRef {
let type_code = ty.type_code(vm);
let raw_value: PyObjectRef = unsafe {
match type_code.as_deref() {
Some("b") => vm.ctx.new_int(*(ptr as *const i8) as i32).into(),
Some("B") => vm.ctx.new_int(*(ptr as *const u8) as i32).into(),
Some("c") => vm.ctx.new_bytes(vec![*(ptr as *const u8)]).into(),
Some("h") => vm.ctx.new_int(*(ptr as *const i16) as i32).into(),
Some("H") => vm.ctx.new_int(*(ptr as *const u16) as i32).into(),
Some("i") => vm.ctx.new_int(*(ptr as *const i32)).into(),
Some("I") => vm.ctx.new_int(*(ptr as *const u32)).into(),
Some("l") => vm.ctx.new_int(*(ptr as *const libc::c_long)).into(),
Some("L") => vm.ctx.new_int(*(ptr as *const libc::c_ulong)).into(),
Some("q") => vm.ctx.new_int(*(ptr as *const libc::c_longlong)).into(),
Some("Q") => vm.ctx.new_int(*(ptr as *const libc::c_ulonglong)).into(),
Some("f") => vm.ctx.new_float(*(ptr as *const f32) as f64).into(),
Some("d") => vm.ctx.new_float(*(ptr as *const f64)).into(),
Some("z") => {
// c_char_p: C string pointer → Python bytes
let cstr_ptr = *(ptr as *const *const libc::c_char);
if cstr_ptr.is_null() {
vm.ctx.none()
} else {
let cstr = std::ffi::CStr::from_ptr(cstr_ptr);
vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into()
}
}
Some("Z") => {
// c_wchar_p: wchar_t* → Python str
let wstr_ptr = *(ptr as *const *const libc::wchar_t);
if wstr_ptr.is_null() {
vm.ctx.none()
} else {
let mut len = 0;
while *wstr_ptr.add(len) != 0 {
len += 1;
}
let slice = std::slice::from_raw_parts(wstr_ptr, len);
let s: String = slice
.iter()
.filter_map(|&c| char::from_u32(c as u32))
.collect();
vm.ctx.new_str(s).into()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wide-char callback arguments ("Z") share the same UTF‑16 decoding bug as other Z paths

The "Z" arm of ffi_to_python (used for callback argument conversion) also interprets libc::wchar_t as a full code point and uses char::from_u32, which is wrong on Windows where wchar_t is UTF‑16. This should be fixed in the same way as convert_raw_result / PyCSimple::value, ideally by reusing a shared helper that mirrors _ctypes.wstring_at_impl.

Comment on lines +1723 to +1780
fn ffi_to_python(ty: &Py<PyType>, ptr: *const c_void, vm: &VirtualMachine) -> PyObjectRef {
let type_code = ty.type_code(vm);
let raw_value: PyObjectRef = unsafe {
match type_code.as_deref() {
Some("b") => vm.ctx.new_int(*(ptr as *const i8) as i32).into(),
Some("B") => vm.ctx.new_int(*(ptr as *const u8) as i32).into(),
Some("c") => vm.ctx.new_bytes(vec![*(ptr as *const u8)]).into(),
Some("h") => vm.ctx.new_int(*(ptr as *const i16) as i32).into(),
Some("H") => vm.ctx.new_int(*(ptr as *const u16) as i32).into(),
Some("i") => vm.ctx.new_int(*(ptr as *const i32)).into(),
Some("I") => vm.ctx.new_int(*(ptr as *const u32)).into(),
Some("l") => vm.ctx.new_int(*(ptr as *const libc::c_long)).into(),
Some("L") => vm.ctx.new_int(*(ptr as *const libc::c_ulong)).into(),
Some("q") => vm.ctx.new_int(*(ptr as *const libc::c_longlong)).into(),
Some("Q") => vm.ctx.new_int(*(ptr as *const libc::c_ulonglong)).into(),
Some("f") => vm.ctx.new_float(*(ptr as *const f32) as f64).into(),
Some("d") => vm.ctx.new_float(*(ptr as *const f64)).into(),
Some("z") => {
// c_char_p: C string pointer → Python bytes
let cstr_ptr = *(ptr as *const *const libc::c_char);
if cstr_ptr.is_null() {
vm.ctx.none()
} else {
let cstr = std::ffi::CStr::from_ptr(cstr_ptr);
vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into()
}
}
Some("Z") => {
// c_wchar_p: wchar_t* → Python str
let wstr_ptr = *(ptr as *const *const libc::wchar_t);
if wstr_ptr.is_null() {
vm.ctx.none()
} else {
let mut len = 0;
while *wstr_ptr.add(len) != 0 {
len += 1;
}
let slice = std::slice::from_raw_parts(wstr_ptr, len);
let s: String = slice
.iter()
.filter_map(|&c| char::from_u32(c as u32))
.collect();
vm.ctx.new_str(s).into()
}
}
Some("P") => vm.ctx.new_int(*(ptr as *const usize)).into(),
Some("?") => vm.ctx.new_bool(*(ptr as *const u8) != 0).into(),
_ => return vm.ctx.none(),
}
};

if !is_simple_subclass(ty, vm) {
return raw_value;
}
ty.as_object()
.call((raw_value.clone(),), vm)
.unwrap_or(raw_value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Callback result marshalling for c_char_p / c_wchar_p is too limited and can leave result undefined

In python_to_ffi, the "z" / "Z" / "P" branch only accepts values that pass try_int(vm) and writes them directly into the result slot as usize. For callbacks declared with restype c_char_p / c_wchar_p, CPython allows returning bytes / str and arranges for a temporary NULL-terminated buffer whose lifetime is tied to the call.

Current behaviour:

  • CFUNCTYPE(c_char_p)(py_callable) where py_callable returns bytes will silently do nothing in the "z" arm (since try_int fails), leaving result uninitialized.
  • Same for c_wchar_p and str.

At minimum this should reject unsupported return types with a clear Python error; ideally, it should mirror the constructor logic and:

  • For "z": accept bytes/bytearray/None and allocate a null-terminated buffer via ensure_z_null_terminated.
  • For "Z": accept str/None and allocate via str_to_wchar_bytes.
  • Keep the holder alive for the duration of the callback (e.g., by storing it in the ThunkUserData or another keepalive attached to the thunk).

Given the complexity, even a conservative improvement that raises on non-int results for "z"/"Z" would be better than silently writing garbage.

Comment on lines +605 to +628
unsafe {
let ptr = addr as *const u8;
match type_code {
Some("c") => Ok(vm.ctx.new_bytes(vec![*ptr]).into()),
Some("b") => Ok(vm.ctx.new_int(*(ptr as *const i8) as i32).into()),
Some("B") => Ok(vm.ctx.new_int(*ptr as i32).into()),
Some("h") => Ok(vm.ctx.new_int(*(ptr as *const i16) as i32).into()),
Some("H") => Ok(vm.ctx.new_int(*(ptr as *const u16) as i32).into()),
Some("i") | Some("l") => Ok(vm.ctx.new_int(*(ptr as *const i32)).into()),
Some("I") | Some("L") => Ok(vm.ctx.new_int(*(ptr as *const u32)).into()),
Some("q") => Ok(vm.ctx.new_int(*(ptr as *const i64)).into()),
Some("Q") => Ok(vm.ctx.new_int(*(ptr as *const u64)).into()),
Some("f") => Ok(vm.ctx.new_float(*(ptr as *const f32) as f64).into()),
Some("d") | Some("g") => Ok(vm.ctx.new_float(*(ptr as *const f64)).into()),
Some("P") | Some("z") | Some("Z") => {
Ok(vm.ctx.new_int(*(ptr as *const usize)).into())
}
_ => {
// Default: read as bytes
let bytes = std::slice::from_raw_parts(ptr, size).to_vec();
Ok(vm.ctx.new_bytes(bytes).into())
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential undefined behavior from unaligned memory access

The pointer casts (e.g., ptr as *const i16, ptr as *const i32) assume the address is properly aligned for the target type. On architectures with strict alignment requirements (e.g., ARM), unaligned reads cause undefined behavior or hardware faults.

🔎 Suggested fix using read_unaligned
 unsafe {
     let ptr = addr as *const u8;
     match type_code {
         Some("c") => Ok(vm.ctx.new_bytes(vec![*ptr]).into()),
         Some("b") => Ok(vm.ctx.new_int(*(ptr as *const i8) as i32).into()),
         Some("B") => Ok(vm.ctx.new_int(*ptr as i32).into()),
-        Some("h") => Ok(vm.ctx.new_int(*(ptr as *const i16) as i32).into()),
-        Some("H") => Ok(vm.ctx.new_int(*(ptr as *const u16) as i32).into()),
-        Some("i") | Some("l") => Ok(vm.ctx.new_int(*(ptr as *const i32)).into()),
-        Some("I") | Some("L") => Ok(vm.ctx.new_int(*(ptr as *const u32)).into()),
-        Some("q") => Ok(vm.ctx.new_int(*(ptr as *const i64)).into()),
-        Some("Q") => Ok(vm.ctx.new_int(*(ptr as *const u64)).into()),
-        Some("f") => Ok(vm.ctx.new_float(*(ptr as *const f32) as f64).into()),
-        Some("d") | Some("g") => Ok(vm.ctx.new_float(*(ptr as *const f64)).into()),
-        Some("P") | Some("z") | Some("Z") => {
-            Ok(vm.ctx.new_int(*(ptr as *const usize)).into())
-        }
+        Some("h") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const i16) as i32).into()),
+        Some("H") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const u16) as i32).into()),
+        Some("i") | Some("l") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const i32)).into()),
+        Some("I") | Some("L") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const u32)).into()),
+        Some("q") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const i64)).into()),
+        Some("Q") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const u64)).into()),
+        Some("f") => Ok(vm.ctx.new_float(std::ptr::read_unaligned(ptr as *const f32) as f64).into()),
+        Some("d") | Some("g") => Ok(vm.ctx.new_float(std::ptr::read_unaligned(ptr as *const f64)).into()),
+        Some("P") | Some("z") | Some("Z") => {
+            Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const usize)).into())
+        }
         // ...
     }
 }
🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/pointer.rs around lines 605-628: the current code
casts the raw byte pointer to larger integer/float pointer types (e.g., *const
i16, *const i32) and dereferences them, which can cause undefined behavior on
architectures that require alignment; change those direct dereferences to use
std::ptr::read_unaligned on the appropriately cast *const T so the value is read
safely even when unaligned (for each type case cast ptr to *const T and call
ptr::read_unaligned(ptr as *const T) and then convert to the VM value as before;
do this for i8/u8/i16/u16/i32/u32/i64/u64/f32/f64/usize cases while leaving the
default bytes branch unchanged).

Comment on lines +688 to +699
let f = float_val.to_f64();
match size {
4 => {
*(ptr as *mut f32) = f as f32;
}
8 => {
*(ptr as *mut f64) = f;
}
_ => {}
}
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unaligned writes for float values

Similar to the read case, writing floats via pointer cast assumes alignment. Use write_unaligned for safety.

🔎 Suggested fix
 if let Ok(float_val) = value.try_float(vm) {
     let f = float_val.to_f64();
     match size {
         4 => {
-            *(ptr as *mut f32) = f as f32;
+            std::ptr::write_unaligned(ptr as *mut f32, f as f32);
         }
         8 => {
-            *(ptr as *mut f64) = f;
+            std::ptr::write_unaligned(ptr as *mut f64, f);
         }
         _ => {}
     }
     return Ok(());
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let f = float_val.to_f64();
match size {
4 => {
*(ptr as *mut f32) = f as f32;
}
8 => {
*(ptr as *mut f64) = f;
}
_ => {}
}
return Ok(());
}
let f = float_val.to_f64();
match size {
4 => {
std::ptr::write_unaligned(ptr as *mut f32, f as f32);
}
8 => {
std::ptr::write_unaligned(ptr as *mut f64, f);
}
_ => {}
}
return Ok(());
}
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/pointer.rs around lines 688 to 699, the code
performs potentially unaligned writes by dereferencing ptr as *mut f32/*mut f64
and assigning; replace those direct writes with core::ptr::write_unaligned to
avoid UB on unaligned addresses. Specifically, inside the match arms for size 4
and 8, cast ptr to *mut f32/*mut f64 and call write_unaligned with f as f32 / f
as f64 (inside the existing unsafe), keep the default branch and return Ok(())
logic unchanged.

Comment on lines +1087 to +1134
#[pygetset]
pub fn value(instance: PyObjectRef, vm: &VirtualMachine) -> PyResult<PyObjectRef> {
let zelf: &Py<Self> = instance
.downcast_ref()
.ok_or_else(|| vm.new_type_error("cannot get value of instance"))?;

// Get _type_ from class
let cls = zelf.class();
let type_attr = cls
.as_object()
.get_attr("_type_", vm)
.map_err(|_| vm.new_type_error("no _type_ attribute"))?;
let type_code = type_attr.str(vm)?.to_string();

// Special handling for c_char_p (z) and c_wchar_p (Z)
// z_get, Z_get - dereference pointer to get string
if type_code == "z" {
// c_char_p: read pointer from buffer, dereference to get bytes string
let buffer = zelf.0.buffer.read();
let ptr = super::base::read_ptr_from_buffer(&buffer);
if ptr == 0 {
return Ok(vm.ctx.none());
}
// Read null-terminated string at the address
unsafe {
let cstr = std::ffi::CStr::from_ptr(ptr as _);
return Ok(vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into());
}
}
if type_code == "Z" {
// c_wchar_p: read pointer from buffer, dereference to get wide string
let buffer = zelf.0.buffer.read();
let ptr = super::base::read_ptr_from_buffer(&buffer);
if ptr == 0 {
return Ok(vm.ctx.none());
}
// Read null-terminated wide string at the address
unsafe {
let w_ptr = ptr as *const libc::wchar_t;
let len = libc::wcslen(w_ptr);
let wchars = std::slice::from_raw_parts(w_ptr, len);
let s: String = wchars
.iter()
.filter_map(|&c| char::from_u32(c as u32))
.collect();
return Ok(vm.ctx.new_str(s).into());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

c_wchar_p.value decodes UTF-16 incorrectly on Windows

In the "Z" branch of PyCSimple::value, you treat libc::wchar_t elements as full Unicode code points via char::from_u32(c as u32). On Windows, wchar_t is 16-bit UTF-16:

  • Surrogate pairs are split into individual 16‑bit units.
  • char::from_u32 returns None for surrogate code points, so they are silently dropped.
  • Any character outside the BMP will be corrupted or lost.

You already handle this correctly in _ctypes.wstring_at_impl using Wtf8Buf::from_wide on Windows. The same approach should be used here.

Suggested Windows-aware fix for `Z` case
-        if type_code == "Z" {
-            // c_wchar_p: read pointer from buffer, dereference to get wide string
-            let buffer = zelf.0.buffer.read();
-            let ptr = super::base::read_ptr_from_buffer(&buffer);
-            if ptr == 0 {
-                return Ok(vm.ctx.none());
-            }
-            // Read null-terminated wide string at the address
-            unsafe {
-                let w_ptr = ptr as *const libc::wchar_t;
-                let len = libc::wcslen(w_ptr);
-                let wchars = std::slice::from_raw_parts(w_ptr, len);
-                let s: String = wchars
-                    .iter()
-                    .filter_map(|&c| char::from_u32(c as u32))
-                    .collect();
-                return Ok(vm.ctx.new_str(s).into());
-            }
-        }
+        if type_code == "Z" {
+            // c_wchar_p: read pointer from buffer, dereference to get wide string
+            let buffer = zelf.0.buffer.read();
+            let ptr = super::base::read_ptr_from_buffer(&buffer);
+            if ptr == 0 {
+                return Ok(vm.ctx.none());
+            }
+            unsafe {
+                let w_ptr = ptr as *const libc::wchar_t;
+                let len = libc::wcslen(w_ptr);
+                let wchars = std::slice::from_raw_parts(w_ptr, len);
+                #[cfg(windows)]
+                {
+                    use rustpython_common::wtf8::Wtf8Buf;
+                    let wide: Vec<u16> = wchars.iter().map(|&c| c as u16).collect();
+                    let wtf8 = Wtf8Buf::from_wide(&wide);
+                    return Ok(vm.ctx.new_str(wtf8).into());
+                }
+                #[cfg(not(windows))]
+                {
+                    let s: String = wchars
+                        .iter()
+                        .filter_map(|&c| char::from_u32(c as u32))
+                        .collect();
+                    return Ok(vm.ctx.new_str(s).into());
+                }
+            }
+        }
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/simple.rs around lines 1087 to 1134, the "Z"
branch decodes wchar_t elements by mapping each element to a Rust char which
drops UTF-16 surrogates on Windows; change it to use the same platform-aware
decoding as _ctypes.wstring_at_impl: read the wchar_t slice, and on Windows
convert the &[libc::wchar_t] (u16) into a WTF-8 string via Wtf8Buf::from_wide
(or equivalent) then new_str that result, while on non-Windows continue using
from_raw_parts -> char conversion (or convert from u32) as before; ensure
null-termination is handled by using the computed length from wcslen and that
unsafe pointer usage stays confined to creating the slice before calling the
proper conversion.

@youknowone youknowone merged commit 70b9389 into RustPython:main Dec 19, 2025
13 checks passed
@youknowone youknowone deleted the ctypes-overhaul branch December 19, 2025 05:10
@coderabbitai coderabbitai bot mentioned this pull request Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant