-
Notifications
You must be signed in to change notification settings - Fork 1.4k
ctypes overhaul #6450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctypes overhaul #6450
Conversation
WalkthroughMajor multi-module refactor: spell/config updates; standardized dunder/default binding attributes; PEP3118-aware format parsing and buffer validation; Py upcast helpers; and a large, modular redesign of the _ctypes subsystem (new Initializer/SetAttr lifecycle, array/pointer/structure/simple/union, libffi call plumbing, visibility/signature changes). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Py as Python caller
participant VM as Rust VM (_ctypes)
participant FFI as libffi / Native
participant Mem as Native memory
Note right of Py: create / call ctypes objects
Py->>VM: create type or instantiate Array/Struct/Pointer/FuncPtr
VM->>VM: Initializer::init / process_fields / build metadata (StgInfo/type_data)
VM->>Mem: allocate native buffer (size/align)
Mem-->>VM: return buffer pointer
VM-->>Py: return ctypes instance
Note right of Py: call into native code
Py->>VM: call PyCFuncPtr with Python args
VM->>VM: resolve argtypes/restype, build_callargs (Python → FFI)
VM->>FFI: invoke native function via libffi with prepared args
FFI->>Mem: writes result/out-params to buffer
FFI-->>VM: raw result/status
VM->>VM: convert_raw_result (FFI → Python), extract out-parameters
VM-->>Py: return Python result (and out-parameters)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Areas needing extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using git pull origin ctypes-overhaul |
ee0efb7 to
ca7743c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/vm/src/types/structseq.rs (1)
202-211: Fix type inconsistency in single-field case.The closure
format_fieldexpects(&PyObject, _), and the multi-field case (line 215) explicitly converts using.as_ref(). However, the single-field case passesvaluedirectly fromzelf.first().unwrap(), which returns&PyObjectRef, not&PyObject.🔎 Apply this diff to fix the type inconsistency:
let (body, suffix) = if let Some(_guard) = rustpython_vm::recursion::ReprGuard::enter(vm, zelf.as_ref()) { if field_names.len() == 1 { let value = zelf.first().unwrap(); - let formatted = format_field((value, field_names[0]))?; + let formatted = format_field((value.as_ref(), field_names[0]))?; (formatted, ",")crates/vm/src/stdlib/ctypes/array.rs (1)
315-353: Fix incomplete overflow handling inPyCArrayType::multiplyThe current overflow check only returns an error when
checked_mulsucceeds and the result exceedsisize::MAX. Ifchecked_mulreturnsNone(indicating overflow), the check is silently skipped andarray_type_from_ctypeis still called with a hugenvalue, creating nonsensical array types:if let Some(new_size) = current_size.checked_mul(n as usize) && new_size > isize::MAX as usize { return Err(vm.new_overflow_error("array too large")); }Handle both overflow cases:
- if let Some(new_size) = current_size.checked_mul(n as usize) - && new_size > isize::MAX as usize - { - return Err(vm.new_overflow_error("array too large")); - } + if current_size.checked_mul(n as usize) + .is_some_and(|new_size| new_size > isize::MAX as usize) + .not() + { + return Err(vm.new_overflow_error("array too large")); + }Or more explicitly:
- if let Some(new_size) = current_size.checked_mul(n as usize) - && new_size > isize::MAX as usize - { - return Err(vm.new_overflow_error("array too large")); - } + match current_size.checked_mul(n as usize) { + Some(new_size) if new_size > isize::MAX as usize => { + return Err(vm.new_overflow_error("array too large")); + } + None => { + return Err(vm.new_overflow_error("array too large")); + } + _ => {} + }crates/vm/src/stdlib/ctypes/function.rs (1)
333-407:c_bool('?') is not correctly marshalled in callbacks and certain conversion pathsTwo functions still lack handling for type code
"?":
ffi_to_python(lines 1625-1681): C callbacks returningc_boolcurrently fall through to the default case and produceNoneinstead ofTrue/False.python_to_ffi(lines 1683-1752): Python callbacks withrestype = c_booldon't write a proper boolean value back into the C result slot.Note:
ReturnType::from_ffi_typealready correctly handles"?", so regular function return types are not affected.Add the missing
"?"cases:
- In
ffi_to_python, add before the final_ => return vm.ctx.none():Some("P") => vm.ctx.new_int(*(ptr as *const usize)).into(), + Some("?") => vm + .ctx + .new_bool(*(ptr as *const u8) != 0) + .into(), _ => return vm.ctx.none(),
- In
python_to_ffi, add after theSome("P") | Some("z") | Some("Z")block:Some("P") | Some("z") | Some("Z") => { if let Ok(i) = obj.try_int(vm) { *(result as *mut usize) = i.as_bigint().to_usize().unwrap_or(0); } } + Some("?") => { + if let Ok(b) = obj.try_to_bool(vm) { + *(result as *mut u8) = if b { 1 } else { 0 }; + } + } _ => {}
🧹 Nitpick comments (9)
crates/vm/src/stdlib/ctypes/structure.rs (1)
13-29: Silent error swallowing incalculate_struct_size.Line 16 uses
unwrap_or_default()which silently returns an empty vector iftry_to_valuefails. This could mask type errors in_fields_. Consider propagating the error or at least logging it.🔎 Suggested improvement:
- let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default(); + let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;crates/vm/src/stdlib/ctypes/union.rs (2)
11-28: Same silent error swallowing as incalculate_struct_size.Line 14 uses
unwrap_or_default()which silently returns an empty vector on error. Consider propagating the error for consistency with proper error handling.
358-413:SetAttrmay callprocess_fieldstwice for_fields_assignment.When setting
_fields_:
- If a descriptor exists (lines 370-381), it calls the descriptor then
process_fieldsat line 378- Otherwise, it stores the attribute (lines 385-402), then calls
process_fieldsagain at line 408This means
process_fieldsis called after every_fields_assignment regardless of path, but in the descriptor path it's called twice (once at 378, and the function returns at 380 so actually only once there).Wait, looking again: line 380 has
return Ok(())so the second call at 408 is not reached in the descriptor case. This is correct but the control flow is complex. Consider restructuring for clarity.🔎 Suggested restructure for clarity:
impl SetAttr for PyCUnionType { fn setattro( zelf: &Py<Self>, attr_name: &Py<PyStr>, value: PySetterValue, vm: &VirtualMachine, ) -> PyResult<()> { let pytype: &Py<PyType> = zelf.to_base(); let attr_name_interned = vm.ctx.intern_str(attr_name.as_str()); + let is_fields = attr_name.as_str() == "_fields_"; // 1. First, do PyType's setattro (PyType_Type.tp_setattro first) // Check for data descriptor first if let Some(attr) = pytype.get_class_attr(attr_name_interned) { let descr_set = attr.class().mro_find_map(|cls| cls.slots.descr_set.load()); if let Some(descriptor) = descr_set { descriptor(&attr, pytype.to_owned().into(), value.clone(), vm)?; - // After successful setattro, check if _fields_ and call process_fields - if attr_name.as_str() == "_fields_" - && let PySetterValue::Assign(fields_value) = value - { + if is_fields && let PySetterValue::Assign(fields_value) = value { PyCUnionType::process_fields(pytype, fields_value, vm)?; } return Ok(()); } } // Store in type's attributes dict match &value { PySetterValue::Assign(v) => { pytype .attributes .write() .insert(attr_name_interned, v.clone()); + if is_fields { + PyCUnionType::process_fields(pytype, v.clone(), vm)?; + } } PySetterValue::Delete => { let prev = pytype.attributes.write().shift_remove(attr_name_interned); if prev.is_none() { return Err(vm.new_attribute_error(format!( "type object '{}' has no attribute '{}'", pytype.name(), attr_name.as_str(), ))); } } } - - // 2. If _fields_, call process_fields (which checks FINAL internally) - if attr_name.as_str() == "_fields_" - && let PySetterValue::Assign(fields_value) = value - { - PyCUnionType::process_fields(pytype, fields_value, vm)?; - } Ok(()) } }crates/vm/src/buffer.rs (1)
264-281: The implicit fallthrough behavior for standalone T/X is correct but could be more explicit.When
TorXappears without a following{, the code falls through toFormatType::try_from(c), which returns an error sinceTandXare not validFormatTypevariants. This is the intended behavior (standaloneT/Xare invalid), but the code would be clearer with an explicit check and error message explaining that these characters require the{}syntax when used.Consider adding an explicit guard:
if (c == b'T' || c == b'X') && chars.peek() != Some(&b'{') { return Err("'T' and 'X' format specifiers require '{}'".to_owned()); }This would make the constraint explicit rather than relying on
FormatType::try_fromto catch it implicitly.crates/vm/src/stdlib/ctypes/simple.rs (1)
943-963:is_cwchar_array_or_pointerpanics if StgInfo is missing
is_cwchar_array_or_pointeruses.expect("array has StgInfo")/.expect("pointer has StgInfo"). If a user manages to construct a malformedPyCArray/PyCPointersubclass without properStgInfo, this will panic the VM instead of raising a Python exception.This is low‑probability but avoidable.
Suggested defensive change
- if let Some(arr) = value.downcast_ref::<PyCArray>() { - let info = arr.class().stg_info_opt().expect("array has StgInfo"); - let elem_type = info.element_type.as_ref().expect("array has element_type"); + if let Some(arr) = value.downcast_ref::<PyCArray>() { + let info = if let Some(info) = arr.class().stg_info_opt() { + info + } else { + return false; + }; + let elem_type = if let Some(elem_type) = info.element_type.as_ref() { + elem_type + } else { + return false; + };Same pattern for the pointer branch.
crates/vm/src/stdlib/ctypes/array.rs (1)
871-910: Minor: slice ops ignore base buffer for arrays backed by external memory
getitem_by_sliceandwchar_array_set_valuealways operate onzelf.0.bufferand ignorebase/base_offsetthe waygetitem_by_indexdoes. For arrays created viafrom_buffer/ views into other ctypes objects, slicing and the wchar setters will read/write the local buffer instead of the base buffer.This is subtle and probably rare, but it makes slice semantics inconsistent with single‑index semantics on such views.
Direction for aligning slice ops with base-backed arrays
Mirror the logic used in
getitem_by_index:let base_obj = zelf.0.base.read().clone(); let (buffer_lock, offset0) = if let Some(cdata) = base_obj .as_ref() .and_then(|b| b.downcast_ref::<super::PyCData>()) { (&cdata.buffer, zelf.0.base_offset.load()) } else { (&zelf.0.buffer, 0) };Then compute offsets relative to
offset0in both thecanduslice branches and in wchar setters.Also applies to: 1141-1169
crates/vm/src/stdlib/ctypes.rs (1)
486-547:sizeoffallback behavior for non‑StgInfo ctypes types is a bit ad-hocThe
sizeofimplementation tries several cases, then for simple types uses_type_andget_size, but finally returnssize_of::<usize>()or errors for many other cases. For some ctypes scenarios (custom simple subclasses, mis‑initialized types) this may yield surprising sizes instead of a clean error.Not a blocker, but worth tightening once the rest of the ctypes semantics stabilize.
Possible cleanup direction
- Prefer consistently using
StgInfo.sizefor any type that has it, including simple types, rather than recomputing from_type_.- For types without
StgInfoand without_type_/known metaclasses, consider always raisingTypeError("this type has no size")instead of falling back tosize_of::<usize>().crates/vm/src/stdlib/ctypes/function.rs (1)
153-176: Known leak in Windowsconv_paramforstr→ wide pointerThe Windows branch of
conv_paramforPyStrintentionally allocates aVec<u16>, leaks it withmem::forget, and returns its pointer:let wide: Vec<u16> = s.as_str().encode_utf16().chain(std::iter::once(0)).collect(); let addr = wide.as_ptr() as usize; std::mem::forget(wide); return Ok((Type::pointer(), FfiArgValue::Pointer(addr)));The comment notes this
FIXME/leak. Given typical usage, this is probably acceptable for now (and mirrors CPython’s tendency to tie such allocations to_objects), but it is a real leak and will be hit for every Windowsoledllstring parameter converted this way.Future improvement
Once the
_objects/ keep‑alive story for parameters is fully wired, this code should store the wide buffer in an owned object (e.g., aPyBytes/capsule referenced from the argument’s_objects) instead of leaking theVec<u16>.crates/vm/src/stdlib/ctypes/pointer.rs (1)
523-572: Negative pointer indices are stored in_objectsunder hugeusizekeys
setitem_by_indexacceptsindex: isize(allowing negative indices, as C pointer arithmetic does), but when callingkeep_refit usesindex as usize:let offset = index * element_size as isize; let addr = (ptr_value as isize + offset) as usize; ... zelf.0.keep_ref(index as usize, value, vm)For negative
index, this wraps to a very largeusize. Since_objectsis only used to keep referents alive and not looked up by index later, this is mostly an internal bookkeeping quirk rather than a functional bug, but it’s confusing and may make debugging harder.Possible clean‑up
- Either forbid negative indices in
__setitem__(matching typical ctypes usage) and guard before callingsetitem_by_index.- Or change
keep_refto accept a signed index (if feasible) so the stored key matches the logical index used in pointer arithmetic.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
.cspell.dict/python-more.txt(2 hunks).cspell.json(3 hunks)crates/stdlib/src/pystruct.rs(1 hunks)crates/vm/src/buffer.rs(1 hunks)crates/vm/src/builtins/builtin_func.rs(2 hunks)crates/vm/src/builtins/function.rs(1 hunks)crates/vm/src/builtins/object.rs(1 hunks)crates/vm/src/builtins/str.rs(2 hunks)crates/vm/src/builtins/type.rs(1 hunks)crates/vm/src/exceptions.rs(3 hunks)crates/vm/src/object/core.rs(1 hunks)crates/vm/src/protocol/buffer.rs(1 hunks)crates/vm/src/stdlib/ast/python.rs(1 hunks)crates/vm/src/stdlib/codecs.rs(2 hunks)crates/vm/src/stdlib/ctypes.rs(14 hunks)crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(8 hunks)crates/vm/src/stdlib/ctypes/library.rs(5 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)crates/vm/src/stdlib/ctypes/structure.rs(10 hunks)crates/vm/src/stdlib/ctypes/thunk.rs(0 hunks)crates/vm/src/stdlib/ctypes/union.rs(4 hunks)crates/vm/src/stdlib/ctypes/util.rs(0 hunks)crates/vm/src/stdlib/functools.rs(1 hunks)crates/vm/src/stdlib/operator.rs(1 hunks)crates/vm/src/types/structseq.rs(2 hunks)
💤 Files with no reviewable changes (2)
- crates/vm/src/stdlib/ctypes/util.rs
- crates/vm/src/stdlib/ctypes/thunk.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/stdlib/src/pystruct.rscrates/vm/src/buffer.rscrates/vm/src/stdlib/operator.rscrates/vm/src/builtins/type.rscrates/vm/src/protocol/buffer.rscrates/vm/src/types/structseq.rscrates/vm/src/exceptions.rscrates/vm/src/stdlib/functools.rscrates/vm/src/object/core.rscrates/vm/src/builtins/function.rscrates/vm/src/stdlib/codecs.rscrates/vm/src/builtins/str.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/builtins/object.rscrates/vm/src/stdlib/ctypes/library.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/builtins/builtin_func.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/simple.rs
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/exceptions.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
.cspell.json
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
Applied to files:
crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes.rs
🧬 Code graph analysis (8)
crates/stdlib/src/pystruct.rs (1)
crates/vm/src/builtins/str.rs (3)
s(1783-1788)s(1814-1818)isascii(630-632)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
strerror(1544-1548)
crates/vm/src/stdlib/functools.rs (4)
crates/vm/src/builtins/builtin_func.rs (2)
__reduce__(123-126)__reduce__(169-182)crates/vm/src/builtins/function.rs (1)
__reduce__(804-812)crates/vm/src/builtins/object.rs (1)
__reduce__(503-505)crates/vm/src/stdlib/operator.rs (3)
__reduce__(360-365)__reduce__(458-461)__reduce__(527-544)
crates/vm/src/builtins/function.rs (1)
crates/vm/src/types/slot.rs (1)
_cls_is(1083-1085)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytearray.rs (1)
isascii(271-273)crates/vm/src/builtins/bytes.rs (1)
isascii(262-264)crates/vm/src/bytes_inner.rs (1)
isascii(326-328)
crates/vm/src/stdlib/ctypes/union.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (33)
fields(2225-2225)fields(2227-2227)field_type(304-304)field_type(2143-2143)field_type(2158-2158)field_type(2191-2191)field_size(1183-1185)default(129-145)obj(725-725)obj(749-749)obj(771-771)obj(786-786)obj(1547-1547)obj(1549-1549)obj(1658-1659)obj(1686-1687)obj(1707-1708)obj(1732-1732)get_usize_attr(2081-2098)std(2103-2103)std(2186-2186)size(626-628)size(1628-1630)get_field_size(2157-2187)get_field_align(2190-2211)is_big_endian(334-340)make_anon_fields(2275-2316)proto(291-293)proto(874-874)proto(939-939)proto(1497-1497)proto(1514-1514)attr(895-895)
crates/vm/src/stdlib/ctypes.rs (5)
extra_tests/snippets/stdlib_ctypes.py (1)
c_double(124-125)crates/vm/src/suggestion.rs (1)
name(25-25)crates/vm/src/builtins/genericalias.rs (1)
args(102-102)crates/vm/src/stdlib/ast/other.rs (1)
object(39-39)crates/vm/src/stdlib/ctypes/library.rs (1)
null(31-31)
crates/vm/src/stdlib/ctypes/pointer.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (49)
obj(725-725)obj(749-749)obj(771-771)obj(786-786)obj(1547-1547)obj(1549-1549)obj(1658-1659)obj(1686-1687)obj(1707-1708)obj(1732-1732)mem(1912-1912)mem(1921-1921)mem(1930-1930)mem(1939-1939)mem(1948-1948)mem(1957-1957)mem(1966-1966)mem(1975-1975)mem(1984-1984)mem(1993-1993)mem(2004-2004)mem(2018-2018)mem(2059-2059)mem(2060-2060)value(808-809)value(817-817)value(817-817)value(839-839)value(853-853)value(868-868)value(878-878)value(911-911)value(1309-1309)value(1317-1317)value(1325-1325)value(1373-1373)value(1388-1388)value(1402-1402)value(1416-1416)value(1442-1442)from_bytes(399-408)std(2103-2103)std(2186-2186)buffer(347-347)buffer(354-354)read_ptr_from_buffer(2102-2109)at_address(432-444)size(626-628)size(1628-1630)
🪛 Biome (2.1.2)
.cspell.json
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run tests under miri
- GitHub Check: Check the WASM package and demo
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (28)
crates/vm/src/stdlib/ast/python.rs (1)
50-51: LGTM! Clean simplification of attribute naming.The change removes redundant name mapping by directly naming the Rust method
_fieldsinstead of mappingfieldsto_fields. The Python API remains unchanged and consistent with CPython's AST specification..cspell.dict/python-more.txt (1)
151-151: LGTM! Dictionary entries are appropriate for ctypes work.The added terms "needsfree" and "PYTHONAPI" are legitimate technical identifiers related to Python's C API and memory management, which align well with the ctypes overhaul objectives.
Also applies to: 172-172
.cspell.json (1)
78-78: LGTM! New dictionary terms are appropriate.The added terms "oparg", "pytype", "summands", and "sighandler" are legitimate technical identifiers that align with the ctypes overhaul.
Note: The Biome parse errors are false positives. This file uses JSONC (JSON with Comments), which is the standard format for cspell configuration files. The syntax is correct.
Also applies to: 110-110, 116-116, 134-134
crates/vm/src/builtins/function.rs (1)
632-636: LGTM! Clean refactoring.The change simplifies the return path by wrapping the if-else expression directly in
Ok(), eliminating an intermediate variable assignment. The descriptor protocol logic remains correct and the behavior is unchanged.crates/vm/src/types/structseq.rs (1)
213-219: LGTM! Correct type conversion in multi-field case.The explicit conversion via
.map(|value| value.as_ref())correctly transformsPyObjectRefto&PyObjectbefore pairing with field names and passing to the closure. This aligns with the updated closure signature.crates/vm/src/stdlib/functools.rs (1)
76-77: LGTM! Naming convention now consistent with codebase patterns.The change from explicit name mapping (
#[pymethod(name = "__reduce__")]withfn reduce) to implicit name-based binding (#[pymethod]withfn __reduce__) aligns with the established pattern used across the codebase (as seen in builtins/object.rs, builtins/function.rs, and stdlib/operator.rs). This improves readability and maintains consistency with the broader method naming simplification effort mentioned in the PR objectives.crates/vm/src/stdlib/codecs.rs (2)
191-196: LGTM! Consistent with the latin_1_encode change.The method rename from
is_ascii()toisascii()is applied consistently with the earlier change inlatin_1_encode. The ASCII optimization path remains functional.
178-183: Method rename is correct and complete.The change from
is_ascii()toisascii()aligns with Python's naming convention. PyStr definesisascii()as a public const method, and both occurrences inlatin_1_encodeandascii_encodecorrectly use it. No remainingis_ascii()calls exist on PyStr types in the codebase.crates/vm/src/stdlib/operator.rs (1)
326-326: Approved: Method rename aligns with Python naming conventions.The change from
is_ascii()toisascii()correctly implements Python'sstr.isascii()method name onPyStrRef. The method is properly defined and all related wrapper types (PyStr,PyBytes,PyBytearray) have been systematically updated. No clippy warnings.crates/vm/src/builtins/type.rs (1)
1448-1452: mro getter correctly wired to mro_map_collectThe new #[pygetset]
__mro__accessor cleanly exposes the existing MRO vector as a tuple, matching the prior behavior while aligning with the new naming convention.crates/vm/src/builtins/builtin_func.rs (2)
117-120: PyNativeFunction.self now uses default pygetset bindingSwitching to bare
#[pygetset]keeps the getter behavior (always returning None here) while aligning with the refactored attribute exposure pattern.
184-187: PyNativeMethod.self getter remains correct under new exposure styleThe
__self__property still returns the bound receiver fromzelf.func.zelf, with only the pygetset metadata simplified to the default form.crates/vm/src/exceptions.rs (3)
627-635: Expose suppress_context via suppress_context pygetsetRenaming the getter to
__suppress_context__and annotating it with#[pygetset]makes the Python-visible attribute explicit and pairs cleanly withset_suppress_context’s setter mapping, without changing the underlying AtomicCell logic.
1115-1116: Include suppress_context in serialized exception stateSerializing the
suppress_contextfield viaexc.__suppress_context__()ensures the chained-exception suppression flag is preserved in serde output, matching the BaseException state.
1860-1868: PyOSError.strerror getter now aligned with property nameUsing
#[pygetset] fn strerror(&self)with the existingset_strerrorsetter (named"strerror") keeps the Python attribute intact while simplifying the Rust-side naming.crates/vm/src/builtins/object.rs (1)
453-456: class getter renamed to match exposed attributeThe new
#[pygetset] fn __class__still returnsobj.class().to_owned(), preserving behavior while following the unified “method name == Python attribute name” convention.crates/vm/src/builtins/str.rs (2)
628-632: isascii now uses StrKind for an O(1) checkImplementing
isasciiasmatches!(self.kind(), StrKind::Ascii)is a nice fast-path, assuming StrData/StrKind maintain the invariant that ASCII-only strings are tagged asAscii.Please double-check that all constructors and transformations on
PyStrcorrectly promote/demoteStrKindso that ASCII-only results are always markedAscii(e.g., slicing, joins, replacements), to keepisasciifully compatible with CPython.
963-984: format exposure via default #[pymethod] keeps semantics intactSwitching to
#[pymethod] fn __format__only affects how the method is registered; the logic for empty vs non-empty spec and the UTF-8-only formatting path is unchanged and remains consistent with the rest of the formatting machinery.crates/stdlib/src/pystruct.rs (1)
30-41: Struct format ASCII validation updated to use PyStr.isasciiUsing
s.isascii()here matches the new string API and keeps the semantics of “format must be ASCII-only” the same forstrinputs, with bytes still validated viaAsciiStr::from_ascii.crates/vm/src/buffer.rs (2)
297-307: Missing error message for unmatched:in field name.The error message at line 302 says "unmatched ':' in format" which is correct but slightly inconsistent with PEP3118 terminology. Field names use the pattern
:name:, so running out of input before finding the closing:is indeed an unmatched delimiter.The implementation correctly handles this edge case.
309-312: Standalone braces are silently skipped.This handles pointer target syntax (
&followed by type in braces). The silent skip is appropriate since these are extended specifiers that don't contribute to the basic format parsing.crates/vm/src/protocol/buffer.rs (1)
202-218: LGTM! Proper 0-dimensional buffer support added.The validation now correctly handles scalar types (ndim=0) where
itemsize == len, which is needed for ctypes Structure types. The branching logic is clear and the assertions are appropriate for debug builds.crates/vm/src/stdlib/ctypes/library.rs (2)
62-86: Consider handling the race condition inget_or_insert_lib.The current logic creates a new library, gets its pointer as a key, then checks if that key already exists. However, if the library already exists and is open, the newly created
new_libwill be dropped (closing the handle) while the existing one is returned. This is mostly fine, but there's a subtle issue:If
l.is_closed()is true, the code inserts the new library. But the key comes fromnew_lib.get_pointer(), which might differ from the original closed library's key if library addresses can be reused.This is likely safe in practice since
libloadingtypically returns consistent handles, but worth noting.
93-98: LGTM! Clean static initialization pattern.Moving the
static_cell!macro invocation inside the function scope is a cleaner pattern and thepub(super)visibility appropriately restricts access to the ctypes module.crates/vm/src/stdlib/ctypes/structure.rs (3)
66-129: LGTM! Well-structured initialization with proper base class handling.The
Initializerimplementation correctly:
- Handles
_abstract_types- Checks for already-initialized state
- Processes direct
_fields_or inherits from base- Properly clears
DICTFLAG_FINALin subclasses while marking bases as final- Uses separate scopes to avoid holding locks during mutable operations
The lock release pattern (lines 91-98, 100-118) is correct for avoiding deadlocks.
427-491:SetAttrimplementation correctly guards_fields_modification.The implementation:
- Releases the read lock before calling
process_fields(avoiding deadlock)- Prevents deletion of
_fields_- Properly delegates non-
_fields_attributes to type's setattro logicHowever, there's a potential TOCTOU issue: between checking
is_final(line 440-445) and callingprocess_fields(line 456), another thread could modify the state. This is likely acceptable given Python's GIL semantics, but worth noting.
611-635: LGTM! Correct 0-dimensional buffer for Structure.The
AsBufferimplementation correctly creates a scalar buffer with:
ndim=0(emptydim_desc)itemsizeequal to total buffer length- Format string from
StgInfowith fallback to "B"This aligns with the
BufferDescriptor::validatechanges reviewed earlier.crates/vm/src/stdlib/ctypes/union.rs (1)
455-518: LGTM! Correct recursive field initialization.The
init_pos_argsmethod properly:
- Recursively processes base class fields first
- Handles kwargs duplicate detection
- Returns the count of consumed arguments
The lock release pattern at lines 470-480 correctly avoids holding the lock during recursion.
| impl<T: crate::class::PySubclass> Py<T> { | ||
| /// Converts `&Py<T>` to `&Py<T::Base>`. | ||
| #[inline] | ||
| pub fn to_base(&self) -> &Py<T::Base> { | ||
| debug_assert!(self.as_object().downcast_ref::<T::Base>().is_some()); | ||
| // SAFETY: T is #[repr(transparent)] over T::Base, | ||
| // so Py<T> and Py<T::Base> have the same layout. | ||
| unsafe { &*(self as *const Py<T> as *const Py<T::Base>) } | ||
| } | ||
|
|
||
| /// Converts `&Py<T>` to `&Py<U>` where U is an ancestor type. | ||
| #[inline] | ||
| pub fn upcast_ref<U: PyPayload + StaticType>(&self) -> &Py<U> | ||
| where | ||
| T: StaticType, | ||
| { | ||
| debug_assert!(T::static_type().is_subtype(U::static_type())); | ||
| // SAFETY: T is a subtype of U, so Py<T> can be viewed as Py<U>. | ||
| unsafe { &*(self as *const Py<T> as *const Py<U>) } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for PySubclass trait definition
echo "=== Searching for PySubclass trait definition ==="
rg -n "trait PySubclass" --type=rust | head -20
echo ""
echo "=== Searching for impl PySubclass ==="
rg -n "impl.*PySubclass" --type=rust | head -20
echo ""
echo "=== Searching for Py<T> type definition ==="
rg -n "^pub struct Py<" --type=rust | head -10
echo ""
echo "=== Checking file structure around lines 900-1130 in core.rs ==="
wc -l crates/vm/src/object/core.rsRepository: RustPython/RustPython
Length of output: 900
🏁 Script executed:
#!/bin/bash
# Find and display the relevant trait and type definitions
echo "=== Finding PySubclass trait ==="
ast-grep --pattern 'trait PySubclass { $$$ }' 2>/dev/null || rg -A 10 "trait PySubclass" --type=rust
echo ""
echo "=== Finding Py struct definition ==="
rg -B 2 -A 8 "pub struct Py<T>" --type=rust
echo ""
echo "=== Finding PyInner struct definition ==="
rg -B 2 -A 8 "pub struct PyInner<T>" --type=rust
echo ""
echo "=== Checking PySubclass context ==="
rg -B 5 -A 15 "pub fn to_base\|pub fn upcast_ref" crates/vm/src/object/core.rsRepository: RustPython/RustPython
Length of output: 776
🏁 Script executed:
#!/bin/bash
# Get PySubclass trait definition
echo "=== PySubclass trait definition ==="
sed -n '193,220p' crates/vm/src/class.rs
echo ""
echo "=== PyInner struct definition ==="
rg -B 2 -A 15 "pub struct PyInner<T>" --type=rust crates/vm/src/object/core.rs
echo ""
echo "=== Full context of to_base and upcast_ref methods ==="
sed -n '1105,1125p' crates/vm/src/object/core.rs
echo ""
echo "=== Check PySubclass::Base associated type ==="
rg -A 5 "type Base" crates/vm/src/class.rs | head -15Repository: RustPython/RustPython
Length of output: 1475
🏁 Script executed:
#!/bin/bash
# Look at PyInner definition
echo "=== PyInner struct definition ==="
rg -B 5 -A 20 "^pub struct PyInner" crates/vm/src/object/core.rs
echo ""
echo "=== Check existing PySubclass implementations ==="
# Get a few real implementations
rg -B 2 -A 10 "impl.*PySubclass" crates/vm/src/exceptions.rs | head -30
echo ""
echo "=== Look for repr attributes on PySubclass implementors ==="
rg -B 5 "impl.*PySubclass" crates/vm/src/class.rs
echo ""
echo "=== Check if PyInner has repr attributes ==="
rg -B 5 "struct PyInner" crates/vm/src/object/core.rs | grep -A 5 "repr"Repository: RustPython/RustPython
Length of output: 663
🏁 Script executed:
#!/bin/bash
# Get full PyInner definition
echo "=== Full PyInner struct ==="
sed -n '104,130p' crates/vm/src/object/core.rs
echo ""
echo "=== Look at the actual PyOSError structure ==="
rg -B 10 "impl.*PySubclass for PyOSError" crates/vm/src/exceptions.rs | grep -E "struct|pub"
echo ""
echo "=== Search for PyOSError struct definition ==="
rg -B 2 -A 6 "struct PyOSError" crates/vm/src/exceptions.rs
echo ""
echo "=== Check how PyPayload relates to memory layout ==="
rg -A 10 "pub trait PyPayload" crates/vm/src/object/mod.rsRepository: RustPython/RustPython
Length of output: 1551
🏁 Script executed:
#!/bin/bash
# Check PyException structure
echo "=== PyException definition ==="
rg -B 5 -A 15 "^pub struct PyException" crates/vm/src/exceptions.rs
echo ""
echo "=== Check if PyPayload trait has repr requirements ==="
rg -B 2 -A 20 "pub trait PyPayload" crates/vm/src/
echo ""
echo "=== Check comparisons with PyRef methods mentioned in review ==="
sed -n '1076,1103p' crates/vm/src/object/core.rs
echo ""
echo "=== Look for any repr attribute documentation on PySubclass implementors ==="
rg -B 10 "pub struct PyOSError" crates/vm/src/exceptions.rs | head -20Repository: RustPython/RustPython
Length of output: 3014
The PySubclass trait doesn't enforce required layout guarantees for unsafe pointer casts.
The to_base() and upcast_ref() methods use unsafe pointer casts based on layout assumptions that are not enforced by the trait definition. Specifically:
-
to_base()assumes "Tis #[repr(transparent)] overT::Base", butPySubclassonly requires thatT::Baseis aPyPayloadwith anas_base()implementation. For example,PyOSErrorimplementsPySubclassyet has no#[repr(transparent)]attribute—it's a regular struct with a base field. -
upcast_ref()assumes Python subtyping implies layout compatibility, which is not guaranteed by Rust type layout rules. -
debug_assert!()only validates type relationships at runtime, not memory layout, so casts could unsafely reinterpret different layouts.
By contrast, PyRef<T>::into_base() and PyRef<T>::upcast() (lines 1076–1102) safely handle similar conversions by going through PyObjectRef and using type-safe downcasting.
Either enforce #[repr(transparent)] on all PySubclass implementors, use safe downcasting like PyRef, or provide explicit layout guarantees in the trait documentation.
🤖 Prompt for AI Agents
In crates/vm/src/object/core.rs around lines 1105 to 1125, the unsafe pointer
casts in to_base() and upcast_ref() rely on unguaranteed layout assumptions from
PySubclass and must be fixed: either enforce/layout-safety in the trait
(require/document #[repr(transparent)] on all PySubclass implementors) or remove
the unsafe casts and implement these methods using the safe object-level
conversions already used by PyRef (go through self.as_object() and use
downcast_ref/downcast into the appropriate Py<U> or convert via PyObjectRef as
PyRef::into_base()/upcast() do). Update the code to use the safe downcast path
(or add the explicit repr requirement to the trait and tests) and remove the
unsafe transmute-style pointer casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/ctypes/structure.rs (1)
514-562: Add validation to require fields before instance creation.Each subclass must define a fields attribute. Currently,
slot_newretrieves_fields_via MRO lookup but doesn't validate it exists before creating instances. When_fields_is absent, thefield_namesvector remains empty, preventing positional and keyword arguments from initializing any structure fields. This violates ctypes' expected behavior where positional arguments are used to initialize the fields in the same order as they appear in the fields definition, named arguments are used to initialize the fields with the corresponding name.Add a check in
slot_newto ensure_fields_is defined on the class (directly or via inheritance) before allowing instance creation, returning an appropriate error if absent.crates/vm/src/stdlib/ctypes/array.rs (1)
318-343:checked_mulresult is ignored, so overflow is silently allowedIn
PyCArrayType::as_numberyou compute:if let Some(stg_info) = zelf_type.stg_info_opt() { let current_size = stg_info.size; if let Some(new_size) = current_size.checked_mul(n as usize) && new_size > isize::MAX as usize { return Err(vm.new_overflow_error("array too large")); } }If
checked_mulreturnsNone(overflow), this branch is skipped and you proceed to construct the array type anyway, which will then overflow again when computing its total size.Treat
Nonefromchecked_mulas an overflow condition and raiseOverflowErrorinstead of silently continuing.
🧹 Nitpick comments (5)
crates/vm/src/types/structseq.rs (1)
208-211: Apply.as_ref()explicitly for consistency.In the multi-field case (line 215), there's an explicit
.map(|value| value.as_ref())to convertPyObjectRefto&PyObject. However, in the single-field case,valuefromzelf.first().unwrap()(which is&PyObjectRef) is passed directly toformat_field. While implicit deref coercion should work, this creates an inconsistency that reduces code clarity.🔎 Apply this diff for consistency:
if field_names.len() == 1 { - let value = zelf.first().unwrap(); + let value = zelf.first().unwrap().as_ref(); let formatted = format_field((value, field_names[0]))?; (formatted, ",")crates/vm/src/stdlib/ctypes.rs (2)
232-235: Clarify thatget_alignis only valid for simple types.The comment states "for C types, alignment equals size" which isn't universally true (e.g., structs). Consider renaming or documenting that this only applies to the simple type codes handled by
type_info().🔎 Suggested documentation improvement:
-/// Get alignment for a simple type - for C types, alignment equals size +/// Get alignment for a simple type code (b, B, h, H, i, I, etc.) +/// For these primitive C types, natural alignment equals size. fn get_align(ty: &str) -> usize { get_size(ty) }
1216-1230: Dangerous COM interop—consider additional safeguards.The vtable access and direct memory write are inherently unsafe. While this is necessary for COM interop, consider:
- Validate vtable pointer: Ensure
*iunknown(the vtable pointer) is non-null before dereferencing- Document safety requirements: The caller must ensure
src_ptris a validIUnknownpointer🔎 Add null check for vtable:
if src_ptr != 0 { unsafe { // IUnknown vtable: [QueryInterface, AddRef, Release, ...] let iunknown = src_ptr as *mut *const usize; let vtable = *iunknown; + if vtable.is_null() { + return Ok(E_POINTER); + } let addref_fn: extern "system" fn(*mut std::ffi::c_void) -> u32 = std::mem::transmute(*vtable.add(1)); // AddRef is index 1 addref_fn(src_ptr as *mut std::ffi::c_void); } }crates/vm/src/stdlib/ctypes/library.rs (1)
26-33: Minor: Unused return value from closed library.When the library is closed (None),
get_pointerreturns a null pointer. This is used as a HashMap key inget_or_insert_lib. Consider adding a brief comment explaining that the key remains valid for identification purposes even after the library is closed.crates/vm/src/stdlib/ctypes/structure.rs (1)
69-74: Consider simplifying the type conversion.The conversion from
PyRef<Self>toPyTypeRefviaPyObjectRefintermediate is verbose. SincePyCStructTypeis#[repr(transparent)]overPyType, you could potentially useupcastorto_basedirectly.Alternative approach:
// Using the new to_base() method since PyCStructType wraps PyType let new_type: &Py<PyType> = zelf.to_base();Note: This would require adjusting subsequent code that expects ownership.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
.cspell.dict/python-more.txt(2 hunks).cspell.json(3 hunks)crates/stdlib/src/pystruct.rs(1 hunks)crates/vm/src/buffer.rs(1 hunks)crates/vm/src/builtins/builtin_func.rs(2 hunks)crates/vm/src/builtins/function.rs(1 hunks)crates/vm/src/builtins/object.rs(1 hunks)crates/vm/src/builtins/str.rs(2 hunks)crates/vm/src/builtins/type.rs(1 hunks)crates/vm/src/exceptions.rs(3 hunks)crates/vm/src/object/core.rs(1 hunks)crates/vm/src/protocol/buffer.rs(1 hunks)crates/vm/src/stdlib/ast/python.rs(1 hunks)crates/vm/src/stdlib/codecs.rs(2 hunks)crates/vm/src/stdlib/ctypes.rs(14 hunks)crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(8 hunks)crates/vm/src/stdlib/ctypes/library.rs(5 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)crates/vm/src/stdlib/ctypes/structure.rs(10 hunks)crates/vm/src/stdlib/ctypes/thunk.rs(0 hunks)crates/vm/src/stdlib/ctypes/union.rs(4 hunks)crates/vm/src/stdlib/ctypes/util.rs(0 hunks)crates/vm/src/stdlib/functools.rs(1 hunks)crates/vm/src/stdlib/operator.rs(1 hunks)crates/vm/src/types/structseq.rs(2 hunks)
💤 Files with no reviewable changes (2)
- crates/vm/src/stdlib/ctypes/util.rs
- crates/vm/src/stdlib/ctypes/thunk.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/codecs.rscrates/vm/src/builtins/object.rscrates/vm/src/stdlib/operator.rscrates/vm/src/object/core.rscrates/stdlib/src/pystruct.rscrates/vm/src/protocol/buffer.rscrates/vm/src/stdlib/functools.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/exceptions.rscrates/vm/src/builtins/builtin_func.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/buffer.rscrates/vm/src/builtins/str.rscrates/vm/src/types/structseq.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/library.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/function.rs
🧠 Learnings (7)
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
.cspell.json
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/exceptions.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
Applied to files:
crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/vm/src/stdlib/ctypes/simple.rs
🧬 Code graph analysis (6)
crates/vm/src/protocol/buffer.rs (6)
crates/stdlib/src/array.rs (1)
itemsize(738-740)crates/vm/src/builtins/memory.rs (2)
itemsize(584-586)shape(594-603)crates/vm/src/builtins/str.rs (1)
len(1855-1857)crates/vm/src/builtins/dict.rs (2)
len(716-718)len(747-749)crates/vm/src/builtins/tuple.rs (1)
len(193-195)crates/vm/src/builtins/set.rs (1)
len(205-207)
crates/vm/src/builtins/function.rs (1)
crates/vm/src/types/slot.rs (1)
_cls_is(1083-1085)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
strerror(1544-1548)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytes.rs (1)
isascii(262-264)crates/vm/src/builtins/bytearray.rs (1)
isascii(271-273)crates/vm/src/bytes_inner.rs (1)
isascii(326-328)
crates/vm/src/stdlib/ctypes/structure.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (34)
std(2103-2103)std(2186-2186)default(129-145)obj(725-725)obj(749-749)obj(771-771)obj(786-786)obj(1547-1547)obj(1549-1549)obj(1658-1659)obj(1686-1687)obj(1707-1708)obj(1732-1732)new(149-165)new(1194-1202)value(808-809)value(817-817)value(817-817)value(839-839)value(853-853)value(868-868)value(878-878)value(911-911)value(1309-1309)value(1317-1317)value(1325-1325)value(1373-1373)value(1388-1388)value(1402-1402)value(1416-1416)get_usize_attr(2081-2098)size(626-628)size(1628-1630)attr(895-895)
crates/vm/src/stdlib/ctypes/function.rs (2)
crates/vm/src/stdlib/ctypes.rs (15)
base(61-61)type_info(143-225)std(528-528)std(532-532)std(543-543)std(708-708)std(793-793)std(1202-1202)std(1204-1204)std(1206-1206)t(55-55)pointer(675-684)obj(493-493)obj(539-539)src(1199-1199)crates/vm/src/stdlib/ctypes/base.rs (35)
std(2103-2103)std(2186-2186)value(808-809)value(817-817)value(817-817)value(839-839)value(853-853)value(868-868)value(878-878)value(911-911)value(1309-1309)value(1317-1317)value(1325-1325)value(1373-1373)value(1388-1388)value(1402-1402)value(1416-1416)value(1442-1442)obj(725-725)obj(749-749)obj(771-771)obj(786-786)obj(1547-1547)obj(1549-1549)obj(1658-1659)obj(1686-1687)obj(1707-1708)obj(1732-1732)buffer(347-347)buffer(354-354)read_ptr_from_buffer(2102-2109)call_paramfunc(1639-1652)size(626-628)size(1628-1630)default(129-145)
🪛 Biome (2.1.2)
.cspell.json
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (35)
crates/vm/src/stdlib/functools.rs (1)
76-77: LGTM! Clean refactoring to use default naming convention.The change from explicit
name = "__reduce__"to using the function name directly (__reduce__) is a clean refactoring that aligns with idiomatic usage of thepymethodmacro. The Python-facing API remains unchanged, and this approach is more maintainable.crates/vm/src/builtins/function.rs (1)
631-637: LGTM! Clean refactoring that improves code clarity.The inlining of the conditional expression directly into
Ok()removes an unnecessary intermediate variable while preserving the exact same behavior. This is idiomatic Rust and makes the descriptor logic more concise.crates/vm/src/protocol/buffer.rs (1)
205-216: LGTM! Proper scalar buffer validation.The branching logic correctly handles scalar buffers (ndim=0) separately from multi-dimensional buffers. The scalar assertion
itemsize == lenis sound, and this approach is consistent with existing scalar handling infor_each_segmentandzip_eqmethods.crates/vm/src/stdlib/codecs.rs (1)
179-179: LGTM! Method rename aligns with Python naming conventions.The change from
is_ascii()toisascii()is consistent with the broader refactoring to align Rust method names with Python-exposed names. The logic remains unchanged—these are fast-path optimizations for ASCII strings.Also applies to: 192-192
crates/vm/src/stdlib/operator.rs (1)
326-326: LGTM! Method rename is consistent.The update to
isascii()aligns with the refactoring inPyStr. The logic for constant-time ASCII string comparison remains unchanged.crates/vm/src/builtins/str.rs (2)
628-632: LGTM! Improved naming consistency.The rename from
is_asciitoisasciialigns the Rust method name with the Python-exposed name, and simplifying the#[pymethod]attribute to use default naming is cleaner. This matches the naming convention used byPyBytes::isascii()andPyByteArray::isascii()as shown in the relevant code snippets.
963-968: LGTM! Binding attribute simplified.Removing the explicit
name = "__format__"parameter is appropriate since the Rust function name already matches the desired Python name. This follows the same pattern as theisasciichange and reduces redundancy.crates/stdlib/src/pystruct.rs (1)
31-31: LGTM! Call site updated correctly.The method call is updated to use
isascii(), consistent with the rename inPyStr. The format validation logic remains unchanged.crates/vm/src/types/structseq.rs (1)
213-219: LGTM!The explicit conversion from
PyObjectRefto&PyObjectvia.as_ref()correctly aligns with the updated closure signature and ensures type safety.crates/vm/src/stdlib/ast/python.rs (1)
50-51: Refactoring looks good - aligns Rust function naming with Python attribute name.The change from
#[pyattr(name = "_fields")] fn fieldsto#[pyattr] fn _fieldsimproves consistency without breaking anything. The Python API remains unchanged (still exposes_fields), and no internal Rust code references the old method name..cspell.dict/python-more.txt (1)
151-151: LGTM!The additions of
needsfree(ctypes memory management flag) andPYTHONAPI(ctypes calling convention constant) are appropriate vocabulary expansions for the ctypes overhaul.Also applies to: 172-172
.cspell.json (1)
78-78: LGTM!The vocabulary additions (
oparg,pytype,sighandler) are appropriate for the RustPython codebase. The static analysis errors from Biome are false positives—this is a JSONC file (JSON with comments and trailing commas), which Biome's strict JSON parser doesn't handle correctly.Also applies to: 110-110, 134-134
crates/vm/src/builtins/builtin_func.rs (2)
117-120: LGTM!The
#[pygetset]macro correctly infers the attribute name__self__from the function name. This simplification aligns with the broader PR pattern of using default naming instead of explicitname = "..."parameters.
184-187: LGTM!Consistent simplification—the macro infers
__self__from the method name.crates/vm/src/builtins/object.rs (1)
453-456: LGTM!The function rename from
get_classto__class__allows the#[pygetset]macro to infer the Python attribute name, while the setter on line 458 correctly retains the explicitname = "__class__"since its function name isset_class.crates/vm/src/exceptions.rs (3)
627-630: LGTM!The function rename from
get_suppress_contextto__suppress_context__follows the established pattern, allowing the macro to infer the Python attribute name.
1115-1115: LGTM!The serialization code correctly calls the renamed method
__suppress_context__().
1860-1863: LGTM!The getter rename from
get_strerrortostrerroris consistent with the pattern. Theos.rs:strerrorfunction in the relevant snippets is unrelated—it's a standalone utility, not a getter onPyOSError.crates/vm/src/stdlib/ctypes.rs (8)
1-31: LGTM!Clean module reorganization with well-defined submodules and public re-exports. The structure provides a clear separation of concerns for the ctypes implementation.
32-85: LGTM!The
Py<PyType>extension methods provide clean abstractions for accessing and managingStgInfotype metadata. Themark_bases_finalmethod's best-effort approach (discardinginit_type_dataresult) is appropriate for its use case.
249-362: LGTM!The
CArgObjectstruct and itsRepresentableimplementation are well-structured, with comprehensive pattern matching for different type tags and appropriate fallback handling.
415-484: LGTM!The Windows-only
COMErrorimplementation correctly follows CPython's pattern of setting instance attributes (hresult,text,details) in__init__and storing the custom init function afternew_heapto avoid the defaultinit_wrapperoverride.
1083-1095: Inherently unsafe but correctly implemented.The
PyObj_FromPtrfunction correctly handles the conversion from raw pointer toPyObjectRef:
- NULL check prevents null pointer dereference
ManuallyDropprevents double-free- Clone creates a proper reference
However, there's no validation that
ptractually points to a validPyObject. This matches CPython's behavior but is inherently dangerous. Consider adding a doc comment warning about this:/// SAFETY: Caller must ensure ptr points to a valid, live PyObject. /// Passing an invalid pointer causes undefined behavior.
854-894: LGTM!The
resizefunction correctly:
- Validates minimum size against
StgInfo.size- Checks for owned memory (rejects borrowed buffers)
- Safely copies existing data to the new buffer
90-112: LGTM!The
make_modulefunction properly initializes all ctypes metaclasses and extends the module with the necessary type registrations.
604-672: LGTM!The
create_pointer_typefunction correctly handles both complete pointer types (whenclsis a type with_type_set) and incomplete pointer types (whenclsis a string). The caching strategy aligns with CPython's behavior.crates/vm/src/builtins/type.rs (1)
1448-1452: LGTM!Clean refactoring that aligns the function name with the Python attribute name. The simplified
#[pygetset]macro will infer__mro__from the function name, maintaining the same behavior.crates/vm/src/buffer.rs (1)
264-312: LGTM! Well-structured PEP3118 extended format specifier handling.The implementation correctly:
- Uses depth tracking for nested T{...}, X{}, and (...) constructs
- Provides clear error messages for unmatched delimiters
- Skips field names (:name:) and standalone braces appropriately
One note: T/X characters not followed by
{will fall through toFormatType::try_from, which will correctly error with "bad char in struct format" since they aren't valid standalone format codes per PEP3118.crates/vm/src/stdlib/ctypes/library.rs (2)
45-49: LGTM!The
Dropimplementation correctly delegates toclose(), ensuring the underlyingLibraryis properly released whenSharedLibraryis dropped.
93-97: LGTM!Good use of
static_cell!to encapsulate the lazy initialization withinlibcache(), reducing top-level static exposure while maintaining thread-safe singleton behavior.crates/vm/src/stdlib/ctypes/structure.rs (5)
170-218: LGTM! Well-structured field processing initialization.The function properly:
- Handles swapped byte order structures
- Validates
_fields_as list or tuple- Extracts
_pack_and_align_attributes with defaults- Initializes tracking variables from base class StgInfo when available
268-299: Good flag propagation logic.The code correctly propagates type flags (HASPOINTER, HASUNION, HASBITFIELD) from field types to the containing structure, and properly marks field types as finalized when used as structure fields.
301-333: Format string construction follows PEP3118.The padding and field format construction correctly:
- Adds padding bytes before fields when needed
- Handles array shapes with proper
(shape)formatsyntax- Includes field names with
:name:notation
427-491: SetAttr implementation handles finalization correctly.Good pattern:
- Releases read lock before calling
process_fieldsto avoid deadlock (lines 440-445)- Guards
_fields_mutation with finalization check- Falls back to standard type attribute handling for non-
_fields_attributes
611-635: AsBuffer implementation aligns with PyCData model.The buffer descriptor correctly:
- Uses the actual buffer length from
zelf.0.buffer- Retrieves format from class StgInfo with "B" fallback
- Sets
ndim=0(empty dim_desc) for structure buffers per PEP3118
ca7743c to
8c5948f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (9)
crates/vm/src/stdlib/ctypes/union.rs (2)
553-577: Union buffer view now matchesStructure’s 0‑dimensional semantics
PyCUnion::as_buffernow builds aBufferDescriptorwithdim_desc: vec![]anditemsize = buffer_len, mirroringPyCStructureand CPython’s scalar ctypes behavior. This fixes the previous mismatch where unions appeared as 1‑D buffers.Looks good.
12-29: Avoid silently treating malformed_fields_as a zero-sized unionBoth
calculate_union_sizeandPyCUnion::init_pos_argsuse:let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();If
_fields_exists but isn’t a proper sequence (ortry_to_valuefails), these helpers quietly behave as if there were no fields at all, yielding size 0 and skipping positional initialization. That makes broken union definitions hard to debug.As with the struct helper, prefer propagating the conversion error (or mapping it to a
TypeErrorconsistent withprocess_fields) when_fields_is present, e.g.:- if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) { - let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default(); + if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) { + let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;and similarly in
init_pos_args.Also applies to: 487-515
crates/vm/src/stdlib/ctypes/function.rs (4)
369-385: Pointer restypes (c_char_p,c_wchar_p,c_void_p/POINTER) always return ints
ReturnType::from_ffi_typetreats"P","z", and"Z"uniformly:Some("P") | Some("z") | Some("Z") => { vm.ctx.new_int(unsafe { *(value as *const usize) }).into() }And
ctypes_callprocroutes such restypes throughRawResult::Pointer, whichconvert_raw_resultalso turns into a Python int.That means:
restype = c_char_p→ Pythonintpointer, notbytes/None.restype = c_wchar_p→ Pythonintpointer, notstr/None.- General pointer types (including
c_void_p) are also exposed only as bare integers.CPython’s ctypes decodes
c_char_pandc_wchar_prestypes into Python bytes/str (orNonefor NULL), while pointer types may expose pointer objects or addresses.You already have string/wstring logic (
string_at_impl,wstring_at_impl) and array/pointer helpers; consider:
- Distinguishing
"z"and"Z"inconvert_raw_result/ReturnType::from_ffi_type, decoding the pointed-to string and returning bytes/str (orNone), and- Leaving
"P"/general pointer types as the raw address or appropriate ctypes pointer object.Right now all pointer restypes collapse to integers, which diverges notably from ctypes semantics.
Also applies to: 1376-1395
186-287:ArgumentType::convert_objectstill bypasses.from_param, limiting ctypes compatibilityThe new
convert_objectforPyTypeRefadds better handling for pointers/structures and rawPyCSimple, but it still never consults the type’sfrom_paramhook. Instead it:
- Tries
downcast::<PyCSimple>(), or- Calls the type as a constructor and expects a
PyCSimple.This misses the primary ctypes customization point (
from_param) for both built-in and user-defined types, and was already a pain point in the previous version.Consider:
- First attempting to call
from_paramon the type (or using a shared paramfunc) if it exists, and using that result as the canonical converted object.- Only falling back to the current downcast/constructor path when
from_paramis absent or clearly unsupported.Without that, many legitimate
argtypespatterns (especially customStructure/Union/pointer subclasses) will not behave like CPython.
723-766: COM-style constructor still panics on non-Windows via NULL function pointerThe COM-method constructor branch:
if args.args.len() >= 2 && first_arg.try_int(vm).is_ok() && args.args[1].downcast_ref::<PyStr>().is_some() { // ... return PyCFuncPtr { _base: PyCData::from_bytes(vec![0u8; ptr_size], None), ... }; }runs on all platforms. On non-Windows it creates a
PyCFuncPtrwith a NULL function pointer and noindex/iid. Later,Callable::calldoes:let (func_ptr, is_com_method) = (None::<CodePtr>, false); // non-Windows let code_ptr = func_ptr.unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));Since
get_code_ptr()returnsNonefor a NULL pointer, this.expectwill panic at call time.You should either:
- Gate this entire COM constructor branch under
#[cfg(windows)], or- On non-Windows, detect this usage and raise a
TypeErrorimmediately instead of constructing a broken object.As-is, non-Windows builds can create
CFuncPtrinstances that inevitably panic when called.Also applies to: 1462-1480
1398-1452: OUT parameters currently discard the function’s own return value
build_resulthandles OUT buffers like this:if prepared.out_buffers.is_empty() { return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none())); } let out_values = extract_out_values(prepared.out_buffers, vm); Ok(match <[PyObjectRef; 1]>::try_from(out_values) { Ok([single]) => single, Err(v) => PyTuple::new_ref(v, &vm.ctx).into(), })As soon as any OUT parameters exist, the function’s (possibly
errcheck-processed) return value is dropped and only the OUT values are returned. In CPython ctypes, the usual pattern is to return a tuple like(result, *outs).You likely want:
- To keep the existing “no OUT” branch as-is, and
- When OUT buffers are present, return a tuple whose first element is
result.unwrap_or(vm.ctx.none())followed by the extracted OUT values.That way, callers with both a meaningful return value and OUT/INOUT parameters see the full result, not just the OUTs.
crates/vm/src/stdlib/ctypes/structure.rs (1)
13-29: Don’t silently drop_fields_conversion errors incalculate_struct_size
calculate_struct_sizeuses:let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();If
_fields_is malformed ortry_to_valuefails, you quietly return size 0 instead of surfacing an error, which makes misconfigured structures hard to debug and can mask real bugs. Prefer propagating the error (or mapping it to a descriptiveTypeError) when_fields_exists.A minimal fix:
- if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) { - let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default(); + if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) { + let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;This keeps the
Ok(0)fallback for the “no_fields_attribute” case but no longer hides bad_fields_contents.crates/vm/src/stdlib/ctypes/array.rs (2)
64-73: Guard againstelement_size * lengthoverflow inarray_type_from_ctypeHere you compute the total array size as
element_size * lengthwithout checking for overflow:let stg_info = StgInfo::new_array( element_size * length, element_align, length, ... );For large
lengthorelement_size, this can overflowusize(panic in debug, wrap in release) and produce a bogus layout. InPyCArrayType::inityou already use alength > usize::MAX / item_sizeguard; mirror that here before callingStgInfo::new_array.Example adjustment:
- let stg_info = StgInfo::new_array( - element_size * length, + if element_size != 0 && length > usize::MAX / element_size { + return Err(vm.new_overflow_error("array too large")); + } + let stg_info = StgInfo::new_array( + element_size * length, element_align, length, ... );
613-638:c_wchar_preading still hard-codes 32‑bit wchar and ignoresWCHAR_SIZEhelpersThe
"Z"branch inread_element_from_bufferstill does:let ptr = ptr_val as *const u32; while *ptr.add(len) != 0 { ... } let wchars = std::slice::from_raw_parts(ptr, len); let s: String = wchars.iter().filter_map(|&c| char::from_u32(c)).collect();This assumes 32‑bit
wchar_teverywhere. On Windows,wchar_tis 16‑bit, and you already introducedWCHAR_SIZE,wchar_from_bytes, andwstring_from_bytesthat correctly handle platform differences.You should decode the pointed‑to buffer using those helpers instead of hard‑coding
u32, e.g.:
- Treat
ptr_valas*const u8.- Walk in
WCHAR_SIZE‑sized chunks until a 0 code unit.- Use
wchar_from_bytes/wstring_from_bytesto build theString.This will make
"Z"consistent with the array and helper APIs and avoid misaligned, incorrect reads on Windows.Also applies to: 1220-1268
🧹 Nitpick comments (2)
crates/vm/src/stdlib/ctypes/library.rs (1)
66-86: Library cache no longer actually reuses existing libraries
get_or_insert_libcomputeskeyfrom the freshly createdSharedLibrary’sLibrarypointer, then immediately looks upself.libraries.get(&key). That key has never been inserted before, so theSome(l)branch (withis_closedcheck) is effectively unreachable, and a newSharedLibraryis always inserted.If you intend
_ctypesto reuse an existing handle for the same library (or at least to make the “closed → replace” path meaningful), the key should be derived from something stable across calls (e.g., the OS handle or canonicalized path) rather than the address of the just-createdLibrary.crates/vm/src/stdlib/ctypes/pointer.rs (1)
449-471: Document memory safety assumptions for pointer slice operationsIn
getitem_by_slicefor"c"(c_char) types, raw pointer slices are created without bounds validation:if step == 1 { // Optimized contiguous copy let start_addr = (ptr_value as isize + start * element_size as isize) as *const u8; unsafe { result.extend_from_slice(std::slice::from_raw_parts(start_addr, len)); } }There's no validation that:
ptr_valuepoints to valid, allocated memory- The slice
[start_addr, start_addr + len)stays within valid bounds- The memory region has the expected lifetime
While pointer arithmetic in ctypes inherently requires callers to ensure validity (matching CPython's behavior), the lack of documentation makes it unclear whether this is intentional. Consider:
- Adding a comment documenting that callers must ensure the pointer and computed offsets are valid
- Optionally, checking
ptr_value != 0before slice operations (though CPython allows this for non-zero invalid pointers)- For development builds, consider optional bounds checking via a debug feature flag
Similar patterns appear at lines 481-490 (c_wchar slice) and throughout read/write operations.
Note: This is standard ctypes behavior—users working with raw pointers are responsible for memory safety—but explicit documentation improves maintainability.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
.cspell.dict/python-more.txt(2 hunks).cspell.json(3 hunks)crates/stdlib/src/pystruct.rs(1 hunks)crates/vm/src/buffer.rs(1 hunks)crates/vm/src/builtins/builtin_func.rs(2 hunks)crates/vm/src/builtins/function.rs(1 hunks)crates/vm/src/builtins/object.rs(1 hunks)crates/vm/src/builtins/str.rs(2 hunks)crates/vm/src/builtins/type.rs(1 hunks)crates/vm/src/exceptions.rs(3 hunks)crates/vm/src/object/core.rs(1 hunks)crates/vm/src/protocol/buffer.rs(1 hunks)crates/vm/src/stdlib/ast/python.rs(1 hunks)crates/vm/src/stdlib/codecs.rs(2 hunks)crates/vm/src/stdlib/ctypes.rs(14 hunks)crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(8 hunks)crates/vm/src/stdlib/ctypes/library.rs(5 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)crates/vm/src/stdlib/ctypes/structure.rs(10 hunks)crates/vm/src/stdlib/ctypes/thunk.rs(0 hunks)crates/vm/src/stdlib/ctypes/union.rs(4 hunks)crates/vm/src/stdlib/ctypes/util.rs(0 hunks)crates/vm/src/stdlib/functools.rs(1 hunks)crates/vm/src/stdlib/operator.rs(1 hunks)crates/vm/src/types/structseq.rs(2 hunks)
💤 Files with no reviewable changes (2)
- crates/vm/src/stdlib/ctypes/util.rs
- crates/vm/src/stdlib/ctypes/thunk.rs
🚧 Files skipped from review as they are similar to previous changes (7)
- crates/vm/src/stdlib/codecs.rs
- crates/vm/src/builtins/type.rs
- crates/vm/src/buffer.rs
- crates/vm/src/stdlib/operator.rs
- crates/stdlib/src/pystruct.rs
- crates/vm/src/types/structseq.rs
- crates/vm/src/object/core.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/functools.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/builtins/function.rscrates/vm/src/protocol/buffer.rscrates/vm/src/exceptions.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/builtin_func.rscrates/vm/src/builtins/object.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/ctypes/library.rs
🧠 Learnings (7)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/exceptions.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.
Applied to files:
crates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/union.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
Applied to files:
crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
.cspell.json
🧬 Code graph analysis (6)
crates/vm/src/stdlib/functools.rs (7)
crates/vm/src/builtins/builtin_func.rs (2)
__reduce__(123-126)__reduce__(169-182)crates/vm/src/builtins/function.rs (1)
__reduce__(804-812)crates/vm/src/builtins/object.rs (1)
__reduce__(503-505)crates/vm/src/builtins/str.rs (1)
__reduce__(302-307)crates/vm/src/builtins/genericalias.rs (1)
__reduce__(217-222)crates/vm/src/builtins/list.rs (2)
__reduce__(562-566)__reduce__(607-611)crates/vm/src/stdlib/collections.rs (1)
__reduce__(403-410)
crates/vm/src/builtins/function.rs (1)
crates/vm/src/types/slot.rs (1)
_cls_is(1083-1085)
crates/vm/src/protocol/buffer.rs (2)
crates/stdlib/src/array.rs (1)
itemsize(738-740)crates/vm/src/builtins/memory.rs (2)
itemsize(584-586)shape(594-603)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
strerror(1544-1548)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytes.rs (1)
isascii(262-264)crates/vm/src/builtins/bytearray.rs (1)
isascii(271-273)crates/vm/src/bytes_inner.rs (1)
isascii(326-328)
crates/vm/src/stdlib/ctypes.rs (6)
crates/vm/src/stdlib/ctypes/structure.rs (2)
base(111-111)a(411-412)crates/vm/src/stdlib/ctypes/function.rs (15)
std(102-102)std(451-451)std(453-453)std(511-511)std(699-699)t(259-259)t(323-323)t(994-994)t(1002-1002)t(1004-1004)t(1424-1424)n(1049-1049)src(614-614)cast_impl(563-664)typ(218-219)crates/vm/src/stdlib/ctypes/array.rs (13)
std(597-597)std(622-622)std(1221-1221)a(331-332)t(62-62)t(255-255)t(376-376)obj(1083-1083)obj(1091-1091)obj(1111-1111)obj(1123-1123)obj(1136-1136)obj(1147-1147)crates/vm/src/stdlib/ctypes/pointer.rs (9)
std(38-38)std(268-268)std(352-352)std(437-437)std(479-479)std(547-547)a(164-165)t(543-543)typ(138-140)crates/vm/src/stdlib/ctypes/base.rs (28)
std(2113-2113)std(2196-2196)mem(1922-1922)mem(1931-1931)mem(1940-1940)mem(1949-1949)mem(1958-1958)mem(1967-1967)mem(1976-1976)mem(1985-1985)mem(1994-1994)mem(2003-2003)mem(2014-2014)mem(2028-2028)mem(2069-2069)mem(2070-2070)obj(735-735)obj(759-759)obj(781-781)obj(796-796)obj(1557-1557)obj(1559-1559)obj(1668-1669)obj(1696-1697)obj(1717-1718)obj(1742-1742)size(636-638)size(1638-1640)crates/vm/src/stdlib/ctypes/field.rs (3)
name(278-280)type_(283-285)size(258-260)
🪛 Biome (2.1.2)
.cspell.json
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run tests under miri
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Check Rust code with clippy
- GitHub Check: Check the WASM package and demo
🔇 Additional comments (16)
crates/vm/src/stdlib/ast/python.rs (1)
50-51: LGTM! Clean refactoring improves naming consistency.The function name now matches the exposed Python attribute name, eliminating the need for an explicit
nameparameter in the decorator. This simplifies the code while preserving the external API.crates/vm/src/builtins/builtin_func.rs (2)
117-120: LGTM! Improved consistency by removing redundant attribute name.The explicit
name = "__self__"parameter is unnecessary since the function name already matches the desired Python attribute name. This change aligns with the existing pattern used for other#[pygetset]attributes in this implementation block (lines 79-115).
184-187: LGTM! Consistent refactoring across both function and method types.This change mirrors the improvement at line 117, removing the redundant
name = "__self__"parameter. BothPyNativeFunctionandPyNativeMethodnow follow the same concise pattern for attribute binding.crates/vm/src/exceptions.rs (3)
627-630: LGTM! Consistent with Python naming conventions.The rename from
get_suppress_contextto__suppress_context__aligns with Python's dunder method conventions and the broader refactoring pattern across the codebase. The implementation correctly returns the boolean value from the atomic cell.
1115-1115: Correctly updated to use the renamed method.The serialization code properly uses the new
__suppress_context__()method name, maintaining consistency with the refactored getter.
1860-1863: LGTM! Naming convention improved.The
strerrorgetter follows the standardized naming pattern without theget_prefix, matching the style of other OSError field accessors (errno,filename, etc.). The implementation correctly returns the optional PyObjectRef.crates/vm/src/builtins/function.rs (1)
632-636: LGTM! Clean refactoring.The inlining of the conditional expression into the return statement improves conciseness without changing behavior. The descriptor protocol logic correctly returns the unbound function when accessed from a class context and a bound method when accessed from an instance.
.cspell.dict/python-more.txt (1)
151-151: LGTM! Dictionary additions align with ctypes implementation.The additions of
needsfreeandPYTHONAPIare appropriate spell-check exceptions for the ctypes module implementation.Also applies to: 172-172
.cspell.json (2)
78-78: LGTM! Vocabulary additions are appropriate.The additions of
oparg,pytype, andsighandlerare reasonable spell-check exceptions for Python bytecode and signal handling terms.Note: The static analysis hints from Biome are false positives—the JSON structure is valid.
Also applies to: 110-110, 134-134
80-80: Summary inconsistency:pyargwas not removed.The AI summary claims
pyargwas removed, but it's still present on Line 80. The term remains in the vocabulary list.crates/vm/src/builtins/object.rs (1)
453-456: Good refactoring to align method name with Python attribute name.The change from
get_classwith an explicitnameparameter to__class__without it is more idiomatic and consistent with the broader pattern in this PR. The search confirms no remaining references to the oldget_classmethod exist in the codebase, so all necessary call sites have been updated.crates/vm/src/protocol/buffer.rs (1)
205-216: LGTM! Correctly implements scalar buffer validation.The branching logic properly handles scalar (0-dimensional) buffers:
- When
ndim() == 0, the buffer contains a single scalar item, solenmust equalitemsize.- When
ndim() > 0, the existing validation logic ensuresshape_product * itemsize == len.This change aligns with PEP 3118 and is consistent with how
for_each_segmentandzip_eqalready handle scalar buffers. The implementation correctly supports ctypes Structure types as scalar buffers.crates/vm/src/builtins/str.rs (1)
628-632: isascii/format binding changes look correct and behavior-preservingUsing
#[pymethod]with Rust namesisasciiand__format__keeps the Python API unchanged while simplifying bindings, and theisasciiimplementation remains consistent with the internalStrKind::Asciiflag.Also applies to: 963-984
crates/vm/src/stdlib/functools.rs (1)
76-101: Partial.reduce/setstate implementation looks sound and CPython-alignedState serialization/deserialization correctly handles func/args/keywords/instance
__dict__, normalizes tuple/dict subclasses, and enforces callable/shape checks. No functional or safety issues spotted.Also applies to: 103-192, 207-244
crates/vm/src/stdlib/ctypes/function.rs (2)
41-57: Double‑checkffi_type_from_tag('u')on Windows (wchar_t size)
ffi_type_from_tagmaps the'u'tag toType::u32():b'u' => Type::u32(), // wchar_tOn Windows,
wchar_tis 16‑bit, so using a 32‑bit FFI type here may misalign the call ABI forc_wchar/c_wchar_parguments and returns. On Unix, 32‑bit is fine.Please verify that this mapping matches how
type_info("u")is defined and what libffi expects forwchar_ton each platform; if not, this should be made conditional on the platform (e.g.,u16on Windows,u32elsewhere).
1585-1881: Callback/thunk implementation is thoughtfully structured; lifetime management looks correctThe
PyCThunk/ThunkData/ThunkUserDatamachinery:
- Builds a
Cifusing type codes viaget_ffi_type,- Allocates
ThunkUserDataon the heap, stores a raw pointer with aThunkDatawrapper andDropimpl to free it,- Uses
with_current_vminthunk_callbackto safely re-enter the VM, converting args and return values viaffi_to_python/python_to_ffi,- Exposes the callable and code pointer cleanly.
Given the constraints of libffi and ctypes callbacks, this design looks solid and much closer to CPython’s behavior.
| #[pyfunction(name = "PyObj_FromPtr")] | ||
| fn py_obj_from_ptr(ptr: usize, vm: &VirtualMachine) -> PyResult { | ||
| if ptr == 0 { | ||
| return Err(vm.new_value_error("NULL pointer access")); | ||
| } | ||
| let raw_ptr = ptr as *mut crate::object::PyObject; | ||
| unsafe { | ||
| let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr)); | ||
| let obj = std::mem::ManuallyDrop::new(obj); | ||
| Ok((*obj).clone()) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the py_obj_from_ptr function and surrounding context
echo "=== py_obj_from_ptr function ==="
rg -n "py_obj_from_ptr" crates/vm/src/stdlib/ctypes.rs -A 15
echo ""
echo "=== Usages of py_obj_from_ptr in codebase ==="
rg -r "py_obj_from_ptr" --type rust
echo ""
echo "=== PyObjectRef::from_raw definition ==="
rg -n "fn from_raw" --type rust -B 3 -A 10 | head -50
echo ""
echo "=== Checking for validation in ctypes module ==="
grep -n "NULL\|validation\|check" crates/vm/src/stdlib/ctypes.rs | head -30Repository: RustPython/RustPython
Length of output: 5490
🏁 Script executed:
#!/bin/bash
# Look at the ctypes.rs file to understand the context around py_obj_from_ptr
echo "=== Context around lines 1084-1095 ==="
sed -n '1070,1110p' crates/vm/src/stdlib/ctypes.rs
echo ""
echo "=== Search for usages of PyObj_FromPtr ==="
rg "PyObj_FromPtr" --type rust -B 2 -A 2
echo ""
echo "=== Search for tests of this function ==="
rg "test.*py_obj_from_ptr\|py_obj_from_ptr.*test" --type rust -B 2 -A 5
echo ""
echo "=== Find PyObjectRef::from_raw ==="
rg "fn from_raw" --type rust -B 2 -A 8Repository: RustPython/RustPython
Length of output: 6542
🏁 Script executed:
#!/bin/bash
# Check the file size and path
ls -la crates/vm/src/stdlib/ctypes.rs
echo ""
echo "=== Lines 1080-1100 of ctypes.rs ==="
wc -l crates/vm/src/stdlib/ctypes.rs
sed -n '1080,1100p' crates/vm/src/stdlib/ctypes.rsRepository: RustPython/RustPython
Length of output: 954
🏁 Script executed:
cat crates/vm/src/stdlib/ctypes.rs | sed -n '1084,1095p'Repository: RustPython/RustPython
Length of output: 549
🏁 Script executed:
#!/bin/bash
set -e
# Search for the file
find . -name ctypes.rs -type f 2>/dev/null | head -5
# Show the function
echo "=== Function code ==="
rg -n "def py_obj_from_ptr|fn py_obj_from_ptr" --type rust -A 15
echo ""
echo "=== All references to PyObj_FromPtr ==="
rg "PyObj_FromPtr" --type rust
echo ""
echo "=== from_raw implementations ==="
rg "fn from_raw" --type rust -A 10Repository: RustPython/RustPython
Length of output: 6152
🏁 Script executed:
#!/bin/bash
# Search for the exact code pattern
rg -A 12 'PyObj_FromPtr.*pyfunction' --type rust
echo ""
echo "=== Check for usages of this function ==="
rg "py_obj_from_ptr\|PyObj_FromPtr" --type rust -B 2 -A 2
echo ""
echo "=== Examine from_raw method ==="
rg "\.from_raw" --type rust -B 3 -A 3Repository: RustPython/RustPython
Length of output: 140
🏁 Script executed:
#!/bin/bash
# First, locate and examine the exact function
rg -n "py_obj_from_ptr" --type rust -B 2 -A 15
# Search for any uses of this function
rg "py_obj_from_ptr" --type rust | grep -v "def \|fn " | head -10
# Search for from_raw usages to understand the pattern
rg "from_raw" --type rust -B 2 -A 5 | head -40
# Check if there are any tests
rg "test.*ctypes\|PyObj" --type rust | grep -i "test\|spec" | head -10Repository: RustPython/RustPython
Length of output: 3420
Critical safety issue: PyObj_FromPtr enables arbitrary memory dereference with insufficient validation
The py_obj_from_ptr function converts an arbitrary usize pointer to a PyObjectRef with only a NULL check. It is exposed via #[pyfunction], making it directly callable from Python code:
if ptr == 0 {
return Err(vm.new_value_error("NULL pointer access"));
}
let raw_ptr = ptr as *mut crate::object::PyObject;
unsafe {
let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr));This can cause:
- Segmentation faults from dereferencing invalid memory addresses
- Use-after-free if the pointer refers to a freed object
- Memory corruption from invalid reference counting or structure access
- Arbitrary memory read/write via the returned object
There is no validation that:
- The pointer points to valid, allocated memory
- The memory contains a valid
PyObjectwith correct layout - The object is still alive (not freed)
- The reference count is correct
Remove this function or restrict it to internal use only with comprehensive safety documentation explaining the caller's obligations. If it must remain public, add runtime validation (though this cannot catch all invalid pointers).
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes.rs around lines 1084-1095, the exposed
#[pyfunction] py_obj_from_ptr allows arbitrary usize -> PyObjectRef conversion
with only a NULL check; remove the Python-callable surface and either delete the
function or make it internal (remove #[pyfunction], change visibility to
pub(crate) or private) and document it as unsafe with clear caller obligations,
or gate it behind a safety feature flag (e.g. cfg(feature =
"internal_unsafe_api")); if you must keep a runtime entrypoint, replace the
pyfunction with a safe wrapper that performs strict validation using VM-managed
metadata (e.g. verify the pointer belongs to the VM’s allocation table/arena, is
aligned, type-tag matches PyObject, and the object is live) and ensure the
actual unsafe conversion is only in a clearly documented unsafe helper not
exported to Python.
8c5948f to
e640ebf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (7)
crates/vm/src/stdlib/ctypes/array.rs (2)
64-73: Guard againstelement_size * lengthoverflow when constructing arrayStgInfo
StgInfo::new_arrayis called withelement_size * lengthwithout overflow checking. For largelengthorelement_sizethis can overflowusize, corrupting the computed size and later allocations (e.g. inPyCArray::slot_new’s buffer).You already have a safe pattern in
PyCArrayType::Initializer(length > usize::MAX / item_size): mirror that here (e.g.checked_mul+OverflowError) before callingStgInfo::new_array.Suggested fix sketch
- let stg_info = StgInfo::new_array( - element_size * length, + let total_size = element_size + .checked_mul(length) + .ok_or_else(|| vm.new_overflow_error("array too large"))?; + let stg_info = StgInfo::new_array( + total_size, element_align, length, itemtype_ref.clone(), element_size, item_format.as_deref(), &item_shape, item_flags, );
613-637:c_wchar_preader hardcodes 32‑bit wchar and ignoresWCHAR_SIZEThe
"Z"arm ofread_element_from_bufferinterprets the target as*const u32, assuming 4‑byte wide chars. On Windowswchar_tis 2 bytes; on other platforms you already exposeWCHAR_SIZE,wchar_from_bytes, andwstring_from_bytesfor platform‑correct handling.This mismatch will break
c_wchar_parrays on Windows (wrong alignment/size) and duplicates logic that should be centralized in the wchar helpers.Refactor the
"Z"arm to useWCHAR_SIZE+wstring_from_bytes(or a shared helper inbase.rs) rather than*const u32, so wide pointer reading matches the rest of the ctypes wchar handling.Suggested direction (high level)
- Keep reading the pointer value as you do now.
- Use
std::slice::from_raw_parts(ptr as *const u8, len_in_bytes)and thenwstring_from_bytesto build theString.- Compute
len_in_bytesasn * WCHAR_SIZEwherenis the number of wchar elements up to the terminator, usingwchar_from_bytesto walk until0.Also applies to: 1214-1262
crates/vm/src/stdlib/ctypes/function.rs (2)
1377-1395: Pointer restypes (c_char_p/c_wchar_p/c_void_p) are always exposed as Python ints
convert_raw_resultmapsRawResult::Pointerdirectly tovm.ctx.new_int(*ptr), andReturnType::from_ffi_typealso returns ints for type codes"P","z", and"Z". Combined withCallInfo::is_pointer_return, this means:
restype = c_char_pyields an integer address, not bytes/None.restype = c_wchar_pyields an integer address, not str/None.restype = c_void_palso yields a bare int, which is arguable but inconsistent with ctypes’ pointer object behavior.This diverges from ctypes, where
"z"/"Z"restypes decode the pointed‑to C string and return bytes/str (or None), while generic pointers expose addresses.```web How does CPython ctypes treat function results when restype is c_char_p or c_wchar_p? ```Suggested high-level fix
- In
convert_raw_result, whenRawResult::Pointer(ptr)andcall_info.restype_objis a type with_type_"z"or"Z", read from foreign memory:
"z": ifptr == 0returnNone, else read achar*C string (likestring_at_impl) and returnPyBytes."Z": similarly, read awchar_t*string using the same platform‑aware logic aswstring_at_impland returnPyStr.- For
"P"and other pointer types, keep returning the integer address or wrap it in the appropriate ctypes object.- Optionally, reuse
string_at_impl/wstring_at_implto avoid duplicating the memory‑walk logic.
1409-1452: OUT parameters currently discard the function’s own return value
build_resultreturns only OUT values whenprepared.out_buffersis non‑empty:if prepared.out_buffers.is_empty() { return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none())); } let out_values = extract_out_values(prepared.out_buffers, vm); Ok(match <[PyObjectRef; 1]>::try_from(out_values) { ... })This drops the function’s result (even post‑
errcheck), whereas ctypes normally returns(result, *outs)whenparamflagsspecify OUT/INOUT parameters.```web In CPython ctypes, what is returned when a function has both a non-void restype and OUT parameters via paramflags? ```Suggested behavior
- Keep the current branch when
out_buffersis empty.- When there are OUT values, construct a tuple whose first element is
result.unwrap_or(vm.ctx.none()), followed by the extracted out values.- Preserve the single‑OUT special case only for compatibility if you explicitly want it, but align with ctypes’
(result, out1, out2, ...)semantics.crates/vm/src/stdlib/ctypes/simple.rs (2)
59-75:c_char_p/c_wchar_pstill rejectNonein constructors and.valueassignments
set_primitivefor"z"and"Z"only accepts ints/bytes/str:"z" => { if value.downcast_ref_if_exact::<PyInt>(vm).is_some() || value.downcast_ref_if_exact::<PyBytes>(vm).is_some() { ... } else { TypeError(...) } } "Z" => { if value.downcast_ref_if_exact::<PyStr>(vm).is_some() { ... } else { TypeError(...) } }
PyCSimple::slot_newandset_valueboth route explicit arguments throughset_primitivewhen they’re not the special bytes/str fast paths. As a result:
c_char_p(None)andc_wchar_p(None)raiseTypeErrorinstead of representing a NULL pointer.x.value = Nonefor these types also raises, rather than clearing the pointer.Although
from_paramacceptsNonefor"z","Z", and"P", constructors and attribute setters remain incompatible with ctypes semantics.Suggested fix
- Adjust
set_primitiveto acceptNonefor"z"and"Z"(similar to"P"):- "z" => { - if value.downcast_ref_if_exact::<PyInt>(vm).is_some() - || value.downcast_ref_if_exact::<PyBytes>(vm).is_some() - { - Ok(value.to_owned()) - } else { - Err(vm.new_type_error(...)) - } - } + "z" => { + if value.downcast_ref_if_exact::<PyInt>(vm).is_some() + || value.downcast_ref_if_exact::<PyBytes>(vm).is_some() + || value.downcast_ref_if_exact::<PyNone>(vm).is_some() + { + Ok(value.to_owned()) + } else { + Err(vm.new_type_error("bytes, integer address or None expected")) + } + }
- Do the analogous change for
"Z"(acceptPyNoneand mention “or None” in the error).- With that in place,
slot_newandset_valuewill naturally allowNoneand encode it as a NULL pointer (all‑zero buffer) viavalue_to_bytes_endian.Also applies to: 140-173, 961-1003, 1167-1221
705-713:c_wcharserialization ignores actualwchar_tsize (hardcoded 4 bytes)In
value_to_bytes_endian, the"u"(c_wchar) branch always encodes as a 4‑byteu32and returnsvec![0; 4]on fallback, regardless of platform. Other parts of the ctypes code (e.g., the wchar helpers inarray.rs/base.rs) correctly usesize_of::<libc::wchar_t>()viaWCHAR_SIZE.On Windows, where
wchar_tis 2 bytes, this will desynchronize simplec_wcharlayout from arrays/pointers and from the actual ABI, risking data corruption.Suggested adjustment
- Use the same platform‑dependent size as the rest of the ctypes code:
- "u" => { - // c_wchar - 4 bytes (wchar_t on most platforms) - if let Ok(s) = value.str(vm) - && let Some(c) = s.as_str().chars().next() - { - return to_bytes!(c as u32); - } - vec![0; 4] - } + "u" => { + use crate::stdlib::ctypes::array::WCHAR_SIZE; // or shared base helper + if let Ok(s) = value.str(vm) + && let Some(c) = s.as_str().chars().next() + { + let mut buf = vec![0u8; WCHAR_SIZE]; + super::array::wchar_to_bytes(c as u32, &mut buf); + return buf; + } + vec![0; WCHAR_SIZE] + }
- Or equivalently, reuse the shared
WCHAR_SIZE/wchar_to_bytesfrom the common wchar helper module.crates/vm/src/stdlib/ctypes/pointer.rs (1)
646-669:write_value_at_address"Z"branch still leaks wchar buffers and diverges from the safermake_wchar_bufferpatternThe
"Z"arm ofwrite_value_at_addressconstructs aVec<libc::wchar_t>, takes its pointer, andmem::forgets it:let mut wchars: Vec<libc::wchar_t> = s .as_str() .chars() .map(|c| c as libc::wchar_t) .chain(std::iter::once(0)) .collect(); let ptr = wchars.as_mut_ptr(); std::mem::forget(wchars); ptr as usizeThis heap allocation is never tracked or freed. Meanwhile,
setitem_by_indexhas its own"Z"+PyStrspecial case that usesmake_wchar_bufferand stores a holder inbuffer_holder, so you effectively have two divergent code paths for the same type:
- This helper leaks memory on every call.
- The higher-level setter uses a safer holder, but nothing prevents future callers from hitting this helper directly.
Suggested refactor
Remove the ad‑hoc
Vec<libc::wchar_t>+mem::forgetlogic and instead:
- Reuse
make_wchar_buffer(as insetitem_by_index) to build a managed buffer and return its pointer.- Ensure the corresponding owner object is kept alive via
keep_refor a dedicated holder, not just a single sharedbuffer_holder.If you want a single authoritative path, consider making
setitem_by_indexalways call a small helper that:
- Builds/keeps the holder.
- Writes the pointer.
- Registers the holder for lifetime tracking.
🧹 Nitpick comments (1)
crates/vm/src/stdlib/ctypes/function.rs (1)
500-535: Apply consistent wide-string handling to callback "Z" type, matchingwstring_at_implplatform logicIn
ffi_to_python, the "Z" (c_wchar_p) handler at lines 1631-1645 naively casts each wchar_t to char without platform branching, causing UTF-16 surrogate pairs on Windows to be mishandled. Align with the correct platform-aware logic used inwstring_at_impl: useWtf8Buf::from_wideon Windows andchar::from_u32on other platforms.Similarly, in
python_to_ffiat line 1702, "Z" is treated as a raw integer pointer rather than a true wchar buffer. For callback returns, create and manage a wchar buffer (similar tomake_wchar_bufferpattern) and return its pointer, keeping the buffer alive during callback execution to match non-callbackc_wchar_psemantics.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(8 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/pointer.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (2)
crates/vm/src/stdlib/ctypes/array.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (25)
new_array(172-216)size(659-661)size(1658-1660)buffer(347-347)buffer(354-354)mem(1942-1942)mem(1951-1951)mem(1960-1960)mem(1969-1969)mem(1978-1978)mem(1987-1987)mem(1996-1996)mem(2005-2005)mem(2014-2014)mem(2023-2023)mem(2034-2034)mem(2048-2048)mem(2089-2089)mem(2090-2090)std(365-365)std(366-366)std(2133-2133)std(2216-2216)new(149-165)new(1232-1240)
crates/vm/src/stdlib/ctypes/pointer.rs (3)
crates/vm/src/stdlib/ctypes/array.rs (17)
init(141-277)obj(1077-1077)obj(1085-1085)obj(1105-1105)obj(1117-1117)obj(1130-1130)obj(1141-1141)from_param(283-312)array_type_from_ctype(16-93)b(543-543)b(822-822)std(597-597)std(622-622)std(1215-1215)t(62-62)t(255-255)t(376-376)crates/vm/src/stdlib/ctypes/base.rs (57)
obj(758-758)obj(782-782)obj(804-804)obj(819-819)obj(1577-1577)obj(1579-1579)obj(1688-1689)obj(1716-1717)obj(1737-1738)obj(1762-1762)proto(291-293)proto(907-907)proto(977-977)proto(1527-1527)proto(1544-1544)mem(1942-1942)mem(1951-1951)mem(1960-1960)mem(1969-1969)mem(1978-1978)mem(1987-1987)mem(1996-1996)mem(2005-2005)mem(2014-2014)mem(2023-2023)mem(2034-2034)mem(2048-2048)mem(2089-2089)mem(2090-2090)value(841-842)value(850-850)value(850-850)value(872-872)value(886-886)value(901-901)value(911-911)value(949-949)value(1347-1347)value(1355-1355)value(1363-1363)value(1412-1412)value(1427-1427)value(1441-1441)value(1455-1455)value(1481-1481)from_bytes(426-436)std(365-365)std(366-366)std(2133-2133)std(2216-2216)buffer(347-347)buffer(354-354)read_ptr_from_buffer(2132-2139)at_address(461-474)offset(1653-1655)size(659-661)size(1658-1660)crates/vm/src/stdlib/ctypes/library.rs (1)
drop(46-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Check Rust code with clippy
| // 6. Get code pointer | ||
| let code_ptr = func_ptr | ||
| .unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set")); | ||
|
|
||
| // 7. Call the function | ||
| let raw_result = ctypes_callproc(code_ptr, &prepared, &call_info); | ||
|
|
||
| // 8. Build result | ||
| build_result(raw_result, &call_info, prepared, zelf, &args, vm) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling a CFuncPtr with a NULL function pointer will panic instead of raising a Python error
In PyCFuncPtr::call you compute:
let code_ptr = func_ptr
.unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));If the underlying buffer stores a NULL pointer (e.g., CFuncPtr() with no address, failed symbol lookup, or explicit zero address), get_code_ptr() returns None and this expect panics the VM instead of raising a Python exception like ValueError("NULL function pointer").
Suggested fix
- let code_ptr = func_ptr
- .unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));
+ let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) {
+ Some(cp) => cp,
+ None => {
+ return Err(vm.new_value_error("NULL function pointer"));
+ }
+ };🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/function.rs around lines 1477 to 1486, replace
the panic-causing expect when resolving the code pointer with a proper Python
exception: check if func_ptr is Some, otherwise call zelf.get_code_ptr() and if
that returns None return an Err containing a Python ValueError (e.g., "NULL
function pointer") via the VM API instead of calling expect; ensure the function
returns that Err so the VM raises the Python exception rather than panicking.
| fn setitem_by_index( | ||
| zelf: &Py<Self>, | ||
| index: isize, | ||
| value: PyObjectRef, | ||
| vm: &VirtualMachine, | ||
| ) -> PyResult { | ||
| let offset = offset.unwrap_or(0); | ||
| if offset < 0 { | ||
| return Err(vm.new_value_error("offset cannot be negative".to_owned())); | ||
| ) -> PyResult<()> { | ||
| let ptr_value = zelf.get_ptr_value(); | ||
| if ptr_value == 0 { | ||
| return Err(vm.new_value_error("NULL pointer access")); | ||
| } | ||
| let offset = offset as usize; | ||
| let size = std::mem::size_of::<usize>(); | ||
|
|
||
| let source_bytes = source.borrow_buf(); | ||
| let buffer_len = source_bytes.len(); | ||
| // Get element type, size and type_code from StgInfo.proto | ||
| let stg_info = zelf.class().stg_info(vm)?; | ||
| let proto_type = stg_info.proto(); | ||
|
|
||
| if offset + size > buffer_len { | ||
| return Err(vm.new_value_error(format!( | ||
| "Buffer size too small ({} instead of at least {} bytes)", | ||
| buffer_len, | ||
| offset + size | ||
| ))); | ||
| } | ||
| // Get type code from proto's _type_ attribute | ||
| let type_code: Option<String> = proto_type | ||
| .as_object() | ||
| .get_attr("_type_", vm) | ||
| .ok() | ||
| .and_then(|t| t.downcast_ref::<PyStr>().map(|s| s.to_string())); | ||
|
|
||
| // Read pointer value from buffer | ||
| let ptr_bytes = &source_bytes[offset..offset + size]; | ||
| let ptr_val = usize::from_ne_bytes(ptr_bytes.try_into().expect("size is checked above")); | ||
| let element_size = proto_type | ||
| .stg_info_opt() | ||
| .map_or(std::mem::size_of::<usize>(), |info| info.size); | ||
|
|
||
| Ok(PyCPointer { | ||
| _base: PyCData::new(CDataObject::from_bytes(vec![], None)), | ||
| contents: PyRwLock::new(vm.ctx.new_int(ptr_val).into()), | ||
| // Calculate address | ||
| let offset = index * element_size as isize; | ||
| let addr = (ptr_value as isize + offset) as usize; | ||
|
|
||
| // Write value at address | ||
| // Handle Structure/Array types by copying their buffer | ||
| if let Some(cdata) = value.downcast_ref::<super::PyCData>() | ||
| && (cdata.fast_isinstance(PyCStructure::static_type()) | ||
| || cdata.fast_isinstance(PyCArray::static_type()) | ||
| || cdata.fast_isinstance(PyCSimple::static_type())) | ||
| { | ||
| let src_buffer = cdata.buffer.read(); | ||
| let copy_len = src_buffer.len().min(element_size); | ||
| unsafe { | ||
| let dest_ptr = addr as *mut u8; | ||
| std::ptr::copy_nonoverlapping(src_buffer.as_ptr(), dest_ptr, copy_len); | ||
| } | ||
| } else { | ||
| // Handle c_wchar_p (Z) specially to store holder | ||
| if type_code.as_deref() == Some("Z") | ||
| && let Some(s) = value.downcast_ref::<PyStr>() | ||
| { | ||
| let (holder, ptr_val) = super::base::make_wchar_buffer(s.as_str(), vm); | ||
| *zelf.0.buffer_holder.write() = Some(holder); | ||
| unsafe { | ||
| *(addr as *mut usize) = ptr_val; | ||
| } | ||
| } else { | ||
| Self::write_value_at_address(addr, element_size, &value, type_code.as_deref(), vm)?; | ||
| } | ||
| } | ||
| .into_ref_with_type(vm, cls)? | ||
| .into()) | ||
|
|
||
| // KeepRef: store reference to keep value alive using actual index | ||
| zelf.0.keep_ref(index as usize, value, vm) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative pointer indices combined with keep_ref(index as usize) can blow up reference tracking
PyCPointer::setitem_by_index allows arbitrary (including negative) indices for pointer arithmetic:
let offset = index * element_size as isize;
let addr = (ptr_value as isize + offset) as usize;
// ...
zelf.0.keep_ref(index as usize, value, vm)Casting a negative index to usize wraps to a huge value (e.g. -1 → usize::MAX), which can cause pathological behavior in _objects/keep_ref (large sparse structures, unexpected panics, etc.).
You either need to normalize the index you use for keep_ref, or disallow negative indices for writes.
Suggested fix (simple)
- Keep pointer arithmetic support, but clamp or reject negative indices for reference tracking:
- let offset = index * element_size as isize;
+ let offset = index * element_size as isize;
let addr = (ptr_value as isize + offset) as usize;
// ...
- // KeepRef: store reference to keep value alive using actual index
- zelf.0.keep_ref(index as usize, value, vm)
+ // KeepRef: use a non-negative key (e.g. logical position) to track lifetime
+ let key = if index >= 0 { index as usize } else { 0 };
+ zelf.0.keep_ref(key, value, vm)- Or, if you don’t need per‑index tracking for pointers, consider always using a small fixed key (e.g.
0) instead of the index.
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/pointer.rs around lines 523–583, casting the
signed index to usize (keep_ref(index as usize, ...)) will wrap negative indices
into huge usize values; change the logic so keep_ref never receives a wrapped
value — either reject negative indices for writes (return a ValueError when
index < 0) or normalize the index used for tracking (e.g., map any negative
index to a fixed small key like 0 or clamp to 0) and pass that safe usize into
keep_ref; implement the chosen approach consistently before calling keep_ref.
e640ebf to
c8478c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/ctypes/function.rs (1)
300-385: Struct and Array return types incorrectly default to 32-bit intsWhen
ReturnType::to_ffi_typeforPyTypeRefreturnsNone(which happens for Structure and Array types without a_type_attribute),extract_call_infofalls back toType::i32(). Thefrom_ffi_typemethod then reads the return value as an i32, exposing only the low 32 bits as a Pythonintinstead of creating an instance of the declared ctypes type.Since CPython supports by-value struct and array returns, this implementation either needs to properly construct the return type instance from the memory value, or explicitly raise
TypeError/NotImplementedErrorrather than silently returning an integer.
♻️ Duplicate comments (9)
crates/vm/src/stdlib/ctypes/function.rs (4)
227-287: Typed argument conversion still bypassesfrom_param, breaking custom and built‑in ctypes conversions
ArgumentType::convert_objectforPyTypeRefonly:
- Checks for
CArgObject, pointer/structure types, and_type_ == "P", then- Tries a direct downcast to
PyCSimple, or- Calls the type as a constructor (
self.as_object().call((value.clone(),), vm)).It never calls the type’s
from_paramor routes throughcall_paramfuncfor the typed (argtypes) path. This means:
- Custom
from_paramon user ctypes types is ignored whenargtypesis set.- Built‑in behavior for
c_char_p,c_wchar_p, pointer/array types, etc. implemented viafrom_param/ParamFuncwon’t be used in the typed path.You already rely on
call_paramfuncin the no‑argtypespath (conv_param); the typed path should be unified to consultfrom_param(or a shared paramfunc) first and only fall back to the directPyCSimple/constructor logic when it’s absent or fails.
1376-1395: Pointer restypes (c_char_p,c_wchar_p,c_void_p) are always exposed as Python intsWhen
CallInfo::is_pointer_returnis true (for_type_"P" | "z" | "Z"),ctypes_callprocusesRawResult::Pointer, andconvert_raw_resultmaps that to:RawResult::Pointer(ptr) => Some(vm.ctx.new_int(*ptr).as_object().to_pyobject(vm)),So:
restype = c_char_p→ Pythonintaddress, never bytes/None.restype = c_wchar_p→ Pythonintaddress, neverstr/None.restype = c_void_p→ bareint, instead of ac_void_p/pointer object.That diverges sharply from ctypes expectations and from your own pointer/string helpers elsewhere.
You likely want to branch here on the restype’s
_type_:
"z"→ decodechar*as a null‑terminated C string →bytes/None."Z"→ decodewchar_t*using the samewstringlogic aswstring_at_impl/array::wstring_from_bytes→str/None."P"/ general pointer types → keep returning an address (or ac_void_pinstance).Right now, any API using those common restypes will get unusable integer addresses.
1409-1452: OUT parameters drop the function’s own return value instead of returning(result, *outs)In
build_result:if prepared.out_buffers.is_empty() { return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none())); } let out_values = extract_out_values(prepared.out_buffers, vm); Ok(match <[PyObjectRef; 1]>::try_from(out_values) { Ok([single]) => single, Err(v) => PyTuple::new_ref(v, &vm.ctx).into(), })As soon as any OUT buffers exist, the (possibly
errcheck‑processed) function result is discarded; callers only see the OUT values, unlike ctypes where the convention is(result, *outs).This will break code that relies on both a return value and OUT/INOUT parameters. Consider returning a tuple whose first element is
result(orNonewhen absent), followed by the OUT values, while keeping the existing “no OUTs → bare result” behavior.
1477-1480: Calling aCFuncPtrwith a NULL function pointer panics instead of raising a Python error
PyCFuncPtr::callresolves the code pointer as:let code_ptr = func_ptr .unwrap_or_else(|| zelf.get_code_ptr().expect("Function pointer should be set"));If:
- No COM function pointer is resolved (
func_ptrisNone), and- The stored pointer in
_base.bufferis NULL (get_code_ptr()returnsNone),then the
expectwill panic the VM instead of raising a Python exception likeValueError("NULL function pointer"). This also affects the COM‑style constructor path on non‑Windows where the pointer is left at 0.Return an error instead of panicking, e.g.:
let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) { Some(cp) => cp, None => return Err(vm.new_value_error("NULL function pointer")), };so callers get a proper Python‐level failure.
crates/vm/src/stdlib/ctypes/array.rs (2)
64-73: Uncheckedelement_size * lengthmultiplication can overflow when building arrayStgInfo
array_type_from_ctypecomputes the total size as:let stg_info = StgInfo::new_array( element_size * length, element_align, length, ... );with no
checked_mul/overflow guard. For very largelengthor element sizes this can overflowusize(panic in debug, wrap in release), creating a brokenStgInfo(wrong size/shape) for the array type.You already guard this in
PyCArrayType::Initializer(item_size != 0 && length > usize::MAX / item_size). Mirror that here (or usechecked_mul+OverflowError) before callingStgInfo::new_array.
613-638:c_wchar_parray elements still assume 32‑bitwchar_t, ignoringWCHAR_SIZEIn
read_element_from_bufferfor"Z":let ptr = ptr_val as *const u32; while *ptr.add(len) != 0 { ... } let wchars = std::slice::from_raw_parts(ptr, len); let s: String = wchars.iter().filter_map(|&c| char::from_u32(c)).collect();This hardcodes a 32‑bit
wchar_t. On Windows,libc::wchar_tis 16‑bit, so this is misaligned and undefined behavior. You’ve already introducedWCHAR_SIZE,wchar_from_bytes, andwstring_from_bytesbelow to handle platform‑dependentwchar_tsize; this branch should use those helpers instead of*const u32.E.g., read the pointer value as now, then treat it as a raw
u8buffer and feed it to awstring_from_bytes‑style reader that respectsWCHAR_SIZE.crates/vm/src/stdlib/ctypes/simple.rs (2)
140-161:set_primitiveforc_char_p/c_wchar_prejectsNone, breaking ctypes‑style NULL semanticsThe
"z"and"Z"arms currently accept only ints/bytes andstrrespectively:"z" => { if value.downcast_ref_if_exact::<PyInt>(vm).is_some() || value.downcast_ref_if_exact::<PyBytes>(vm).is_some() { ... } } "Z" => { if value.downcast_ref_if_exact::<PyStr>(vm).is_some() { ... } }
Noneis treated as an error, so:
c_char_p(None)/c_wchar_p(None)raiseTypeErrorinstead of representing a NULL pointer.x.value = Nonefor those types also fails.CPython accepts
Nonefor these as a NULL pointer, and your own pointer"P"case already does.PyCSimpleType::from_paramspecial‑casesNonefor"z" | "Z" | "P", but constructors andvaluesetter still go throughset_primitiveand break.Allow
Nonehere (e.g.,vm.is_none(value)in the condition) and adjust the error messages accordingly soNoneis treated as a valid “NULL pointer” input.
685-693:c_wcharserialization ignores actualwchar_tsize; should use sharedWCHAR_SIZEhelpersThe
"u"branch ofvalue_to_bytes_endianis:"u" => { // c_wchar - 4 bytes (wchar_t on most platforms) if let Ok(s) = value.str(vm) && let Some(c) = s.as_str().chars().next() { return to_bytes!(c as u32); } vec![0; 4] }This hardcodes 4 bytes and encodes via
u32, but elsewhere (array.rs) you’ve correctly abstractedwchar_tusingWCHAR_SIZE = size_of::<libc::wchar_t>()pluswchar_to_bytes. On Windows this mismatch will desynchronizec_wcharsize/representation from arrays, pointers, andStgInfo.Reuse the same abstraction here: allocate
WCHAR_SIZEbytes and callwchar_to_byteswith the code point, falling back tovec![0; WCHAR_SIZE]when invalid.crates/vm/src/stdlib/ctypes/pointer.rs (1)
523-591: Negative indices combined withkeep_ref(index as usize, …)can blow up reference tracking
setitem_by_indexallows arbitrary signed indices for pointer arithmetic:let offset = index * element_size as isize; let addr = (ptr_value as isize + offset) as usize; // ... return zelf.0.keep_ref(index as usize, converted, vm); // ... zelf.0.keep_ref(index as usize, value, vm)Casting a negative
indextousizewraps (e.g.-1 → usize::MAX), so_objects/keep_refmay get extremely large keys. That can lead to pathological behavior (huge sparse dicts, surprising growth, potential panics in other code assuming “small” indices).You probably want:
- To keep pointer arithmetic semantics for reads/writes, but
- Normalize the key used for
keep_ref(e.g. clamp negatives to 0 or some small fixed key), or disallow negative indices for writes.Right now, negative writes both mutate memory at a negative offset and also poison
_objectswith huge indices.
🧹 Nitpick comments (1)
crates/vm/src/stdlib/ctypes/pointer.rs (1)
133-155:set_typeupdatesStgInfo.protobut not the pointer’s buffer format metadata
PyCPointerType::set_typeonly does:if let Some(mut stg_info) = zelf.get_type_data_mut::<StgInfo>() { stg_info.proto = Some(typ_type.clone()); } zelf.as_object().set_attr("_type_", typ_type, vm)?;whereas the Initializer computes a full
StgInfoincluding:
proto,formatstring"&<element_format>",flags |= TYPEFLAG_ISPOINTER, etc.For incomplete pointer types that are later completed via
set_type,StgInfo.format(and any other derived fields) remain whatever they were at initialization time, ignoring the newly provided element type’s format. That can desync the buffer protocol metadata from the actual pointed‑to type.Not critical for basic usage, but you may want to recompute
stg_info.format(and any other derived fields) inset_typeto match the initializer’s logic.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(8 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/array.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (1)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (22)
obj(1081-1081)obj(1089-1089)obj(1109-1109)obj(1121-1121)obj(1134-1134)obj(1145-1145)value(295-295)array_type_from_ctype(16-93)b(543-543)b(826-826)std(597-597)std(622-622)std(1219-1219)__getitem__(848-863)item(850-850)item(858-858)item(965-965)item(973-973)t(62-62)t(255-255)t(376-376)__setitem__(955-978)crates/vm/src/stdlib/ctypes/base.rs (55)
obj(768-768)obj(792-792)obj(814-814)obj(829-829)obj(1581-1581)obj(1583-1583)obj(1692-1693)obj(1720-1721)obj(1741-1742)obj(1766-1766)mem(1946-1946)mem(1955-1955)mem(1964-1964)mem(1973-1973)mem(1982-1982)mem(1991-1991)mem(2000-2000)mem(2009-2009)mem(2018-2018)mem(2027-2027)mem(2038-2038)mem(2052-2052)mem(2093-2093)mem(2094-2094)new(149-165)new(1237-1245)value(851-852)value(860-860)value(860-860)value(882-882)value(896-896)value(911-911)value(921-921)value(1352-1352)value(1360-1360)value(1368-1368)value(1417-1417)value(1432-1432)value(1446-1446)value(1460-1460)value(1485-1485)type_ref(2148-2148)std(365-365)std(366-366)std(2137-2137)std(2220-2220)buffer(347-347)buffer(354-354)read_ptr_from_buffer(2136-2143)at_address(475-487)offset(1657-1659)type_code(1537-1537)type_code(1554-1554)size(669-671)size(1662-1664)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Check the WASM package and demo
c8478c0 to
2bf7dac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/ctypes.rs (1)
377-397:SIZEOF_TIME_Tis hard-coded and wrong on 64‑bit UnixYou currently expose:
#[cfg(target_os = "windows")] const SIZEOF_TIME_T: usize = 8; #[cfg(not(target_os = "windows"))] const SIZEOF_TIME_T: usize = 4;On 64‑bit Linux and many other Unix platforms,
time_tis 8 bytes, so this constant is incorrect there and will confuse code/tests that introspect sizes through ctypes.You should derive this from the actual C type:
const SIZEOF_TIME_T: usize = std::mem::size_of::<libc::time_t>();and drop the OS-based branching.
crates/vm/src/stdlib/ctypes/structure.rs (1)
514-588: Structure instance creation is correct, but_fields_instance getter is a stub
PyCStructure::slot_new:
- Reads
size,align, andlengthfrom the classStgInfo.- Marks the class as final on first instantiation.
- Collects field names from
_fields_and zero-initializes the buffer.- Applies kwargs and then positional args, with a clear “too many initializers” error.
That part looks solid. However, the
_fields_instance getter is currently:fn _fields_(&self, vm: &VirtualMachine) -> PyObjectRef { // Return the _fields_ from the class, not instance vm.ctx.none() }This contradicts the comment: callers expecting
instance._fields_to reflect the class fields will just getNone. If you want to support this attribute on instances, you probably want:self.0.as_object().class().as_object().get_attr("_fields_", vm).unwrap_or(vm.ctx.none())or similar. If you don’t intend to expose
_fields_on instances at all, consider removing this getset to let the class attribute be inherited normally.
♻️ Duplicate comments (7)
crates/vm/src/stdlib/ctypes/structure.rs (1)
13-29:calculate_struct_sizenow correctly propagates_fields_errorsSwitching to:
let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;instead of
unwrap_or_default()means malformed_fields_now raise rather than silently producing a 0‑sized structure. This is a nice improvement in debuggability and aligns with howprocess_fieldshandles invalid field tuples.crates/vm/src/stdlib/ctypes/union.rs (2)
12-29:calculate_union_sizenow surfaces malformed_fields_instead of silently returning 0Using:
let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm)?;prevents bad
_fields_definitions from turning into zero-sized unions. This improves debuggability and matches the stricter behavior introduced for structures.
553-577: Union buffer interface now matches structure’s 0‑D scalar behavior
AsBuffer for PyCUnion:
- Uses
info->formatif available (falling back to"B").- Exposes
len == itemsize == buffer_lenanddim_desc: vec![]for a 0‑dimensional buffer.This resolves the previous inconsistency with structures and aligns both with CPython’s “scalar struct/union” buffer semantics.
crates/vm/src/stdlib/ctypes/function.rs (3)
1473-1516: OUT parameters still discard the function’s own return valueWhen
prepared.out_buffersis non‑empty,build_resultignores the function’s result (even aftererrcheck) and returns only the OUT values:if prepared.out_buffers.is_empty() { return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none())); } let out_values = extract_out_values(prepared.out_buffers, vm); Ok(match <[PyObjectRef; 1]>::try_from(out_values) { ... })In ctypes, when paramflags specify OUT/INOUT parameters, the usual behavior is to return a tuple
(result, *outs), preserving the function’s own return value.You should change this branch to:
- Use
result.unwrap_or_else(|| vm.ctx.none())as the first element, and- Append the extracted OUT values, returning a tuple even for a single OUT (to avoid ambiguity).
1521-1556: Calling aCFuncPtrwith a NULL function pointer still invokes FFI with a nullCodePtrIn
PyCFuncPtr::call:let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) { Some(cp) => cp, None => { debug_assert!(false, "NULL function pointer"); CodePtr(std::ptr::null_mut()) } }; let raw_result = ctypes_callproc(code_ptr, &prepared, &call_info);If the underlying buffer holds a NULL pointer and no COM method override is present, this will call libffi with a null
CodePtr, leading to undefined behavior or a crash. The previousexpect("Function pointer should be set")panic was replaced, but the core problem remains.Instead, when both
func_ptrandzelf.get_code_ptr()areNone, return a Python exception (e.g.ValueError("NULL function pointer")) rather than constructing a nullCodePtr:let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) { Some(cp) => cp, None => return Err(vm.new_value_error("NULL function pointer")), };
739-782: COM-style constructor branch still runs on non-Windows and builds a NULL pointer objectThe COM constructor path:
if args.args.len() >= 2 && first_arg.try_int(vm).is_ok() && args.args[1].downcast_ref::<PyStr>().is_some() { #[cfg(windows)] let index = ...; // ... let paramflags = args.args.get(2)... return PyCFuncPtr { _base: PyCData::from_bytes(vec![0u8; ptr_size], None), ... }.into_ref... }is compiled and executed on all platforms. On non‑Windows:
- The condition still matches, so this branch creates a
PyCFuncPtrwith a NULL function pointer and COM-related metadata partially absent.- Later,
callwill fall back toget_code_ptr()and hit the null‑pointer case above, causing UB/crash instead of a clean Python error.The COM constructor should be Windows-only or explicitly rejected elsewhere. Options:
- Guard the entire
ifblock with#[cfg(windows)]and make it a no-op on non‑Windows, or- On non‑Windows, detect this pattern early and
return Err(vm.new_type_error("COM methods are only supported on Windows")).crates/vm/src/stdlib/ctypes.rs (1)
1093-1105:PyObj_FromPtris an extremely unsafe public escape hatch
PyObj_FromPtrexposes:let raw_ptr = ptr as *mut crate::object::PyObject; unsafe { let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr)); let obj = std::mem::ManuallyDrop::new(obj); Ok((*obj).clone()) }This lets arbitrary user code reinterpret any integer as a
PyObjectRefwith only a NULL check. Passing an invalid, stale, or non‑RustPython pointer can cause memory corruption or crashes, defeating RustPython’s usual safety guarantees.If this is only for internal testing/compatibility with CPython’s
PyObj_FromPtr, please:
- Remove the
#[pyfunction]export, or- Gate it behind a clearly documented
cfg(feature = "unsafe_debug_apis"), or- At minimum, mark it as unsafe on the Python side and limit its availability (e.g. only in debug builds).
Leaving it callable from normal Python code in release builds is a significant safety hazard.
🧹 Nitpick comments (3)
crates/vm/src/stdlib/ctypes/function.rs (3)
79-150: Potentially surprising integer handling inconvert_to_pointerFor integer inputs,
convert_to_pointeruses:int_val.as_bigint().to_usize().unwrap_or(0)so any value outside
usizesilently turns into0(NULL). That can hide caller mistakes (e.g. negative addresses or too‑large values) and lead to confusing NULL dereferences.Consider returning an
OverflowError/ValueErrorinstead when the integer cannot be represented asusize, to make misuses visible.
954-1030:extract_call_infopointer-return detection is hard‑coded to"P"|"z"|"Z"
is_pointer_returnis currently:.is_some_and(|tc| matches!(tc.as_str(), "P" | "z" | "Z"))This works for
c_void_p,c_char_p,c_wchar_p, but misses user pointer types (e.g.POINTER(T)where_type_is a dynamically created type code or composite structure). In those cases,ctypes_callprocwill treat the return as aValueinstead ofPointer, andconvert_raw_resultwon’t run the pointer-specific logic.Consider deriving pointer-return behavior from the restype’s
StgInfo/ flags (TYPEFLAG_ISPOINTER) rather than only_type_string, so custom pointer subclasses and more complex pointer restypes are correctly recognized.
1392-1460: Wide-string pointer restype decoding is platform-agnostic and may mishandle UTF‑16 on WindowsFor
RawResult::Pointerwith_type_ == "Z"(c_wchar_p), you read alibc::wchar_t*and then build aStringvia:let slice = std::slice::from_raw_parts(wstr_ptr, len); let s: String = slice .iter() .filter_map(|&c| char::from_u32(c as u32)) .collect();On Windows (
wchar_t= 16 bits, UTF‑16), this treats each 16‑bit code unit as an independent Unicode scalar, which breaks surrogate pairs and produces incorrect results for non‑BMP characters. Inwstring_at_implyou already useWtf8Buf::from_wideon Windows to handle UTF‑16 correctly.It would be better to share that logic here (e.g. a helper that takes
*const wchar_tand length and returns aStringusing Wtf8 on Windows andchar::from_u32on Unix), soc_wchar_prestypes behave consistently withwstring_at.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
.cspell.dict/python-more.txt(2 hunks).cspell.json(3 hunks)crates/stdlib/src/pystruct.rs(1 hunks)crates/vm/src/buffer.rs(1 hunks)crates/vm/src/builtins/builtin_func.rs(2 hunks)crates/vm/src/builtins/function.rs(1 hunks)crates/vm/src/builtins/object.rs(1 hunks)crates/vm/src/builtins/str.rs(2 hunks)crates/vm/src/builtins/type.rs(1 hunks)crates/vm/src/exceptions.rs(4 hunks)crates/vm/src/object/core.rs(1 hunks)crates/vm/src/protocol/buffer.rs(1 hunks)crates/vm/src/stdlib/ast/python.rs(1 hunks)crates/vm/src/stdlib/codecs.rs(2 hunks)crates/vm/src/stdlib/ctypes.rs(14 hunks)crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(8 hunks)crates/vm/src/stdlib/ctypes/library.rs(5 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)crates/vm/src/stdlib/ctypes/structure.rs(10 hunks)crates/vm/src/stdlib/ctypes/thunk.rs(0 hunks)crates/vm/src/stdlib/ctypes/union.rs(4 hunks)crates/vm/src/stdlib/ctypes/util.rs(0 hunks)crates/vm/src/stdlib/functools.rs(1 hunks)crates/vm/src/stdlib/operator.rs(1 hunks)crates/vm/src/types/structseq.rs(2 hunks)
💤 Files with no reviewable changes (2)
- crates/vm/src/stdlib/ctypes/util.rs
- crates/vm/src/stdlib/ctypes/thunk.rs
🚧 Files skipped from review as they are similar to previous changes (8)
- crates/vm/src/builtins/object.rs
- crates/vm/src/stdlib/operator.rs
- crates/vm/src/builtins/builtin_func.rs
- crates/vm/src/types/structseq.rs
- .cspell.dict/python-more.txt
- crates/vm/src/builtins/function.rs
- crates/vm/src/stdlib/codecs.rs
- crates/vm/src/object/core.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/protocol/buffer.rscrates/vm/src/buffer.rscrates/vm/src/stdlib/functools.rscrates/vm/src/stdlib/ast/python.rscrates/stdlib/src/pystruct.rscrates/vm/src/exceptions.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/library.rscrates/vm/src/builtins/type.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/builtins/str.rs
🧠 Learnings (7)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/exceptions.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.
Applied to files:
crates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/ctypes/simple.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
Applied to files:
crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
.cspell.json
🧬 Code graph analysis (5)
crates/vm/src/stdlib/functools.rs (3)
crates/vm/src/builtins/builtin_func.rs (2)
__reduce__(123-126)__reduce__(169-182)crates/vm/src/builtins/function.rs (1)
__reduce__(804-812)crates/vm/src/builtins/object.rs (1)
__reduce__(503-505)
crates/stdlib/src/pystruct.rs (2)
crates/vm/src/builtins/str.rs (3)
s(1783-1788)s(1814-1818)isascii(630-632)crates/vm/src/builtins/bytes.rs (1)
isascii(262-264)
crates/vm/src/exceptions.rs (1)
crates/vm/src/stdlib/os.rs (1)
strerror(1544-1548)
crates/vm/src/stdlib/ctypes/union.rs (2)
crates/vm/src/stdlib/ctypes/base.rs (28)
std(365-365)std(366-366)std(2137-2137)std(2220-2220)default(129-145)obj(768-768)obj(792-792)obj(814-814)obj(829-829)obj(1581-1581)obj(1583-1583)obj(1692-1693)obj(1720-1721)obj(1741-1742)obj(1766-1766)set_or_init_stginfo(2146-2152)new(149-165)new(1237-1245)get_usize_attr(2115-2132)size(669-671)size(1662-1664)is_big_endian(334-340)proto(291-293)proto(917-917)proto(982-982)proto(1531-1531)proto(1548-1548)attr(938-938)crates/vm/src/protocol/buffer.rs (1)
format(187-200)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytearray.rs (1)
isascii(271-273)crates/vm/src/builtins/bytes.rs (1)
isascii(262-264)crates/vm/src/bytes_inner.rs (1)
isascii(326-328)
🪛 Biome (2.1.2)
.cspell.json
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run tests under miri
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check Rust code with clippy
🔇 Additional comments (46)
.cspell.json (2)
78-78: Static analysis hints are false positives.The Biome parse errors are false positives. The JSON syntax is valid, and the added words are correctly formatted as string entries within the words array.
Also applies to: 110-110, 134-134
78-78: LGTM!The added spell-check vocabulary terms are appropriate technical terms that align with the ctypes overhaul and broader codebase changes.
Also applies to: 110-110, 134-134
crates/vm/src/stdlib/ast/python.rs (1)
50-51: LGTM! No Rust-level references to the old method name detected.The refactoring correctly aligns the Rust method name with the Python attribute name, allowing the simplified
#[pyattr]decorator. All access to the_fieldsattribute uses Python-level dynamic lookup viaget_attr("_fields", vm), ensuring the change is backward compatible and introduces no breaking changes.crates/vm/src/protocol/buffer.rs (1)
205-216: LGTM! Scalar buffer validation correctly implemented.The branching logic properly handles zero-dimensional (scalar) buffers, which is essential for ctypes structures. The validation is sound:
- Scalar case (ndim == 0): Correctly asserts
itemsize == lensince the buffer contains exactly one item.- Multi-dimensional case: Properly validates total size, strides, and suboffsets through shape product calculation.
This change is consistent with existing methods like
for_each_segmentandzip_eq, which already handle thendim() == 0case by returning early with special handling. Buffer creation sites in ctypes structures correctly initialize scalar buffers with an emptydim_descvector to setndim=0.crates/stdlib/src/pystruct.rs (1)
30-41: Use ofPyStr::isasciiis consistent with the new string APISwitching to
s.isascii()matches the renamedPyStrhelper and keeps the ASCII-only invariant for struct format strings without changing behavior.crates/vm/src/buffer.rs (1)
264-312: PEP 3118 extended-spec skipping looks correct; worth validating against edge casesThe new handling for
T{...}/X{...}, array shapes(...), field names:name:, and standalone{/}is internally consistent: it correctly tracks nesting, errors on unmatched delimiters, and cleanly skips constructs the currentFormatTypedoesn’t model. This should allow parsing richer PEP 3118 buffer formats without breaking existing struct codes.Given the subtle semantics, I’d recommend adding/confirming tests for:
- Unmatched
{,(, or:at end of format.- Nested braces/parentheses.
- Formats mixing repeats or regular codes around these extended constructs.
crates/vm/src/exceptions.rs (3)
605-635:__suppress_context__pygetset wiring is coherentExposing
__suppress_context__via a dedicated getter plus the existingset_suppress_contextsetter cleanly matches the Python attribute name and keeps internal use ofAtomicCell<bool>intact, with no behavior change.
1102-1116: Serializer now trackssuppress_contextvia the new dunder accessorSwitching serialization to
self.exc.__suppress_context__()keeps the on-the-wire shape the same while aligning with the new API; no functional issues here.
1552-1562:PyOSErrorlayout andstrerroraccessor updates are consistent with usageAdding
#[repr(C)]toPyOSErroris compatible with its explicitbase: PyExceptionfield and the transparent wrappers for subclasses, and it matches the comments that rely on well-defined layout. Renaming the getter tostrerrorand binding it via#[pygetset]plus#[pygetset(setter, name = "strerror")]keeps the Python attribute name stable while simplifying the Rust side.Also applies to: 1861-1868
crates/vm/src/builtins/str.rs (2)
628-632:str.isasciiimplementation matches internalStrKindinvariantUsing
matches!(self.kind(), StrKind::Ascii)provides an O(1) check that’s consistent with howStrKindis maintained and withbytes/bytearray’sisasciihelpers, while the#[pymethod]name exposes the correct Python method.
963-984:__format__binding rename preserves formatting semanticsRenaming the method to
__format__and relying on a plain#[pymethod]only affects the binding name; the logic (empty-spec fast path, UTF‑8 validation,FormatSpecparsing, and use ofCharLenStr) is unchanged, sostr.__format__behavior stays intact.crates/vm/src/stdlib/functools.rs (1)
76-101:PyPartial.__reduce__rename keeps pickle protocol intactSwitching to a method literally named
__reduce__with a plain#[pymethod]keeps the pickle tuple(partial_type, (func,), state)identical while making the Rust binding clearer; no behavioral concerns.crates/vm/src/builtins/type.rs (1)
1448-1452:get_mroto__mro__renaming is completeThe renaming is clean with no remaining callsites. The
#[pygetset]macro correctly exposes__mro__as a Python property, and the implementation properly preserves behavior viamro_map_collect. Rust code uses internal methods and themrofield directly, not the oldget_mromethod.crates/vm/src/stdlib/ctypes/function.rs (3)
40-77: Wide‐char FFI mapping looks correct, but signedness for'c'is inconsistent with tags
ffi_type_from_tagnow usesWideCharsize to chooseType::u16()vsType::u32()for'u', which correctly tracks platformwchar_tsize and aligns withtype_info("u"). Good change.Note however that
"c"intype_infomaps toType::u8, while the'c'tag here maps toType::i8(). That signed/unsigned mismatch can cause subtle differences between code paths that use_type_vsCArgObject.tagfor the samec_charvalue (e.g. negative bytes). If you care about strict CPython parity, consider unifying these to a single choice (likelyi8).
[ suggest_recommended_refactor ]
573-680:cast_implpointer/value semantics and_objectstracking look correctThe revamped
cast_impl:
- Validates the second argument is a pointer-like type via
cast_check_pointertype.- Extracts the pointer value from integers, bytes, str, pointer instances, simple ctypes and other
PyCDatainstances.- Shares and updates the
_objectsdict betweensrcand the result to keep the source object alive, and- Writes the pointer either into a
PyCPointerresult or directly into the result’s buffer.This matches CPython’s
cast/_objects lifetime behavior and fixes the previous incorrectaddressofhandling. No issues spotted here.
1656-1850: Callback thunk implementation looks solid and mirrors ctypes semanticsThe new
PyCThunk/thunk_callbackpath:
- Builds
arg_types/res_typevia type codes andget_ffi_type.- Converts C arguments to Python in
ffi_to_python, includingz/Z/P/?handling.- Invokes the Python callable and routes exceptions through
run_unraisablewith a helpful message.- Converts the Python result back to C in
python_to_ffi, matching type codes and sizes.- Manages
ThunkUserDatalifetime safely viaThunkDataand aDropimpl.This is a non-trivial and error-prone area; the layering and lifetime management look correct.
crates/vm/src/stdlib/ctypes.rs (10)
118-135:LONG_DOUBLE_SIZEfallback seems reasonable but may mask exotic ABIsThe new
LONG_DOUBLE_SIZEconstants handle:
- 16 bytes on x86_64/aarch64 non‑Windows,
sizeof(c_double)on Windows and the remaining platforms.This matches most common ABIs (80‑bit extended on x86_64, 128‑bit on AArch64, double on Windows). If you need to support more exotic long‑double ABIs, this may need refinement, but for mainstream targets this is a sensible simplification.
142-237:type_infonow correctly tracks platformc_long,c_ulong, andc_wcharsizesThe new
type_info:
- Uses
mem::size_of::<c_long>()/c_ulong()to choosei32/i64andu32/u64.- Uses
mem::size_of::<WideChar>()to chooseu16vsu32for"u".- Maps pointer-like codes to
Type::pointer().This fixes the previous hard‑coded 32‑bit mappings and aligns the FFI types with the actual C layout on LP64 vs LLP64 platforms. Good, no issues here.
499-560:sizeoffor types/instances looks correct and much closer to CPythonThe new
sizeofimplementation:
- Prefers
StgInfo.sizefor ctypes types.- Falls back to struct/union size calculators and simple type codes.
- Returns pointer-size for pointer types and instances, and uses
PyCData::size()for actual cdata.This is substantially more robust and matches CPython’s behavior well; the remaining fallbacks to
sizeof::<usize>()for simple types without_type_seem acceptable as a last resort.
617-685: Pointer type creation and caching are sound, but incomplete-pointer handling is subtle
create_pointer_type:
- Uses
_pointer_type_cachein_ctypesto memoize pointer types.- Produces incomplete pointer types when
clsis astrname, keyed by the result’sid.- Produces complete pointer types when
clsis a ctypes type, keyed bycls.This follows CPython’s approach. One subtlety: the incomplete pointer types are only retrievable by object id, not by the original string name, so calling
POINTER("Foo")twice will create two distinct types. That matches CPython, but it’s worth being aware of; if you ever want to change that, you’d need a mapping keyed on the string instead.
716-724:addressofnow returns actual cdata buffer address (fixes previous bug)Switching
addressofto:if let Some(cdata) = obj.downcast_ref::<PyCData>() { Ok(cdata.buffer.read().as_ptr() as usize) }removes the incorrect
size_of::<usize>()special case forPyCPointerand always returns the underlying buffer address, which is what ctypes users expect. Good fix.
726-762:byrefsemantics and lifetime tracking match CPythonThe new
byref:
- Validates the argument is a ctypes instance.
- Computes
(char *)b_ptr + offsetforPyCSimpleandPyCData, matching CPython’s pointer arithmetic.- Stores the base object in
CArgObject.objand the offset inCArgObject.offset, so lifetime and address are tracked correctly.This is a solid and faithful port of CPython’s
byref.
864-904:resizecorrectly guards ownership and avoids mutating borrowed buffers
resize:
- Verifies the class has initialized
StgInfo(so this is a ctypes instance).- Rejects sizes smaller than the original
size, with appropriateValueError.- Uses
Cowchecks to prevent resizing borrowed memory (Cow::Borrowed), returning a clear error.- Reallocates and copies the buffer safely when owned.
This is a careful implementation and should be safe.
1043-1091:call_function_internalraw pointer usage looks safe given argument lifetimesThe helper:
- Validates
func_addr != 0.- Builds
arg_values: Vec<isize>andffi_args: Vec<Arg>that point into this stack-allocated vector.- For
bytesandstr, it takes pointers into the underlying buffers but the originalargstuple (and its elements) live at least until the FFI call completes, so there is no dangling-pointer risk here.The conversion set is intentionally small and the error message (“Don’t know how to convert parameter …”) is clear. No changes needed.
1144-1182: WindowsFormatErrorhelper looks correct and frees the buffer
FormatError:
- Fetches or uses the provided error code.
- Calls
FormatMessageWwithFORMAT_MESSAGE_ALLOCATE_BUFFER.- Handles the zero-length / null-buffer case with a safe
"<no description>"fallback.- Converts to UTF‑16, trims the trailing newline, and calls
LocalFreeon the buffer.This is a good, well-scoped wrapper around the Windows APIs.
1184-1244:CopyComPointercorrectly handles AddRef semantics and byref destinationThe COM helper:
- Locates the destination pointer-to-pointer via a
CArgObjectfrombyref, returningE_POINTERwhen the shape is wrong or pointers are NULL.- Reads the source COM pointer from a ctypes instance buffer or treats
Noneas NULL.- Calls
AddRefwhensrc_ptr != 0by reading the IUnknown vtable and invoking the function at index 1.- Stores the pointer into
*pdstand returnsS_OK.This matches expected COM semantics (copy and AddRef) and should interoperate well with ctypes COM usages.
crates/vm/src/stdlib/ctypes/array.rs (7)
137-278: PyCArrayType initialization and_length_/_type_validation look robustThe
InitializerforPyCArrayType:
- Validates
_length_is a non-negative integer fitting inusize, with clearTypeError/ValueError/OverflowErrormessages.- Validates
_type_is a proper ctypes type with initializedStgInfo.- Checks for
item_size * lengthoverflow.- Propagates flags and adds char/wchar getsets based on the element type code.
This is a solid and CPython-like initialization flow for array subtypes.
315-351:PyCArrayType.__mul__overflow guard is good, but relies onstg_info.sizeIn
AsNumberforPyCArrayType, you:
- Convert
bto anisizeand reject negatives.- Use
stg_info.sizeand anisize::MAX / current_sizecheck to preventcurrent_size * noverflow before callingarray_type_from_ctype.That’s the right pattern; just be aware it uses the existing array type’s total size as the “element size” for the new array-of-arrays type, which is consistent with how
(T * n) * mbehaves in ctypes. Combined with the earlier comment about guardingelement_size * lengthinsidearray_type_from_ctype, this will give you consistent safety.
441-523:int_to_bytes/bytes_to_intcorrectly handle signed/unsigned and size variantsThe helpers:
- Try unsigned conversions first for 1/2/4/8 bytes and fall back to signed for negative values.
- Inverse that mapping when reading back from bytes based on the presence of unsigned type codes (
B,H,I,L,Q).This matches how ctypes treats integral arrays and should round-trip typical values correctly.
668-801:write_element_to_buffernicely unifies type-specific writes and keep_ref handlingThe writer:
- Handles
c,u,z,Z,f,d/g, ctypes instances, and plain ints.- Uses
ensure_z_null_terminatedandstr_to_wchar_bytesto allocate backing buffers and then stores their holders viakeep_ref, tying lifetime to the array element.- Calls
PyCData::should_keep_ref/get_kept_objectsfor other ctypes instances to keep backing objects alive where needed.This is a big improvement in clarity and correctness over ad-hoc per‑type code; no issues spotted.
844-1014: Slice get/set implementations match CPython semantics, including char/wchar special cases
__getitem__/__setitem__:
- Delegate index vs slice handling to
getitem_by_index/setitem_by_indexandgetitem_by_slice/setitem_by_slice.getitem_by_slicereturns:
bytesforc_chararrays,strforc_wchararrays (usingWCHAR_SIZEandwchar_from_bytes), and- a list for other types.
setitem_by_sliceusesSaturatedSliceIterto correctly handle negative steps and enforces size equality between source and target.These behaviors are consistent with CPython’s ctypes array slicing.
1030-1072: Array buffer interface correctly exposes PEP 3118 metadata when available
as_buffer:
- Pulls
format,shape, andelement_sizefrom the type’sStgInfo.- Computes C-contiguous strides from innermost to outermost dimension.
- Falls back to a simple 1D buffer when no format/shape info is available.
This should integrate cleanly with NumPy and other PEP 3118 consumers.
1077-1210: Char/wchar array getsets and helpers are well-implementedThe dynamically added getsets:
- Provide
value/rawforc_chararrays andvalueforc_wchararrays.- Use
wchar_from_bytes/wchar_to_bytesandwstring_from_byteswithWCHAR_SIZEto handle platform-dependentwchar_t.These helpers give a convenient, CPython-like API while respecting platform width differences.
crates/vm/src/stdlib/ctypes/library.rs (3)
19-43: SharedLibrary abstraction and close semantics look fine
SharedLibrarywrapslibloading::Libraryin aPyMutex<Option<_>>and provides:
close()that setslibtoNone,Dropthat callsclose()again.Using the
Library’s address as the key for the handle is a bit unconventional but internally consistent: as long as you always go throughlibcache()to create/load libraries, the numeric handle you hand to Python uniquely identifies that instance. No issues from a correctness standpoint.
66-86:get_or_insert_libdoesn’t deduplicate by path, but behavior is consistentThe new
get_or_insert_lib:let new_lib = SharedLibrary::new(library_path)?; let key = new_lib.get_pointer(); match self.libraries.get(&key) { ... } self.libraries.insert(key, new_lib); Ok((key, self.libraries.get(&key).expect("just inserted")))This effectively means each call returns a fresh handle (keyed by the new
SharedLibrarypointer). There’s no deduplication for the same path, but ctypes doesn’t strictly require it, and thedrop_libpath will correctly drop whatever handle the caller passes back.If you ever want re-use by path (e.g. to share handles for the same library), you’d need a second map keyed by
OsString, but the current behavior is still correct.
93-98:libcacherefactor tostatic_cell!is a good improvementMoving
LIBCACHEintolibcache()viastatic_cell!:
- Avoids manual
lazy_static-style boilerplate,- Keeps visibility scoped to this module via
pub(super),- Still returns a stable
'staticreference to a sharedPyRwLock<ExternalLibs>.This is a neat and idiomatic way to manage the global ctypes library cache.
crates/vm/src/stdlib/ctypes/structure.rs (4)
66-129: PyCStructType initialization flow is well-structured and handles inheritance correctlyThe
InitializerforPyCStructType:
- Skips initialization for abstract types (
_abstract_).- Uses
check_not_initializedto prevent double init.- Processes direct
_fields_viaprocess_fields, or- Clones
StgInfofrom the base class, clearingFINALand then marking the base as final, or- Sets a default
StgInfofor types without_fields_.This mirrors CPython’s behavior and avoids RwLock reentrancy by carefully dropping guards before taking mutable references. Nicely done.
169-387:process_fieldsis comprehensive; just ensureget_field_formatcovers all relevant ctypesThe field-processing logic:
- Validates
_fields_type, packing, and_align_.- Inherits size/align/flags/FFI field types from the base.
- Computes padding and alignment per field, including array shapes.
- Builds a PEP 3118
T{...}format string with field names and padding.- Propagates pointer/union/bitfield flags into the type’s
StgInfo.This is a faithful port of CPython’s structure layout logic. The only assumption is that
get_field_formatcorrectly covers all ctypes-y field types; given that’s centralized inbase, this is the right place to delegate.
427-491:SetAttrforPyCStructTypecorrectly protects_fields_and respects descriptorsThe custom
setattro:
- Rejects modifications to
_fields_once the type is final.- Uses the descriptor protocol to let data descriptors (e.g. slots) handle assignments before falling back to plain attribute storage.
- Calls
process_fieldswhenever_fields_is set, updatingStgInfoand descriptors accordingly.This avoids deadlocks by checking
StgInfofinality in a short-lived scope and then performing the heavier work afterwards. Looks good.
598-636: Structure buffer interface is consistent with union and exposes correct itemsize
AsBuffer for PyCStructure:
- Uses the current buffer length as both
lenanditemsize, treating the structure as a scalar object.- Pulls
formatfrom typeStgInfowith"B"as a fallback, and usesdim_desc: vec![]for a 0‑D buffer.This matches the new union behavior and should integrate cleanly with buffer consumers.
crates/vm/src/stdlib/ctypes/union.rs (6)
66-133: PyCUnionType initialization mirrors structure logic and handles inheritance/finality wellThe
Initializer:
- Skips
_abstract_types.- Uses
check_not_initializedto avoid double init.- If
_fields_is present directly, delegates toset_attrto triggersetattroandprocess_fields.- Otherwise, clones
StgInfofrom the base (clearingFINAL) or builds a default unionStgInfowithTYPEFLAG_HASUNIONand a"B"format.This is a clean and consistent design with the structure side.
136-296: Unionprocess_fieldscorrectly computes max size/align and updates flagsFor each field in
_fields_,process_fields:
- Validates the tuple structure and name type.
- Uses base helpers to get size/align and apply packing.
- Tracks the maximum field size and effective alignment.
- Propagates pointer and bitfield flags, marks field types as final, and collects FFI field types.
- Sets every field’s offset to 0 (union semantics).
- Computes an aligned total size and stores a
StgInfowithTYPEFLAG_HASUNION,DICTFLAG_FINAL, and a"B"format.This is exactly what we want for a union layout engine.
299-345:from_paramfor unions provides good diagnostics and handles CArgObject/protoThe new
from_param:
- Reuses an existing instance of the requested union type.
- Accepts a
CArgObjectwhen its wrapped object matches the union’sprotoor the union type itself, with a clear error when the pointer is to the wrong type.- Recurses through
_as_parameter_when needed.This matches ctypes customization patterns and gives clear error messages on misuse.
359-413: CustomSetAttrforPyCUnionTypecorrectly hooks_fields_without breaking descriptors
setattro:
- First defers to any data descriptor on the metatype, and if the attribute is
_fields_, runsprocess_fieldsafterwards.- Otherwise, stores/removes attributes in the type’s dict.
- When
_fields_is assigned directly, again callsprocess_fieldssoStgInfoand CField descriptors are created.This is symmetric with the structure implementation and should behave well under subclassing.
432-449: Union instance creation correctly usesStgInfoand finalizes the type
PyCUnion::slot_new:
- Reads
sizeandalignfrom the type’sStgInfo.- Marks the type as final on first instantiation.
- Allocates a zero-initialized buffer via
PyCData::from_stg_info.This keeps layout metadata centralized in
StgInfoand aligns with the structure side.
521-545: UnionInitializermirrors structure init and handles extra args nicely
PyCUnion::init:
- Walks positional args through the inheritance chain with
init_pos_args, checking for “too many initializers”.- Then applies keyword arguments as attribute assignments.
This is consistent with
Structureinitialization and should make union construction intuitive.
| fn convert_object(&self, value: PyObjectRef, vm: &VirtualMachine) -> PyResult<FfiArgValue> { | ||
| // Call from_param first to convert the value (like CPython's callproc.c:1235) | ||
| // converter = PyTuple_GET_ITEM(argtypes, i); | ||
| // v = PyObject_CallOneArg(converter, arg); | ||
| let from_param = self | ||
| .as_object() | ||
| .get_attr(vm.ctx.intern_str("from_param"), vm)?; | ||
| let converted = from_param.call((value.clone(),), vm)?; | ||
|
|
||
| // Then pass the converted value to ConvParam logic | ||
| // CArgObject (from from_param) -> use stored value directly | ||
| if let Some(carg) = converted.downcast_ref::<CArgObject>() { | ||
| return Ok(carg.value); | ||
| } | ||
|
|
||
| // None -> NULL pointer | ||
| if vm.is_none(&converted) { | ||
| return Ok(FfiArgValue::Pointer(0)); | ||
| } | ||
|
|
||
| // For pointer types (POINTER(T)), we need to pass the ADDRESS of the value's buffer | ||
| if self.fast_issubclass(PyCPointer::static_type()) { | ||
| if let Some(cdata) = converted.downcast_ref::<PyCData>() { | ||
| let addr = cdata.buffer.read().as_ptr() as usize; | ||
| return Ok(FfiArgValue::Pointer(addr)); | ||
| } | ||
| return convert_to_pointer(&converted, vm); | ||
| } | ||
|
|
||
| // For structure types, convert to pointer to structure | ||
| if self.fast_issubclass(PyCStructure::static_type()) { | ||
| return convert_to_pointer(&converted, vm); | ||
| } | ||
|
|
||
| // Get the type code for this argument type | ||
| let type_code = self | ||
| .as_object() | ||
| .get_attr(vm.ctx.intern_str("_type_"), vm) | ||
| .ok() | ||
| .and_then(|t| t.downcast_ref::<PyStr>().map(|s| s.to_string())); | ||
|
|
||
| // For pointer types (c_void_p, c_char_p, c_wchar_p), handle as pointer | ||
| if matches!(type_code.as_deref(), Some("P") | Some("z") | Some("Z")) { | ||
| return convert_to_pointer(&converted, vm); | ||
| } | ||
|
|
||
| // PyCSimple (already a ctypes instance from from_param) | ||
| if let Ok(simple) = converted.clone().downcast::<PyCSimple>() { | ||
| let typ = ArgumentType::to_ffi_type(self, vm)?; | ||
| let arg = simple | ||
| .to_arg(typ, vm) | ||
| .ok_or(vm.new_type_error("Unsupported argument type".to_string()))?; | ||
| return Ok(arg); | ||
| let ffi_value = simple | ||
| .to_ffi_value(typ, vm) | ||
| .ok_or(vm.new_type_error("Unsupported argument type"))?; | ||
| return Ok(ffi_value); | ||
| } | ||
| Err(vm.new_type_error("Unsupported argument type".to_string())) | ||
|
|
||
| Err(vm.new_type_error("Unsupported argument type")) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArgumentType::convert_object may drop from_param temporaries before the FFI call
convert_object calls the type’s from_param, uses the returned converted value to build an FfiArgValue, and then returns only the FfiArgValue. The converted object itself is not kept anywhere:
let converted = from_param.call((value.clone(),), vm)?;
// ...
return Ok(FfiArgValue::Pointer(addr)); // or from PyCSimple / convert_to_pointerIf a custom from_param allocates a fresh ctypes object (or bytes/str buffer) that isn’t otherwise referenced from the original argument, that temporary can be dropped as soon as convert_object returns. The FFI call will still use its raw pointer, risking use‑after‑free.
CPython avoids this by keeping the converted objects alive for the duration of the call (e.g. via a converters/keepalive list).
You should keep converted objects alive until after ctypes_callproc returns — for example, by:
- Extending the API so
convert_objectalso returns an optional keepalivePyObjectRef, and - Storing all such keepalives in the
PreparedArgsor on thePyCFuncPtrinstance for the duration of the call.
| Some("Z") => { | ||
| // 1. str → create c_wchar_p | ||
| if value.downcast_ref::<PyStr>().is_some() { | ||
| return create_simple_with_value("Z", &value); | ||
| } | ||
| // 2. Array/Pointer with c_wchar element type | ||
| if is_cwchar_array_or_pointer(&value, vm) { | ||
| return Ok(value); | ||
| } | ||
| // 3. CArgObject (byref(c_wchar(...))) | ||
| if let Some(carg) = value.downcast_ref::<CArgObject>() | ||
| && carg.tag == b'u' | ||
| { | ||
| return Ok(value.clone()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, examine the from_param "Z" case (lines 302-316 plus context)
sed -n '295,325p' crates/vm/src/stdlib/ctypes/simple.rs | cat -nRepository: RustPython/RustPython
Length of output: 1646
🏁 Script executed:
# Examine the value_to_bytes_endian "Z" case (lines 900-908 plus context)
sed -n '895,920p' crates/vm/src/stdlib/ctypes/simple.rs | cat -nRepository: RustPython/RustPython
Length of output: 1212
🏁 Script executed:
# Examine the "z" PyBytes handling case (lines 277-300)
sed -n '270,310p' crates/vm/src/stdlib/ctypes/simple.rs | cat -nRepository: RustPython/RustPython
Length of output: 2089
🏁 Script executed:
# Check if create_simple_with_value calls value_to_bytes_endian
sed -n '215,240p' crates/vm/src/stdlib/ctypes/simple.rs | cat -nRepository: RustPython/RustPython
Length of output: 1435
🏁 Script executed:
# Search for any special PyStr handling in the file
rg -n 'PyStr.*"Z"|"Z".*PyStr|make_wchar_buffer' crates/vm/src/stdlib/ctypes/simple.rsRepository: RustPython/RustPython
Length of output: 152
🏁 Script executed:
# Check recent commits that touched this file to see if this was supposedly "addressed"
git log --oneline -20 -- crates/vm/src/stdlib/ctypes/simple.rsRepository: RustPython/RustPython
Length of output: 83
🏁 Script executed:
# Search for commits around the hashes mentioned in the review (1707861 to 2bf7dac)
git log --oneline --all | grep -E '1707861|2bf7dac'Repository: RustPython/RustPython
Length of output: 83
from_param for c_wchar_p ("Z") produces NULL pointers for PyStr values
When from_param receives a PyStr for type code "Z" (c_wchar_p), it calls create_simple_with_value("Z", &value) (line 311). This function calls value_to_bytes_endian("Z", val, false, vm) (line 223), which only handles integer addresses via try_index(vm) and returns a zero-filled buffer (NULL pointer) for PyStr:
"Z" => {
// PyStr case is handled in slot_new/set_value with make_wchar_buffer()
if let Ok(int_val) = value.try_index(vm) {
let v = int_val.as_bigint().to_usize().unwrap_or(0);
return to_bytes!(v);
}
vec![0; std::mem::size_of::<usize>()] // ← returns NULL for PyStr
}The comment about slot_new/set_value is misleading: from_param bypasses those code paths. In contrast, the "z" case (c_char_p) at lines 278-287 has direct special handling to extract the actual bytes pointer. The "Z" case should have equivalent handling to create a wchar buffer pointer, not return NULL.
🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/simple.rs around lines 302-316: from_param for "Z"
currently routes PyStr through create_simple_with_value which ends up returning
a zeroed (NULL) pointer because value_to_bytes_endian only handles integer
addresses; change from_param to special-case PyStr like the "z" (c_char_p)
branch: when value is a PyStr, allocate/produce a wchar buffer (using the
existing make_wchar_buffer logic or equivalent conversion to wide chars), store
its pointer in the argument representation (or return the proper
CArgObject/array/pointer wrapper) and return that instead of falling back to
create_simple_with_value; keep the existing checks for cwchar arrays/pointers
and CArgObject intact.
2bf7dac to
5ebd684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/vm/src/stdlib/ctypes/library.rs (1)
5-43: Check the cache before callingSharedLibrary::newinget_or_insert_libLine 71 eagerly calls
SharedLibrary::new(library_path)?(which invokesunsafe { Library::new(..) }) before checking if a library with the same path is already cached. This means:
- Every call triggers dlopen and library initialization routines, even when an equivalent library exists in the cache
- While modern Unix systems return the same handle for repeated dlopen calls on the same path and only run init once, this is not guaranteed across all platforms or if different path representations (relative vs. absolute, different symlink resolutions) are passed
- If the path provided differs even when the absolute path is the same, different file handles are returned, potentially triggering initialization multiple times
Refactor
get_or_insert_libto check the cache by library path before constructing a newSharedLibrary. You can check if a path is already loaded (usingdlopenwithRTLD_NOLOADflag or by maintaining a path-based lookup) before callingSharedLibrary::new.crates/vm/src/builtins/builtin_func.rs (1)
117-121:__self__forbuiltin_function_or_methodalways returnsNoneinstead of the bound objectCPython exposes
__self__for built-in methods as either the bound object orNone. The current implementation at lines 117-120 always returnsNone, discarding the bound object information already maintained byPyNativeFunction.zelf.The
PyNativeFunctionstruct stores the bound object in itszelffield and aget_self()method already exists to access it. However, the__self__getter ignores this:#[pygetset] fn __self__(_zelf: PyObjectRef, vm: &VirtualMachine) -> PyObjectRef { vm.ctx.none() }By contrast,
PyNativeMethod.__self__(lines 184-187) correctly returns the bound object viazelf.func.zelf.clone().Align
PyNativeFunction.__self__with the existing pattern by routing it throughNativeFunctionOrMethodandget_self():#[pygetset] fn __self__(zelf: NativeFunctionOrMethod, vm: &VirtualMachine) -> PyObjectRef { match zelf.0.get_self() { Some(obj) => obj.clone().into(), None => vm.ctx.none(), } }This ensures both
builtin_function_or_methodandbuiltin_methodexpose the correct bound object to introspection tools.crates/vm/src/stdlib/ctypes.rs (1)
704-714: Public_check_HRESULTusestodo!()and will panic at runtimeOn Windows, calling
_ctypes._check_HRESULTwill unconditionally hittodo!(), aborting the VM instead of raising a Python exception. This should at least raise a proper error or delegate to the fully implementedcheck_hresultinfunction.rs.Suggested minimal fix
- fn check_hresult(_self: PyObjectRef, hr: i32, _vm: &VirtualMachine) -> PyResult<i32> { - // TODO: fixme - if hr < 0 { - // vm.ctx.new_windows_error(hr) - todo!(); - } else { - Ok(hr) - } - } + fn check_hresult(_self: PyObjectRef, hr: i32, vm: &VirtualMachine) -> PyResult<i32> { + if hr < 0 { + return Err(vm.new_os_error(format!("HRESULT: 0x{:08X}", hr as u32))); + } + Ok(hr) + }
♻️ Duplicate comments (13)
crates/vm/src/stdlib/ctypes/pointer.rs (1)
528-596: Negative pointer indices cause pathologicalkeep_refbehaviorThis issue was flagged in a previous review and remains unaddressed. Lines 580, 588, and 595 cast
index(anisize) tousizeforkeep_ref. Whenindexis negative (valid for pointer arithmetic), it wraps to a huge value (e.g.,-1→usize::MAX), causing issues in the internal_objectstracking.🔎 Suggested fix
Either reject negative indices for writes, or use a normalized key for reference tracking:
fn setitem_by_index( zelf: &Py<Self>, index: isize, value: PyObjectRef, vm: &VirtualMachine, ) -> PyResult<()> { + // Use a stable key for keep_ref (negative indices are valid for pointer arithmetic + // but problematic for reference tracking) + let keep_ref_key = index.unsigned_abs(); + // ... existing code ... - return zelf.0.keep_ref(index as usize, converted, vm); + return zelf.0.keep_ref(keep_ref_key, converted, vm); // ... and similarly for other keep_ref calls ...crates/vm/src/stdlib/ctypes.rs (2)
716-762:addressof/byrefsemantics align with CPython and earlier review feedback
addressofnow consistently returns the underlyingPyCDatabuffer address (including pointer types), andbyrefcomputes the pointer asbuffer_ptr + offsetfor bothPyCSimpleandPyCData. This matches CPython’s semantics and fixes the earlierPyCPointerspecial-case bug.
1093-1105:PyObj_FromPtrexposes unsafe arbitrary pointer →PyObjectRefconversion to Python
PyObj_FromPtrtakes an arbitraryusize, does only a NULL check, and then constructs aPyObjectRefviafrom_raw(NonNull::new_unchecked(raw_ptr)). Any Python code can pass an invalid address here, leading to:
- Undefined behavior inside Rust (invalid
NonNull),- Use-after-free on stale pointers,
- Silent memory corruption or crashes.
This is much less controlled than normal ctypes FFI (which at least goes through libffi and explicit C signatures) and breaks the VM’s safety guarantees.
If you want to keep a CPython-like escape hatch, it should not be directly exposed as a normal
_ctypesfunction:
- Either remove this pyfunction, or
- Gate it behind a debug / internal feature (
cfg(feature = "internal_unsafe_api")) and mark itunsafeon the Rust side with clear documentation for internal callers, or- At minimum, hide it from Python-level code and only use it in tightly controlled internals.
Suggested safe-facing change
- #[pyfunction(name = "PyObj_FromPtr")] - fn py_obj_from_ptr(ptr: usize, vm: &VirtualMachine) -> PyResult { + // Do not expose directly to Python; keep as internal unsafe helper. + pub(crate) unsafe fn py_obj_from_ptr_unsafe(ptr: usize, vm: &VirtualMachine) -> PyResult { if ptr == 0 { return Err(vm.new_value_error("NULL pointer access")); } let raw_ptr = ptr as *mut crate::object::PyObject; - unsafe { - let obj = crate::PyObjectRef::from_raw(std::ptr::NonNull::new_unchecked(raw_ptr)); - let obj = std::mem::ManuallyDrop::new(obj); - Ok((*obj).clone()) - } + let obj = crate::PyObjectRef::from_raw( + std::ptr::NonNull::new(raw_ptr) + .ok_or_else(|| vm.new_value_error("invalid object pointer"))?, + ); + let obj = std::mem::ManuallyDrop::new(obj); + Ok((*obj).clone()) }and drop the
#[pyfunction]export for now.crates/vm/src/stdlib/ctypes/union.rs (2)
12-29: Union size calculation now fails fast on malformed_fields_
calculate_union_sizenow usestry_to_value(vm)?and only falls back toOk(0)when_fields_is missing, not when it’s malformed. That fixes the previous “silent zero-sized union” behavior while still allowing empty unions.
553-577: Union buffer descriptor now uses 0-D scalar layout, matchingStructure
PyCUnion::as_buffernow constructs aBufferDescriptorwithdim_desc: vec![]anditemsize = buffer_len, using theStgInfoformat when available. That brings unions in line with structures as scalar (0-D) ctypes values and addresses the earlier inconsistency.crates/vm/src/stdlib/ctypes/simple.rs (3)
59-179: Primitive coercion logic for simple ctypes is much closer to CPython now
set_primitive’s handling of:
"z"/"Z"(acceptingNone,int, and the appropriatebytes/str),- numeric integer and float codes via
try_index/try_float,"?"and"v"viatry_to_bool,mirrors CPython’s ctypes expectations and fixes the earlier regressions around
c_char_p(None)/c_wchar_p(None)and VARIANT_BOOL.
268-413:from_paramfor pointer-like simple types (z/Z/P) correctly builds owned buffersThe new
from_paramlogic:
- For
"z": usesensure_z_null_terminatedandFfiArgValue::OwnedPointerto keep the bytes alive.- For
"Z": usesstr_to_wchar_byteswith an owned holder.- For
"P": supports ints, bytes, str, ctypes arrays/pointers,CArgObject,PyCFuncPtr, andc_char_p/c_wchar_pinstances.This addresses earlier issues where
from_paramcould produce NULL pointers forbytes/strand improves lifetime guarantees viaOwnedPointer.
675-734:c_wcharserialization now respects platformwchar_tsizeThe
"u"arm ofvalue_to_bytes_endiannow usesWCHAR_SIZEandwchar_to_bytes, rather than hardcoding 4 bytes. This alignsc_wcharwithWCHAR_SIZEused elsewhere (e.g., arrays) and fixes the previous platform mismatch on Windows vs Unix.crates/vm/src/stdlib/ctypes/function.rs (5)
33-77: Centralizing FFI type lookup throughtype_infois a solid improvement
get_ffi_typeand the updatedffi_type_from_tag:
- Reuse the shared
type_infomapping,- Handle
c_long/c_ulongviasize_of::<libc::c_long/ulong>(),- Map
u(c_wchar) usingsize_of::<super::WideChar>()to chooseu16vsu32.That keeps libffi type selection consistent across the module.
247-303: Remaining caveat: customfrom_paramimplementations still need to manage lifetimes carefully
ArgumentType::convert_objectnow correctly:
- Uses
CArgObject.valuedirectly whenfrom_paramreturns aCArgObject(includingOwnedPointerkeepalives),- Treats
Noneas NULL,- Delegates pointer-like types to
convert_to_pointer,- Extracts primitive values from
PyCSimpleviato_ffi_value.For builtin ctypes types this is enough to keep buffers alive across the call, but user-defined
from_paramimplementations that don’t return aCArgObjector a ctypes instance may still accidentally create temporaries whose buffers are dropped before the FFI call. That’s essentially the same concern as the earlier review.Consider documenting that third‑party
from_paramshould either:
- Return a
CArgObjectwith appropriatevalue/obj/keepalive semantics, or- Return a ctypes instance whose backing memory is kept alive elsewhere.
785-830: COM-style constructor branch still creates unusableCFuncPtron non-WindowsThe
(index, name, [paramflags], [iid])constructor path is compiled on all platforms, but:
- On Windows it’s fine:
index/iidare stored andresolve_com_methodsupplies the actual function pointer at call time.- On non-Windows, this branch still runs, but
index/iiddon’t exist,_base.bufferis all zeros, andcallends up building aCodePtrfrom address 0 and calling through libffi.That revives the earlier issue: COM-style construction on non-Windows builds a broken object which later calls a NULL function pointer.
You should either gate this branch behind
#[cfg(windows)]or reject it explicitly on non-Windows.Suggested fix
- if args.args.len() >= 2 - && first_arg.try_int(vm).is_ok() - && args.args[1].downcast_ref::<PyStr>().is_some() - { + #[cfg(windows)] + if args.args.len() >= 2 + && first_arg.try_int(vm).is_ok() + && args.args[1].downcast_ref::<PyStr>().is_some() + { // existing Windows COM constructor logic... // (index, iid, paramflags) return PyCFuncPtr { /* unchanged */ } .into_ref_with_type(vm, cls) .map(Into::into); } + + #[cfg(not(windows))] + if args.args.len() >= 2 + && first_arg.try_int(vm).is_ok() + && args.args[1].downcast_ref::<PyStr>().is_some() + { + return Err(vm.new_type_error( + "COM-style CFuncPtr construction is only supported on Windows", + )); + }
1510-1564: OUT parameters still discard the function’s own return valueWhen
prepared.out_buffersis non-empty,build_resultignores the function’s (potentiallyerrcheck-processed) return value and returns only the OUT/INOUT arguments. ctypes normally returns a tuple(result, *outs)in that situation.This matches an earlier review concern and is still present.
You likely want:
result_obj = result.unwrap_or_else(|| vm.ctx.none())as the first element, followed by extracted out values.Suggested fix
- if prepared.out_buffers.is_empty() { - return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none())); - } - - let out_values = extract_out_values(prepared.out_buffers, vm); - Ok(match <[PyObjectRef; 1]>::try_from(out_values) { - Ok([single]) => single, - Err(v) => PyTuple::new_ref(v, &vm.ctx).into(), - }) + if prepared.out_buffers.is_empty() { + return result.map(Ok).unwrap_or_else(|| Ok(vm.ctx.none())); + } + + let mut values = Vec::new(); + // First element: function result (after errcheck), or None + values.push(result.unwrap_or_else(|| vm.ctx.none())); + // Followed by OUT / INOUT values + values.extend(extract_out_values(prepared.out_buffers, vm)); + + Ok(match <[PyObjectRef; 1]>::try_from(values) { + Ok([single]) => single, + Err(v) => PyTuple::new_ref(v, &vm.ctx).into(), + })
1589-1597: Calling aCFuncPtrwith a NULL function pointer still calls through address 0The new
code_ptrresolution replaces the previousexpectpanic with:let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) { Some(cp) => cp, None => { debug_assert!(false, "NULL function pointer"); CodePtr(std::ptr::null_mut()) } };In release builds this will happily call libffi with a null
CodePtr, which is undefined behaviour and likely to crash the process. CPython’s ctypes is “unsafe”, but in Rust this crosses into UB territory.A safer choice is to raise a Python
ValueError("NULL function pointer")instead of constructing a nullCodePtr.Suggested change
- let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) { - Some(cp) => cp, - None => { - debug_assert!(false, "NULL function pointer"); - // In release mode, this will crash like CPython - CodePtr(std::ptr::null_mut()) - } - }; + let code_ptr = match func_ptr.or_else(|| zelf.get_code_ptr()) { + Some(cp) => cp, + None => { + return Err(vm.new_value_error("NULL function pointer")); + } + };
🧹 Nitpick comments (5)
crates/vm/src/stdlib/ctypes/structure.rs (1)
13-29:calculate_struct_sizenow surfaces errors instead of silently defaultingSwitching from
unwrap_or_default()tofields_attr.try_to_value(vm)?is a good change: invalid_fields_contents will now raise instead of quietly treating the structure as size 0, which makes debugging much easier.If you ever touch this again, you might also consider guarding against
total_sizeusizeoverflow when summing many large field sizes, but that’s a very edge‑case scenario.crates/vm/src/stdlib/ctypes/pointer.rs (1)
664-682: Silent value truncation on integer overflowUsing
unwrap_or(0)silently truncates values that don't fit in the target type. For example, writing256to ac_charresults in0being written instead of raising an error.🔎 Suggested fix with proper error handling
match size { 1 => { - *ptr = i.to_u8().unwrap_or(0); + *ptr = i.to_u8().ok_or_else(|| { + vm.new_overflow_error("value out of range for c_char") + })?; } 2 => { - *(ptr as *mut i16) = i.to_i16().unwrap_or(0); + std::ptr::write_unaligned( + ptr as *mut i16, + i.to_i16().ok_or_else(|| { + vm.new_overflow_error("value out of range for c_short") + })?, + ); } // ... similar for other sizes }crates/vm/src/stdlib/ctypes.rs (2)
388-394: DeriveSIZEOF_TIME_Tfromlibc::time_tinstead of hardcodingHardcoding
SIZEOF_TIME_Tto 8 on Windows and 4 elsewhere can be wrong on some Unix targets and diverges from CPython’s “reflect actual C type” behavior. Prefer computing this once fromlibc::time_t.Suggested change
- #[pyattr] - const SIZEOF_TIME_T: usize = 8; + #[pyattr] + const SIZEOF_TIME_T: usize = std::mem::size_of::<libc::time_t>(); @@ - #[pyattr] - const SIZEOF_TIME_T: usize = 4; + #[pyattr] + const SIZEOF_TIME_T: usize = std::mem::size_of::<libc::time_t>();
1107-1117:Py_INCREF/Py_DECREFare no-ops – document or restrict usageThe exported
Py_INCREF/Py_DECREFfunctions currently just return the object unchanged and don’t affect refcounts. That’s probably acceptable for now, but:
- It’s observably different from CPython,
- And misleading for anyone expecting real refcount semantics.
Consider either documenting them as stubs in
_ctypes(e.g., in the docstring) or gating them behind a feature until a correct implementation is provided.crates/vm/src/stdlib/ctypes/simple.rs (1)
957-977: Avoid panicking on missingStgInfoinis_cwchar_array_or_pointer
is_cwchar_array_or_pointerusesexpect("array has StgInfo")/expect("pointer has StgInfo"). If someone constructs an incomplete array/pointer type or a misconfigured ctypes type withoutStgInfo, this will panic the VM instead of raising a Python exception.You can preserve the fast path but turn misconfigurations into
TypeError:Suggested change
- if let Some(arr) = value.downcast_ref::<PyCArray>() { - let info = arr.class().stg_info_opt().expect("array has StgInfo"); - let elem_type = info.element_type.as_ref().expect("array has element_type"); + if let Some(arr) = value.downcast_ref::<PyCArray>() { + let info = match arr.class().stg_info_opt() { + Some(info) => info, + None => return false, + }; + let Some(elem_type) = info.element_type.as_ref() else { + return false; + }; @@ - if let Some(ptr) = value.downcast_ref::<PyCPointer>() { - let info = ptr.class().stg_info_opt().expect("pointer has StgInfo"); + if let Some(ptr) = value.downcast_ref::<PyCPointer>() { + let Some(info) = ptr.class().stg_info_opt() else { + return false; + };
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
.cspell.dict/python-more.txt(2 hunks).cspell.json(3 hunks)crates/stdlib/src/pystruct.rs(1 hunks)crates/vm/src/buffer.rs(1 hunks)crates/vm/src/builtins/builtin_func.rs(2 hunks)crates/vm/src/builtins/function.rs(1 hunks)crates/vm/src/builtins/object.rs(1 hunks)crates/vm/src/builtins/str.rs(2 hunks)crates/vm/src/builtins/type.rs(1 hunks)crates/vm/src/exceptions.rs(4 hunks)crates/vm/src/object/core.rs(1 hunks)crates/vm/src/protocol/buffer.rs(1 hunks)crates/vm/src/stdlib/ast/python.rs(1 hunks)crates/vm/src/stdlib/codecs.rs(2 hunks)crates/vm/src/stdlib/ctypes.rs(14 hunks)crates/vm/src/stdlib/ctypes/array.rs(5 hunks)crates/vm/src/stdlib/ctypes/function.rs(7 hunks)crates/vm/src/stdlib/ctypes/library.rs(5 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(3 hunks)crates/vm/src/stdlib/ctypes/simple.rs(1 hunks)crates/vm/src/stdlib/ctypes/structure.rs(10 hunks)crates/vm/src/stdlib/ctypes/thunk.rs(0 hunks)crates/vm/src/stdlib/ctypes/union.rs(4 hunks)crates/vm/src/stdlib/ctypes/util.rs(0 hunks)crates/vm/src/stdlib/functools.rs(1 hunks)crates/vm/src/stdlib/operator.rs(1 hunks)crates/vm/src/types/structseq.rs(2 hunks)
💤 Files with no reviewable changes (2)
- crates/vm/src/stdlib/ctypes/util.rs
- crates/vm/src/stdlib/ctypes/thunk.rs
🚧 Files skipped from review as they are similar to previous changes (9)
- crates/stdlib/src/pystruct.rs
- crates/vm/src/builtins/function.rs
- crates/vm/src/types/structseq.rs
- crates/vm/src/buffer.rs
- crates/vm/src/object/core.rs
- crates/vm/src/stdlib/codecs.rs
- crates/vm/src/stdlib/functools.rs
- crates/vm/src/exceptions.rs
- .cspell.dict/python-more.txt
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/protocol/buffer.rscrates/vm/src/builtins/str.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/type.rscrates/vm/src/stdlib/operator.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/ctypes/library.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/builtins/builtin_func.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/function.rs
🧠 Learnings (6)
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
.cspell.json
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/function.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/simple.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
Applied to files:
crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the `Lib/` directory; modifications should be minimal and only to work around RustPython limitations
Applied to files:
crates/vm/src/stdlib/ctypes/library.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
crates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/function.rs
🧬 Code graph analysis (3)
crates/vm/src/builtins/str.rs (3)
crates/vm/src/builtins/bytearray.rs (1)
isascii(271-273)crates/vm/src/builtins/bytes.rs (1)
isascii(262-264)crates/vm/src/bytes_inner.rs (1)
isascii(326-328)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (22)
obj(1097-1097)obj(1105-1105)obj(1125-1125)obj(1137-1137)obj(1150-1150)obj(1161-1161)value(300-300)array_type_from_ctype(16-98)b(546-546)b(842-842)std(600-600)std(625-625)std(1235-1235)__getitem__(864-879)item(866-866)item(874-874)item(981-981)item(989-989)t(67-67)t(260-260)t(379-379)__setitem__(971-994)crates/vm/src/stdlib/ctypes/base.rs (39)
obj(768-768)obj(792-792)obj(814-814)obj(829-829)obj(1581-1581)obj(1583-1583)obj(1692-1693)obj(1720-1721)obj(1741-1742)obj(1766-1766)new(149-165)new(1237-1245)value(851-852)value(860-860)value(860-860)value(882-882)value(896-896)value(911-911)value(921-921)value(1352-1352)value(1360-1360)value(1368-1368)value(1417-1417)value(1432-1432)value(1446-1446)value(1460-1460)value(1485-1485)type_ref(2151-2151)std(365-365)std(366-366)std(2140-2140)std(2223-2223)buffer(347-347)buffer(354-354)offset(1657-1659)type_code(1537-1537)type_code(1554-1554)size(669-671)size(1662-1664)
crates/vm/src/stdlib/ctypes/function.rs (1)
crates/vm/src/stdlib/ctypes.rs (7)
type_info(143-237)std(541-541)std(545-545)std(556-556)t(55-55)pointer(688-697)src(1209-1209)
🪛 Biome (2.1.2)
.cspell.json
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 78-78: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 110-110: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
[error] 134-134: End of file expected
Use an array for a sequence of values: [1, 2]
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run tests under miri
- GitHub Check: Check Rust code with clippy
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (18)
.cspell.json (1)
78-78: LGTM! Vocabulary additions are appropriate.The three new terms added to the spell-check dictionary are legitimate technical terms for the ctypes implementation:
oparg: operation argument (bytecode terminology)pytype: Python type terminologysighandler: signal handler terminologyNote: The Biome parse errors on these lines are false positives. Biome is attempting to parse this as standard JSON, but
.cspell.jsonuses an extended JSON format that supports trailing commas and comments. The file is valid cspell configuration.Also applies to: 110-110, 134-134
crates/vm/src/stdlib/ast/python.rs (1)
50-53: LGTM! Improved naming clarity.The refactoring from
fieldswith#[pyattr(name = "_fields")]to_fieldswith#[pyattr]aligns the Rust function name with the Python attribute name. This improves code maintainability by eliminating the explicit name override while preserving the Python API.crates/vm/src/protocol/buffer.rs (1)
203-218: LGTM! Correct implementation of scalar type support.The conditional validation logic properly handles both scalar (ndim=0) and non-scalar cases:
- Scalar case correctly asserts
itemsize == lensince a dimensionless buffer's length equals its item size- Non-scalar case maintains the existing validation:
shape_product * itemsize == lenwith proper checks forsuboffset >= 0andstride != 0This change is essential for ctypes Structure support and aligns with how
for_each_segment(line 277) andzip_eq(line 316) already handle scalar buffers.crates/vm/src/stdlib/operator.rs (1)
318-341: ASCII check rename toisasciiis consistent withPyStrAPIThe
_compare_digestpath still correctly rejects non-ASCII strings; this just follows the newPyStr::isasciinaming and looks good.crates/vm/src/builtins/type.rs (1)
1446-1452:__mro__getter matches expected semanticsExposing
__mro__viamro_map_collectis straightforward and consistent with CPython; returning a newPyTupleper access is fine here.crates/vm/src/builtins/object.rs (1)
453-488:__class__getter rename is behavior‑preservingExposing
__class__directly instead of via a named getter still returnsobj.class().to_owned()and keeps the existing setter semantics, so this is a clean naming/annotation tidy‑up.crates/vm/src/builtins/str.rs (1)
628-632:isascii/__format__binding cleanups are consistentRenaming the internal method to
isasciiand relying on#[pymethod]’s default export keeps the Python API (str.isascii) intact and matches the bytes/bytearray helpers. Likewise, dropping the explicitname = "__format__"while keeping the Rust function named__format__doesn’t change behavior.Also applies to: 963-984
crates/vm/src/stdlib/ctypes/array.rs (2)
1-132: Array type creation andPyCArrayTypeinitialization look soundThe new
array_type_from_ctypeandPyCArrayType::Initializerwiring are a big improvement:
- You now cache array types in
_array_type_cachekeyed by(itemtype, length), matching CPython’s strategy and avoiding type churn.element_size * lengthis checked withchecked_mul(and mirrored in theInitializerpath), which closes previous overflow holes for huge arrays._length_and_type_are validated and can be inherited from parentStgInfo, with appropriateValueError/OverflowErrorsemantics.- StgInfo for arrays is centralized via
StgInfo::new_array, and_type_/_length_attributes plus char/wchar getsets are consistently attached.This design should be much easier to reason about than ad‑hoc per‑class initialization.
Also applies to: 136-283, 285-357
963-1032: Wide‑char helpers and char/wchar array getsets are aligned and platform‑awareThe new wide‑char utilities and array getsets look good:
WCHAR_SIZEis derived fromlibc::wchar_t, and bothwchar_from_bytes/wchar_to_bytesandwstring_from_bytesconsistently branch on that size, covering UTF‑16 (Windows) and UTF‑32 (Unix‑like) layouts.- The
"Z"(c_wchar_p) read/write paths inread_element_from_buffer/write_element_to_buffernow useWCHAR_SIZEand assemble/disassemble code points accordingly, fixing the earlier hard‑coded 32‑bit assumption.char_array_*andwchar_array_*get/set pairs implementvalue/rawin a reasonably CPython‑like way, with length checks and null‑termination for char/wchar arrays; the dynamicadd_*_array_getsetshelpers correctly attach these descriptors to the generated array types.Overall this brings the wchar handling much closer to CPython, especially on Windows.
Also applies to: 1040-1091, 1095-1229, 1230-1281
crates/vm/src/stdlib/ctypes/structure.rs (2)
31-129: Struct metatype initialization and_fields_handling look correct and deadlock‑safeThe new
PyCStructTypeimplementation is a substantial cleanup:
slot_newmarks base classes as FINAL and seeds a defaultStgInfo, deferring real layout work toInitializer::init.- The
Initializerrespects_abstract_, supports inheritingStgInfofrom a base (clearing FINAL in the subclass, marking the base FINAL), and falls back to a sensible defaultStgInfowhen there’s no base.process_fieldscarefully computes offsets, padding, alignment, flags (TYPEFLAG_HASPOINTER,TYPEFLAG_HASUNION,TYPEFLAG_HASBITFIELD), and a PEP‑3118 style format string, while cloning base ffi field types; it also marks field types FINAL as they’re used, mirroring CPython’s semantics.SetAttrfor_fields_checks the FINAL flag in a separate scope so the read lock onStgInfois dropped before callingprocess_fields, which avoids RwLock self‑deadlocks; it also forbids deletion and mutation after finalization, as expected.Overall this matches CPython’s structure layout behavior quite closely and should be much easier to maintain.
Also applies to: 131-167, 169-393, 407-425, 427-491
493-507:PyCStructurewrapping and buffer exposure align with the newStgInfomodelThe
PyCStructurerefactor to a transparent wrapper overPyCDataintegrates nicely with the newStgInfo‑driven layout:
slot_newpullssize,align, andlengthfrom the type’sStgInfo, marks the type FINAL on first instantiation, and uses_fields_to collect field names for initializing from positional/keyword arguments.- The instance buffer is created via
PyCData::from_stg_info, so you don’t have to manually juggle buffer sizing or alignment here.- Field initialization delegates to normal attribute setting (
set_attr), letting theCFielddescriptors do the heavy lifting; you correctly guard against too many positional initializers.- AsBuffer uses
StgInfo.formatwhen available (or"B"otherwise) and exposes a 0‑D buffer withitemsize == struct_sizeandlen == buffer_len, which is consistent with a scalar struct view and CPython’s buffer semantics.This all looks coherent with the rest of the ctypes overhaul.
Also applies to: 511-589, 598-637
crates/vm/src/stdlib/ctypes/pointer.rs (3)
17-56: LGTM!The
Initializerimplementation forPyCPointerTypecorrectly sets up theStgInfowith appropriate size, alignment, flags, and format string. Theexpecton line 47 is safe since it's guarded byif let Some(ref proto).
60-119: LGTM!The
from_paramimplementation follows the correct precedence order matching CPython's behavior: None → same type instance → pointed-to type (wrapped with byref) → compatible array/pointer →_as_parameter_fallback.
222-234: LGTM!The
InitializerforPyCPointercorrectly handles optional initial contents, delegating toset_contentsonly when a non-None value is provided.crates/vm/src/stdlib/ctypes.rs (1)
136-237: FFI type mapping now matches platform C sizesThe
TypeInfo/type_infotable correctly derives FFI types frommem::size_offorc_long/c_ulongandWideChar, fixing the earlier LP64/LLP64 mismatches and ensuringc_wcharfollows the actual underlying size. This looks consistent withffi_type_from_taginfunction.rs.crates/vm/src/stdlib/ctypes/union.rs (1)
136-295: Union_fields_processing andStgInfopopulation look consistent with structure handlingThe new
process_fields:
- Validates
_fields_type (list/tuple of tuples, name isstr, type is ctypes type),- Derives size/align via
get_field_size/get_field_align, honours_pack_/_align_,- Propagates
HASPOINTER/HASBITFIELD, recordsffi_field_types, and marks field types FINAL,- Computes an aligned union size and sets
TYPEFLAG_HASUNIONand a PEP-3118"B"format.This matches CPython’s ctypes union semantics and integrates cleanly with the new
StgInfomodel.crates/vm/src/stdlib/ctypes/function.rs (2)
79-150: Pointer conversion helper (convert_to_pointer) aligns with ctypes’ ConvParam semanticsThe layered handling (CArgObject → byref buffers,
None→ NULL, array/structure/pointer/simple → appropriate buffer or stored pointer,bytes/int→ raw address,_as_parameter_recursion) matches CPython’s ConvParam logic and provides a single place to reason about pointer arguments.
1860-1898: Thunk callback plumbing and unraisable handling look solidThe
thunk_callbackimplementation:
- Converts C arguments to Python using
ffi_to_python,- Calls the Python callable inside the current VM,
- Sends any exceptions through
run_unraisablewith a helpful context message,- Writes the result back via
python_to_ffiusing the declared restype.The lifetime management via
ThunkData/ThunkUserData(boxed userdata + libffiClosure) andPyCThunk’sPyRwLock<Option<ThunkData>>keeps the closure and userdata alive for the lifetime of thePyCThunk. This overall structure looks correct.
| fn getitem_by_index(zelf: &Py<PyCArray>, i: isize, vm: &VirtualMachine) -> PyResult { | ||
| let stg = zelf.class().stg_info_opt(); | ||
| let length = stg.as_ref().map_or(0, |i| i.length) as isize; | ||
| let index = if i < 0 { length + i } else { i }; | ||
| if index < 0 || index >= length { | ||
| return Err(vm.new_index_error("array index out of range".to_owned())); | ||
| return Err(vm.new_index_error("invalid index")); | ||
| } | ||
| let index = index as usize; | ||
| let element_size = zelf.element_size.load(); | ||
| let element_size = stg.as_ref().map_or(0, |i| i.element_size); | ||
| let offset = index * element_size; | ||
| let buffer = zelf.cdata.read().buffer.clone(); | ||
| if offset + element_size <= buffer.len() { | ||
| let bytes = &buffer[offset..offset + element_size]; | ||
| Ok(Self::bytes_to_int(bytes, element_size, vm)) | ||
| let type_code = Self::get_element_type_code(zelf, vm); | ||
|
|
||
| // Get target buffer and offset (base's buffer if available, otherwise own) | ||
| let base_obj = zelf.0.base.read().clone(); | ||
| let (buffer_lock, final_offset) = if let Some(cdata) = base_obj | ||
| .as_ref() | ||
| .and_then(|b| b.downcast_ref::<super::PyCData>()) | ||
| { | ||
| (&cdata.buffer, zelf.0.base_offset.load() + offset) | ||
| } else { | ||
| Ok(vm.ctx.new_int(0).into()) | ||
| (&zelf.0.buffer, offset) | ||
| }; | ||
|
|
||
| let buffer = buffer_lock.read(); | ||
| Self::read_element_from_buffer( | ||
| &buffer, | ||
| final_offset, | ||
| element_size, | ||
| type_code.as_deref(), | ||
| vm, | ||
| ) | ||
| } | ||
|
|
||
| /// Helper to read an element value from a buffer at given offset | ||
| fn read_element_from_buffer( | ||
| buffer: &[u8], | ||
| offset: usize, | ||
| element_size: usize, | ||
| type_code: Option<&str>, | ||
| vm: &VirtualMachine, | ||
| ) -> PyResult { | ||
| match type_code { | ||
| Some("c") => { | ||
| // Return single byte as bytes | ||
| if offset < buffer.len() { | ||
| Ok(vm.ctx.new_bytes(vec![buffer[offset]]).into()) | ||
| } else { | ||
| Ok(vm.ctx.new_bytes(vec![0]).into()) | ||
| } | ||
| } | ||
| Some("u") => { | ||
| // Return single wchar as str | ||
| if let Some(code) = wchar_from_bytes(&buffer[offset..]) { | ||
| let s = char::from_u32(code) | ||
| .map(|c| c.to_string()) | ||
| .unwrap_or_default(); | ||
| Ok(vm.ctx.new_str(s).into()) | ||
| } else { | ||
| Ok(vm.ctx.new_str("").into()) | ||
| } | ||
| } | ||
| Some("z") => { | ||
| // c_char_p: pointer to bytes - dereference to get string | ||
| if offset + element_size > buffer.len() { | ||
| return Ok(vm.ctx.none()); | ||
| } | ||
| let ptr_bytes = &buffer[offset..offset + element_size]; | ||
| let ptr_val = usize::from_ne_bytes( | ||
| ptr_bytes | ||
| .try_into() | ||
| .unwrap_or([0; std::mem::size_of::<usize>()]), | ||
| ); | ||
| if ptr_val == 0 { | ||
| return Ok(vm.ctx.none()); | ||
| } | ||
| // Read null-terminated string from pointer address | ||
| unsafe { | ||
| let ptr = ptr_val as *const u8; | ||
| let mut len = 0; | ||
| while *ptr.add(len) != 0 { | ||
| len += 1; | ||
| } | ||
| let bytes = std::slice::from_raw_parts(ptr, len); | ||
| Ok(vm.ctx.new_bytes(bytes.to_vec()).into()) | ||
| } | ||
| } | ||
| Some("Z") => { | ||
| // c_wchar_p: pointer to wchar_t - dereference to get string | ||
| if offset + element_size > buffer.len() { | ||
| return Ok(vm.ctx.none()); | ||
| } | ||
| let ptr_bytes = &buffer[offset..offset + element_size]; | ||
| let ptr_val = usize::from_ne_bytes( | ||
| ptr_bytes | ||
| .try_into() | ||
| .unwrap_or([0; std::mem::size_of::<usize>()]), | ||
| ); | ||
| if ptr_val == 0 { | ||
| return Ok(vm.ctx.none()); | ||
| } | ||
| // Read null-terminated wide string using WCHAR_SIZE | ||
| unsafe { | ||
| let ptr = ptr_val as *const u8; | ||
| let mut chars = Vec::new(); | ||
| let mut pos = 0usize; | ||
| loop { | ||
| let code = if WCHAR_SIZE == 2 { | ||
| let bytes = std::slice::from_raw_parts(ptr.add(pos), 2); | ||
| u16::from_ne_bytes([bytes[0], bytes[1]]) as u32 | ||
| } else { | ||
| let bytes = std::slice::from_raw_parts(ptr.add(pos), 4); | ||
| u32::from_ne_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]) | ||
| }; | ||
| if code == 0 { | ||
| break; | ||
| } | ||
| if let Some(ch) = char::from_u32(code) { | ||
| chars.push(ch); | ||
| } | ||
| pos += WCHAR_SIZE; | ||
| } | ||
| let s: String = chars.into_iter().collect(); | ||
| Ok(vm.ctx.new_str(s).into()) | ||
| } | ||
| } | ||
| Some("f") => { | ||
| // c_float | ||
| if offset + 4 <= buffer.len() { | ||
| let bytes: [u8; 4] = buffer[offset..offset + 4].try_into().unwrap(); | ||
| let val = f32::from_ne_bytes(bytes); | ||
| Ok(vm.ctx.new_float(val as f64).into()) | ||
| } else { | ||
| Ok(vm.ctx.new_float(0.0).into()) | ||
| } | ||
| } | ||
| Some("d") | Some("g") => { | ||
| // c_double / c_longdouble - read f64 from first 8 bytes | ||
| if offset + 8 <= buffer.len() { | ||
| let bytes: [u8; 8] = buffer[offset..offset + 8].try_into().unwrap(); | ||
| let val = f64::from_ne_bytes(bytes); | ||
| Ok(vm.ctx.new_float(val).into()) | ||
| } else { | ||
| Ok(vm.ctx.new_float(0.0).into()) | ||
| } | ||
| } | ||
| _ => { | ||
| if offset + element_size <= buffer.len() { | ||
| let bytes = &buffer[offset..offset + element_size]; | ||
| Ok(Self::bytes_to_int(bytes, element_size, type_code, vm)) | ||
| } else { | ||
| Ok(vm.ctx.new_int(0).into()) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice reads for c_char / c_wchar arrays ignore base / base_offset
getitem_by_index and setitem_by_index correctly respect PyCData::base and base_offset: for arrays that are views into another object’s buffer (e.g. a c_char * n field inside a Structure, or arrays created via from-buffer APIs), element access is routed through the underlying base buffer.
In contrast, getitem_by_slice does:
- For
"c"(c_char arrays), it reads fromzelf.0.bufferonly, computing offsets asidx * element_size. - For
"u"(c_wchar arrays), it also reads fromzelf.0.bufferusingWCHAR_SIZE, again ignoringbaseandbase_offset.
So for array instances whose storage actually lives in a base CData, integer indexing will hit the correct underlying bytes, but slicing will instead read from the local buffer (which may be zero‑initialized or simply not tied to the base). That can lead to:
- Inconsistent results between
arr[i]andarr[i:j]for the same array. - Wrong behavior for
c_char * nandc_wchar * nfields on structures/unions or other base‑backed arrays.
The “other types” branch of getitem_by_slice already delegates to getitem_by_index, so it inherits the correct base handling; only the "c" and "u" cases bypass it.
I’d suggest:
- Mirroring the base/offset resolution from
getitem_by_indexin the"c"and"u"branches ofgetitem_by_slice(compute(buffer_lock, base_offset)once, then slice within that view), or - Reusing
read_element_from_buffervia per‑index iteration even for the contiguous cases, but with the resolvedfinal_offsetbased on base/base_offset.
Also applies to: 686-860, 881-967, 996-1032
🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/array.rs around lines 530-683: the slice-read
branches for type codes "c" and "u" ignore PyCData::base and base_offset,
causing slices to read from the local buffer instead of the underlying base
buffer; fix by reusing the same base/offset resolution logic used in
getitem_by_index (determine base_obj, select buffer_lock and compute final
base_offset = zelf.0.base_offset.load() + relative_offset when base exists,
otherwise use zelf.0.buffer and offset) and then read the slice data from that
resolved buffer/offset (either by calling read_element_from_buffer per element
with computed offsets or by slicing the resolved buffer at the computed start
and length); apply the same pattern to the other affected ranges (lines
~686-860, 881-967, 996-1032).
| /// Convert raw FFI result to Python object | ||
| fn convert_raw_result( | ||
| raw_result: &mut RawResult, | ||
| call_info: &CallInfo, | ||
| vm: &VirtualMachine, | ||
| ) -> Option<PyObjectRef> { | ||
| match raw_result { | ||
| RawResult::Void => None, | ||
| RawResult::Pointer(ptr) => { | ||
| // Get type code from restype to determine conversion method | ||
| let type_code = call_info | ||
| .restype_obj | ||
| .as_ref() | ||
| .and_then(|t| t.clone().downcast::<PyType>().ok()) | ||
| .and_then(|t| t.as_object().get_attr(vm.ctx.intern_str("_type_"), vm).ok()) | ||
| .and_then(|t| t.downcast_ref::<PyStr>().map(|s| s.to_string())); | ||
|
|
||
| match type_code.as_deref() { | ||
| Some("z") => { | ||
| // c_char_p: NULL -> None, otherwise read C string -> bytes | ||
| if *ptr == 0 { | ||
| Some(vm.ctx.none()) | ||
| } else { | ||
| let cstr = unsafe { std::ffi::CStr::from_ptr(*ptr as _) }; | ||
| Some(vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into()) | ||
| } | ||
| } | ||
| Some("Z") => { | ||
| // c_wchar_p: NULL -> None, otherwise read wide string -> str | ||
| if *ptr == 0 { | ||
| Some(vm.ctx.none()) | ||
| } else { | ||
| let wstr_ptr = *ptr as *const libc::wchar_t; | ||
| let mut len = 0; | ||
| unsafe { | ||
| while *wstr_ptr.add(len) != 0 { | ||
| len += 1; | ||
| } | ||
| } | ||
| let slice = unsafe { std::slice::from_raw_parts(wstr_ptr, len) }; | ||
| let s: String = slice | ||
| .iter() | ||
| .filter_map(|&c| char::from_u32(c as u32)) | ||
| .collect(); | ||
| Some(vm.ctx.new_str(s).into()) | ||
| } | ||
| } | ||
| _ => { | ||
| // c_void_p ("P") and other pointer types: NULL -> None, otherwise int | ||
| if *ptr == 0 { | ||
| Some(vm.ctx.none()) | ||
| } else { | ||
| Some(vm.ctx.new_int(*ptr).into()) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| RawResult::Value(val) => call_info | ||
| .restype_obj | ||
| .as_ref() | ||
| .and_then(|f| f.clone().downcast::<PyType>().ok()) | ||
| .map(|f| f.from_ffi_type(&mut output, vm).ok().flatten()) | ||
| .unwrap_or_else(|| Some(vm.ctx.new_int(output as i32).as_object().to_pyobject(vm))); | ||
| if let Some(return_type) = return_type { | ||
| Ok(return_type) | ||
| } else { | ||
| Ok(vm.ctx.none()) | ||
| .map(|f| { | ||
| f.from_ffi_type(val as *mut _ as *mut c_void, vm) | ||
| .ok() | ||
| .flatten() | ||
| }) | ||
| .unwrap_or_else(|| Some(vm.ctx.new_int(*val as usize).as_object().to_pyobject(vm))), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer restypes: wide-char decoding is still wrong on Windows
convert_raw_result now distinguishes "z" / "Z" / others, which is good, but the "Z" branch still does:
- Treat
libc::wchar_telements as full Unicode code points withchar::from_u32(c as u32). - On Windows,
wchar_tis 16-bit UTF‑16, so surrogate pairs are mishandled and dropped.
This is the same UTF‑16 vs UTF‑32 issue as in PyCSimple::value and will corrupt c_wchar_p restype returns containing non‑BMP characters.
You already have a correct Windows implementation in wstring_at_impl using Wtf8Buf::from_wide. That logic should be reused here.
Suggested change for the `"Z"` arm
- Some("Z") => {
- // c_wchar_p: NULL -> None, otherwise read wide string -> str
- if *ptr == 0 {
- Some(vm.ctx.none())
- } else {
- let wstr_ptr = *ptr as *const libc::wchar_t;
- let mut len = 0;
- unsafe {
- while *wstr_ptr.add(len) != 0 {
- len += 1;
- }
- }
- let slice = unsafe { std::slice::from_raw_parts(wstr_ptr, len) };
- let s: String = slice
- .iter()
- .filter_map(|&c| char::from_u32(c as u32))
- .collect();
- Some(vm.ctx.new_str(s).into())
- }
- }
+ Some("Z") => {
+ if *ptr == 0 {
+ Some(vm.ctx.none())
+ } else {
+ let wstr_ptr = *ptr as *const libc::wchar_t;
+ #[cfg(windows)]
+ {
+ use rustpython_common::wtf8::Wtf8Buf;
+ let mut len = 0;
+ unsafe {
+ while *wstr_ptr.add(len) != 0 {
+ len += 1;
+ }
+ let slice = std::slice::from_raw_parts(wstr_ptr, len);
+ let wide: Vec<u16> = slice.iter().map(|&c| c as u16).collect();
+ let wtf8 = Wtf8Buf::from_wide(&wide);
+ Some(vm.ctx.new_str(wtf8).into())
+ }
+ }
+ #[cfg(not(windows))]
+ {
+ let mut len = 0;
+ unsafe {
+ while *wstr_ptr.add(len) != 0 {
+ len += 1;
+ }
+ let slice = std::slice::from_raw_parts(wstr_ptr, len);
+ let s: String = slice
+ .iter()
+ .filter_map(|&c| char::from_u32(c as u32))
+ .collect();
+ Some(vm.ctx.new_str(s).into())
+ }
+ }
+ }
+ }🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/function.rs around lines 1440 to 1507: the "Z"
branch decodes libc::wchar_t as full Unicode code points which breaks on Windows
where wchar_t is UTF‑16; replace the manual char::from_u32 iteration with the
same logic used by wstring_at_impl (or call wstring_at_impl/Wtf8Buf::from_wide)
to correctly decode UTF‑16 surrogate pairs on Windows and UTF‑32 on Unix, i.e.
build the wide slice from ptr, convert it via Wtf8Buf::from_wide (or
platform-appropriate conversion) into a Rust String/WTF-8 and return
vm.ctx.new_str(...) ; ensure NULL handling, unsafe slice creation, and
feature-gated/conditional code so non-Windows keeps the existing UTF‑32
behavior.
| fn ffi_to_python(ty: &Py<PyType>, ptr: *const c_void, vm: &VirtualMachine) -> PyObjectRef { | ||
| let type_code = ty.type_code(vm); | ||
| let raw_value: PyObjectRef = unsafe { | ||
| match type_code.as_deref() { | ||
| Some("b") => vm.ctx.new_int(*(ptr as *const i8) as i32).into(), | ||
| Some("B") => vm.ctx.new_int(*(ptr as *const u8) as i32).into(), | ||
| Some("c") => vm.ctx.new_bytes(vec![*(ptr as *const u8)]).into(), | ||
| Some("h") => vm.ctx.new_int(*(ptr as *const i16) as i32).into(), | ||
| Some("H") => vm.ctx.new_int(*(ptr as *const u16) as i32).into(), | ||
| Some("i") => vm.ctx.new_int(*(ptr as *const i32)).into(), | ||
| Some("I") => vm.ctx.new_int(*(ptr as *const u32)).into(), | ||
| Some("l") => vm.ctx.new_int(*(ptr as *const libc::c_long)).into(), | ||
| Some("L") => vm.ctx.new_int(*(ptr as *const libc::c_ulong)).into(), | ||
| Some("q") => vm.ctx.new_int(*(ptr as *const libc::c_longlong)).into(), | ||
| Some("Q") => vm.ctx.new_int(*(ptr as *const libc::c_ulonglong)).into(), | ||
| Some("f") => vm.ctx.new_float(*(ptr as *const f32) as f64).into(), | ||
| Some("d") => vm.ctx.new_float(*(ptr as *const f64)).into(), | ||
| Some("z") => { | ||
| // c_char_p: C string pointer → Python bytes | ||
| let cstr_ptr = *(ptr as *const *const libc::c_char); | ||
| if cstr_ptr.is_null() { | ||
| vm.ctx.none() | ||
| } else { | ||
| let cstr = std::ffi::CStr::from_ptr(cstr_ptr); | ||
| vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into() | ||
| } | ||
| } | ||
| Some("Z") => { | ||
| // c_wchar_p: wchar_t* → Python str | ||
| let wstr_ptr = *(ptr as *const *const libc::wchar_t); | ||
| if wstr_ptr.is_null() { | ||
| vm.ctx.none() | ||
| } else { | ||
| let mut len = 0; | ||
| while *wstr_ptr.add(len) != 0 { | ||
| len += 1; | ||
| } | ||
| let slice = std::slice::from_raw_parts(wstr_ptr, len); | ||
| let s: String = slice | ||
| .iter() | ||
| .filter_map(|&c| char::from_u32(c as u32)) | ||
| .collect(); | ||
| vm.ctx.new_str(s).into() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wide-char callback arguments ("Z") share the same UTF‑16 decoding bug as other Z paths
The "Z" arm of ffi_to_python (used for callback argument conversion) also interprets libc::wchar_t as a full code point and uses char::from_u32, which is wrong on Windows where wchar_t is UTF‑16. This should be fixed in the same way as convert_raw_result / PyCSimple::value, ideally by reusing a shared helper that mirrors _ctypes.wstring_at_impl.
| fn ffi_to_python(ty: &Py<PyType>, ptr: *const c_void, vm: &VirtualMachine) -> PyObjectRef { | ||
| let type_code = ty.type_code(vm); | ||
| let raw_value: PyObjectRef = unsafe { | ||
| match type_code.as_deref() { | ||
| Some("b") => vm.ctx.new_int(*(ptr as *const i8) as i32).into(), | ||
| Some("B") => vm.ctx.new_int(*(ptr as *const u8) as i32).into(), | ||
| Some("c") => vm.ctx.new_bytes(vec![*(ptr as *const u8)]).into(), | ||
| Some("h") => vm.ctx.new_int(*(ptr as *const i16) as i32).into(), | ||
| Some("H") => vm.ctx.new_int(*(ptr as *const u16) as i32).into(), | ||
| Some("i") => vm.ctx.new_int(*(ptr as *const i32)).into(), | ||
| Some("I") => vm.ctx.new_int(*(ptr as *const u32)).into(), | ||
| Some("l") => vm.ctx.new_int(*(ptr as *const libc::c_long)).into(), | ||
| Some("L") => vm.ctx.new_int(*(ptr as *const libc::c_ulong)).into(), | ||
| Some("q") => vm.ctx.new_int(*(ptr as *const libc::c_longlong)).into(), | ||
| Some("Q") => vm.ctx.new_int(*(ptr as *const libc::c_ulonglong)).into(), | ||
| Some("f") => vm.ctx.new_float(*(ptr as *const f32) as f64).into(), | ||
| Some("d") => vm.ctx.new_float(*(ptr as *const f64)).into(), | ||
| Some("z") => { | ||
| // c_char_p: C string pointer → Python bytes | ||
| let cstr_ptr = *(ptr as *const *const libc::c_char); | ||
| if cstr_ptr.is_null() { | ||
| vm.ctx.none() | ||
| } else { | ||
| let cstr = std::ffi::CStr::from_ptr(cstr_ptr); | ||
| vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into() | ||
| } | ||
| } | ||
| Some("Z") => { | ||
| // c_wchar_p: wchar_t* → Python str | ||
| let wstr_ptr = *(ptr as *const *const libc::wchar_t); | ||
| if wstr_ptr.is_null() { | ||
| vm.ctx.none() | ||
| } else { | ||
| let mut len = 0; | ||
| while *wstr_ptr.add(len) != 0 { | ||
| len += 1; | ||
| } | ||
| let slice = std::slice::from_raw_parts(wstr_ptr, len); | ||
| let s: String = slice | ||
| .iter() | ||
| .filter_map(|&c| char::from_u32(c as u32)) | ||
| .collect(); | ||
| vm.ctx.new_str(s).into() | ||
| } | ||
| } | ||
| Some("P") => vm.ctx.new_int(*(ptr as *const usize)).into(), | ||
| Some("?") => vm.ctx.new_bool(*(ptr as *const u8) != 0).into(), | ||
| _ => return vm.ctx.none(), | ||
| } | ||
| }; | ||
|
|
||
| if !is_simple_subclass(ty, vm) { | ||
| return raw_value; | ||
| } | ||
| ty.as_object() | ||
| .call((raw_value.clone(),), vm) | ||
| .unwrap_or(raw_value) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callback result marshalling for c_char_p / c_wchar_p is too limited and can leave result undefined
In python_to_ffi, the "z" / "Z" / "P" branch only accepts values that pass try_int(vm) and writes them directly into the result slot as usize. For callbacks declared with restype c_char_p / c_wchar_p, CPython allows returning bytes / str and arranges for a temporary NULL-terminated buffer whose lifetime is tied to the call.
Current behaviour:
CFUNCTYPE(c_char_p)(py_callable)wherepy_callablereturnsbyteswill silently do nothing in the"z"arm (sincetry_intfails), leavingresultuninitialized.- Same for
c_wchar_pandstr.
At minimum this should reject unsupported return types with a clear Python error; ideally, it should mirror the constructor logic and:
- For
"z": acceptbytes/bytearray/Noneand allocate a null-terminated buffer viaensure_z_null_terminated. - For
"Z": acceptstr/Noneand allocate viastr_to_wchar_bytes. - Keep the holder alive for the duration of the callback (e.g., by storing it in the
ThunkUserDataor another keepalive attached to the thunk).
Given the complexity, even a conservative improvement that raises on non-int results for "z"/"Z" would be better than silently writing garbage.
| unsafe { | ||
| let ptr = addr as *const u8; | ||
| match type_code { | ||
| Some("c") => Ok(vm.ctx.new_bytes(vec![*ptr]).into()), | ||
| Some("b") => Ok(vm.ctx.new_int(*(ptr as *const i8) as i32).into()), | ||
| Some("B") => Ok(vm.ctx.new_int(*ptr as i32).into()), | ||
| Some("h") => Ok(vm.ctx.new_int(*(ptr as *const i16) as i32).into()), | ||
| Some("H") => Ok(vm.ctx.new_int(*(ptr as *const u16) as i32).into()), | ||
| Some("i") | Some("l") => Ok(vm.ctx.new_int(*(ptr as *const i32)).into()), | ||
| Some("I") | Some("L") => Ok(vm.ctx.new_int(*(ptr as *const u32)).into()), | ||
| Some("q") => Ok(vm.ctx.new_int(*(ptr as *const i64)).into()), | ||
| Some("Q") => Ok(vm.ctx.new_int(*(ptr as *const u64)).into()), | ||
| Some("f") => Ok(vm.ctx.new_float(*(ptr as *const f32) as f64).into()), | ||
| Some("d") | Some("g") => Ok(vm.ctx.new_float(*(ptr as *const f64)).into()), | ||
| Some("P") | Some("z") | Some("Z") => { | ||
| Ok(vm.ctx.new_int(*(ptr as *const usize)).into()) | ||
| } | ||
| _ => { | ||
| // Default: read as bytes | ||
| let bytes = std::slice::from_raw_parts(ptr, size).to_vec(); | ||
| Ok(vm.ctx.new_bytes(bytes).into()) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential undefined behavior from unaligned memory access
The pointer casts (e.g., ptr as *const i16, ptr as *const i32) assume the address is properly aligned for the target type. On architectures with strict alignment requirements (e.g., ARM), unaligned reads cause undefined behavior or hardware faults.
🔎 Suggested fix using read_unaligned
unsafe {
let ptr = addr as *const u8;
match type_code {
Some("c") => Ok(vm.ctx.new_bytes(vec![*ptr]).into()),
Some("b") => Ok(vm.ctx.new_int(*(ptr as *const i8) as i32).into()),
Some("B") => Ok(vm.ctx.new_int(*ptr as i32).into()),
- Some("h") => Ok(vm.ctx.new_int(*(ptr as *const i16) as i32).into()),
- Some("H") => Ok(vm.ctx.new_int(*(ptr as *const u16) as i32).into()),
- Some("i") | Some("l") => Ok(vm.ctx.new_int(*(ptr as *const i32)).into()),
- Some("I") | Some("L") => Ok(vm.ctx.new_int(*(ptr as *const u32)).into()),
- Some("q") => Ok(vm.ctx.new_int(*(ptr as *const i64)).into()),
- Some("Q") => Ok(vm.ctx.new_int(*(ptr as *const u64)).into()),
- Some("f") => Ok(vm.ctx.new_float(*(ptr as *const f32) as f64).into()),
- Some("d") | Some("g") => Ok(vm.ctx.new_float(*(ptr as *const f64)).into()),
- Some("P") | Some("z") | Some("Z") => {
- Ok(vm.ctx.new_int(*(ptr as *const usize)).into())
- }
+ Some("h") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const i16) as i32).into()),
+ Some("H") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const u16) as i32).into()),
+ Some("i") | Some("l") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const i32)).into()),
+ Some("I") | Some("L") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const u32)).into()),
+ Some("q") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const i64)).into()),
+ Some("Q") => Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const u64)).into()),
+ Some("f") => Ok(vm.ctx.new_float(std::ptr::read_unaligned(ptr as *const f32) as f64).into()),
+ Some("d") | Some("g") => Ok(vm.ctx.new_float(std::ptr::read_unaligned(ptr as *const f64)).into()),
+ Some("P") | Some("z") | Some("Z") => {
+ Ok(vm.ctx.new_int(std::ptr::read_unaligned(ptr as *const usize)).into())
+ }
// ...
}
}🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/pointer.rs around lines 605-628: the current code
casts the raw byte pointer to larger integer/float pointer types (e.g., *const
i16, *const i32) and dereferences them, which can cause undefined behavior on
architectures that require alignment; change those direct dereferences to use
std::ptr::read_unaligned on the appropriately cast *const T so the value is read
safely even when unaligned (for each type case cast ptr to *const T and call
ptr::read_unaligned(ptr as *const T) and then convert to the VM value as before;
do this for i8/u8/i16/u16/i32/u32/i64/u64/f32/f64/usize cases while leaving the
default bytes branch unchanged).
| let f = float_val.to_f64(); | ||
| match size { | ||
| 4 => { | ||
| *(ptr as *mut f32) = f as f32; | ||
| } | ||
| 8 => { | ||
| *(ptr as *mut f64) = f; | ||
| } | ||
| _ => {} | ||
| } | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unaligned writes for float values
Similar to the read case, writing floats via pointer cast assumes alignment. Use write_unaligned for safety.
🔎 Suggested fix
if let Ok(float_val) = value.try_float(vm) {
let f = float_val.to_f64();
match size {
4 => {
- *(ptr as *mut f32) = f as f32;
+ std::ptr::write_unaligned(ptr as *mut f32, f as f32);
}
8 => {
- *(ptr as *mut f64) = f;
+ std::ptr::write_unaligned(ptr as *mut f64, f);
}
_ => {}
}
return Ok(());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let f = float_val.to_f64(); | |
| match size { | |
| 4 => { | |
| *(ptr as *mut f32) = f as f32; | |
| } | |
| 8 => { | |
| *(ptr as *mut f64) = f; | |
| } | |
| _ => {} | |
| } | |
| return Ok(()); | |
| } | |
| let f = float_val.to_f64(); | |
| match size { | |
| 4 => { | |
| std::ptr::write_unaligned(ptr as *mut f32, f as f32); | |
| } | |
| 8 => { | |
| std::ptr::write_unaligned(ptr as *mut f64, f); | |
| } | |
| _ => {} | |
| } | |
| return Ok(()); | |
| } |
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/pointer.rs around lines 688 to 699, the code
performs potentially unaligned writes by dereferencing ptr as *mut f32/*mut f64
and assigning; replace those direct writes with core::ptr::write_unaligned to
avoid UB on unaligned addresses. Specifically, inside the match arms for size 4
and 8, cast ptr to *mut f32/*mut f64 and call write_unaligned with f as f32 / f
as f64 (inside the existing unsafe), keep the default branch and return Ok(())
logic unchanged.
| #[pygetset] | ||
| pub fn value(instance: PyObjectRef, vm: &VirtualMachine) -> PyResult<PyObjectRef> { | ||
| let zelf: &Py<Self> = instance | ||
| .downcast_ref() | ||
| .ok_or_else(|| vm.new_type_error("cannot get value of instance"))?; | ||
|
|
||
| // Get _type_ from class | ||
| let cls = zelf.class(); | ||
| let type_attr = cls | ||
| .as_object() | ||
| .get_attr("_type_", vm) | ||
| .map_err(|_| vm.new_type_error("no _type_ attribute"))?; | ||
| let type_code = type_attr.str(vm)?.to_string(); | ||
|
|
||
| // Special handling for c_char_p (z) and c_wchar_p (Z) | ||
| // z_get, Z_get - dereference pointer to get string | ||
| if type_code == "z" { | ||
| // c_char_p: read pointer from buffer, dereference to get bytes string | ||
| let buffer = zelf.0.buffer.read(); | ||
| let ptr = super::base::read_ptr_from_buffer(&buffer); | ||
| if ptr == 0 { | ||
| return Ok(vm.ctx.none()); | ||
| } | ||
| // Read null-terminated string at the address | ||
| unsafe { | ||
| let cstr = std::ffi::CStr::from_ptr(ptr as _); | ||
| return Ok(vm.ctx.new_bytes(cstr.to_bytes().to_vec()).into()); | ||
| } | ||
| } | ||
| if type_code == "Z" { | ||
| // c_wchar_p: read pointer from buffer, dereference to get wide string | ||
| let buffer = zelf.0.buffer.read(); | ||
| let ptr = super::base::read_ptr_from_buffer(&buffer); | ||
| if ptr == 0 { | ||
| return Ok(vm.ctx.none()); | ||
| } | ||
| // Read null-terminated wide string at the address | ||
| unsafe { | ||
| let w_ptr = ptr as *const libc::wchar_t; | ||
| let len = libc::wcslen(w_ptr); | ||
| let wchars = std::slice::from_raw_parts(w_ptr, len); | ||
| let s: String = wchars | ||
| .iter() | ||
| .filter_map(|&c| char::from_u32(c as u32)) | ||
| .collect(); | ||
| return Ok(vm.ctx.new_str(s).into()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c_wchar_p.value decodes UTF-16 incorrectly on Windows
In the "Z" branch of PyCSimple::value, you treat libc::wchar_t elements as full Unicode code points via char::from_u32(c as u32). On Windows, wchar_t is 16-bit UTF-16:
- Surrogate pairs are split into individual 16‑bit units.
char::from_u32returnsNonefor surrogate code points, so they are silently dropped.- Any character outside the BMP will be corrupted or lost.
You already handle this correctly in _ctypes.wstring_at_impl using Wtf8Buf::from_wide on Windows. The same approach should be used here.
Suggested Windows-aware fix for `Z` case
- if type_code == "Z" {
- // c_wchar_p: read pointer from buffer, dereference to get wide string
- let buffer = zelf.0.buffer.read();
- let ptr = super::base::read_ptr_from_buffer(&buffer);
- if ptr == 0 {
- return Ok(vm.ctx.none());
- }
- // Read null-terminated wide string at the address
- unsafe {
- let w_ptr = ptr as *const libc::wchar_t;
- let len = libc::wcslen(w_ptr);
- let wchars = std::slice::from_raw_parts(w_ptr, len);
- let s: String = wchars
- .iter()
- .filter_map(|&c| char::from_u32(c as u32))
- .collect();
- return Ok(vm.ctx.new_str(s).into());
- }
- }
+ if type_code == "Z" {
+ // c_wchar_p: read pointer from buffer, dereference to get wide string
+ let buffer = zelf.0.buffer.read();
+ let ptr = super::base::read_ptr_from_buffer(&buffer);
+ if ptr == 0 {
+ return Ok(vm.ctx.none());
+ }
+ unsafe {
+ let w_ptr = ptr as *const libc::wchar_t;
+ let len = libc::wcslen(w_ptr);
+ let wchars = std::slice::from_raw_parts(w_ptr, len);
+ #[cfg(windows)]
+ {
+ use rustpython_common::wtf8::Wtf8Buf;
+ let wide: Vec<u16> = wchars.iter().map(|&c| c as u16).collect();
+ let wtf8 = Wtf8Buf::from_wide(&wide);
+ return Ok(vm.ctx.new_str(wtf8).into());
+ }
+ #[cfg(not(windows))]
+ {
+ let s: String = wchars
+ .iter()
+ .filter_map(|&c| char::from_u32(c as u32))
+ .collect();
+ return Ok(vm.ctx.new_str(s).into());
+ }
+ }
+ }🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/simple.rs around lines 1087 to 1134, the "Z"
branch decodes wchar_t elements by mapping each element to a Rust char which
drops UTF-16 surrogates on Windows; change it to use the same platform-aware
decoding as _ctypes.wstring_at_impl: read the wchar_t slice, and on Windows
convert the &[libc::wchar_t] (u16) into a WTF-8 string via Wtf8Buf::from_wide
(or equivalent) then new_str that result, while on non-Windows continue using
from_raw_parts -> char conversion (or convert from u32) as before; ensure
null-termination is handled by using the computed length from wcslen and that
unsafe pointer usage stays confined to creating the slice before calling the
proper conversion.
close #2364
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.