Add buffer support to c-api#8071
Conversation
📝 WalkthroughWalkthroughThis PR implements Python buffer protocol FFI support in RustPython's C API. It exports a new ChangesPython Buffer Protocol FFI
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8a071ff to
9056150
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/capi/src/pybuffer.rs`:
- Around line 363-366: The functions PyBuffer_ToContiguous and
PyBuffer_FromContiguous currently bail out when
is_contiguous_for_order(view_ref, ...) returns false and then assume contiguous
memory by using dst.copy_from_slice(view_ref.buf), so they fail for
strided/indirect buffers; update both functions to handle non-contiguous views
by iterating over the view_ref (using its shape/strides/len/offset fields) and
performing element-wise reads/writes to pack/unpack into the contiguous dst
buffer according to the requested order instead of calling dst.copy_from_slice;
keep the existing contiguous fast-path (when is_contiguous_for_order is true)
and implement a fallback that computes the source/destination index using
view_ref.strides and itemsize to correctly handle strided/indirect layouts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 9a37bdf8-bace-453f-b4fd-7fe323270fca
📒 Files selected for processing (2)
crates/capi/src/lib.rscrates/capi/src/pybuffer.rs
| if !is_contiguous_for_order(view_ref, (order as u8).to_ascii_uppercase()) { | ||
| return Err(vm.new_buffer_error( | ||
| "PyBuffer_ToContiguous only supports contiguous exported buffers", | ||
| )); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
rg -n "PyBuffer_(To|From)Contiguous" crates/capi/src/pybuffer.rs || true
rg -n "is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
# Print around the cited lines
sed -n '330,430p' crates/capi/src/pybuffer.rs | cat -n
# Also show definition of is_contiguous_for_order if present elsewhere in file
rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
if rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs >/dev/null; then
start=$(rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs | head -n1 | cut -d: -f1)
start=$((start-40))
end=$((start+120))
sed -n "${start},${end}p" crates/capi/src/pybuffer.rs | cat -n
fi
# Search for any code path that handles non-contiguous (e.g., flatten/strides) near these functions
rg -n "strid|stride|shape|indirect|view.*contiguous|ToContiguous|FromContiguous" crates/capi/src/pybuffer.rs || trueRepository: RustPython/RustPython
Length of output: 16604
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
rg -n "PyBuffer_(To|From)Contiguous" crates/capi/src/pybuffer.rs || true
rg -n "is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
# Print around the cited lines
sed -n '330,430p' crates/capi/src/pybuffer.rs | cat -n
# Also show definition of is_contiguous_for_order if present elsewhere in file
rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
if rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs >/dev/null; then
start=$(rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs | head -n1 | cut -d: -f1)
start=$((start-40))
end=$((start+120))
sed -n "${start},${end}p" crates/capi/src/pybuffer.rs | cat -n
fi
# Search for any code path that handles non-contiguous (e.g., flatten/strides) near these functions
rg -n "strid|stride|shape|indirect|view.*contiguous|ToContiguous|FromContiguous" crates/capi/src/pybuffer.rs || trueRepository: RustPython/RustPython
Length of output: 16604
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "PyBuffer_(To|From)Contiguous" crates/capi/src/pybuffer.rs || true
rg -n "is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
echo "---- around 363-406 ----"
sed -n '340,420p' crates/capi/src/pybuffer.rs | cat -n
echo "---- definition of is_contiguous_for_order ----"
rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
if rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs >/dev/null; then
start=$(rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs | head -n1 | cut -d: -f1)
start=$((start-60))
end=$((start+180))
sed -n "${start},${end}p" crates/capi/src/pybuffer.rs | cat -n
fi
echo "---- search for flatten/strided logic in file ----"
rg -n "strid|stride|flatten|indirect|non[- ]contig|non[- ]contiguous|ToContiguous|FromContiguous" crates/capi/src/pybuffer.rs || trueRepository: RustPython/RustPython
Length of output: 6988
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "PyBuffer_(To|From)Contiguous" crates/capi/src/pybuffer.rs || true
rg -n "is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
echo "---- around 363-406 ----"
sed -n '340,420p' crates/capi/src/pybuffer.rs | cat -n
echo "---- definition of is_contiguous_for_order ----"
rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs || true
if rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs >/dev/null; then
start=$(rg -n "fn is_contiguous_for_order" crates/capi/src/pybuffer.rs | head -n1 | cut -d: -f1)
start=$((start-60))
end=$((start+180))
sed -n "${start},${end}p" crates/capi/src/pybuffer.rs | cat -n
fi
echo "---- search for flatten/strided logic in file ----"
rg -n "strid|stride|flatten|indirect|non[- ]contig|non[- ]contiguous|ToContiguous|FromContiguous" crates/capi/src/pybuffer.rs || trueRepository: RustPython/RustPython
Length of output: 6988
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "PyBuffer_(To|From)Contiguous" -S .
rg -n "ToContiguous\(" -S crates src Lib || true
rg -n "FromContiguous\(" -S crates src Lib || trueRepository: RustPython/RustPython
Length of output: 1556
🏁 Script executed:
sed -n '200,360p' crates/capi/src/pybuffer.rs | cat -nRepository: RustPython/RustPython
Length of output: 6483
🌐 Web query:
CPython PyBuffer_ToContiguous semantics non-contiguous buffer strided indirect
💡 Result:
PyBuffer_ToContiguous is a CPython C-API function used to copy data from a source Py_buffer structure into a contiguous destination memory block [1][2]. Its primary purpose is to flatten potentially non-contiguous, strided, or indirect (suboffset-containing) buffer data into a standard, tightly packed C-style or Fortran-style representation [1][3]. The function signature is: int PyBuffer_ToContiguous(void *buf, const Py_buffer *src, Py_ssize_t len, char order) Key semantics and behaviors include: 1. Contiguity Handling: The function first checks if the source buffer (src) is already contiguous according to the requested order ('C', 'F', or 'A' for either) using PyBuffer_IsContiguous [1][3]. If it is already contiguous, it performs a high-speed memcpy directly from the source buffer's memory [3]. 2. Complex Buffers (Strided/Indirect): If the source buffer is not already in the requested contiguous format—such as when it features non-standard strides, multi-dimensional layouts, or suboffsets (indirect pointers)—the function invokes a more complex internal copying routine [3]. This routine systematically iterates through the buffer's dimensions, strides, and suboffsets to reconstruct the data into a linear sequence in the destination buffer (buf) [3][2]. 3. Safety and Validation: - The function requires that the requested length (len) exactly matches the total length of the source buffer (src->len) [1][4][3]. If these do not match, the function fails and returns -1 [1][3]. - It ensures that the destination buffer has sufficient space, though in practice, the caller is responsible for providing a destination buffer of the correct size (matching src->len) [4][3]. 4. Usage Notes: - Order: The order parameter specifies the target memory layout ('C' for row-major, 'F' for column-major) [1][2]. - Error Handling: It returns 0 on success and -1 on error [1][5]. - Evolution: Historical implementations of this function had bugs regarding non-contiguous arrays (e.g., issue #12834) and off-by-one errors [6][7]. Modern CPython versions (3.3+) have significantly improved these routines to handle complex buffer layouts correctly by using an internal, recursive copying mechanism [8][6][3]. Essentially, PyBuffer_ToContiguous acts as a bridge for C-extensions to normalize disparate memory layouts into a predictable format required by many low-level libraries or I/O operations [2].
Citations:
- 1: https://docs.python.org/3/c-api/buffer.html
- 2: https://weshallneversurrender.com/PyBuffer-ToContiguous/
- 3: https://hg.python.org/cpython/rev/8fbbc7c8748e
- 4: https://bugs.python.org/issue33126
- 5: https://github.com/python/cpython/blob/main/Doc/c-api/buffer.rst
- 6: https://bugs.python.org/issue12834
- 7: PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers python/cpython#67538
- 8: memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays python/cpython#57043
🏁 Script executed:
sed -n '200,360p' crates/capi/src/pybuffer.rs | cat -nRepository: RustPython/RustPython
Length of output: 6483
🌐 Web query:
CPython PyBuffer_ToContiguous semantics non-contiguous buffer strided indirect
💡 Result:
PyBuffer_ToContiguous is a CPython C-API function used to copy data from an exported buffer (Py_buffer) into a flat, contiguous destination buffer (buf) [1][2]. Semantics for Non-Contiguous, Strided, and Indirect Buffers: 1. Contiguity Check: The function first checks if the source buffer is already contiguous in the requested order ('C' for C-style, 'F' for Fortran-style, or 'A' for either) using PyBuffer_IsContiguous [1][3]. If it is, a direct memory copy (memcpy) is performed [3][4]. 2. Non-Contiguous/Strided Handling: If the buffer is not contiguous, the function employs a more complex procedure to pack the data [3][4]. It iterates over the logical structure of the buffer using its shape, strides, and suboffsets to extract elements and place them sequentially into the destination buffer [3]. 3. Indirect Buffers (Suboffsets): If a buffer is indirect (i.e., has suboffsets), PyBuffer_ToContiguous correctly handles the dereferencing required to access the actual data pointed to by the suboffset pointers [5][3]. 4. Requirements: The caller must provide a destination buffer of at least the size specified by the source buffer's len field. The function will fail and return -1 if the provided length does not exactly match the source buffer's total length (src->len) [1][6][3]. In essence, PyBuffer_ToContiguous acts as a formatter that linearizes complex, multi-dimensional, or strided memory layouts into a flat byte sequence based on the requested memory order [3][4]. It is the underlying mechanism used by memoryview.tobytes to resolve non-contiguous memory structures [7][4].
Citations:
- 1: https://docs.python.org/release/3.11.1/c-api/buffer.html
- 2: https://docs.python.org/3/c-api/buffer.html
- 3: https://hg.python.org/cpython/rev/8fbbc7c8748e
- 4: https://gopy.tamnd.com/docs/annotations/objects/objects_memoryview6_detail
- 5: https://python.readthedocs.io/en/stable/c-api/buffer.html
- 6: https://github.com/python/cpython/blob/main/Doc/c-api/buffer.rst
- 7: https://bugs.python.org/issue12834
🏁 Script executed:
sed -n '200,360p' crates/capi/src/pybuffer.rs | cat -nRepository: RustPython/RustPython
Length of output: 6483
🌐 Web query:
CPython PyBuffer_ToContiguous semantics non-contiguous buffer strided indirect
💡 Result:
In CPython, the PyBuffer_ToContiguous function is used to copy data from a Py_buffer structure into a destination buffer as a contiguous byte array [1][2]. Its semantics regarding buffer types are as follows: 1. Contiguous Buffers: If the source buffer is already contiguous in the requested order (C-style, Fortran-style, or either), the function effectively performs a memory copy (e.g., memcpy) to the destination [3]. 2. Non-Contiguous and Strided Buffers: If the buffer is non-contiguous or employs strides, the function automatically reconstructs the logical data layout into a contiguous representation within the provided destination buffer [3]. It traverses the buffer according to its shape and strides, gathering the non-contiguous elements into a packed, contiguous memory block [3]. 3. Indirect Buffers: Buffers that utilize suboffsets (indirect buffers) are inherently considered non-contiguous in CPython's buffer protocol [4]. PyBuffer_ToContiguous handles these by resolving the pointers to collect the data into the requested contiguous format [3]. Function Signature: int PyBuffer_ToContiguous(void *buf, const Py_buffer *src, Py_ssize_t len, char order) Parameters: - buf: The destination buffer where the contiguous data will be copied. - src: The source Py_buffer containing the data (which may be strided or indirect). - len: The total number of bytes to copy. This must match src->len [1][2]. - order: The desired layout order: 'C' (C-style), 'F' (Fortran-style), or 'A' (either one) [1][2]. Return Value: Returns 0 on success and -1 on error [1][2]. Historical Note: The function has historically been subject to bugs related to off-by-one errors and incorrect handling of complex buffer layouts, particularly when dealing with non-contiguous or multi-dimensional arrays [5][3]. While these issues have been addressed in modern CPython versions, users should ensure they are using updated versions of the interpreter if relying on this API for complex buffer manipulations [5][3].
Citations:
- 1: https://docs.python.org/3/c-api/buffer.html
- 2: https://github.com/python/cpython/blob/main/Doc/c-api/buffer.rst
- 3: https://bugs.python.org/issue12834
- 4: https://gopy.tamnd.com/docs/annotations/objects/objects_bufferobject_detail
- 5: PyBuffer_ToContiguous() off-by-one error for non-contiguous buffers python/cpython#67538
Fix PyBuffer_{To,From}Contiguous to support non-contiguous (strided/indirect) buffers
crates/capi/src/pybuffer.rs currently rejects any view failing is_contiguous_for_order(...) (lines 363-366 and 403-406) and then performs a raw dst.copy_from_slice(src) over view_ref.buf, so strided/indirect layouts can’t be packed/filled as PyBuffer_{To,From}Contiguous is expected to do.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/capi/src/pybuffer.rs` around lines 363 - 366, The functions
PyBuffer_ToContiguous and PyBuffer_FromContiguous currently bail out when
is_contiguous_for_order(view_ref, ...) returns false and then assume contiguous
memory by using dst.copy_from_slice(view_ref.buf), so they fail for
strided/indirect buffers; update both functions to handle non-contiguous views
by iterating over the view_ref (using its shape/strides/len/offset fields) and
performing element-wise reads/writes to pack/unpack into the contiguous dst
buffer according to the requested order instead of calling dst.copy_from_slice;
keep the existing contiguous fast-path (when is_contiguous_for_order is true)
and implement a fallback that computes the source/destination index using
view_ref.strides and itemsize to correctly handle strided/indirect layouts.
Summary by CodeRabbit