[Stdlib] Add CPython protocol builders for Mojo-defined Python types#6453
[Stdlib] Add CPython protocol builders for Mojo-defined Python types#6453winding-lines wants to merge 3 commits into
Conversation
1c56f7e to
43f912d
Compare
|
Hi @winding-lines - thanks for opening this PR! It's been in my queue as well as @ConnorGray's for a bit: I have started reviewing this and will leave a more detailed review this afternoon. Appreciate your patience - I'm excited by this work and what it can help unlock! |
| def _unwrap_self[ | ||
| T: ImplicitlyDestructible | ||
| ](py_self: PyObjectPtr) -> UnsafePointer[T, MutAnyOrigin]: | ||
| """Downcast a raw PyObjectPtr to a typed Mojo pointer, aborting on failure. | ||
| """ | ||
| try: | ||
| return PythonObject(from_borrowed=py_self).downcast_value_ptr[T]() | ||
| except e: | ||
| abort( | ||
| String("Python method receiver did not have the expected type: ", e) | ||
| ) |
There was a problem hiding this comment.
_unwrap_self aborts the process on a type mismatch. Every protocol slot funnels through here, so a wrong-type object reaching an unbound method kills the whole interpreter.
Pythonic behavior is to set PyExc_TypeError and return the slot's C error indicator (NULL for *func, -1 for lenfunc/inquiry/getbufferproc/objobjproc, etc.). Either return a Variant/Result-style outcome to each wrapper, or have _unwrap_self itself raise and let the existing try/except in each wrapper handle it.
Side benefit: makes negative tests for the protocol surface actually testable — right now any test that misroutes a receiver type aborts the test binary.
There was a problem hiding this comment.
Great feedback, removed the try/except and see how to handle the error in the caller.
| except e: | ||
| var msg = String(e) | ||
| if NotImplementedError.name == msg: | ||
| var not_implemented = cpython.lib.call[ | ||
| "Py_GetConstantBorrowed", PyObjectPtr | ||
| ](4) | ||
| return cpython.Py_NewRef(not_implemented) |
There was a problem hiding this comment.
I see two issues co-located on these lines.
1. Py_GetConstantBorrowed is Python 3.13+ only. _cpython.mojo:1547-1569 already documents this and version-branches Py_None. The CI matrix runs 3.10, 3.11, 3.12, 3.13, 3.14 (bazel/common.MODULE.bazel:117-123). On 3.10–3.12 the first binary handler that raises NotImplementedError will fail with a missing-symbol error. Suggest routing Py_NotImplemented through CPython.__init__ the same way Py_None is, so the version branch lives in one place; this call site then becomes cpython.Py_NotImplemented and the same fix covers lines 253 and 424.
2. Dispatch by message string is fragile. if NotImplementedError.name == msg matches any error whose message text equals the literal "NotImplementedError". The sentinel also collides by name with Python's built-in NotImplementedError, which is semantically different (a real exception vs. a singleton fallback). Suggest renaming the sentinel (e.g. PyNotImplementedSignal) and dispatching via a side-channel — for example a thread-local flag set by the sentinel's constructor and cleared in the wrapper — rather than a string compare. Same comment applies at lines 251 and 421.
| except e: | ||
| var msg = String(e) | ||
| if NotImplementedError.name == msg: | ||
| var not_implemented = cpython.lib.call[ | ||
| "Py_GetConstantBorrowed", PyObjectPtr | ||
| ](4) | ||
| return cpython.Py_NewRef(not_implemented) |
There was a problem hiding this comment.
Same 3.13+ Py_GetConstantBorrowed portability issue as line 208 (ternary path). Same string-match dispatch fragility. Please fix together — all three call sites (here, 208, 424) should consume one shared cpython.Py_NotImplemented accessor.
| except e: | ||
| # Mojo lacks multiple except branches; dispatch on the error name. | ||
| var msg = String(e) | ||
| if NotImplementedError.name == msg: | ||
| # Py_CONSTANT_NOT_IMPLEMENTED = 4 (CPython 3.13+ stable ABI) | ||
| var not_implemented = cpython.lib.call[ | ||
| "Py_GetConstantBorrowed", PyObjectPtr | ||
| ](4) | ||
| return cpython.Py_NewRef(not_implemented) |
There was a problem hiding this comment.
Same 3.13+ Py_GetConstantBorrowed portability issue as lines 208/253 (richcompare path). The # Py_CONSTANT_NOT_IMPLEMENTED = 4 (CPython 3.13+ stable ABI) comment acknowledges this but the call site doesn't guard for it.
While fixing, please add a test that triggers this branch under Python 3.10 and 3.12 — today nothing in test_type_protocol.py or test_number.py exercises the NotImplementedError → Py_NotImplemented path on the pre-3.13 matrix entries, so the bug would land silently.
| struct _PySlotIndex: | ||
| """CPython slot index constants for use with `PythonTypeBuilder._insert_slot`. | ||
|
|
||
| These match the values in CPython's `typeslots.h` and are part of the | ||
| stable ABI — do not renumber them. | ||
| """ | ||
|
|
||
| # Buffer protocol | ||
| comptime bf_getbuffer = Int32(1) | ||
| comptime bf_releasebuffer = Int32(2) | ||
| # Mapping protocol | ||
| comptime mp_setitem = Int32(3) # mp_ass_subscript | ||
| comptime mp_length = Int32(4) | ||
| comptime mp_getitem = Int32(5) # mp_subscript | ||
| # Number protocol | ||
| comptime nb_absolute = Int32(6) | ||
| comptime nb_add = Int32(7) | ||
| comptime nb_and = Int32(8) | ||
| comptime nb_bool = Int32(9) | ||
| comptime nb_divmod = Int32(10) | ||
| comptime nb_float = Int32(11) | ||
| comptime nb_floor_divide = Int32(12) | ||
| comptime nb_index = Int32(13) | ||
| comptime nb_inplace_add = Int32(14) | ||
| comptime nb_inplace_and = Int32(15) | ||
| comptime nb_inplace_floor_divide = Int32(16) | ||
| comptime nb_inplace_lshift = Int32(17) | ||
| comptime nb_inplace_multiply = Int32(18) | ||
| comptime nb_inplace_or = Int32(19) | ||
| comptime nb_inplace_power = Int32(20) | ||
| comptime nb_inplace_remainder = Int32(21) | ||
| comptime nb_inplace_rshift = Int32(22) | ||
| comptime nb_inplace_subtract = Int32(23) | ||
| comptime nb_inplace_true_divide = Int32(24) | ||
| comptime nb_inplace_xor = Int32(25) | ||
| comptime nb_int = Int32(26) | ||
| comptime nb_invert = Int32(27) | ||
| comptime nb_lshift = Int32(28) | ||
| comptime nb_multiply = Int32(29) | ||
| comptime nb_negative = Int32(30) | ||
| comptime nb_or = Int32(31) | ||
| comptime nb_positive = Int32(32) | ||
| comptime nb_power = Int32(33) | ||
| comptime nb_remainder = Int32(34) | ||
| comptime nb_rshift = Int32(35) | ||
| comptime nb_subtract = Int32(36) | ||
| comptime nb_true_divide = Int32(37) | ||
| comptime nb_xor = Int32(38) | ||
| # Sequence protocol | ||
| comptime sq_ass_item = Int32(39) | ||
| comptime sq_concat = Int32(40) | ||
| comptime sq_contains = Int32(41) | ||
| comptime sq_inplace_concat = Int32(42) | ||
| comptime sq_inplace_repeat = Int32(43) | ||
| comptime sq_item = Int32(44) | ||
| comptime sq_length = Int32(45) | ||
| comptime sq_repeat = Int32(46) | ||
| # Type protocol | ||
| comptime tp_alloc = Int32(47) | ||
| comptime tp_base = Int32(48) | ||
| comptime tp_bases = Int32(49) | ||
| comptime tp_call = Int32(50) | ||
| comptime tp_clear = Int32(51) | ||
| comptime tp_dealloc = Int32(52) | ||
| comptime tp_del = Int32(53) | ||
| comptime tp_descr_get = Int32(54) | ||
| comptime tp_descr_set = Int32(55) | ||
| comptime tp_doc = Int32(56) | ||
| comptime tp_getattr = Int32(57) | ||
| comptime tp_getattro = Int32(58) | ||
| comptime tp_hash = Int32(59) | ||
| comptime tp_init = Int32(60) | ||
| comptime tp_is_gc = Int32(61) | ||
| comptime tp_iter = Int32(62) | ||
| comptime tp_iternext = Int32(63) | ||
| comptime tp_methods = Int32(64) | ||
| comptime tp_new = Int32(65) |
There was a problem hiding this comment.
_PySlotIndex re-numbers slots that _cpython.mojo:419-423 already defines as named constants (Py_tp_dealloc = 52, Py_tp_init = 60, Py_tp_methods = 64, Py_tp_new = 65, Py_tp_repr = 66) with typed PyType_Slot.tp_X(func) named constructors. Two parallel sources of truth for ABI numbers is a bug magnet.
Either extend the _cpython.mojo set with the additional mp_* / nb_* / sq_* / bf_* entries this PR needs, or — if the intent is to replace the Py_tp_* constants — do that conversion in this PR or call it out as an explicit follow-up. Today buffer.mojo:34-35 forks a third copy of the same numbers, see comment there.
| # Allocate storage for: shape[0] (8 bytes) + strides[0] (8 bytes) | ||
| # + format string (fmt_len bytes) + null terminator (1 byte). | ||
| # On 64-bit platforms Int = 8 bytes, so the two Int fields occupy 16 bytes. | ||
| var fmt_bytes = info.format.as_bytes() | ||
| var fmt_len = len(fmt_bytes) | ||
| var alloc_size = 16 + fmt_len + 1 # 2 * sizeof(Int64) + format + NUL | ||
|
|
||
| # List.steal_data() gives us an owned UnsafePointer we can free later. | ||
| var store = List[UInt8](capacity=alloc_size) | ||
| store.resize(alloc_size, 0) | ||
| var alloc = store.steal_data() |
There was a problem hiding this comment.
One List[UInt8] allocation + resize(alloc_size, 0) + steal_data() per memoryview(obj) / numpy.frombuffer(obj) call, just to hold one shape[0], one strides[0], and the format string. For 1D zero-copy hot paths (e.g. calling into numpy in a tight loop) this churns the allocator unnecessarily.
Two options worth considering:
- Move shape/strides/format storage into the host Mojo struct (or alongside it) so
_bf_getbuffer_wrapperjust hands back pointers — no allocation per call. - At minimum, leave a
# TODOhere so future profiling work has a breadcrumb.
Same applies to _bf_releasebuffer_impl on line 232 since it has to free the matching block.
| from .mapping import MappingProtocolBuilder | ||
| from .number import NumberProtocolBuilder | ||
| from .sequence import SequenceProtocolBuilder | ||
| from .type_protocol import TypeProtocolBuilder |
There was a problem hiding this comment.
Re-exports MappingProtocolBuilder, NumberProtocolBuilder, SequenceProtocolBuilder, TypeProtocolBuilder — but not BufferProtocolBuilder. test/python/buffer_mojo_module.mojo has to reach into std.python.buffer directly as a result. Either include all five or document why buffer is intentionally separate.
| """Raise this from a rich compare handler to signal Python's NotImplemented. | ||
|
|
||
| When caught by the `_richcompare_wrapper`, this causes the wrapper to | ||
| return `Py_NotImplemented` to Python rather than setting an exception, | ||
| allowing Python to try the reflected operation on the other operand. | ||
|
|
||
| Example: | ||
| ```mojo | ||
| @staticmethod | ||
| def rich_compare( | ||
| self_ptr: PythonObject, other: PythonObject, op: Int | ||
| ) raises -> Bool: | ||
| if op == RichCompareOps.Py_EQ: | ||
| ... | ||
| raise NotImplementedError() | ||
| ``` | ||
| """ | ||
|
|
||
| comptime name: String = "NotImplementedError" | ||
| """Well-known name used by `_richcompare_wrapper` for dispatch.""" | ||
|
|
||
| def write_to(self, mut writer: Some[Writer]): |
There was a problem hiding this comment.
Naming this sentinel NotImplementedError collides with Python's built-in NotImplementedError exception. The Mojo source raise NotImplementedError() and Python source except NotImplementedError: look identical at the syntactic level but refer to different concepts: this is a CPython protocol fallback sentinel (returns Py_NotImplemented), whereas Python's built-in is a genuine exception users may legitimately raise.
Suggest renaming to PyNotImplementedSignal (or just NotImplemented, which mirrors Python's singleton), and expanding the docstring to explicitly call out that this is the sentinel for Py_NotImplemented dispatch — not a user-facing exception type. Pair this with replacing the string-match dispatch in adapters.mojo:206/251/421 (see comment there).
| target_compatible_with = select({ | ||
| "//:asan": ["@platforms//:incompatible"], | ||
| "//:ubsan": ["@platforms//:incompatible"], | ||
| "//conditions:default": [], | ||
| }), |
There was a problem hiding this comment.
Inconsistent sanitizer exclusion. The matching mojo_shared_library and modular_py_test targets in mojo/stdlib/test/python/BUILD.bazel:82-86, 101-106 exclude asan, tsan, ubsan for the same code path; this example excludes only asan and ubsan. tsan is missing here — likely an oversight. Please match the test-side config.
| [ | ||
| modular_py_test( | ||
| name = "test_" + name, | ||
| srcs = ["test_" + name + ".py"], | ||
| data = [":" + name + "_mojo_module"], | ||
| tags = ["no-pydeps"] if name == "buffer" else [], | ||
| target_compatible_with = select({ | ||
| "//:asan": ["@platforms//:incompatible"], | ||
| "//:tsan": ["@platforms//:incompatible"], | ||
| "//:ubsan": ["@platforms//:incompatible"], | ||
| "//conditions:default": [], | ||
| }), | ||
| ) | ||
| for name in _PROTOCOL_TESTS | ||
| ] |
There was a problem hiding this comment.
Three test-coverage gaps worth closing as part of this PR rather than as follow-ups:
- Pre-3.13
Py_NotImplementedpath. Nothing intest_number.py/test_type_protocol.pyexercises a raised-NotImplementedError → Py_NotImplementedhandoff on Python 3.10–3.12. That is exactly the matrix wherePy_GetConstantBorrowedwill fail to resolve (seeadapters.mojo:208). Add at least one such test per protocol that uses the sentinel. - Buffer writable-rejection branch.
buffer.mojo:161-167returns -1 withPyExc_BufferErrorwhenPyBUF_WRITABLEis requested on a read-only buffer. No test exercises this path — it would be straightforward to add viamemoryview(obj, writable=True)(or the lower-level__buffer__protocol). - Type-mismatch handling. No negative tests around
_unwrap_self's wrong-type path because the current implementation aborts; replacing the abort with aTypeError(see comment onadapters.mojo:36) makes this testable.
There was a problem hiding this comment.
@winding-lines thanks for your hard work on extending these capabilities and building these out! I think they will be super useful.
Once we align on the feedback at large, I'd suggest we split this up into 5 separate PRs - one for each builder essentially. @ConnorGray mentioned you and him had already chatted about this. This will make each PR much more bite-sized and easier to review, especially for ABI and other issues.
I would also recommend having a short design doc for each of the builders to help give context and understand the proposed changes for people that may not be as close to the Python-Mojo interop so far (such as @NathanSWard). It would be good to help understand the use cases each one unlocks and how it fits into the broader Python-Mojo interop story for the stdlib.
BEGIN_PUBLIC
[Stdlib] Add CPython protocol builders for Mojo-defined Python types
Adds a family of protocol builders under `std.python` that let Mojo
structs exposed as Python types implement the standard CPython type
protocols without writing CPython slot wrappers by hand.
New builders, each constructed from a `PythonTypeBuilder`:
- `BufferProtocolBuilder` — `bf_getbuffer` / `bf_releasebuffer`,
enabling zero-copy `memoryview(obj)` / `numpy.frombuffer(obj)`.
- `MappingProtocolBuilder` — `mp_length`, `mp_subscript`,
`mp_ass_subscript` (assignment and deletion).
- `NumberProtocolBuilder` — full `PyNumberMethods` surface: unary
(`__abs__`, `__neg__`, `__pos__`, `__invert__`, `__int__`,
`__float__`, `__index__`), inquiry (`__bool__`), binary (arithmetic,
bitwise, in-place), and ternary (`__pow__`, `__ipow__`).
- `SequenceProtocolBuilder` — `sq_length`, `sq_item`, `sq_ass_item`,
`sq_contains`, `sq_concat`, `sq_repeat`.
- `TypeProtocolBuilder` — `tp_richcompare` for the six comparison
operators.
Each `def_*` method accepts the corresponding Python dunder signature
and overloads on receiver kind (pointer, value, mut) and on raising vs
non-raising methods. Binary and ternary number methods may raise
`NotImplementedError` to return `Py_NotImplemented`, triggering Python's
reflected-operation fallback. Number methods may also return any
`ConvertibleToPython` type rather than `PythonObject` directly.
Supporting additions:
- `std.python.adapters` — internal C ABI wrappers, `_PySlotIndex`
constants, and lift/conv helpers shared by all builders.
- `std.python.utils` — `RichCompareOps` constants and a
`NotImplementedError` sentinel.
- `std.python.builders` — re-exports the four `*ProtocolBuilder`
types as a convenience entry point.
Adds a `columnar` example under `mojo/examples/python-interop`
demonstrating buffer-protocol interop, and integration tests covering
each protocol.
END_PUBLIC
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Marius S <39998+winding-lines@users.noreply.github.com>
43f912d to
8182768
Compare
BEGIN_PUBLIC
[Stdlib] Add CPython protocol builders for Mojo-defined Python types
Adds a family of protocol builders under
std.pythonthat let Mojo structs exposed as Python types implement the standard CPython type protocols without writing CPython slot wrappers by hand.New builders, each constructed from a
PythonTypeBuilder:BufferProtocolBuilder—bf_getbuffer/bf_releasebuffer, enabling zero-copymemoryview(obj)/numpy.frombuffer(obj).MappingProtocolBuilder—mp_length,mp_subscript,mp_ass_subscript(assignment and deletion).NumberProtocolBuilder— fullPyNumberMethodssurface: unary (__abs__,__neg__,__pos__,__invert__,__int__,__float__,__index__), inquiry (__bool__), binary (arithmetic, bitwise, in-place), and ternary (__pow__,__ipow__).SequenceProtocolBuilder—sq_length,sq_item,sq_ass_item,sq_contains,sq_concat,sq_repeat.TypeProtocolBuilder—tp_richcomparefor the six comparison operators.Each
def_*method accepts the corresponding Python dunder signature and overloads on receiver kind (pointer, value, mut) and on raising vs non-raising methods. Binary and ternary number methods may raiseNotImplementedErrorto returnPy_NotImplemented, triggering Python's reflected-operation fallback. Number methods may also return anyConvertibleToPythontype rather thanPythonObjectdirectly.Supporting additions:
std.python.adapters— internal C ABI wrappers,_PySlotIndexconstants, and lift/conv helpers shared by all builders.std.python.utils—RichCompareOpsconstants and aNotImplementedErrorsentinel.std.python.builders— re-exports the four*ProtocolBuildertypes as a convenience entry point.Adds a
columnarexample undermojo/examples/python-interopdemonstrating buffer-protocol interop, and integration tests covering each protocol.END_PUBLIC
Summary
Extend Python extension capabilities provided by the standard library. This PR adds support for:
Assisted-by: Claude Opus 4.7 (Anthropic) noreply@anthropic.com
Testing
Adds a
columnarexample undermojo/examples/python-interopdemonstrating buffer-protocol interop, and integration tests covering each protocol.Additionally the standalone library that this code comes from (https://pontoneer.dev) is used by https://github.com/kszucs/marrow
Checklist
sequence of smaller PRs
./bazelw run formatto format my changesAssisted-by:trailer in my commit message or this PR description(see AI Tool Use Policy)