Skip to content

[Stdlib] Add CPython protocol builders for Mojo-defined Python types#6453

Open
winding-lines wants to merge 3 commits into
modular:mainfrom
winding-lines:pontoneer-all
Open

[Stdlib] Add CPython protocol builders for Mojo-defined Python types#6453
winding-lines wants to merge 3 commits into
modular:mainfrom
winding-lines:pontoneer-all

Conversation

@winding-lines
Copy link
Copy Markdown
Contributor

BEGIN_PUBLIC
[Stdlib] Add CPython protocol builders for Mojo-defined Python types

Adds a family of protocol builders under std.python that let Mojo structs exposed as Python types implement the standard CPython type protocols without writing CPython slot wrappers by hand.

New builders, each constructed from a PythonTypeBuilder:

  • BufferProtocolBuilderbf_getbuffer / bf_releasebuffer, enabling zero-copy memoryview(obj) / numpy.frombuffer(obj).
  • MappingProtocolBuildermp_length, mp_subscript, mp_ass_subscript (assignment and deletion).
  • NumberProtocolBuilder — full PyNumberMethods surface: unary (__abs__, __neg__, __pos__, __invert__, __int__, __float__, __index__), inquiry (__bool__), binary (arithmetic, bitwise, in-place), and ternary (__pow__, __ipow__).
  • SequenceProtocolBuildersq_length, sq_item, sq_ass_item, sq_contains, sq_concat, sq_repeat.
  • TypeProtocolBuildertp_richcompare for the six comparison operators.

Each def_* method accepts the corresponding Python dunder signature and overloads on receiver kind (pointer, value, mut) and on raising vs non-raising methods. Binary and ternary number methods may raise NotImplementedError to return Py_NotImplemented, triggering Python's reflected-operation fallback. Number methods may also return any ConvertibleToPython type rather than PythonObject directly.

Supporting additions:

  • std.python.adapters — internal C ABI wrappers, _PySlotIndex constants, and lift/conv helpers shared by all builders.
  • std.python.utilsRichCompareOps constants and a NotImplementedError sentinel.
  • std.python.builders — re-exports the four *ProtocolBuilder types as a convenience entry point.

Adds a columnar example under mojo/examples/python-interop demonstrating buffer-protocol interop, and integration tests covering each protocol.
END_PUBLIC

Summary

Extend Python extension capabilities provided by the standard library. This PR adds support for:

  • mapping protocol — obj[key], len(obj), obj[key] = val
  • number protocol — arithmetic operators, abs(), bool(), etc.
  • sequence protocol — indexed access, in operator, concatenation, repetition
  • rich comparison — ==, !=, <, <=, >, >=

Assisted-by: Claude Opus 4.7 (Anthropic) noreply@anthropic.com

Testing

Adds a columnar example under mojo/examples/python-interop demonstrating buffer-protocol interop, and integration tests covering each protocol.

Additionally the standalone library that this code comes from (https://pontoneer.dev) is used by https://github.com/kszucs/marrow

Checklist

  • PR is small and focused — consider splitting larger changes into a
    sequence of smaller PRs
  • I ran ./bazelw run format to format my changes
  • I added or updated tests to cover my changes
  • If AI tools assisted with this contribution, I have included an
    Assisted-by: trailer in my commit message or this PR description
    (see AI Tool Use Policy)

@winding-lines winding-lines requested review from a team as code owners April 24, 2026 02:13
@github-actions github-actions Bot added mojo-stdlib Tag for issues related to standard library waiting-on-review mojo-docs labels Apr 24, 2026
@winding-lines winding-lines force-pushed the pontoneer-all branch 2 times, most recently from 1c56f7e to 43f912d Compare April 24, 2026 02:47
@JoeLoser
Copy link
Copy Markdown
Member

JoeLoser commented May 8, 2026

Hi @winding-lines - thanks for opening this PR! It's been in my queue as well as @ConnorGray's for a bit: I have started reviewing this and will leave a more detailed review this afternoon. Appreciate your patience - I'm excited by this work and what it can help unlock!

Comment on lines +36 to +46
def _unwrap_self[
T: ImplicitlyDestructible
](py_self: PyObjectPtr) -> UnsafePointer[T, MutAnyOrigin]:
"""Downcast a raw PyObjectPtr to a typed Mojo pointer, aborting on failure.
"""
try:
return PythonObject(from_borrowed=py_self).downcast_value_ptr[T]()
except e:
abort(
String("Python method receiver did not have the expected type: ", e)
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_unwrap_self aborts the process on a type mismatch. Every protocol slot funnels through here, so a wrong-type object reaching an unbound method kills the whole interpreter.

Pythonic behavior is to set PyExc_TypeError and return the slot's C error indicator (NULL for *func, -1 for lenfunc/inquiry/getbufferproc/objobjproc, etc.). Either return a Variant/Result-style outcome to each wrapper, or have _unwrap_self itself raise and let the existing try/except in each wrapper handle it.

Side benefit: makes negative tests for the protocol surface actually testable — right now any test that misroutes a receiver type aborts the test binary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feedback, removed the try/except and see how to handle the error in the caller.

Comment on lines +204 to +210
except e:
var msg = String(e)
if NotImplementedError.name == msg:
var not_implemented = cpython.lib.call[
"Py_GetConstantBorrowed", PyObjectPtr
](4)
return cpython.Py_NewRef(not_implemented)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two issues co-located on these lines.

1. Py_GetConstantBorrowed is Python 3.13+ only. _cpython.mojo:1547-1569 already documents this and version-branches Py_None. The CI matrix runs 3.10, 3.11, 3.12, 3.13, 3.14 (bazel/common.MODULE.bazel:117-123). On 3.10–3.12 the first binary handler that raises NotImplementedError will fail with a missing-symbol error. Suggest routing Py_NotImplemented through CPython.__init__ the same way Py_None is, so the version branch lives in one place; this call site then becomes cpython.Py_NotImplemented and the same fix covers lines 253 and 424.

2. Dispatch by message string is fragile. if NotImplementedError.name == msg matches any error whose message text equals the literal "NotImplementedError". The sentinel also collides by name with Python's built-in NotImplementedError, which is semantically different (a real exception vs. a singleton fallback). Suggest renaming the sentinel (e.g. PyNotImplementedSignal) and dispatching via a side-channel — for example a thread-local flag set by the sentinel's constructor and cleared in the wrapper — rather than a string compare. Same comment applies at lines 251 and 421.

Comment on lines +249 to +255
except e:
var msg = String(e)
if NotImplementedError.name == msg:
var not_implemented = cpython.lib.call[
"Py_GetConstantBorrowed", PyObjectPtr
](4)
return cpython.Py_NewRef(not_implemented)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 3.13+ Py_GetConstantBorrowed portability issue as line 208 (ternary path). Same string-match dispatch fragility. Please fix together — all three call sites (here, 208, 424) should consume one shared cpython.Py_NotImplemented accessor.

Comment on lines +418 to +426
except e:
# Mojo lacks multiple except branches; dispatch on the error name.
var msg = String(e)
if NotImplementedError.name == msg:
# Py_CONSTANT_NOT_IMPLEMENTED = 4 (CPython 3.13+ stable ABI)
var not_implemented = cpython.lib.call[
"Py_GetConstantBorrowed", PyObjectPtr
](4)
return cpython.Py_NewRef(not_implemented)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 3.13+ Py_GetConstantBorrowed portability issue as lines 208/253 (richcompare path). The # Py_CONSTANT_NOT_IMPLEMENTED = 4 (CPython 3.13+ stable ABI) comment acknowledges this but the call site doesn't guard for it.

While fixing, please add a test that triggers this branch under Python 3.10 and 3.12 — today nothing in test_type_protocol.py or test_number.py exercises the NotImplementedError → Py_NotImplemented path on the pre-3.13 matrix entries, so the bug would land silently.

Comment on lines +440 to +516
struct _PySlotIndex:
"""CPython slot index constants for use with `PythonTypeBuilder._insert_slot`.

These match the values in CPython's `typeslots.h` and are part of the
stable ABI — do not renumber them.
"""

# Buffer protocol
comptime bf_getbuffer = Int32(1)
comptime bf_releasebuffer = Int32(2)
# Mapping protocol
comptime mp_setitem = Int32(3) # mp_ass_subscript
comptime mp_length = Int32(4)
comptime mp_getitem = Int32(5) # mp_subscript
# Number protocol
comptime nb_absolute = Int32(6)
comptime nb_add = Int32(7)
comptime nb_and = Int32(8)
comptime nb_bool = Int32(9)
comptime nb_divmod = Int32(10)
comptime nb_float = Int32(11)
comptime nb_floor_divide = Int32(12)
comptime nb_index = Int32(13)
comptime nb_inplace_add = Int32(14)
comptime nb_inplace_and = Int32(15)
comptime nb_inplace_floor_divide = Int32(16)
comptime nb_inplace_lshift = Int32(17)
comptime nb_inplace_multiply = Int32(18)
comptime nb_inplace_or = Int32(19)
comptime nb_inplace_power = Int32(20)
comptime nb_inplace_remainder = Int32(21)
comptime nb_inplace_rshift = Int32(22)
comptime nb_inplace_subtract = Int32(23)
comptime nb_inplace_true_divide = Int32(24)
comptime nb_inplace_xor = Int32(25)
comptime nb_int = Int32(26)
comptime nb_invert = Int32(27)
comptime nb_lshift = Int32(28)
comptime nb_multiply = Int32(29)
comptime nb_negative = Int32(30)
comptime nb_or = Int32(31)
comptime nb_positive = Int32(32)
comptime nb_power = Int32(33)
comptime nb_remainder = Int32(34)
comptime nb_rshift = Int32(35)
comptime nb_subtract = Int32(36)
comptime nb_true_divide = Int32(37)
comptime nb_xor = Int32(38)
# Sequence protocol
comptime sq_ass_item = Int32(39)
comptime sq_concat = Int32(40)
comptime sq_contains = Int32(41)
comptime sq_inplace_concat = Int32(42)
comptime sq_inplace_repeat = Int32(43)
comptime sq_item = Int32(44)
comptime sq_length = Int32(45)
comptime sq_repeat = Int32(46)
# Type protocol
comptime tp_alloc = Int32(47)
comptime tp_base = Int32(48)
comptime tp_bases = Int32(49)
comptime tp_call = Int32(50)
comptime tp_clear = Int32(51)
comptime tp_dealloc = Int32(52)
comptime tp_del = Int32(53)
comptime tp_descr_get = Int32(54)
comptime tp_descr_set = Int32(55)
comptime tp_doc = Int32(56)
comptime tp_getattr = Int32(57)
comptime tp_getattro = Int32(58)
comptime tp_hash = Int32(59)
comptime tp_init = Int32(60)
comptime tp_is_gc = Int32(61)
comptime tp_iter = Int32(62)
comptime tp_iternext = Int32(63)
comptime tp_methods = Int32(64)
comptime tp_new = Int32(65)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PySlotIndex re-numbers slots that _cpython.mojo:419-423 already defines as named constants (Py_tp_dealloc = 52, Py_tp_init = 60, Py_tp_methods = 64, Py_tp_new = 65, Py_tp_repr = 66) with typed PyType_Slot.tp_X(func) named constructors. Two parallel sources of truth for ABI numbers is a bug magnet.

Either extend the _cpython.mojo set with the additional mp_* / nb_* / sq_* / bf_* entries this PR needs, or — if the intent is to replace the Py_tp_* constants — do that conversion in this PR or call it out as an explicit follow-up. Today buffer.mojo:34-35 forks a third copy of the same numbers, see comment there.

Comment on lines +169 to +179
# Allocate storage for: shape[0] (8 bytes) + strides[0] (8 bytes)
# + format string (fmt_len bytes) + null terminator (1 byte).
# On 64-bit platforms Int = 8 bytes, so the two Int fields occupy 16 bytes.
var fmt_bytes = info.format.as_bytes()
var fmt_len = len(fmt_bytes)
var alloc_size = 16 + fmt_len + 1 # 2 * sizeof(Int64) + format + NUL

# List.steal_data() gives us an owned UnsafePointer we can free later.
var store = List[UInt8](capacity=alloc_size)
store.resize(alloc_size, 0)
var alloc = store.steal_data()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One List[UInt8] allocation + resize(alloc_size, 0) + steal_data() per memoryview(obj) / numpy.frombuffer(obj) call, just to hold one shape[0], one strides[0], and the format string. For 1D zero-copy hot paths (e.g. calling into numpy in a tight loop) this churns the allocator unnecessarily.

Two options worth considering:

  1. Move shape/strides/format storage into the host Mojo struct (or alongside it) so _bf_getbuffer_wrapper just hands back pointers — no allocation per call.
  2. At minimum, leave a # TODO here so future profiling work has a breadcrumb.

Same applies to _bf_releasebuffer_impl on line 232 since it has to free the matching block.

Comment on lines +14 to +17
from .mapping import MappingProtocolBuilder
from .number import NumberProtocolBuilder
from .sequence import SequenceProtocolBuilder
from .type_protocol import TypeProtocolBuilder
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-exports MappingProtocolBuilder, NumberProtocolBuilder, SequenceProtocolBuilder, TypeProtocolBuilder — but not BufferProtocolBuilder. test/python/buffer_mojo_module.mojo has to reach into std.python.buffer directly as a result. Either include all five or document why buffer is intentionally separate.

Comment on lines +44 to +65
"""Raise this from a rich compare handler to signal Python's NotImplemented.

When caught by the `_richcompare_wrapper`, this causes the wrapper to
return `Py_NotImplemented` to Python rather than setting an exception,
allowing Python to try the reflected operation on the other operand.

Example:
```mojo
@staticmethod
def rich_compare(
self_ptr: PythonObject, other: PythonObject, op: Int
) raises -> Bool:
if op == RichCompareOps.Py_EQ:
...
raise NotImplementedError()
```
"""

comptime name: String = "NotImplementedError"
"""Well-known name used by `_richcompare_wrapper` for dispatch."""

def write_to(self, mut writer: Some[Writer]):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this sentinel NotImplementedError collides with Python's built-in NotImplementedError exception. The Mojo source raise NotImplementedError() and Python source except NotImplementedError: look identical at the syntactic level but refer to different concepts: this is a CPython protocol fallback sentinel (returns Py_NotImplemented), whereas Python's built-in is a genuine exception users may legitimately raise.

Suggest renaming to PyNotImplementedSignal (or just NotImplemented, which mirrors Python's singleton), and expanding the docstring to explicitly call out that this is the sentinel for Py_NotImplemented dispatch — not a user-facing exception type. Pair this with replacing the string-match dispatch in adapters.mojo:206/251/421 (see comment there).

Comment on lines +54 to +58
target_compatible_with = select({
"//:asan": ["@platforms//:incompatible"],
"//:ubsan": ["@platforms//:incompatible"],
"//conditions:default": [],
}),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent sanitizer exclusion. The matching mojo_shared_library and modular_py_test targets in mojo/stdlib/test/python/BUILD.bazel:82-86, 101-106 exclude asan, tsan, ubsan for the same code path; this example excludes only asan and ubsan. tsan is missing here — likely an oversight. Please match the test-side config.

Comment on lines +95 to +109
[
modular_py_test(
name = "test_" + name,
srcs = ["test_" + name + ".py"],
data = [":" + name + "_mojo_module"],
tags = ["no-pydeps"] if name == "buffer" else [],
target_compatible_with = select({
"//:asan": ["@platforms//:incompatible"],
"//:tsan": ["@platforms//:incompatible"],
"//:ubsan": ["@platforms//:incompatible"],
"//conditions:default": [],
}),
)
for name in _PROTOCOL_TESTS
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three test-coverage gaps worth closing as part of this PR rather than as follow-ups:

  1. Pre-3.13 Py_NotImplemented path. Nothing in test_number.py / test_type_protocol.py exercises a raised-NotImplementedError → Py_NotImplemented handoff on Python 3.10–3.12. That is exactly the matrix where Py_GetConstantBorrowed will fail to resolve (see adapters.mojo:208). Add at least one such test per protocol that uses the sentinel.
  2. Buffer writable-rejection branch. buffer.mojo:161-167 returns -1 with PyExc_BufferError when PyBUF_WRITABLE is requested on a read-only buffer. No test exercises this path — it would be straightforward to add via memoryview(obj, writable=True) (or the lower-level __buffer__ protocol).
  3. Type-mismatch handling. No negative tests around _unwrap_self's wrong-type path because the current implementation aborts; replacing the abort with a TypeError (see comment on adapters.mojo:36) makes this testable.

Copy link
Copy Markdown
Member

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winding-lines thanks for your hard work on extending these capabilities and building these out! I think they will be super useful.

Once we align on the feedback at large, I'd suggest we split this up into 5 separate PRs - one for each builder essentially. @ConnorGray mentioned you and him had already chatted about this. This will make each PR much more bite-sized and easier to review, especially for ABI and other issues.

I would also recommend having a short design doc for each of the builders to help give context and understand the proposed changes for people that may not be as close to the Python-Mojo interop so far (such as @NathanSWard). It would be good to help understand the use cases each one unlocks and how it fits into the broader Python-Mojo interop story for the stdlib.

winding-lines and others added 3 commits May 13, 2026 19:34
BEGIN_PUBLIC
[Stdlib] Add CPython protocol builders for Mojo-defined Python types

Adds a family of protocol builders under `std.python` that let Mojo
structs exposed as Python types implement the standard CPython type
protocols without writing CPython slot wrappers by hand.

New builders, each constructed from a `PythonTypeBuilder`:

  - `BufferProtocolBuilder` — `bf_getbuffer` / `bf_releasebuffer`,
    enabling zero-copy `memoryview(obj)` / `numpy.frombuffer(obj)`.
  - `MappingProtocolBuilder` — `mp_length`, `mp_subscript`,
    `mp_ass_subscript` (assignment and deletion).
  - `NumberProtocolBuilder` — full `PyNumberMethods` surface: unary
    (`__abs__`, `__neg__`, `__pos__`, `__invert__`, `__int__`,
    `__float__`, `__index__`), inquiry (`__bool__`), binary (arithmetic,
    bitwise, in-place), and ternary (`__pow__`, `__ipow__`).
  - `SequenceProtocolBuilder` — `sq_length`, `sq_item`, `sq_ass_item`,
    `sq_contains`, `sq_concat`, `sq_repeat`.
  - `TypeProtocolBuilder` — `tp_richcompare` for the six comparison
    operators.

Each `def_*` method accepts the corresponding Python dunder signature
and overloads on receiver kind (pointer, value, mut) and on raising vs
non-raising methods. Binary and ternary number methods may raise
`NotImplementedError` to return `Py_NotImplemented`, triggering Python's
reflected-operation fallback. Number methods may also return any
`ConvertibleToPython` type rather than `PythonObject` directly.

Supporting additions:

  - `std.python.adapters` — internal C ABI wrappers, `_PySlotIndex`
    constants, and lift/conv helpers shared by all builders.
  - `std.python.utils` — `RichCompareOps` constants and a
    `NotImplementedError` sentinel.
  - `std.python.builders` — re-exports the four `*ProtocolBuilder`
    types as a convenience entry point.

Adds a `columnar` example under `mojo/examples/python-interop`
demonstrating buffer-protocol interop, and integration tests covering
each protocol.
END_PUBLIC

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Marius S <39998+winding-lines@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mojo-docs mojo-stdlib Tag for issues related to standard library waiting-on-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants