a smalltalk on
Object and Protocol in CPython
shiyao.ma <i@introo.me>
May. 4th
Why this
‣ Python is great for carrying out research experiment.
this should lay the foundation why I discuss Python.
‣ Life is short. You need Python.
this should lay the foundation why people like Python.
2
life is neither a short nor a long, just a (signed) int, 31bits at most, I say.
Takeaway
‣ Understand the inter-relation among {.py, .pyc .c} file.
‣ Understand that everything in Python is an object.
‣ Understand how functions on TypeObject affect
InstanceObject.
3
CPython Overview
‣ First implemented in Dec.1989 by GvR, the BDFL
‣ Serving as the reference implementation.
‣ IronPython (clr)
‣ Jython (jvm)
‣ Brython (v8, spider) [no kidding]
‣ Written in ANSI C.
‣ flexible language binding
‣ embedding (libpython), e.g., openwrt, etc.
4
CPython Overview
‣ Code maintained by Mercurial.
‣ source: https://hg.python.org/cpython/
‣ Build toolchain is autoconf (on *nix)
./configure --with-pydebug && make -j2
5
CPython Overview
‣ Structure
6
cpython
configure.ac
Doc
Grammar
Include
Lib
Mac
Modules
Objects
Parser
Programs
Python
CPython Overview
‣ execution lifetime
7
PY
Parser
PY[CO]
VM
LOAD 1
LOAD 2
ADD
LOAD X
STORE X
x = 1 + 2
1 1
2
3
3
x
STACK
takeaway: py/pyc/c inter-relatoin
8
object and protocol:
the objects
Object
object: memory of C structure with common header
9
PyListObject
PyDictObject
PyTupleObject
PySyntaxErrorObject
PyImportErrorObject
…
takeaway: everything is object
ob_type
ob_refcnt
PyObject
ob_type
ob_size
ob_refcnt
PyVarObject
Object Structure
Will PyLongObject overflow?
10
The answer: chunk-chunk
digit[n]
…
digit[3]
digit[2]
ob_type
ob_size
digit ob_digit[1]
ob_refcnt
PyLongObject
typedef PY_UINT32_T digit;
result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
size*sizeof(digit));
n = 2 ** 64 # more bits than a word
assert type(n) is int and n > 0
Object Structure
Why my multi-dimensional array won’t work?
11
The answer: indirection, mutability
allocated
ob_type
ob_item
ob_refcnt
ob_size
PyListObject
PyObject*
PyObject*
…
PyObject*
PyObject*
allocated
ob_type
ob_item
ob_refcnt
ob_size
PyObject*
PyObject*
… 42
None
m, n = 4, 2
arr = [ [ None ] * n ] * m
arr[1][1] = 42
# [[None, 42], [None, 42], [None, 42], [None, 42]]
PyList_SetItem
Object Structure
what is the ob_type?
12
The answer: flexible type system
class Machine(type): pass
# Toy = Machine(foo, bar, hoge)
class Toy(metaclass=Machine): pass
toy = Toy()
# Toy, Machine, type, type
print(type(toy), type(Toy), type(Machine), type(type))
ob_type
ob_refcnt
…
toy
ob_type
ob_refcnt
…
ob_type
ob_refcnt
…
Toy
Machine
ob_type
ob_refcnt
…
Type
Object Structure
what is the ob_type?
13
# ob_type2
# 10fd69490 - 10fd69490 - 10fd69490
print("%x - %x - %x" % (id(42 .__class__), id(233 .__class__), id(int)))
assert dict().__class__ is dict
# dynamically create a class named "MagicKlass"
klass=“MagicKlass"
klass=type(klass, (object,), {"quack": lambda _: print("quack")});
duck = klass()
# quack
duck.quack()
assert duck.__class__ is klass
Object Structure
what is the ob_type?
14
ob_type
…
…
…
ob_refcnt
PyObject
…
*tp_as_mapping
*tp_as_sequence
*tp_as_number
…
ob_type
tp_getattr
…
tp_print
ob_refcnt
PyTypeObject
…
nb_subtract
…
nb_add
15
object and protocol:
the protocol
16
Protocol:
duck-typing in typing
AOL
‣ Abstract Object Layer
17
…
*tp_as_mapping
*tp_as_sequence
*tp_as_number
…
ob_type
tp_getattr
…
tp_print
ob_refcnt
PyTypeObject
…
nb_subtract
…
nb_add
When I see a bird that walks like a duck and swims like a duck and
quacks like a duck, I call that bird a duck.
Object Protocol
Number Protocol
Sequence Protocol
Iterator Protocol
Buffer Protocol
int PyObject_Print(PyObject *o, FILE *fp, int flags)
int PyObject_HasAttr(PyObject *o, PyObject *attr_name)
int PyObject_DelAttr(PyObject *o, PyObject *attr_name)
…
PyObject* PyNumber_Add(PyObject *o1, PyObject *o2)
PyObject* PyNumber_Multiply(PyObject *o1, PyObject *o2)
PyObject* PyNumber_FloorDivide(PyObject *o1, PyObject *o2)
…
PyObject* PySequence_Concat(PyObject *o1, PyObject *o2)
PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count)
PyObject* PySequence_GetItem(PyObject *o, Py_ssize_t i)
…
int PyIter_Check(PyObject *o)
PyObject* PyIter_Next(PyObject *o)
int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags)
void PyBuffer_Release(Py_buffer *view)
int PyBuffer_IsContiguous(Py_buffer *view, char order)
…
Mapping Protocol
int PyMapping_HasKey(PyObject *o, PyObject *key)
PyObject* PyMapping_GetItemString(PyObject *o, const char *key)
int PyMapping_SetItemString(PyObject *o, const char *key, PyObject *v)
…
Example
‣ Number Protocol (PyNumber_Add)
18
// v + w?
PyObject *
PyNumber_Add(PyObject *v, PyObject *w)
{
// this just an example!
// try on v
result = v->ob_type->tp_as_number.nb_add(v, w)
// if fail or if w->ob_type is a subclass of v->ob_type
result = w->ob_type->tp_as_number.nb_add(w, v)
// return result
}
…
*tp_as_mapping
*tp_as_sequence
*tp_as_number
…
ob_type
tp_getattr
…
tp_print
ob_refcnt
PyTypeObject
…
nb_subtract
…
nb_add
takeaway: typeobject stores meta information
More Example
Why can we multiply a list? Is it slow?
19
arr = [None] * 3
# [None, None, None]
Exercise:
arr = [None] + [None]
# [None, None]
Magic Methods
access slots of tp_as_number, and its friends
20
Note tp_as_mapping->mp_length and tp_as_sequence->sq_length map to the
same slot __len__

If your C based MyType implements both, what’s MyType.__len__ and
len(MyType()) ?

# access magic method of dict and list
dict.__getitem__ # tp_as_mapping->mp_subscript
dict.__len__ # tp_as_mapping->mp_length
list.__getitem__ # tp_as_sequence->sq_item
list.__len__ # tp_as_sequence->sq_length
Magic Methods
backfill as_number and its friends
21
class A():
def __len__(self):
return 42
class B(): pass
# 42
print(len(A()))
# TypeError: object of type 'B' has no len()
print(len(B()))
Py_ssize_t
PyObject_Size(PyObject *o)
{
PySequenceMethods *m;
if (o == NULL) {
null_error();
return -1;
}
m = o->ob_type->tp_as_sequence;
if (m && m->sq_length)
return m->sq_length(o);
return PyMapping_Size(o);
}Which field does A.__len__ fill?
Next: Heterogeneous
Have you ever felt insecure towards negative indexing of
PyListObject?
22
The answer: RTFSC
words = "the quick brown fox jumps over the old lazy dog".split()
assert words[-1] == "dog"
words.insert(-100, "hayabusa")
assert words[-100] == ??
Thanks
23

Intro python-object-protocol

  • 1.
    a smalltalk on Objectand Protocol in CPython shiyao.ma <i@introo.me> May. 4th
  • 2.
    Why this ‣ Pythonis great for carrying out research experiment. this should lay the foundation why I discuss Python. ‣ Life is short. You need Python. this should lay the foundation why people like Python. 2 life is neither a short nor a long, just a (signed) int, 31bits at most, I say.
  • 3.
    Takeaway ‣ Understand theinter-relation among {.py, .pyc .c} file. ‣ Understand that everything in Python is an object. ‣ Understand how functions on TypeObject affect InstanceObject. 3
  • 4.
    CPython Overview ‣ Firstimplemented in Dec.1989 by GvR, the BDFL ‣ Serving as the reference implementation. ‣ IronPython (clr) ‣ Jython (jvm) ‣ Brython (v8, spider) [no kidding] ‣ Written in ANSI C. ‣ flexible language binding ‣ embedding (libpython), e.g., openwrt, etc. 4
  • 5.
    CPython Overview ‣ Codemaintained by Mercurial. ‣ source: https://hg.python.org/cpython/ ‣ Build toolchain is autoconf (on *nix) ./configure --with-pydebug && make -j2 5
  • 6.
  • 7.
    CPython Overview ‣ executionlifetime 7 PY Parser PY[CO] VM LOAD 1 LOAD 2 ADD LOAD X STORE X x = 1 + 2 1 1 2 3 3 x STACK takeaway: py/pyc/c inter-relatoin
  • 8.
  • 9.
    Object object: memory ofC structure with common header 9 PyListObject PyDictObject PyTupleObject PySyntaxErrorObject PyImportErrorObject … takeaway: everything is object ob_type ob_refcnt PyObject ob_type ob_size ob_refcnt PyVarObject
  • 10.
    Object Structure Will PyLongObjectoverflow? 10 The answer: chunk-chunk digit[n] … digit[3] digit[2] ob_type ob_size digit ob_digit[1] ob_refcnt PyLongObject typedef PY_UINT32_T digit; result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) + size*sizeof(digit)); n = 2 ** 64 # more bits than a word assert type(n) is int and n > 0
  • 11.
    Object Structure Why mymulti-dimensional array won’t work? 11 The answer: indirection, mutability allocated ob_type ob_item ob_refcnt ob_size PyListObject PyObject* PyObject* … PyObject* PyObject* allocated ob_type ob_item ob_refcnt ob_size PyObject* PyObject* … 42 None m, n = 4, 2 arr = [ [ None ] * n ] * m arr[1][1] = 42 # [[None, 42], [None, 42], [None, 42], [None, 42]] PyList_SetItem
  • 12.
    Object Structure what isthe ob_type? 12 The answer: flexible type system class Machine(type): pass # Toy = Machine(foo, bar, hoge) class Toy(metaclass=Machine): pass toy = Toy() # Toy, Machine, type, type print(type(toy), type(Toy), type(Machine), type(type)) ob_type ob_refcnt … toy ob_type ob_refcnt … ob_type ob_refcnt … Toy Machine ob_type ob_refcnt … Type
  • 13.
    Object Structure what isthe ob_type? 13 # ob_type2 # 10fd69490 - 10fd69490 - 10fd69490 print("%x - %x - %x" % (id(42 .__class__), id(233 .__class__), id(int))) assert dict().__class__ is dict # dynamically create a class named "MagicKlass" klass=“MagicKlass" klass=type(klass, (object,), {"quack": lambda _: print("quack")}); duck = klass() # quack duck.quack() assert duck.__class__ is klass
  • 14.
    Object Structure what isthe ob_type? 14 ob_type … … … ob_refcnt PyObject … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add
  • 15.
  • 16.
  • 17.
    AOL ‣ Abstract ObjectLayer 17 … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. Object Protocol Number Protocol Sequence Protocol Iterator Protocol Buffer Protocol int PyObject_Print(PyObject *o, FILE *fp, int flags) int PyObject_HasAttr(PyObject *o, PyObject *attr_name) int PyObject_DelAttr(PyObject *o, PyObject *attr_name) … PyObject* PyNumber_Add(PyObject *o1, PyObject *o2) PyObject* PyNumber_Multiply(PyObject *o1, PyObject *o2) PyObject* PyNumber_FloorDivide(PyObject *o1, PyObject *o2) … PyObject* PySequence_Concat(PyObject *o1, PyObject *o2) PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count) PyObject* PySequence_GetItem(PyObject *o, Py_ssize_t i) … int PyIter_Check(PyObject *o) PyObject* PyIter_Next(PyObject *o) int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) void PyBuffer_Release(Py_buffer *view) int PyBuffer_IsContiguous(Py_buffer *view, char order) … Mapping Protocol int PyMapping_HasKey(PyObject *o, PyObject *key) PyObject* PyMapping_GetItemString(PyObject *o, const char *key) int PyMapping_SetItemString(PyObject *o, const char *key, PyObject *v) …
  • 18.
    Example ‣ Number Protocol(PyNumber_Add) 18 // v + w? PyObject * PyNumber_Add(PyObject *v, PyObject *w) { // this just an example! // try on v result = v->ob_type->tp_as_number.nb_add(v, w) // if fail or if w->ob_type is a subclass of v->ob_type result = w->ob_type->tp_as_number.nb_add(w, v) // return result } … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add takeaway: typeobject stores meta information
  • 19.
    More Example Why canwe multiply a list? Is it slow? 19 arr = [None] * 3 # [None, None, None] Exercise: arr = [None] + [None] # [None, None]
  • 20.
    Magic Methods access slotsof tp_as_number, and its friends 20 Note tp_as_mapping->mp_length and tp_as_sequence->sq_length map to the same slot __len__ If your C based MyType implements both, what’s MyType.__len__ and len(MyType()) ? # access magic method of dict and list dict.__getitem__ # tp_as_mapping->mp_subscript dict.__len__ # tp_as_mapping->mp_length list.__getitem__ # tp_as_sequence->sq_item list.__len__ # tp_as_sequence->sq_length
  • 21.
    Magic Methods backfill as_numberand its friends 21 class A(): def __len__(self): return 42 class B(): pass # 42 print(len(A())) # TypeError: object of type 'B' has no len() print(len(B())) Py_ssize_t PyObject_Size(PyObject *o) { PySequenceMethods *m; if (o == NULL) { null_error(); return -1; } m = o->ob_type->tp_as_sequence; if (m && m->sq_length) return m->sq_length(o); return PyMapping_Size(o); }Which field does A.__len__ fill?
  • 22.
    Next: Heterogeneous Have youever felt insecure towards negative indexing of PyListObject? 22 The answer: RTFSC words = "the quick brown fox jumps over the old lazy dog".split() assert words[-1] == "dog" words.insert(-100, "hayabusa") assert words[-100] == ??
  • 23.