Metaprogramming

As everything in Python is an object, classes are created by instanciating a super class called a metaclass. Metaprogramming is about overriding some behavior of that metaclass. This is often used to follow the DRY principle (Don’t Repeat Yourself) in order to avoid repeating the same or similar code inside a program. Metaprogramming is thus often used by frameworks such as Django as it helps to make their API much easier to use.

Example #1: keeping the old values

Imagine we want a given class to keep the old values of some of its attributes. There are several ways to achieve this, but one of them is to use a descriptor – a class that allows to control access to an attribute.

class Descriptor:
    def __init__(self, name):
        self.name = name

    def __set__(self, instance, value):
        # store the value in the attribute
        instance.__dict__[self.name] = value
        # if there is an attribute [name]_history, add the new value
        if hasattr(instance, self.name + '_history'):
            getattr(instance, self.name + '_history').insert(0, value)
        # otherwise create that attribute
        else:
            setattr(instance, self.name + '_history', [value])

    def __delete__(self, instance):
        del instance.__dict__[self.name]
        # also delete the history attribute if it exists
        if hasattr(instance, self.name + '_history'):
            delattr(instance, self.name + '_history')

The descriptor above overrides the default update behavior, keeping all the values of the desired attribute in a list named “<attribute name>_history”. This makes updating the value slower than usual, but because the descriptor does not override the __get__ action, reads (the most common access) are as fast as usual. A more complex descriptor could also record who made the updates and when.

Note that one can access instance attributes either through its dictionary (“instance.__dict__”) or by calling the getattr(), hasattr(), setattr() and delattr() methods. The built-in methods are more human-readable but probably not as fast as directly accessing the dictionary. One cannot however use setattr() or delattr() on the current attribute (lines 7 and 16), as it would end up calling respectively Descriptor.__set__() and Descriptor.__delete__(), triggering an infinite loop.

Once the descriptor is defined, it can be used as follows:

>>> class MyClass(object):
...     attr = Descriptor('attr')
...
>>> obj = MyClass()
>>> obj.attr = "Value"
>>> obj.attr = "New value"
>>> obj.attr
'New Value'
>>> obj.attr_history
['New Value', 'Value']

Because class update descriptors override instance attributes of the same name (as we have previously seen), we can declare “attr” at the class level. That does not however mean that two instance of MyClass will share the same values. Indeed, the descriptor’s __set__ method modifies the instance and not the class attributes.

But what if we want to have a more automatic way to declare such attributes? What if, in a DRY fashion, we want to avoid repeating the attribute name twice?

One way is to use metaprogramming. Let’s define a convention where a class attribute “attr_version” is a list of attribute names that should be declared as versioned attributes. And let’s use metaprogramming as the plumbing to make it work.

Metaclasses

Each class is created by instanciating the metaclass “type”. It is indeed possible to create a class entirely dynamically by explicitly instanciating “type”. The following class:

class MyClass(object):
    attr = 42
    def display(self):
        print('Value: ' + str(self.attr))

can be written 100% dynamically by using the following code (*), even though the result is not as human-readable:

exec("def display(self):print('Value: '+str(self.attr))")
globals()['MyClass'] =
    type('MyClass',
         (object,),
         { 'attr': 42, 'display': globals()['display'] })

The above code first defines a function “display” (note it is calling exec() and passing a string, thus making the creation really dynamic). It then instanciates “type”, passing 1) the class name 2) the base class(es) and 3) a dictionary of class attributes (remember, a method is just a callable attribute). It is also using globals[] instead of hard-coding variable names (“MyClass = type(…)”, ” ‘display’: display”)

It is however possible to create your own metaclass by deriving “type”:

class MyType(type):
    def __new__(cls, name, bases, clsdict):
        for attr_name in clsdict['attr_version']:
            clsdict[field] = Descriptor(attr_name)
        clsobj = super().__new__(cls,name,bases,clsdict)
        return clsobj

The __new__ method is called when the new class is being created (you can also overwrite __init__ which is called after the class was created). It receives as parameters the new class object, its name, its base class(es) and the new class dictionary which contains all its attributes. From here, it is easy to fetch the attribute ‘attr_version’ (line 3) and add new class attributes to the dictionary (line 4).

Once this is done, you can use this metaclass the following way:

>>> class MyClass(metaclass=MyType):
...     attr_version = ['attr1', 'attr2']
...
>>> obj = MyClass()
>>> obj.attr1 = "Value"
>>> obj.attr1 = "Other value"
>>> obj.attr1_history
['Other Value', 'Value']
>>> obj.attr2 = 42
>>> obj.attr2 = 43
>>> obj.attr2_history
[43. 42]

Example #2: automatically generating constructors

Pretty much any class has a constructor such as:

class MyClass(object):
    def __init__(self, attr1, attr2, attr3, ...):
        self.attr1 = attr1
        self.attr2 = attr2
        self.attr3 = attr3
        (...)

How could we simplify this boring, repetitive task? We can do so by defining another convention where the class attribute “attr_init” contains the list of attributes that should be passed to the constructor and by defining a metaclass that is generating the proper constructor.

class MyType(type):
    def __new__(cls, name, bases, clsdict):
        attr_names = clsdict['attr_init']
        params = str.join('', [', '+attr_name+'=None' for attr_name in attr_names])
        code = 'def __init__(self' + params + '): \r\n'
        for attr_name in attr_names:
            code += '    if ' + attr_name + \
                    ' != None: self.' + \
                    attr_name + '=' + \
                    attr_name + '\r\n'
        exec(code, clsdict)
        clsobj = super().__new__(cls, name, bases, clsdict)
        return clsobj

class MyClass(metaclass=MyType):
    attr_init = ['attr1', 'attr2']

This metaclass is dynamically generating a string that contains the constructor code (lines 4 to 10). This string is turned into actual Python code by calling exec() on line 11. Defining MyClass(metaclass=MyType) as shown above implicitly generates the following constructor:

def __init__(self, attr1=None, attr2=None):
    if attr1 != None: self.attr1=attr1
    if attr2 != None: self.attr2=attr2

In terms of performance, the generation of the MyClass takes more time than explicitly writing the same constructor as Python needs do parse the code in the exec() call to generate the constructor. But classes are not generated often so it shouldn’t be a problem. However, once MyType.__new__() completes, the resulting constructor is as fast as if it had been explicitly written as it has been fully compiled in bytecode.

Conclusion

Metaprogramming can be very useful to simplify code. This should however be used carefully. It can be very easy to forget our own conventions if they start to be too numerous and complex, making the code more difficult to maintain.

For those who are interested in further metaprogramming examples, I recommend this video (from which this article is heavily inspired)


(*) If you’re not convinced, run the following commands for both classes and check that the output is the same:

>>> import dis
>>> dis.dis(MyClass) # looks at the generated bytecode
>>> MyClass.__mro__  # looks at the parent classes
>>> dir(MyClass)     # looks at the class dictionary

Leave a comment