August 14, 2011 at 06:05 Tags Articles , Python

Python rightfully prides itself as a relatively straightforward language without a lot of "magic" hiding in its workings and features. Sometimes, however, to make interesting abstractions possible, one can dig deep in Python's more dusty and obscure corners to find language constructs that are a bit more magical than usual. Metaclasses are one such feature.

Unfortunately, metaclasses have a reputation for "a solution seeking a problem". The aim of this article is to demonstrate a few actual uses of metaclasses in widely used Python code.

There is a lot of material on Python metaclasses online, so this isn't just another tutorial (look in the References section below for some links I found useful). I will spend some time explaining what metaclasses are, but my main aim is the examples. That said, this article still aspires to be self-contained - you can start reading it even if you don't know what metaclasses are.

Another quick note before we begin - this article focuses on Python 2.6 & 2.7, because most of the code you find online is still for these versions . In Python 3.x metaclasses work similarly, although the syntax of specifying them is a bit different. So the vast majority of this article applies to 3.x as well.

Classes are objects too To understand metaclasses, first we should make some things clear about classes. In Python, everything is an object. And that includes classes. In fact, classes in Python are first-class objects - they can be created at runtime, passed as parameters and returned from functions, and assigned to variables. Here's a short interactive session that demonstrates these qualities of classes: >>> def make_myklass (**kwattrs): ... return type ( 'MyKlass' , ( object ,), dict (**kwattrs)) ... >>> myklass_foo_bar = make_myklass(foo= 2 , bar= 4 ) >>> myklass_foo_bar < class __main__ .MyKlass> >>> x = myklass_foo_bar() >>> x <__main__.MyKlass object at 0x01F6B050 > >>> x.foo, x.bar ( 2 , 4 ) Here we use the 3-argument form of the type built-in function to dynamically create a class named MyKlass , inheriting from object with some attributes provided as arguments. Then we create one such class. As you can see, myklass_foo_bar is equivalent to: class MyKlass ( object ): foo = 2 bar = 4 But it was created at runtime, returned from a function and assigned to a variable.

The class of a class Every object (including built-ins) in Python has a class. We've just seen that classes are objects too, so classes must also have a class, right? Exactly. Python lets us examine the class of an object with the __class__ attribute. Let's see this in action: >>> class SomeKlass ( object ): pass ... >>> someobject = SomeKlass() >>> someobject.__class__ < class __main__ .SomeKlass> >>> SomeKlass.__class__ < type 'type' > We've created a class and an object of that class. Examining the __class__ of someobject we saw that it's SomeKlass . Next comes the interesting part. What is the class of SomeKlass ? We can again examine it with __class__ and we see it's type . So type is the class of Python classes . In other words, while in the example above someobject is a SomeKlass object, SomeKlass itself is a type object. I don't know about you, but I find this reassuring. Since we learned that classes are objects in Python, it makes sense that they also have a class, and it's nice to know there's a built-in class ( type ) serving the role of being the class of classes.

Metaclass A metaclass is defined as "the class of a class". Any class whose instances are themselves classes, is a metaclass. So, according to what we've seen above, this makes type a metaclass - in fact, the most commonly used metaclass in Python, since it's the default metaclass of all classes. Since a metaclass is the class of a class, it is used to construct classes (just as a class is used to construct objects). But wait a second, don't we create classes with a standard class definition? Definitely, but what Python does under the hood is the following: When it sees a class definition, Python executes it to collect the attributes (including methods) into a dictionary.

definition, Python executes it to collect the attributes (including methods) into a dictionary. When the class definition is over, Python determines the metaclass of the class. Let's call it Meta

definition is over, Python determines the metaclass of the class. Let's call it Eventually, Python executes Meta(name, bases, dct) , where: Meta is the metaclass, so this invocation is instantiating it. name is the name of the newly created class bases is a tuple of the class's base classes dct maps attribute names to objects, listing all of the class's attributes

, where: How do we determine the metaclass of a class? Simply stated , if either a class or one of its bases has a __metaclass__ attribute , it's taken as the metaclass. Otherwise, type is the metaclass. So what happens when we define: class MyKlass ( object ): foo = 2 Is this: MyKlass has no __metaclass__ attribute, so type is used instead, and the class creation is done as: MyKlass = type (name, bases, dct) Which is consistent to what we've seen in the beginning of the article. If, on the other hand, MyKlass does have a metaclass defined: class MyKlass ( object ): __metaclass__ = MyMeta foo = 2 Then the class creation is done as: MyKlass = MyMeta(name, bases, dct) So MyMeta should be implemented appropriately to support such calling form and return the new class. It's actually similar to writing a normal class with a pre-defined constructor signature.

Metaclass's __new__ and __init__ To control the creation and initialization of the class in the metaclass, you can implement the metaclass's __new__ method and/or __init__ constructor . Most real-life metaclasses will probably override just one of them. __new__ should be implemented when you want to control the creation of a new object (class in our case), and __init__ should be implemented when you want to control the initialization of the new object after it has been created. So when the call to MyMeta is done above, what happens under the hood is this: MyKlass = MyMeta.__new__(MyMeta, name, bases, dct) MyMeta.__init__(MyKlass, name, bases, dct) Here's a more concrete example that should demonstrate what's going on. Let's write down this definition for a metaclass: class MyMeta ( type ): def __new__ (meta, name, bases, dct): print '-----------------------------------' print "Allocating memory for class" , name print meta print bases print dct return super (MyMeta, meta).__new__(meta, name, bases, dct) def __init__ (cls, name, bases, dct): print '-----------------------------------' print "Initializing class" , name print cls print bases print dct super (MyMeta, cls).__init__(name, bases, dct) When Python executes the following class definition: class MyKlass ( object ): __metaclass__ = MyMeta def foo ( self , param): pass barattr = 2 What gets printed is this (reformatted for clarity): ----------------------------------- Allocating memory for class MyKlass <class '__main__.MyMeta'> (<type 'object'>,) {'barattr': 2, '__module__': '__main__', 'foo': <function foo at 0x00B502F0>, '__metaclass__': <class '__main__.MyMeta'>} ----------------------------------- Initializing class MyKlass <class '__main__.MyKlass'> (<type 'object'>,) {'barattr': 2, '__module__': '__main__', 'foo': <function foo at 0x00B502F0>, '__metaclass__': <class '__main__.MyMeta'>} Study and understand this example and you'll grasp most of what one needs to know about writing metaclasses. It's important to note here that these print-outs are actually done at class creation time, i.e. when the module containing the class is being imported for the first time. Keep this detail in mind for later.

Metaclass's __call__ Another metaclass method that's occasionally useful to override is __call__ . The reason I'm discussing it separately from __new__ and __init__ is that unlike those two that get called at class creation time, __call__ is called when the already-created class is "called" to instantiate a new object. Here's some code to clarify this: class MyMeta ( type ): def __call__ (cls, *args, **kwds): print '__call__ of ' , str (cls) print '__call__ *args=' , str (args) return type .__call__(cls, *args, **kwds) class MyKlass ( object ): __metaclass__ = MyMeta def __init__ ( self , a, b): print 'MyKlass object with a=%s, b=%s' % (a, b) print 'gonna create foo now...' foo = MyKlass( 1 , 2 ) This prints: gonna create foo now... __call__ of <class '__main__.MyKlass'> __call__ *args= (1, 2) MyKlass object with a=1, b=2 Here MyMeta.__call__ just notifies us of the arguments and delegates to type.__call__ . But it can also interfere in the process, affecting the way objects of the class are created. In a way, this is not unlike overriding the __new__ method of the class itself, although there are some differences .

Examples We've now covered enough theory to understand what metaclasses are and how to write them. At this point, it's time for the examples that should make things clearer. As I mentioned above, instead of writing synthetic examples I prefer to examine the usage of metaclasses in real Python code.

string.Template The first example of a metaclass is taken from the Python standard library. It is one of the very few examples of metaclasses that ships with Python itself. string.Template provides convenient, named string substitutions, and can serve as a very simple templating system. If you're not familiar with this class, this would be a good time to read the docs. I will just explain how it uses metaclasses. Here are the first few lines from class Template : class Template : """A string class for supporting $-substitutions.""" __metaclass__ = _TemplateMetaclass delimiter = '$' idpattern = r'[_a-z][_a-z0-9]*' def __init__ ( self , template): self .template = template And this is _TemplateMetaclass : class _TemplateMetaclass ( type ): pattern = r""" %(delim)s(?: (?P<escaped>%(delim)s) | # Escape sequence of two delimiters (?P<named>%(id)s) | # delimiter and a Python identifier {(?P<braced>%(id)s)} | # delimiter and a braced identifier (?P<invalid>) # Other ill-formed delimiter exprs ) """ def __init__ (cls, name, bases, dct): super (_TemplateMetaclass, cls).__init__(name, bases, dct) if 'pattern' in dct: pattern = cls.pattern else : pattern = _TemplateMetaclass.pattern % { 'delim' : _re.escape(cls.delimiter), 'id' : cls.idpattern, } cls.pattern = _re.compile(pattern, _re.IGNORECASE | _re.VERBOSE) The explanation provided in the first part of this article should be sufficient for understanding how _TemplateMetaclass works. Its __init__ method looks at some class attributes (specifically, pattern , delimiter and idpattern ) and uses them (or its own-supplied defaults) to build a compiled regex, which is then stored back into the class's pattern attribute. According to its documentation, Template can be inherited to provide a custom delimiter and ID pattern, or the whole regex. The metaclass makes sure that these get converted into a compiled regex pattern at class creation time, so this is an optimization of a sort. What I mean is that the same customization could be achieved without using a metaclass, by simply building the compiled regex in the constructor. However, this means that the compilation step is done each time a Template object is instantiated. Consider the following usage, which IMHO is common with string.Template : >>> from string import Template >>> Template( "$name is $value" ).substitute(name= 'me' , value= '2' ) 'me is 2' Leaving regex compilation to Template instantiation time means it is being created and compiled each time such piece of code runs. And this is a shame - because the regex really isn't dependent on the template string, but only on the properties of the class. With a metaclass, the pattern class attribute is getting created just once when the module is being loaded and the class Template (or its subclass) definition is being executed. This saves time when Template objects are created, and makes sense because at class creation time we have all the information we need to compile the regex - so why delay this operation? One may claim that this is a premature optimization, and this could be true. I don't plan to defend this (or any) usage of a metaclass. My intention here is simply to demonstrate how metaclasses are being used in real code for various tasks. So, for this educational purpose it's a good example, since it shows an interesting use case. Whether premature optimization or not, the metaclass does make code more efficient by moving a computation one step earlier in the process of code execution.

twisted.python.reflect.AccessorType The following example is a frequently-used demonstration of metaclasses. An excerpt from its documentation: Metaclass that generates properties automatically. Using this metaclass for your class will give you explicit accessor methods; a method called set_foo, will automatically create a property 'foo' that uses set_foo as a setter method. Same for get_foo and del_foo. Here's the metaclass, shortened a bit to emphasize the important parts: class AccessorType ( type ): def __init__ ( self , name, bases, d): type .__init__( self , name, bases, d) accessors = {} prefixs = [ "get_" , "set_" , "del_" ] for k in d.keys(): v = getattr ( self , k) for i in range ( 3 ): if k.startswith(prefixs[i]): accessors.setdefault(k[ 4 :], [ None , None , None ])[i] = v for name, (getter, setter, deler) in accessors.items(): # create default behaviours for the property - if we leave # the getter as None we won't be able to getattr, etc.. # [...] some code that implements the above comment setattr ( self , name, property (getter, setter, deler, "" )) What this does is straightforward: Find all attributes of the class that start with get_ , set_ or del_ Organize them by the property they aim to control (the part of their name that comes after the underscore) For each getter, setter, deleter triple thus found: Make sure all three exist, or create suitable defaults Set them as a property on the class How useful is such a metaclass? It's hard to say, really. Twisted itself doesn't use it, but does provide it as a public API. If you have several classes to write with a lot of properties, this metaclass may save quite a bit of coding.

pygments Lexer and RegexLexer The pygments library presents an interesting idiom of metaclass usage. A base class is created with a custom metaclass. User classes can then inherit from this base class, and get the metaclass as a bonus . First, let's look at the LexerMeta metaclass, which is used as the metaclass for Lexer - the base class of lexers in pygments: class LexerMeta ( type ): """ This metaclass automagically converts `` analyse_text `` methods into static methods which always return float values. """ def __new__ (cls, name, bases, d): if 'analyse_text' in d: d[ 'analyse_text' ] = make_analysator(d[ 'analyse_text' ]) return type .__new__(cls, name, bases, d) This metaclass overrides the __new__ method to intercept the definition of the analyse_text message and turn it into a static method that always returns a floating point value (this is what the make_analysator function does). Note the usage of __new__ instead of __init__ here. Why isn't __init__ used? In my opinion, this is simply a matter of preference - the same effect could also be achieved with overriding __init__ . The second example from pygments is more complicated, but worth the effort to explain since it contains a couple of features we haven't seen in previous examples. The code for RegexLexerMeta is quite long, so I will snip it to leave the relevant part: class RegexLexerMeta (LexerMeta): """ Metaclass for RegexLexer, creates the self._tokens attribute from self.tokens on the first instantiation. """ # [...] snip def __call__ (cls, *args, **kwds): """Instantiate cls after preprocessing its token definitions.""" if not hasattr (cls, '_tokens' ): cls._all_tokens = {} cls._tmpname = 0 if hasattr (cls, 'token_variants' ) and cls.token_variants: # don't process yet pass else : cls._tokens = cls.process_tokendef( '' , cls.tokens) return type .__call__(cls, *args, **kwds) Generally, the code is quite clear - the metaclass examines the tokens class attribute, and creates _tokens from it. This is only done on the first instantiation of the class. There are two things of special interest here: RegexLexerMeta inherits from LexerMeta , so its users also get the service LexerMeta provides. Inheritance of metaclasses is one of the reasons they are one of the most powerful language constructs in Python. Contrast this to class decorators, for example. For some simple tasks, class decorators could replace metaclasses, but the ability of metaclasses to form inheritance relationships is something that decorators can't do. The process_tokendef computation is performed in __call__ - and a special check makes sure it actually runs only in the first instantiation of the class (although __call__ itself is called for all instantiations). Why do it like this, instead of at class creation time (say in the metaclass's __init__ )? It appears to me this could be an optimization of a sort. pygments comes with many lexers , but you may want to use just one or two in any given code. Why spend the loading time on lexers you don't need, as opposed to just the lexers you use? Whether this is the real reason or not, I think it's still an interesting aspect of metaclasses to ponder - the great flexibility they provide you to choose where and how to perform their meta-work.