Mon 02 February 2015

This sample is from a previous version of the book. See the new second edition here.





A common use of metaclasses is to automatically register types in your program. Registration is useful for doing reverse lookups, where you need to map a simple identifier back to a corresponding class.

For example, say you want to implement your own serialized representation of a Python object using JSON. You need a way to take an object and turn it into a JSON string. Here I do this generically by defining a base class that records the constructor parameters and turns them into a JSON dictionary.

class Serializable ( object ): def __init__ ( self , * args ): self . args = args def serialize ( self ): return json . dumps ({ 'args' : self . args })

This class makes it easy to serialize simple, immutable data structures like Point2D to a string.

class Point2D ( Serializable ): def __init__ ( self , x , y ): super () . __init__ ( x , y ) self . x = x self . y = y def __repr__ ( self ): return 'Point2D( %d , %d )' % ( self . x , self . y ) point = Point2D ( 5 , 3 ) print ( 'Object: ' , point ) print ( 'Serialized:' , point . serialize ())

>>> Object: Point2D(5, 3) Serialized: {"args": [5, 3]}

Now I need to deserialize this JSON string and construct the Point2D object it represents. Here I define another class that can deserialize the data from its Serializable parent class.

class Deserializable ( Serializable ): @classmethod def deserialize ( cls , json_data ): params = json . loads ( json_data ) return cls ( * params [ 'args' ])

Using Deserializable makes it easy to serialize and deserialize simple, immutable objects in a generic way.

class BetterPoint2D ( Deserializable ): # ... point = BetterPoint2D ( 5 , 3 ) print ( 'Before: ' , point ) data = point . serialize () print ( 'Serialized:' , data ) after = BetterPoint2D . deserialize ( data ) print ( 'After: ' , after )

>>> Before: BetterPoint2D(5, 3) Serialized: {"args": [5, 3]} After: BetterPoint2D(5, 3)

The problem with this approach is it only works if you know the intended type of the serialized data ahead of time (e.g., Point2D , BetterPoint2D ). Ideally you’d have a large number of classes serializing to JSON and one common function that could deserialize any of them back to a corresponding Python object.

To do this, I can include the serialized object’s class name in the JSON data.

class BetterSerializable ( object ): def __init__ ( self , * args ): self . args = args def serialize ( self ): return json . dumps ({ 'class' : self . __class__ . __name__ , 'args' : self . args , }) def __repr__ ( self ): # ...

Then I can maintain a mapping of class names back to constructors for those objects. The general deserialize function will work for any classes passed to register_class .

registry = {} def register_class ( target_class ): registry [ target_class . __name__ ] = target_class def deserialize ( data ): params = json . loads ( data ) name = params [ 'class' ] target_class = registry [ name ] return target_class ( * params [ 'args' ])

To ensure deserialize always works properly, I must call register_class for every class I may want to deserialize in the future.

class EvenBetterPoint2D ( BetterSerializable ): def __init__ ( self , x , y ): super () . __init__ ( x , y ) self . x = x self . y = y register_class ( EvenBetterPoint2D )

Now I can deserialize an arbitrary JSON string without having to know which class it contains.

point = EvenBetterPoint2D ( 5 , 3 ) print ( 'Before: ' , point ) data = point . serialize () print ( 'Serialized:' , data ) after = deserialize ( data ) print ( 'After: ' , after )

>>> Before: EvenBetterPoint2D(5, 3) Serialized: {"class": "EvenBetterPoint2D", "args": [5, 3]} After: EvenBetterPoint2D(5, 3)

The problem with this approach is that you can forget to call register_class .

class Point3D ( BetterSerializable ): def __init__ ( self , x , y , z ): super () . __init__ ( x , y , z ) self . x = x self . y = y self . z = z # Forgot to call register_class! Whoops!

This will cause your code to break at runtime, when you finally try to deserialize an object of a class you forgot to register.

point = Point3D ( 5 , 9 , - 4 ) data = point . serialize () deserialize ( data )

>>> KeyError: 'Point3D'

Even though you chose to subclass BetterSerializable , you won’t actually get all of its features if you forget to call register_class after your class statement body. This approach is error prone and especially challenging for beginners. The same omission can happen with class decorators in Python 3.

What if you could somehow act on the programmer’s intent to use BetterSerializable and ensure register_class is called in all cases? Metaclasses enable this by intercepting the class statement when subclasses are defined. This lets you register the new type immediately after the class’s body.

class Meta ( type ): def __new__ ( meta , name , bases , class_dict ): cls = type . __new__ ( meta , name , bases , class_dict ) register_class ( cls ) return cls class RegisteredSerializable ( BetterSerializable , metaclass = Meta ): pass

When I define a subclass of RegisteredSerializable , I can be confident that the call to register_class happened and deserialize will always work as expected.

class Vector3D ( RegisteredSerializable ): def __init__ ( self , x , y , z ): super () . __init__ ( x , y , z ) self . x , self . y , self . z = x , y , z v3 = Vector3D ( 10 , - 7 , 3 ) print ( 'Before: ' , v3 ) data = v3 . serialize () print ( 'Serialized:' , data ) print ( 'After: ' , deserialize ( data ))

>>> Before: Vector3D(10, -7, 3) Serialized: {"class": "Vector3D", "args": [10, -7, 3]} After: Vector3D(10, -7, 3)

Using metaclasses for class registration ensures that you’ll never miss a class as long as the inheritance tree is right. This works well for serialization, as I’ve shown, and also applies to Database ORMs, plug-in systems, and system hooks.

Things to Remember