The Explorer

The wonders of cooperative inheritance, or using super in Python 3

by Michele Simionato

February 3, 2010



Summary

This essay is intended for Python programmers wanting to understand the concept of cooperative inheritance and the usage of super. It does not require any previous reading. The target is Python 3.0, since it has a nicer syntax for super, even if most of what I say here can be backported down to Python 2.2.


Most languages supporting inheritance support cooperative inheritance, i.e. there is a language-supported way for children methods to dispatch to their parent method: this is usually done via a super keyword. Things are easy when the language support single inheritance only, since each class has a single parent and there is an unique concept of super method. Things are difficult when the language support multiple inheritance: in that case the programmer has to understand the intricacies of the Method Resolution Order.

Why cooperative hierarchies are tricky This paper is intended to be very practical, so let me start with an example. Consider the following hierarchy (in Python 3): class A(object): def __init__(self): print('A.__init__') super().__init__() class B(object): def __init__(self): print('B.__init__') super().__init__() class C(A, B): def __init__(self): print('C.__init__') super().__init__() What is the "superclass" of A ? In other words, when I create an instance of A , which method will be called by super().__init__() ? Notice that I am considering here generic instances of A , not only direct instances: in particular, an instance of C is also an instance of A and instantiating C will call super().__init__() in A.__init__ at some point: the tricky point is to understand which method will be called for indirect instances of A . In a single inheritance language there is an unique answer both for direct and indirect instances ( object is the super class of A and object.__init__ is the method called by super().__init__() ). On the other hand, in a multiple inheritance language there is no easy answer. It is better to say that there is no super class and it is impossible to know which method will be called by super().__init__() unless the subclass from wich super is called is known. In particular, this is what happens when we instantiate C : >>> c = C() C.__init__ A.__init__ B.__init__ As you see the super call in C dispatches to A.__init__ and then the super call there dispatches to B.__init__ which in turns dispatches to object.__init__ . The important point is that the same super call can dispatch to different methods: when super().__init__() is called directly by instantiating A it dispatches to object.__init__ whereas when it is called indirectly by instantiating C it dispatches to B.__init__ . If somebody extends the hierarchy, adds subclasses of A and instantiated them, then the super call in A.__init__ can dispatch to an entirely different method: the super method call depends on the instance I am starting from. The precise algorithm specifying the order in which the methods are called is called the Method Resolution Order algorithm, or MRO for short. It is discussed in detail in an old essay I wrote years ago and interested readers are referred to it (see the references below). Here I will take the easy way and I will ask Python. Given any class, it is possibly to extract its linearization, i.e. the ordered list of its ancestors plus the class itself: the super call follow such list to decide which is the right method to dispatch to. For instance, if you are considering a direct instance of A , object is the only class the super call can dispatch to: >>> A.mro() [<class '__main__.A'>, <class 'object'>] If you are considering a direct instance of C , super looks at the linearization of C : >>> C.mro() [<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>] A super call in C will look first at A , then at B and finally at object . Finding out the linearization is non-trivial; just to give an example suppose we add to our hierarchy three classes D , E and F in this way: >>> class D: pass >>> class E(A, D): pass >>> class F(E, C): pass >>> for c in F.mro(): ... print(c.__name__) F E C A D B object As you see, for an instance of F a super call in A.__init__ will dispatch at D.__init__ and not directly at B.__init__ !

The problem with incompatible signatures I have just shown that one cannot tell in advance where the supercall will dispatch, unless one knows the whole hierarchy: this is quite different from the single inheritance situation and it is also very much error prone and brittle. When you design a hierarchy you will expect for instance that A.__init__ will call B.__init__ , but adding classes (and such classes may be added by a third party) may change the method chain. In this case A.__init__ (when invoked by an F instance) will call D.__init__ . This is dangerous: for instance, if the behavior of your code depends on the ordering of the methods you may get in trouble. Things are worse if one of the methods in the cooperative chain does not have a compatible signature, since the chain will break. This problem is not theoretical and it happens even in very trivial hierarchies. For instance, here is an example of incompatible signatures in the __init__ method (this problem affects even Python 2.6, not only Python 3.X): class X(object): def __init__(self, a): super().__init__() class Y(object): def __init__(self, a): super().__init__() class Z(X, Y): def __init__(self, a): super().__init__(a) Here instantiating X and Y works fine, but as soon as you introduce Z you get in trouble since super().__init__(a) in Z.__init__ will call super().__init__() in X which in turns will call Y.__init__ with no arguments, resulting in a TypeError ! In older Python versions (from 2.2 to 2.5) such problem can be avoided by leveraging on the fact that object.__init__ accepts any number of arguments (ignoring them), by replacing super().__init__() with super().__init__(a) . In Python 2.6+ instead there is no real solution for this problem, except avoiding super in the constructor or avoiding multiple inheritance. In general if you want to support multiple inheritance you should use super only when the methods in a cooperative chain have consistent signature: that means that you will not use super in __init__ and __new__ since likely your constructors will have custom arguments whereas object.__init__ and object.__new__ have no arguments. However, in practice, you may inherits from third party classes which do not obey this rule, or others could derive from your classes without following this rule and breakage may occur. For instance, I have used super for years in my __init__ methods and I never had problems because in older Python versions object.__init__ accepted any number of arguments: but in Python 3 all that code is fragile under multiple inheritance. I am left with two choices: removing super or telling people that those classes are not intended to be used in multiple inheritance situations, i.e. the constructors will break if they do that. Nowadays I tend to favor the second choice. Luckily, usually multiple inheritance is used with mixin classes, and mixins do not have constructors, so that in practice the problem is mitigated.

The intended usage for super Even if super has its shortcomings, there are meaningful use cases for it, assuming you think multiple inheritance is a legitimate design technique. For instance, if you use metaclasses and you want to support multiple inheritance, you must use super in the __new__ and __init__ methods: there is no problem, since the constructor for metaclasses has a fixed signature (name, bases, dictionary). But metaclasses are extremely rare, so let me give a more meaningful example for an application programmer where a design bases on cooperative multiple inheritances could be reasonable. Suppose you have a bunch of Manager classes which share many common methods and which are intended to manage different resources, such as databases, FTP sites, etc. To be concrete, suppose there are two common methods: getinfolist which returns a list of strings describing the managed resorce (containing infos such as the URI, the tables in the database or the files in the site, etc.) and close which closes the resource (the database connection or the FTP connection). You can model the hierarchy with a Manager abstract base class class Manager(object): def close(self): pass def getinfolist(self): return [] and two concrete classes DbManager and FtpManager : class DbManager(Manager): def __init__(self, dsn): self.conn = DBConn(dsn) def close(self): super().close() self.conn.close() def getinfolist(self): return super().getinfolist() + ['db info'] class FtpManager(Manager): def __init__(self, url): self.ftp = FtpSite(url) def close(self): super().close() self.ftp.close() def getinfolist(self): return super().getinfolist() + ['ftp info'] Now suppose you need to manage both a database and an FTP site: then you can define a MultiManager as follows: class MultiManager(DbManager, FtpManager): def __init__(self, dsn, url): DbManager.__init__(dsn) FtpManager.__init__(url) Everything works: calling MultiManager.close will in turn call DbManager.close and FtpManager.close . There is no risk of running in trouble with the signature since the close and getinfolist methods have all the same signature (actually they take no arguments at all). Notice also that I did not use super in the constructor. You see that super is essential in this design: without it, only DbManager.close would be called and your FTP connection would leak. The getinfolist method works similarly: forgetting super would mean losing some information. An alternative not using super would require defining an explicit method close in the MultiManager , calling DbManager.close and FtpManager.close explicitly, and an explicit method getinfolist calling `DbManager.getinfolist and FtpManager.getinfolist : def close(self): DbManager.close(self) FtpManager.close(self) def getinfolist(self): return DbManager.getinfolist(self) + FtpManager.getinfolist(self) This would be less elegant but probably clearer and safer so you can always decide not to use super if you really hate it. However, if you have N common methods, there is some boiler plate to write; moreover, every time you add a Manager class you must add it to the N common methods, which is ugly. Here N is just 2, so not using super may work well, but in general it is clear that the cooperative approach is more effective. Actually, I strongly believe (and always had) that super and the MRO are the right way to do multiple inheritance: but I also believe that multiple inheritance itself is wrong. For instance, in the MultiManager example I would not use multiple inheritance but composition and I would probably use a generalization such as the following: class MyMultiManager(Manager): def __init__(self, *managers): self.managers = managers def close(self): for mngr in self.managers: mngr.close() def getinfolist(self): return sum(mngr.getinfolist() for mngr in self.managers) There are languages that do not provide inheritance (even single inheritance!) and are perfectly fine, so you should always question if you should use inheritance or not. There are always many options and the design space is rather large. Personally, I always use super but I use single-inheritance only, so that my cooperative hierarchies are trivial.

The magic of super in Python 3 Deep down, super in Python 3 is the same as in Python 2.X. However, on the surface - at the syntactic level, not at the semantic level - there is a big difference: Python 3 super is smart enough to figure out the class it is invoked from and the first argument of the containing method. Actually it is so smart that it works also for inner classes and even if the first argument is not called self . In Python 2.X super is dumber and you must tell the class and the argument explicitly: for instance our first example must be written class A(object): def __init__(self): print('A.__init__') super(A, self).__init__() By the way, this syntax works both in Python 3 and in Python 2, this is why I said that deep down super is the same. The new feature in Python 3 is that there is a shortcut notation super() for super(A, self) . In Python 3 the (bytecode) compiler is smart enough to recognize that the supercall is performed inside the class A so that it inserts the reference to A automagically; moreover it inserts the reference to the first argument of the current method too. Typically the first argument of the current method is self , but it may be cls or any identifier: super will work fine in any case. Since super() knows the class it is invoked from and the class of the original caller, it can walk the MRO correctly. Such information is stored in the attributes .__thisclass__ and .__self_class__ and you may understand how it works from the following example: class Mother(object): def __init__(self): sup = super() print(sup.__thisclass__) print(sup.__self_class__) sup.__init__() class Child(Mother): pass >>> child = Child() <class '__main__.Mother'> <class '__main__.Child'> Here .__self__class__ is just the class of the first argument ( self ) but this is not always the case. The exception is the case of classmethods and staticmethods taking a class as first argument, such as __new__ . Specifically, super(cls, x) checks if x is an instance of cls and then sets .__self_class__ to x.__class__ ; otherwise (and that happens for classmethods and for __new__ ) it checks if x is a subclass of cls and then sets .__self_class__ to x directly. For instance, in the following example class C0(object): @classmethod def c(cls): print('called classmethod C0.c') class C1(C0): @classmethod def c(cls): sup = super() print('__thisclass__', sup.__thisclass__) print('__selfclass__', sup.__self_class__) sup.c() class C2(C1): pass the attribute .__self_class__ is not the class of the first argument (which would be type the metaclass of all classes) but simply the first argument: >>> C2.c() __thisclass__ <class '__main__.C1'> __selfclass__ <class '__main__.C2'> called classmethod C0.c There is a lot of magic going on in Python 3 super , and even more. For instance, this is a syntax that cannot work: def __init__(self): print('calling __init__') super().__init__() class C(object): __init__ = __init__ if __name__ == '__main__': c = C() If you try to run this code you will get a SystemError: super(): __class__ cell not found and the reason is obvious: since the __init__ method is external to the class the compiler cannot infer to which class it will be attached at runtime. On the other hand, if you are completely explicit and you use the full syntax, by writing the external method as def __init__(self): print('calling __init__') super(C, self).__init__() everything will work because you are explicitly telling than the method will be attached to the class C . I will close this section by noticing a wart of super in Python 3, pointed out by Armin Ronacher and others: the fact that super should be a keyword but it is not. Therefore horrors like the following are possible: def super(): print("I am evil, you are NOT calling the supermethod!") class C(object): def __init__(self): super().__init__() if __name__ == '__main__': c = C() # prints "I am evil, you are NOT calling the supermethod!" DON'T DO THAT! Here the called __init__ is the __init__ method of the object None !! Of course, only an evil programmer would shadow super on purpose, but that may happen accidentally. Consider for instance this use case: you are refactoring an old code base written before the existence of super and using from mod import * (this is ugly but we know that there are code bases written this way), with mod defining a function super which has nothing to do with the super builtin. If in this code you replace Base.method(self, *args) with super().method(*args) you will introduce a bug. This is not common (it never happened to me), but still it is bug that could not happen if super were a keyword. Moreover, super is special and it will not work if you change its name as in this example: # from http://lucumr.pocoo.org/2010/1/7/pros-and-cons-about-python-3 _super = super class Foo(Bar): def foo(self): _super().foo() Here the bytecode compiler will not treat specially _super , only super . It is unfortunate that we missed the opportunity to make super a keyword in Python 3, without good reasons (Python 3 was expected to break compatibility anyway).

References There is plenty of material about super and multiple inheritance. You should probably start from the MRO paper, then read Super considered harmful by James Knight. A lot of the issues with super , especially in old versions of Python are covered in Things to know about super. I did spent some time thinking about ways to avoid multiple inheritance; you may be interested in reading my series Mixins considered harmful.

Talk Back!

Have an opinion? Be the first to post a comment about this weblog entry.

RSS Feed

If you'd like to be notified whenever Michele Simionato adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Michele Simionato started his career as a Theoretical Physicist, working in Italy, France and the U.S. He turned to programming in 2003; since then he has been working professionally as a Python developer and now he lives in Milan, Italy. Michele is well known in the Python community for his posts in the newsgroup(s), his articles and his Open Source libraries and recipes. His interests include object oriented programming, functional programming, and in general programming metodologies that enable us to manage the complexity of modern software developement.

This weblog entry is Copyright © 2010 Michele Simionato. All rights reserved.