April 16, 2012 at 07:03 Tags Articles , Python internals

[The Python version described in this article is 3.x]

This article aims to explore the process of creating new objects in Python. As I explained in a previous article, object creation is just a special case of calling a callable. Consider this Python code:

class Joe : pass j = Joe()

What happens when j = Joe() is executed? Python sees it as a call to the callable Joe , and routes it to the internal function PyObject_Call , with Joe passed as the first argument. PyObject_Call looks at the type of its first argument to extract its tp_call attribute.

Now, what is the type of Joe ? Whenever we define a new Python class, unless we explicitly specify a metaclass for it, its type is type . Therefore, when PyObject_Call attempts to look at the type of Joe , it finds type and picks its tp_call attribute. In other words, the function type_call in Objects/typeobject.c is invoked .

This is an interesting function, and it's short, so I'll paste it wholly here:

static PyObject * type_call (PyTypeObject *type, PyObject *args, PyObject *kwds) { PyObject *obj; if (type->tp_new == NULL ) { PyErr_Format(PyExc_TypeError, "cannot create '%.100s' instances" , type->tp_name); return NULL ; } obj = type->tp_new(type, args, kwds); if (obj != NULL ) { /* Ugly exception: when the call was type(something), don't call tp_init on the result. */ if (type == &PyType_Type && PyTuple_Check(args) && PyTuple_GET_SIZE(args) == 1 && (kwds == NULL || (PyDict_Check(kwds) && PyDict_Size(kwds) == 0 ))) return obj; /* If the returned object is not an instance of type, it won't be initialized. */ if (!PyType_IsSubtype(Py_TYPE(obj), type)) return obj; type = Py_TYPE(obj); if (type->tp_init != NULL && type->tp_init(obj, args, kwds) < 0 ) { Py_DECREF(obj); obj = NULL ; } } return obj; }

So what arguments is type_call being passed in our case? The first one is Joe itself - but how is it represented? Well, Joe is a class, so it's a type (all classes are types in Python 3). Types are represented inside the CPython VM by PyTypeObject objects .

What type_call does is first call the tp_new attribute of the given type. Then, it checks for a special case we can ignore for simplicity, makes sure tp_new returned an object of the expected type, and then calls tp_init . If an object of a different type was returned, it is not being initialized.

Translated to Python, what happens is this: if your class defines the __new__ special method, it gets called first when a new instance of the class is created. This method has to return some object. Usually, this will be of the required type, but this doesn't have to be the case. Objects of the required type get __init__ invoked on them. Here's an example:

class Joe : def __new__ (cls, *args, **kwargs): obj = super (Joe, cls).__new__(cls) print ( '__new__ called. got new obj id=0x%x' % id (obj)) return obj def __init__ ( self , arg): print ( '__init__ called (self=0x%x) with arg=%s' % ( id ( self ), arg)) self .arg = arg j = Joe( 12 ) print ( type (j))

This prints:

__new__ called. got new obj id=0x7f88e7218290 __init__ called (self=0x7f88e7218290) with arg=12 <class '__main__.Joe'>

Customizing the sequence As we saw above, since the type of Joe is type , the type_call function is invoked to define the creation sequence for Joe instances. This sequence can be changed by specifying a custom type for Joe - in other words, a metaclass. Let's modify the previous example to specify a custom metaclass for Joe : class MetaJoe ( type ): def __call__ (cls, *args, **kwargs): print ( 'MetaJoe.__call__' ) return None class Joe (metaclass=MetaJoe): def __new__ (cls, *args, **kwargs): obj = super (Joe, cls).__new__(cls) print ( '__new__ called. got new obj id=0x%x' % id (obj)) return obj def __init__ ( self , arg): print ( '__init__ called (self=0x%x) with arg=%s' % ( id ( self ), arg)) self .arg = arg j = Joe( 12 ) print ( type (j)) So now the type of Joe is not type , but MetaJoe . Consequently, when PyObject_Call picks the call function to execute for j = Joe(12) , it takes MetaJoe.__call__ . The latter prints a notice about itself and returns None , so we don't expect the __new__ and __init__ methods of Joe to be called at all. Indeed, this is the outcome: MetaJoe.__call__ <class 'NoneType'>

Digging deeper - tp_new Alright, so now we have a better understanding of the object creation sequence. One crucial piece of the puzzle is still missing, though. While we almost always define __init__ for our classes, defining __new__ is rather rare . Moreover, from a quick look at the code it's obvious that __new__ is more fundamental in a way. This method is used to create a new object. It is called once and only once per instantiation. __init__ , on the other hand, already gets a constructed object and may not be called at all; it can also be called multiple times. Since the type parameter passed to type_call in our case is Joe , and Joe does not define a custom __new__ method, then type->tp_new defers to the tp_new slot of the base type. The base type of Joe (and all other Python objects, except object itself) is object . The object.tp_new slot is implemented in CPython by the object_new function in Objects/typeobject.c . object_new is actually very simple. It does some argument checking, verifies that the type we're trying to instantiate is not abstract, and then does this: return type->tp_alloc(type, 0 ); tp_alloc is a low-level slot of the type object in CPython. It's not directly accessible from Python code, but should be familiar to C extension developers. A custom type defined in a C extension may override this slot to supply a custom memory allocation scheme for instances of itself. Most C extension types will, however, defer this allocation to the function PyType_GenericAlloc . This function is part of the public C API of CPython, and it also happens to be assigned to the tp_alloc slot of object (defined in Objects/typeobject.c ). It figures out how much memory the new object needs , allocates a memory chunk from CPython's memory allocator and initializes it all to zeros. It then initializes the bare essential PyObject fields (type and reference count), does some GC bookkeeping and returns. The result is a freshly allocated instance.