Python, just like other high-level programming languages such as Java, Ruby or Javascript, manages memory automatically. But often developers don’t pay proper attention to memory management, because it’s done for them which may lead to excessive memory usage and leaks. This post is high level description of how CPython (just Python below) manages object life cycle. It has emerged from various notes I made while working on vprof and I hope that it will be useful for others.

Reference counting

Every Python object contains a reference count — a number that shows how many objects are pointing to this object. It is stored in variable ob_refcnt and manipulated explicitly by calling C macro Py_INCREF to increment and Py_DECREF to decrement it. Py_DECREF is more complex, because it runs object deallocator specified by object type when count reaches zero.

Generally you need to worry about these macros in two cases: you want to implement your own data structures or you want to modify existing ones with Python C API. If you use built-in data structures— everything is done for you.

Also it’s possible to reference objects without incrementing reference count using weak references or weakrefs. Weakrefs are incredibly useful for implementing caches and proxies.

Garbage collection

Reference counting was the only way to manage object life cycle before Python 2.0. It has one weakness though — it fails to delete object in presence of so-called reference cycles. Simplest example of reference cycle is object referring to itself

lst = []

lst.append(lst)

del lst

In most cases creating reference cycles can be avoided, nevertheless sometimes (e.g. long-running programs) cycles are inevitable.

In order to solve this problem, Python 2.0 introduced new garbage collector (just GC below). The main difference of new GC from GCs in other language runtimes such as JVM and CLR is that it was designed to find reference cycles only in the presence of reference counting.

Reference cycles can be created by container objects only, so Python GC does not track types such as integers, strings and so on.

GC divides objects into 3 generations. Every generation has a counter and a threshold. When object is created, it’s automatically assigned to generation 0. When value of the counter becomes greater than threshold for some generation, GC runs on this generation. Survived objects are moved to the next generation and respective counter is reset. Objects from generation 2 stay in generation 2.

Before Python 3.4 GC had it’s own Achilles’ heel — objects with overloaded __del__ method . Since objects can reference each other and GC can’t determine right order to call __del__, GC just skips them. They can be found in gc.garbage and reference cycles with them should be broken manually.

Python 3.4 introduced new finalization approach and at present GC is able to break cycles for these objects and they won’t end up in gc.garbage anymore.

Also, it worth mentioning that it’s possible to disable GC and rely on reference counting only, if you’re sure your code does not create reference cycles (or you just don’t care about it).