Memory use in CPython and MicroPython

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

At PyCon 2017, Kavya Joshi looked at some of the differences between the Python reference implementation (known as "CPython") and that of MicroPython. In particular, she described the differences in memory use and handling between the two. Those differences are part of what allows MicroPython to run on the severely memory-constrained microcontrollers it targets—an environment that could never support CPython.

CPython is the standard and default implementation that everyone loves and uses, she said. But it has a bad reputation when it comes to memory use. That has led to some alternate implementations, including MicroPython.

As the "leanest and meanest" of the alternative implementations, MicroPython targets microcontrollers, so that Python can be used to control hardware. MicroPython can run in 16KB of RAM using 256KB of ROM; it implements most of Python 3.4. Some things have been left out of the language and standard library, since they do not make sense for microcontrollers, such as metaclasses and multiprocessing.

CPython and MicroPython are fairly similar under the hood, Joshi said. They are both bytecode interpreters for stack-based virtual machines. They are also both C programs. Yet they are on the opposite ends of the memory-use spectrum.

As a simple experiment, and not any kind of formal benchmark, she tested Python 3.6 and MicroPython 1.8 on a 64-bit Ubuntu 16.10 system with 4GB of RAM. Her test was to create 200,000 objects of various types (integers, strings, lists) and to prevent them from being garbage collected. She then measured the heap use of the programs from within Python itself.

CPython and MicroPython both use custom allocators to manage their heaps. But CPython will grow its heap on demand, while MicroPython has a fixed-sized heap. Since Joshi is measuring heap use, that fixed size is the upper boundary for the measurements. But, as we will see, MicroPython does not use the heap in the same way that CPython does, which will also affect the measurements.

She showed graphs of the heap usage, starting with the integer test (Joshi's slides are available at Speaker Deck). The test created 200,000 integer objects for consecutive integers starting at 1010 ( 10**10 in Python-speak). The CPython graph shows a linear increase in heap use, while the MicroPython graph is completely flat. The graphs look the same for string objects. The CPython heap use is more than that of MicroPython, but the actual numbers do not matter all that much, she said, as it is the shape of the graphs that she is interested in.

Lists show something a bit different, however. The test creates a list that it keeps appending small integers to; the graphs show the heap use as the list gets bigger. Both graphs show a step function for heap usage, though the MicroPython steps seem to double in depth and height, while the increase in step size for CPython is more gradual. One of the questions she would like to answer is why the two interpreters "have such vastly different memory footprints and memory profiles".

Objects and memory

To explain what is going on, she needed to show how objects are implemented internally for both interpreters. Since CPython uses more memory in these tests, there must be an underlying reason. Either CPython objects are larger or CPython allocates more of them—or both.

CPython allocates all of its objects on the heap, so the statement " x = 1 " will create an object for "1" on the heap. Each object has two parts, the header ( PyObject_HEAD ) and some variable object-specific fields. The header is the overhead for the object and consists of two eight-byte fields for a reference count and a pointer to the type of object it represents. That means 16 bytes is the lower bound on the size of a CPython object.

For an integer, the object-specific field consists of an eight-byte length indicating how many four-byte values will follow (remember that Python can represent arbitrary-sized integers). So to store an integer object that has a value of 230-1 or less (two bits are used for other purposes), it takes 28 bytes, 24 of which are overhead. For the values in the test ( 10**10 and up), it took 32 bytes per object, so 200,000 integer objects consumed 6MB of space.

In MicroPython, though, each "object" is only eight bytes in size. Those eight bytes could be used as a pointer to another data structure or used more directly. "Pointer tagging" allows for multiple uses of eight byte objects. It is used to encode some extra information in the objects; since all addresses are aligned to eight-byte boundaries, the three low-order bits are available to store tag values. Not all of the tag bits are used that way, though; if the low-order bit is one, that indicates a "small" integer, which is stored in the other 63 bits. A 10 2 in the low-order two bits means there is a 62-bit index to an interned string in the rest of the value. 00 2 in those bits indicates the other bits are a pointer to a concrete object (i.e. anything that is not a small integer or interned string).

Since the values stored in the test will fit as small integers, only eight bytes are used per object. Those are stored on the stack, rather than the heap, which explains the flat heap usage graph line for MicroPython. Even if you measure the stack usage, which is a little more complicated to do, Joshi said, MicroPython is using way less memory than CPython for these objects.

For the string experiment, which created 200,000 strings of ten characters or less, the explanation is similar. CPython has 48 bytes of overhead for an ASCII string ( PyASCIIObject ), so a one-character, two-byte string (e.g. "a" plus a null terminator) takes up 50 bytes; a ten-character string would take up 59 bytes. MicroPython stores small strings (254 characters or less) as arrays with three bytes of overhead. So the one-character string takes five bytes, which is 1/10 of the storage that CPython needs.

That still doesn't explain the flat line in the graph, however. Strings in MicroPython are stored in a pre-allocated array, she said. There are no new heap allocations when a string object is created. Effectively, the allocation has just been moved in time so it doesn't appear as an increase in heap size in the graph.

Mutable objects

Strings and integers are both immutable objects, but the third experiment used lists, which are mutable objects. In CPython, that adds overhead in the form of a PyGC_HEAD structure. That structure is used for garbage-collection tracking and is 24 bytes in size. In all, a PyListObject is 64 bytes in size; but you also have to add pointers for each of the objects in the list, which are stored as an array, and the storage for each object.

The steps in the graph for the list experiment indicate dynamic resizing of the array of list elements. If that resizing was done for each append, the graph would be linearly increasing as with integers and strings, but CPython overallocates the array in anticipation of further append operations. That has the effect of amortizing the cost of the resizing over multiple append operations.

The garbage collection mechanism for MicroPython is completely different from that of CPython and does not use reference counting. So, MicroPython concrete objects, which are used for lists, do not have a reference count; they still use a type pointer, though, as CPython does. Removing the reference count saves eight bytes over CPython. In addition, there is no additional overhead for garbage collection in MicroPython, which saves 24 bytes. Thus, mutable objects in MicroPython are 32 bytes smaller than their CPython counterparts.

So it turns out that CPython objects are larger than those of MicroPython, generally substantially larger. The differences for other object types, and classes in particular, are not as clear cut when compared to CPython 3.6, Joshi said, as there have been some CPython optimizations that substantially reduce the overhead. In addition, CPython allocates more objects in these tests than MicroPython does. MicroPython stores small integers directly on the stack, rather than allocate them from the heap as CPython does.

Garbage collection

The overhead incurred to track CPython objects for garbage collection is probably making the audience wonder how that works, Joshi said. CPython uses reference counting, which simply tracks references to an object; every time an object is assigned to a variable, placed on a list, or inserted into a dictionary, for example, its count is increased. When a reference goes away, because of a del() operation or a variable going out of scope, for instance, the count is decreased. When the count reaches zero, the object is deallocated.

But that doesn't work in all cases, which is where the PyGC_HEAD structure for mutable objects comes into play. If objects refer to themselves or indirectly contain a reference to themselves, it results in a reference cycle. She gave an example:

x = [ 1, 2, 3] x.append(x)

x

del(x)

PyGC_HEAD

Ifsubsequently goes out of scope oris executed, the reference count will not drop to zero, thus the object would never be freed. CPython handles that with a cyclic garbage collector that detects and breaks reference cycles. Since only mutable objects can have these cycles, they are the only objects tracked using theinformation.

MicroPython tracks allocations in the heap using a bitmap. It breaks the heap up into 32-byte allocation units, each of which has its free or in-use state tracked with two bits. There is a mark-and-sweep garbage collector that maintains the bitmap when it periodically runs; it manages all of the objects that are allocated on the heap. In that way, the mark-and-sweep approach is effectively trading execution time for a reduction in the overhead needed for tracking memory.

Joshi said that she had "barely scratched the surface" of the optimizations that MicroPython has to reduce its memory use. But the code is available; it is clear and easy to follow C that any who are interested should go and look at.

Differing approaches

She completed the talk with an evaluation of why CPython and MicroPython chose the approaches they did. Typically, there is a trade-off between memory use and performance, but that is not really the case here. MicroPython performs better than CPython for most benchmarks, Joshi said, though it does do worse on some, especially those involving large dictionaries. The trade-off here is functionality; MicroPython does not implement all of the features of CPython. That means, for example, that users do not have access to the entire ecosystem of third-party packages that CPython users enjoy.

Different design decisions were made by the CPython and MicroPython projects in part because of the philosophies behind their development. Back in the 1980s and early 1990s when CPython was being created, its developers wanted "simplicity of implementation and design first", they would "worry about performance later".

That is exactly the opposite of the philosophy behind MicroPython. It is a fairly modern implementation, so it could borrow ideas from Lua, JavaScript, and other languages that have come about in the decades since CPython was created. MicroPython is also solely targeted at these highly constrained microcontrollers. Effectively, they are targeting very different kinds of systems and use cases, which helps explain the choices that have been made.

A YouTube video of Joshi's talk is available.

[I would like to thank the Linux Foundation for travel assistance to Portland for PyCon.]

