Python Memory Management¶

One of the major challenges in writing (somewhat) large-scale Python programs is to keep memory usage at a minimum. However, managing memory in Python is easy—if you just don’t care. Python allocates memory transparently, manages objects using a reference count system, and frees memory when an object’s reference count falls to zero. In theory, it’s swell. In practice, you need to know a few things about Python memory management to get a memory-efficient program running. One of the things you should know, or at least get a good feel about, is the sizes of basic Python objects. Another thing is how Python manages its memory internally.

So let us begin with the size of basic objects. In Python, there’s not a lot of primitive data types: there are ints, longs (an unlimited precision version of ints), floats (which are doubles), tuples, strings, lists, dictionaries, and classes.

Basic Objects¶ What is the size of int ? A programmer with a C or C++ background will probably guess that the size of a machine-specific int is something like 32 bits, maybe 64; and that therefore it occupies at most 8 bytes. But is that so in Python? Let us first write a function that shows the sizes of objects (recursively if necessary): import sys def show_sizeof ( x , level = 0 ): print " \t " * level , x . __class__ , sys . getsizeof ( x ), x if hasattr ( x , '__iter__' ): if hasattr ( x , 'items' ): for xx in x . items (): show_sizeof ( xx , level + 1 ) else : for xx in x : show_sizeof ( xx , level + 1 ) We can now use the function to inspect the sizes of the different basic data types: show_sizeof ( None ) show_sizeof ( 3 ) show_sizeof ( 2 ** 63 ) show_sizeof ( 102947298469128649161972364837164 ) show_sizeof ( 918659326943756134897561304875610348756384756193485761304875613948576297485698417 ) If you have a 32-bit 2.7x Python, you’ll see: 8 None 12 3 22 9223372036854775808 28 102947298469128649161972364837164 48 918659326943756134897561304875610348756384756193485761304875613948576297485698417 and if you have a 64-bit 2.7x Python, you’ll see: 16 None 24 3 36 9223372036854775808 40 102947298469128649161972364837164 60 918659326943756134897561304875610348756384756193485761304875613948576297485698417 Let us focus on the 64-bit version (mainly because that’s what we need the most often in our case). None takes 16 bytes. int takes 24 bytes, three times as much memory as a C int64_t , despite being some kind of “machine-friendly” integer. Long integers (unbounded precision), used to represent integers larger than 263-1, have a minimum size of 36 bytes. Then it grows linearly in the logarithm of the integer represented. Python’s floats are implementation-specific but seem to be C doubles. However, they do not eat up only 8 bytes: show_sizeof ( 3.14159265358979323846264338327950288 ) Outputs 16 3.14159265359 on a 32-bit platform and 24 3.14159265359 on a 64-bit platform. That’s again, three times the size a C programmer would expect. Now, what about strings? show_sizeof ( "" ) show_sizeof ( "My hovercraft is full of eels" ) outputs, on a 32 bit platform: 21 50 My hovercraft is full of eels and 37 66 My hovercraft is full of eels An empty string costs 37 bytes in a 64-bit environment! Memory used by string then linearly grows in the length of the (useful) string. * * * Other structures commonly used, tuples, lists, and dictionaries are worthwhile to examine. Lists (which are implemented as array lists, not as linked lists, with everything it entails) are arrays of references to Python objects, allowing them to be heterogeneous. Let us look at our sizes: show_sizeof ([]) show_sizeof ([ 4 , "toaster" , 230.1 ]) outputs 32 [] 44 [ 4 , 'toaster' , 230.1 ] on a 32-bit platform and 72 [] 96 [ 4 , 'toaster' , 230.1 ] on a 64-bit platform. An empty list eats up 72 bytes. The size of an empty, 64-bit C++ std::list() is only 16 bytes, 4-5 times less. What about tuples? (and dictionaries?): show_sizeof ({}) show_sizeof ({ 'a' : 213 , 'b' : 2131 }) outputs, on a 32-bit box 136 {} 136 { 'a' : 213 , 'b' : 2131 } 32 ( 'a' , 213 ) 22 a 12 213 32 ( 'b' , 2131 ) 22 b 12 2131 and 280 {} 280 { 'a' : 213 , 'b' : 2131 } 72 ( 'a' , 213 ) 38 a 24 213 72 ( 'b' , 2131 ) 38 b 24 2131 for a 64-bit box. This last example is particularly interesting because it “doesn’t add up.” If we look at individual key/value pairs, they take 72 bytes (while their components take 38+24=62 bytes, leaving 10 bytes for the pair itself), but the dictionary takes 280 bytes (rather than a strict minimum of 144=72×2 bytes). The dictionary is supposed to be an efficient data structure for search and the two likely implementations will use more space that strictly necessary. If it’s some kind of tree, then we should pay the cost of internal nodes that contain a key and two pointers to children nodes; if it’s a hash table, then we must have some room with free entries to ensure good performance. The (somewhat) equivalent std::map C++ structure takes 48 bytes when created (that is, empty). An empty C++ string takes 8 bytes (then allocated size grows linearly the size of the string). An integer takes 4 bytes (32 bits). * * * Why does all this matter? It seems that whether an empty string takes 8 bytes or 37 doesn’t change anything much. That’s true. That’s true until you need to scale. Then, you need to be really careful about how many objects you create to limit the quantity of memory your program uses. It is a problem in real-life applications. However, to devise a really good strategy about memory management, we must not only consider the sizes of objects, but how many and in which order they are created. It turns out to be very important for Python programs. One key element to understand is how Python allocates its memory internally, which we will discuss next.

Internal Memory Management¶ To speed-up memory allocation (and reuse) Python uses a number of lists for small objects. Each list will contain objects of similar size: there will be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a small object needs to be created, either we reuse a free block in the list, or we allocate a new one. There are some internal details on how Python manages those lists into blocks, pools, and “arena”: a number of block forms a pool, pools are gathered into arena, etc., but they’re not very relevant to the point we want to make (if you really want to know, read Evan Jones’ ideas on how to improve Python’s memory allocation). The important point is that those lists never shrink. Indeed: if an item (of size x) is deallocated (freed by lack of reference) its location is not returned to Python’s global memory pool (and even less to the system), but merely marked as free and added to the free list of items of size x. The dead object’s location will be reused if another object of compatible size is needed. If there are no dead objects available, new ones are created. If small objects memory is never freed, then the inescapable conclusion is that, like goldfishes, these small object lists only keep growing, never shrinking, and that the memory footprint of your application is dominated by the largest number of small objects allocated at any given point. * * * Therefore, one should work hard to allocate only the number of small objects necessary for one task, favoring (otherwise unpythonèsque) loops where only a small number of elements are created/processed rather than (more pythonèsque) patterns where lists are created using list generation syntax then processed. While the second pattern is more à la Python, it is rather the worst case: you end up creating lots of small objects that will come populate the small object lists, and even once the list is dead, the dead objects (now all in the free lists) will still occupy a lot of memory. * * * The fact that the free lists grow does not seem like much of a problem because the memory it contains is still accessible to the Python program. But from the OS’s perspective, your program’s size is the total (maximum) memory allocated to Python. Since Python returns memory to the OS on the heap (that allocates other objects than small objects) only on Windows, if you run on Linux, you can only see the total memory used by your program increase. * * * Let us prove my point using memory_profiler, a Python add-on module (which depends on the python-psutil package) by Fabian Pedregosa (the module’s github page). This add-on provides the decorator @profile that allows one to monitor one specific function memory usage. It is extremely simple to use. Let us consider the following program: import copy import memory_profiler @profile def function (): x = list ( range ( 1000000 )) # allocate a big list y = copy . deepcopy ( x ) del x return y if __name__ == "__main__" : function () invoking python - m memory_profiler memory - profile - me . py prints, on a 64-bit computer Filename : memory - profile - me . py Line # Mem usage Increment Line Contents ================================================ 4 @profile 5 9.11 MB 0.00 MB def function (): 6 40.05 MB 30.94 MB x = list ( range ( 1000000 )) # allocate a big list 7 89.73 MB 49.68 MB y = copy . deepcopy ( x ) 8 82.10 MB - 7.63 MB del x 9 82.10 MB 0.00 MB return y This program creates a list of n=1,000,000 ints (n x 24 bytes = ~23 MB) and an additional list of references (n x 8 bytes = ~7.6 MB), which amounts to a total memory usage of ~31 MB. copy.deepcopy copies both lists, which allocates again ~50 MB (I am not sure where the additional overhead of 50 MB - 31 MB = 19 MB comes from). The interesting part is del x : it deletes x , but the memory usage only decreases by 7.63 MB! This is because del only deletes the reference list, not the actual integer values, which remain on the heap and cause a memory overhead of ~23 MB. This example allocates in total ~73 MB, which is more than twice the amount of memory needed to store a single list of ~31 MB. You can see that memory can increase surprisingly if you are not careful! Note that you might get different results on a different platform or with a different python version.