by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java and Memory consumption of popular Java data types – part 1 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this one. This article assumes that size of references is equal to 4 bytes.

HashMap, THashMap

HashMap is the most popular map in JDK. Provided that it contains objects with a quality hashCode method and it has not too high load factor, it will generally give you O(1) performance of get and put methods (as well as similar methods like contains ).

You can find a proper hash map description in one of popular textbooks (“Algorithms (4th Edition)” or “Introduction to Algorithms”) or in Wikipedia. Here we will only discuss JDK HashMap implementation.

HashMap is built on top of the array of Map.Entry objects. The implementation ensures that this array length is always equal to at least max( size, capacity ) / load_factor . Default load factor for HashMap is 0.75 and default capacity is 16. Load factor specifies which part of an array could be used for storage and is a value between 0 and 1. The higher is the load factor, the less space is being wasted, but HashMap starts to work slower due to increased rate of collisions. The smaller if the load factor, the more memory is wasted, but the performance of a HashMap is increasing due to smaller possibility of collisions.

So, as you have seen, the default ( new HashMap<>() ) size of array of entries is 16 / 0.75 = 21.33 ~ 22.

What is a HashMap.Entry ? It contains a key, a value, int hash of a key and a pointer to the next entry (remember that entries array could be sparse). It means that an entry occupies 32 bytes (12 bytes header + 16 bytes data + 4 bytes padding). So, a HashMap with size = S has to spend 32 * S bytes for entries storage. Besides, it will use 4 * C bytes for entries array, where C is the map capacity.

As you can see, if you will make the map capacity small enough (less than 12.5%), its entry array size will start dominating over the entries.

A HashMap instance will occupy 32 * SIZE + 4 * CAPACITY bytes, while the theoretical map size limit could be equal to 8 * SIZE bytes (2 arrays of keys and values with no space wasted). Of course, such a “map” will require O(N) lookup time in general case. Though, in special cases, for example for EnumMap it could be as low as 4 * CAPACITY .

Can we improve the hash map memory consumption without sacrificing O(1) lookup/add times? Surely, we can. Trove THashMap is a replacement implementation for HashMap . Internally THashMap contains 2 arrays – one for keys, another for values. It means that THashMap needs 8 * CAPACITY bytes for storage. Its default load factor is 0.5, but you can increase it if necessary.

Let’s compare the memory usage of HashMap and THashMap with default load factors and size = 100. HashMap capacity will be 134 (100/0.75) and THashMap capacity will be 200 (100/0.5). The total memory consumption of a HashMap will be 32 * 100 + 4 * 134 = 3200 + 536 = 3736 bytes (not including the memory occupied by keys or values!). The memory consumption of THashMap will be 8 * 200 = 1600 bytes . Furthermore, if we will set THashMap load factor to 0.75 (thus making it equal to HashMap load factor), THashMap memory consumption will go down to 8 * 134 = 1072 bytes .

As you can see, it worth to replace JDK HashMap with Trove THashMap if you want to save some memory.

HashSet, THashSet

JDK HashSet is built on top of a HashMap<T, Object> , where value is a singleton ‘present’ object. It means that the memory consumption of a HashSet is identical to HashMap : in order to store SIZE values, you need 32 * SIZE + 4 * CAPACITY bytes (plus size of your values). It is definitely not a memory-friendly collection.

Trove THashSet could be the easiest replacement collection for a HashSet – it implements Set<E> and Iterable<E> , which means you should just update a single letter in the initialization of your set.

THashSet uses a single object array for its values, so it uses 4 * CAPACITY bytes for storage. As you can see, compared to JDK HashSet , you will save 32 * SIZE bytes in case of the identical load factor, which is a huge improvement.

LinkedHashMap

LinkedHashMap is the most memory-hungry collection in JDK. It extends HashMap by using LinkedHashMap.Entry as an entry in the internal array of entries. LinkedHashMap.Entry extends HashMap.Entry by adding ‘before’ and ‘after’ pointers, thus implementing a linked deque. For us it means that LinkedHashMap.Entry consumes 40 bytes (8 bytes more than HashMap.Entry ).

It means that LinkedHashMap consumes 40 * SIZE + 4 * CAPACITY bytes. Unfortunately, Trove does not offer a direct replacement for this collection, but I would advise you to use a linked list if you want to store a small number of elements.

LinkedHashSet, TLinkedHashSet

JDK LinkedHashSet extends HashSet and adds nothing to it. It means the identical memory consumption of 32 * SIZE + 4 * CAPACITY bytes (plus size of your values).

Trove TLinkedHashSet could be used as a replacement. It adds TIntList for storing the order of entries to THashSet , which means that it occupies 8 * CAPACITY bytes.

TreeMap

TreeMap is a red-black tree. This means that a tree contains exactly map.size() nodes. Each tree node contains: key, value, pointers to the left and right children, pointer to a parent and a boolean ‘colour’ flag. It means that a node occupies 12 bytes for header, 20 bytes for 5 object fields and 1 byte for the flag, so the total consumption is 12 + 20 + 1 = 40 (due to 8 byte alignment). The last flag seems to be odd in this design, because it causes each node to consume 7 bytes more (due to alignment). This situation could be solved by having 2 types of nodes – one class for red nodes and another for black ones.

So, the total memory consumption of a TreeMap is 40 * SIZE bytes, which is approximately the same as the memory consumption of a HashMap . In terms of CPU consumption, a TreeMap is worse – all of its methods have O(logN) complexity compared to O(1) of HashMap (provided that the load factor is low enough).

The advantage of a TreeMap is that it implements a NavigableMap interface, which allows you to make cheap requests for previous/next entries by the natural ordering or provided Comparator as well as to make any range selections.

TreeSet

JDK TreeSet is backed by a TreeMap , so its memory consumption is identical: 40 * SIZE bytes.

PriorityQueue

The last collection to discuss is a PriorityQueue . This collection is used in scenarios when from time to time you need to extract/get the smallest (or the largest, depending on your settings) element in the collection as well as to add the new elements to the collection. Extraction and insertion are mixed without any particular order.

PriorityQueue is based on the binary heap, which is essentially an array, where children of entry with index N are located at 2*N+1 and 2*N+2 . The size of this array is increased twofold on each resize, so for sufficiently large collections we use between 50 and 100% of entries in the array. There are no wrappers for collection elements, which means that PriorityQueue consumption is between 4 * SIZE and 8 * SIZE bytes (identical to ArrayList despite their different purposes). We can also specify this size as 4 * CAPACITY bytes.

Summary

Always try to replace HashMap with Trove THashMap , HashSet with a THashSet and finally, LinkedHashSet with a Trove TLinkedHashSet . Such replacement requires adding a single letter to your code (letter ‘T’) and no other code changes except the import statement. Such replacement will give you significant memory savings – see table below.

with Trove , with a and finally, with a Trove . Such replacement requires adding a single letter to your code (letter ‘T’) and no other code changes except the import statement. Such replacement will give you significant memory savings – see table below. The following table summarizes the storage occupied per stored value assuming that a reference occupies 4 bytes. Note that you must spend 4 byte per Object reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead (subtract 8 bytes for maps, because there is a key as well as a value).

JDK collection Size Possible Trove substitution Size HashMap 32 * SIZE + 4 * CAPACITY bytes THashMap 8 * CAPACITY bytes HashSet 32 * SIZE + 4 * CAPACITY bytes THashSet 4 * CAPACITY bytes LinkedHashMap 40 * SIZE + 4 * CAPACITY bytes None LinkedHashSet 32 * SIZE + 4 * CAPACITY bytes TLinkedHashSet 8 * CAPACITY bytes TreeMap, TreeSet 40 * SIZE bytes None PriorityQueue 4 * CAPACITY bytes None

See also