java.util.concurrent.ConcurrentHashMap is a highly optimized concurrent hash map implementation. Here are 5 tips we can learn from its implementation:

Disclaimer: The techniques described here increase the complexity of the code, making it harder to reason about it and test. So please only apply have them when you have seen through profiling that your code is on the hot path.

Use Bit Operations Instead of Expensive Mathematical Operations

An example of this technique is the use of the 'and' operation instead of modulo in ConcurrentHashMap. ConcurrentHashMap stores all values in an array. To calculate the position where an entry should be stored, we need to calculate the hash code modulo the array size. When the array size is a power of two as in the case of the ConcurrentHashmap we can write hash code and ( array size - 1 ) instead of hash code modulo size. The following shows this for a size of 8 and a hash code of 23:

int sizeMinusOne = 0b00000111; // 7 int hashCode = 0b00010111; // 23 int positionInArray = hashCode & sizeMinusOne; // 7





This is, for example, done in the get method:

public V get(Object key) { Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek; int h = spread(key.hashCode()); if ((tab = table) != null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) != null) { // other code omitted } return null; }





In line 5, the array position is calculated using (n - 1) & h. The and operation is about 20 percent faster than the modulo operation. So if you have an expensive mathematical operation inside your critical path, it is a good idea to see if you can replace it with a bit operation.

Use Fine-Grained Locks

If multiple threads are accessing the same lock, it becomes a bottleneck. A solution to this problem is to use fine-grained locking. As can be seen in the putVal method, ConcurrentHashMap uses the array elements as monitors for synchronized locks:

final V putVal(K key, V value, boolean onlyIfAbsent) { if (key == null || value == null) throw new NullPointerException(); int hash = spread(key.hashCode()); int binCount = 0; for (Node<K,V>[] tab = table;;) { Node<K,V> f; int n, i, fh; if (tab == null || (n = tab.length) == 0) tab = initTable(); else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) { if (casTabAt(tab, i, null, new Node<K,V>(hash, key, value, null))) break; // no lock when adding to empty bin } else if ((fh = f.hash) == MOVED) tab = helpTransfer(tab, f); else { V oldVal = null; synchronized (f) { // other code omitted } } addCount(1L, binCount); return null; }





If we need to change a non-empty array element, it is locked using a synchronized block with the array element as the monitor. This is done with the synchronized block in line 18 on the array element received in line 9. If the hash function used disperses the elements properly among the array elements, threads access different array elements and therefore synchronize on different monitors.

So to improve scalability, use the smallest independent value as the lock. Using this technique leads to many lock objects. Since reentrant locks consume more memory than the usage of synchronized blocks, this technique should be used with synchronized blocks instead of reentrant locks.

Use Volatile Fields for Reading and Locks for Writing

As we have seen, writing in the putVal method uses a lock on the array element. As can be seen in the get method, reading does not use locks but only consists of reading from a volatile field:

public V get(Object key) { Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek; int h = spread(key.hashCode()); if ((tab = table) != null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) != null) { if ((eh = e.hash) == h) { if ((ek = e.key) == key || (ek != null && key.equals(ek))) return e.val; } else if (eh < 0) return (p = e.find(h, key)) != null ? p.val : null; while ((e = e.next) != null) { if (e.hash == h && ((ek = e.key) == key || (ek != null && key.equals(ek)))) return e.val; } } return null; }





The array element is received in line 5 using the method tabAt. The volatile read is done in the method tabAt as can be seen here, using the method getObjectVolatile from sun.misc.Unsafe:

static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) { return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE); }





When you have a class with many reads and some writes, use this technique. Reading simply consists reading from a volatile field while writing uses a lock. In ConcurrentHashMap the values of the Node are directly modified. This makes it necessary to declare the fields of this class also as volatile:

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; volatile V val; volatile Node<K,V> next; // methods omitted }





This has the disadvantage that the values can change while you read them. An easier variation of this technique is used in java.util.concurrent.CopyOnWriteArrayList which include a copy of the data during the write.

Lazily Create Immutable Data

ConcurrentHashMap has multiple functions to create a view on this collection, like the following:

private transient KeySetView<K,V> keySet; public KeySetView<K,V> keySet() { KeySetView<K,V> ks; return (ks = keySet) != null ? ks : (keySet = new KeySetView<K,V>(this, null)); } public static class KeySetView<K,V> extends CollectionView<K,V,K> implements Set<K>, java.io.Serializable { private static final long serialVersionUID = 7249069246763182397L; private final V value; KeySetView(ConcurrentHashMap<K,V> map, V value) { // non-public super(map); this.value = value; } // methods omitted }





As we can see in line 1, the field keySet is not volatile and the method keySet(), line 3 till 6 is not synchronized. This leads to two problems, first the KeySetView object is not correctly published and second it might lead to the creation of multiple KeySetView objects. Through the usage of the final field in line 11, we make sure that the object gets correctly initialized even when they are incorrectly published. And the creation of multiple objects does not lead to inconsistent state since they are immutable.

Use java.util.concurrent.atomic.LongAdder for Counting

Writing and deleting threads means needing to update the size counter of the collection. So the modification of the size becomes a bottleneck even if we use the atomic methods from java.util.concurrent.atomic.AtomicLong. To solve this problem, ConcurrentHashMap uses the class CounterCell:

/** * A padded cell for distributing counts. Adapted from LongAdder * and Striped64. See their internal docs for explanation. */ @sun.misc.Contended static final class CounterCell { volatile long value; CounterCell(long x) { value = x; } }





Implementing this mechanism yourself is probably not a good idea since it is rather complicated. Luckily, the JDK provides an implementation of the technique used, the class java.util.concurrent.atomic.LongAdder. The idea behind the counter cells and when to use it is described in the Javadoc of java.util.concurrent.atomic.LongAdder:

One or more variables that together maintain an initially zero long sum. When updates (method add(long)) are contended across threads, the set of variables may grow dynamically to reduce contention.

What Is Next?

As we could see ConcurrentHashMap is full of ticks to write high-performance yet still thread-safe java. The next time we will look at java.util.concurrent.ConcurrentSkipListMap. In the meantime, if you want to test whether your application is thread-safe, try vmlens for free.

I would be glad to hear from you about the techniques you use to achieve high-performance thread-safe classes.