Performance of Reader/Writer Locks with HLE

These are the results of a micro-benchmark for a reader/writer workload with different implementations for the lock. Other parameters are the probability that a request is a write request (the probability in the columns of the table below) and the size of the data structure (a simple hash table). In all cases four threads are used. The hash table size in this output is fixed at a large size (65536 entries), reducing the chance of collision as it should be the case.

The first part of the table shows the raw numbers for the execution time of the code. Aside from the faster reade/writer lock implementation a simple mutex lock based on futexes is compare. All All three implementations exist in two forms, with and without HLE use. The very same binary can be executed on a non-HLE machine without any disadvantage for the HLE variant. Finally there are also implementations using the standard POSIX muatex and reader/writer locks from NPTL.

The second, color-coded part of the table compares the execution time of the various implementations relative to the pthread_rwlock_t implementation from NPTL. We see the for all workloads all the implementations perform better than the POSIX reader/writer lock implementation. This is easy to explain with the complexity required for a generic reader/writer implementation which can fulfill the POSIX requirements.

The third part of the table, also color-coded, compares the HLE-using variants of the new reader/writer lock implementation and the simple futex implementation with the same code without the use of HLE. When executed on an HLE-capable machine significant speed-ups can be achieved by only adding the HLE prefixes in the appropriate places and no other change at all.

One word on the results for the “fastrwlock” implementation: blindly using HLE can actually have negative effects. In case restarts of the transaction are the norm the cost of the operations is actually higher than performing the locking right away. This can be seen in the performance of the “fastrwlock” and “fastrwlock_hle” variants. For high numbers of writers the performance of the latter is potentially significantly less than that of the non-HLE variant.