Object Pooling is a creational design pattern that uses a set of objects in a “pool” rather than creating and destroying them on demand. This pattern is very useful if the particular object is very expensive to create and if you want to improve the performance of your software by avoiding the object creation on demand.

Object Pooling can improve the performance significantly. However, the object pooling has to be used with care as the pooling changes the usual object lifetime and you have to make sure that object can be reused next time.

Why did I want to benchmark object pools?

In currently released versions of WSO2 API Manager, the StackObjectPool is used in few places. In last year, a customer reported a performance issue when there are high number of concurrent users accessing the APIs. The performance issue was that some response times were longer than expected. We were able to reproduce the issue in a development environment and we took a Profiling Recording using the Java Flight Recorder. From the profiling recording, we identified that the StackObjectPool used for key validation clients was causing a bottleneck. The main reason is that StackObjectPool will create new objects when all the objects in the pool are used by other threads and any returned objects to the pool are discarded if the pool is already full. The method to create a new object is a synchronized method, which caused many thread contentions. As a workaround to this performance issue, we increased the pool size. Then the time to wait to acquire a lock was less and therefore the response times were much better.

When working on this issue, I wanted to see how other Java Object Pool implementations work and find out what’s the best performing Object Pool. Easiest way to find which one is the best is to write a benchmark using JMH.

Benchmarking Object Pools with JMH

JMH is the best tool you need to write a proper Java Benchmark and produce accurate results. For more details, read Avoiding Benchmarking Pitfalls on the JVM.

The benchmark code does following steps.

Borrow an object from the a Java Object Pool implementation. (The object is very expensive to create. It first consumes CPU using Blackhole.consumeCPU() method with 10,000 tokens and then a String object, which has 10,000 characters is created.) Simulate a delay using Blackhole.consumeCPU() method. Main reason is that usually an object is taken from a pool to do some operations and the object will be used for some time before returning the to the pool. I used 1,000 tokens. Use the object by executing a method to get data and using a black hole in JMH to consume the data from the object. Release the object.

I used following Java Object Pool implementations to benchmark.

Commons Pool and Commons Pool 2 also have Java SoftReference based object pool implementations. See Commons Pool Soft Reference Object Pool and Commons Pool 2 Soft Reference Object Pool. These SoftReference based Object Pools were excluded from the benchmark.

When working on the benchmarks, I found out that Stormpot author Chris Vest has also written similar object pool benchmarks. He also has a medium story on Stormport 2.4 release and benchmark results. Even though Chris Vest has done a comprehensive benchmark, I wanted to continue with my project to analyze results myself and improve my knowledge on JMH. I also improved my code by looking at Stormpot benchmarks. I used latest versions of all object pool implementations.

Source code and running the benchmark.

My Object Pool Benchmarks code is at https://github.com/chrishantha/object-pool-benchmarks

There is a script named benchmark.sh to run the benchmarks. Please build the benchmarks using "mvn clean install" before running the benchmarks.

Benchmark Results

Environment

The benchmarks were executed in my Lenovo ThinkPad X1 Carbon 4th Generation Laptop.

Following are some details about the hardware and software.

Processor:

Model: Intel(R) Core(TM) i7–6500U CPU @ 2.50GHz

CPUs: 4

Threads per core: 2

Cores per socket: 2

Sockets: 1

Memory:

System Memory: 8GiB

L1 Cache (Instructions): 64KiB

L1 Cache (Data): 64KiB

L2 Cache: 512KiB

L3 Cache: 4MiB

Software:

OS: Ubuntu 17.10

Kernel: 4.13.0–16-generic

Java: 1.8.0_152

JMH: 1.19

Latest versions of all Object Pool Implementations were used. I used the command mvn versions:display-dependency-updates to check latest updates to all dependencies in the project.

JMH Setup

I used following command to run the benchmarks.

time ./benchmark.sh 2>&1 | tee benchmark.log

The script also ran the benchmarks using different number of threads for pools with different sizes.

Forks: 2 (A new JVM is started for each fork)

Warm-up: 5 iterations, 1 second each

Measurement: 5 iterations, 1 second each

VM Options: -Xms4g -Xmx4g (4GB Heap)

Threads: 10, 50, and 100

Pool Sizes: 10, 50, 100, and 150

Benchmark Modes: “thrpt” and “sample”

Time Unit: ms (milliseconds)

Different threads sizes and pool sizes were used to understand how Object Pool implementations behave under low to high contention.

The GC profiler was also included to measure the GC allocation rates. When using an object pool, the allocation rate should be less.

The benchmark took almost 3 hours to run. Following is the output of time command.

real 170m56.091s

user 490m24.931s

sys 7m20.652s

Visualizing the results

The benchmark script created 3 result files (for each execution with different number of threads) and I used Python to visualize results.

I used an amazing Python library called “pandas”, which has easy-to-use data structures to analyze data. I used pandas to concatenate all CSV files from the benchmarks and analyze data. I used Seaborn, which is a Python visualization library based on matplotlib to visualize the results.

Summary of the results

Let’s compare the throughput score (ops/ms) for all object pools.