[ 2015-December-27 14:42 ]

TL;DR: The Java NIO API caches a maximum-sized direct ByteBuffer for each thread, which looks like a native memory leak if you read or write large blocks from many threads. You can easily patch the JDK yourself to work around this problem. Always use direct ByteBuffers with Java NIO APIs for the best performance, and to avoid this "leak." Under the covers, heap ByteBuffers are copied to temporary direct ByteBuffers on each I/O call. [Update 2016-02-10: JDK 9 has a property to control this (Thanks Tony!). Run services with -Djdk.nio.maxCachedBufferSize=262144 to avoid this problem.]

The full story

The Java NIO APIs use ByteBuffers as the source and destination of I/O calls, and come in two flavours. Heap ByteBuffers wrap a byte[] array, allocated in the garbage collected Java heap. Direct ByteBuffers wrap memory allocated outside the Java heap using malloc . Only "native" memory can be passed to operating system calls, so it won't be moved by the garbage collector. This means that when you use a heap ByteBuffer for I/O, it is copied into a temporary direct ByteBuffer. The JDK caches one temporary buffer per thread, without any memory limits. As a result, if you call I/O methods with large heap ByteBuffers from multiple threads, your process can use a huge amount of additional native memory, which looks like a native memory leak. This can cause your process to unexpectedly run into memory limits and get killed.

Our team at Twitter ran into this issue. We had a process that would slowly use more and more memory, until it hit its limit and was killed. It turns out that Finagle responses are currently contained in heap ByteBuffers, triggering this issue. (Finagle will eventually switch to a new version of Netty, which will avoid this issue by using direct ByteBuffers.) To work around the problem, Twitter's JVM team added a flag to our internal version to limit the size of this cache. However, it turns out you can easily replace one of the JDK classes for a single program. This makes it easy to avoid this native memory leak by following the steps below. I've sent an email to the nio-dev mailing list to see if we can limit the size of this cache. However, if you are affected by this, you can try my workaround.

Demonstrating the leak

I wrote a program that writes to /dev/null from multiple threads with both heap and direct ByteBuffers. It shows that using direct ByteBuffers works as you expect, where they are garbage collected when they are unused. However, heap ByteBuffers cause direct ByteBuffers to be allocated and cached until the threads exit. You can also use this to show that my quick-and-dirty patch below avoids the leak. I've put the code in a Github repository, and the README includes sample output.

The code behind the leak

When you pass a ByteBuffer to an I/O API, there are checks to copy heap ByteBuffers to a temporary direct ByteBuffer before making the actual system call. For example, for network I/O, you use a SocketChannel, which is actually an instance of sun.nio.ch.SocketChannelImpl. Reading from a socket calls IOUtil.read, and writing calls IOUtil.write. Both methods check if the ByteBuffer is a direct ByteBuffer. If it is not, they allocate a temporary direct ByteBuffer by calling Util.getTemporaryDirectBuffer, copy the data, then call the "real" readIntoNativeBuffer or writeFromNativeBuffer implementations. The leak itself is in Util.getTemporaryDirectBuffer, which caches the maximum sized buffer for each thread.

Patching the leak

Tony Printezis submitted a version of the patch he wrote for Twitter, which has been merged into JDK 9. I suggest running all services with -Djdk.nio.maxCachedBufferSize=262144 to ensure the JDK doesn't cache buffers larger than 256 kB. I would really love to have this get set as the default, but unfortunately that seems unlikely.

However, if you are running on an older version, the source code for the Java libraries are available as part of OpenJDK. It is actually quite easy to compile your own version of a single class, and replace it using the -Xbootclasspath/p option: