After migrating the project to Spring Boot, the problem of excessive memory usage came out. This article describes the entire troubleshooting process and the tools used, and is also very useful for troubleshooting other out-of-heap memory problems.

The Background

In order to manage the projects better, we migrated one project in the group to the MDP framework (based on Spring Boot), and then we found that the system reported the exception of excessive usage on Swap area frequently. I was called to help to look for the cause, but found that there was 4G configured for the on-heap memory, and the actual physical memory used was as high as 7G. ​​It was not normal.

Here the JVM parameter configuration was:

-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=128m -XX:InitialCodeCacheSize=128m, -Xss512k -Xmx4g -Xms4g,-XX:+UseG1GC -XX:G1HeapRegionSize=4M

The actual physical memory used was as shown below:

How I Troubleshooted The Problem

1. Use Java-level tools to locate memory areas (in-heap memory, Code segment, or off-heap memory that is requested by using unsafe.allocateMemory and DirectByteBuffer )

I added the JVM parameter of -XX:NativeMemoryTracking=detail to the project to restart the project, and used the command of jcmd pid VM.native_memory detail to view the memory distribution, which was as follows:

I found that the committed memory was less than the physical memory, because the memory displayed by the jcmd command contained the in-memory memory, the Code segment, as well as the memory requested by unsafe.allocateMemory and DirectByteBuffer , but did not contain the off-heap memory requested by other Native Code (C code). So I guessed that the problem was caused by the usage of Native Code's requesting memory.

I also used pmap to view the memory distribution in order to prevent misjudgment, and found a large number of addresses whose memories were 64M. But these address spaces were not in the address spaces given by the jcmd command, so I almost could judge that the problem was caused by these 64M memories.

2. Use system-level tools to locate off-heap memory

Since I had determined that it was caused by Native Code, and Java-level tools were not very useful for solving such problems, I must take the help of system-level tools.

First, use gperftools to locate the problem.

You can refer to gperftools for the usage of gperftools. The monitoring from gperftools was as follows:

What could be seen from the above image was that, the memory requested by malloc was released when it was up to 3G, and then it was kept always at 700M-800M. My first response was: Didn't it request by using mmap/brkby directly instead of malloc in the Native Code? (The principle for gperftools is to replace the default memory allocator (glibc) of the operating system with dynamic linking.)

Then, use strace to track system calls.

Since the memory couldn't be tracked with gperftools, I used the command strace -f -e"brk,mmap,munmap" -p pid to track the request for memory to OS, but didn't find any suspected memory request. The monitoring through strace was shown below:

Next, use GDB to dump suspected memory.

Since the suspected memory request couldn't be tracked out by strace, I thought about checking the memory. So I went into GDB directly by the command gdp -pid pid , and then used the command dump memory mem.bin startAddress endAddress to dump the memory (startAddress and endAddress could be found from /proc/pid/smaps ). Then I viewed the results through strings mem.bin , which were as follows:

It looked something like the decompressed JAR package information. The information of JAR package should be read at the start of the project, so strace works not very well after the project has been started. So you should use strace during the startup of a project, not after the startup has been complete.

Again, use strace to track system calls during the startup of the project.

So I used strace to track system calls during the startup of the project and found that there were indeed a lot of 64M memories requested. The screenshot was as follows:

The memory used by the address space requested by the mmap could be seen in the pmap tool:

Finally, use jstack to view the corresponding thread.

Since the thread ID of the requested memory had been displayed by the strace command, I could use the command jstack pid directly to view the thread stack to find the corresponding thread stack (note the conversion between decimal and hexadecimal):

Now I found the problem: MCC (MtConfigClient) used Reflections to scan the packages, and used Spring Boot to load JARs on the bottom. The off-heap memory was needed as the Inflater class was to be used when decompressing JARs. Then I used Btrace to trace the class:

Then I took a look at the place where MCC was used, and foud that the path for scanning packages was not configured. The default was to scan all the packages. So I only needed to modify the code and configure the path for scanning packages to solve the memory problem.

3. Why Is The Off-Heap Memory Not Released?

Although the problem has been solved, there still are several questions:

Why is there no problem with the old framework?

Why is the off-heap memory not released?

Why are the memories all 64M?

Why does gperftools finally show that the memory used is about 700M, and does the decompression package really not use malloc to request for memory?

With these questions in mind, I checked the relevant block of source code of Spring Boot Loader, and found that in Spring Boot, the InflaterInputStream in Java JDK was wrapped and the Inflater was used, and the Inflater itself needed to involve the off-heap memory when uncompressing JAR packages. However, the packaged class ZipInflaterInputStream did not release the off-heap memory held by the Inflater . So I thought it was the reason and reported the bug to the Spring Boot community immediately. But soon after that, I discovered that the Inflater object itself implemented the finalize method, in which there was a logic that could be called to release of the off-heap memory. In other words, Spring Boot relies on GC's releasing the off-heap memory.

Then when I viewed the objects in the heap by jmap, I found that there was nearly no Inflater object. So I started to doubt that it hadn't called finalize during GC. I replaced the Inflater in the Spring Boot Loader with my own wrapped Inflater , and then added breakpoints to the finalize method for debugging. The fact was that the finalize method was actually called. So I went to check the C code corresponding to Inflater , and found that it requested memory by malloc during the initialization and called free to release the memory during ending.

At this moment, I could only suspect that it didn't really release the memory when calling free , so I replaced the InflaterInputStream wrapped by Spring Boot with the one that came with Java JDK. Then, the memory problem was solved.

Now I went back to see the memory distribution of gperftools, and I found that the memory usage had been increasing when using Spring Boot, but it dropped a lot at a certain point (the usage reduced from 3G to 700M). It was caused by GC, and the memory was released. However, the memory change couldn't be seen at the operating system level. Was the memory not released to the operating system, but held by the memory allocator?

Then I discovered that the memory address distribution of the default system memory allocator (glibc 2.12) was quite different from the one of using gperftools. The address 2.5G was found to belong to the Native Stack by using smaps. The memory address was distributed as follows:

So there was little doubt that it was due to the memory allocator. I searched glibc 64M, and found that glibc introduce the memory pool to each thread since 2.11 (64-bit machine is 64M memory). The original was as follows:

An enhanced dynamic memory allocation (malloc) behaviour enabling higher scalability across many sockets and cores. This is achieved by assigning threads their own memory pools and by avoiding locking in some situations. The amount of additional memory used for the memory pools (if any) can be controlled using the environment variables MALLOC_ARENA_TEST and MALLOC_ARENA_MAX. MALLOC_ARENA_TEST specifies that a test for the number of cores is performed once the number of memory pools reaches this value. MALLOC_ARENA_MAX sets the maximum number of memory pools used, regardless of the number of cores.

I modified the environment variable of MALLOC_ARENA_MAX as described in the article but found it didn't work. Then I checked the tcmalloc (the memory allocator used by gperftools), and got to know it used the approach of memory pool too.

In order to verify the concluditon, I wrote a simple memory allocator which had no memory pool. Then I used the command gcc zjbmalloc.c -fPIC -shared -o zjbmalloc.so to generate a dynamic library, and then replaced the memory allocator of glibc by export LD_PRELOAD=zjbmalloc.so . The code was as follows:

#include <sys/mman.h> #include <stdlib.h> #include <string.h> #include <stdio.h> void* malloc(size_t size) { long* ptr = mmap(0, size + sizeof(long), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS); if(ptr == MAP_FAILED) { return NULL; } *ptr = size; // First 8 bytes contain lenght. return (void*)(&ptr[1]); // Memory that is after length variable } void *calloc(size_t n, size_t size){ void* ptr = malloc(n * size); if(ptr == NULL){ return NULL; } memset(ptr, 0, n * size); return ptr; } void *realloc(void *ptr, size_t size) { if(size == 0){ free(ptr); return NULL; } if(ptr == NULL){ return malloc(size); } long *plen = (long*)ptr; plen--; // Reach top of memory long len = *plen; if(size <= len){ return ptr; } void* rptr = malloc(size); if(rptr == NULL){ free(ptr); return NULL; } rptr = memcpy(rptr, ptr, len); free(ptr); return rptr; } void free(void* ptr) { if(ptr == NULL){ return; } long *plen = (long*)ptr; plen--; // Reach top of memory long len = *plen; // Read lenght munmap((void*)plen, len + sizeof(long)); }

It could be found that the off-heap memory requested actually was always between 700M-800M after the program had been started when adding event tracking to the custom allocator. The shown memory usage in gperftools was also around 700M-800M. However, the memory occupied by the process was very different at the operating system level (here only the off-heap memory was monitored).

I made a test for using different allocators to scan packages in varying degrees. And the memories occupied were as follows:

Allocator Name No Scan All Packages Scan All Packages Once Scan All Packages Twice glibc 750M 1.5G 2.3G tcmalloc 900M 2.0G 2.77G customize 1.7G 1.7G 1.7G

Why did the custom malloc request for 800M, but the physical memory occupied finally was 1.7G?

The reason was that the custom memory allocator allocated memory through the way of mmap, which needed to round up to an integer number of pages on demand when allocating memory. So there was a huge waste of space. It also could be found that the number of pages requested finally was about 536k, while the memory actually requested to the system was equal to 512k * 4k(pagesize) = 2G.

Why was it greater than 1.7G?

The operating system used the way of delay allocation, so when the mmap requested for memory to the system, the system only returned the memory address but did not allocate real physical memory. Only when the memory was actually used, the system would generate a page fault interrupt and then allocate an actual physical Page.

Sum up

The flow of the entire memory allocation is shown in the image above. The default configuration for the MCC Scanner is to scan all JAR packages. When scanning packages, Spring Boot does not release the off-heap memory automatically, which will make the off-heap memory consumption soar continuely during the scanning phase. When GC occurs, Spring Boot relies on the finalize mechanism to release the off-heap memory; but for performance reasons, glibc does not actually return the memory to the operating system, but leaves the memory in the memory pool, which makes the application layer consider there has occurred a "memory leak" . Therefore, the configuration path for the MCC should be modified to specific JAR packages, to solve this problem.

By the time I published this article, I found that the latest version of Spring Boot (2.0.5.RELEASE) had made changes, and the ZipInflaterInputStream could release the off-heap memory actively and no longer depend on GC; so the problem also can be solved by upgrading Spring Boot to the latest version. :)

References