Not all stories need to be success stories. Reality is also not like that. We would like to share a true, disappointing story (but a phenomenal learning experience) that may be beneficial to you.

This is a story about optimizing the memory utilization of a web application. This application was configured with a lot of memory (4GB) just to service a handful of transactions/second. Thus, we set out to study the memory utilization patterns of this application. We captured heap dumps of this application using ‘jmap’ tool. We uploaded the captured heap dump to the HeapHero tool. HeapHero is a heap dump analysis tool just like Eclipse MAT, JProfiler, or Yourkit. The HeapHero tool profiled the memory and provided statistics on total classes, total objects, heap size, and a histogram view of large objects residing in the memory. On top of these traditional metrics, HeapHero reported the total amount of memory wasted due to inefficient programming practices. In modern computing, a considerable amount memory is wasted because of inefficient programming practices such as duplicate object creation, suboptimal data type definitions (declaring ‘double’ and assigning only ‘float’ values), over-allocation and underutilization of data structures, and several other practices.

This application was no exception. HeapHero reported that the application was wasting 56% of memory due to inefficient programming practices. Yes, that was eyebrow-raising. It reported that 30% of application’s memory is wasted because of duplicate strings.

Fig: HeapHero tool reporting amount of memory wasted due to inefficient programming

String Deduplication

In Java 8, update 20, a new JVM argument ‘ -XX:+UseStringDeduplication ’ was introduced. When an application is launched with this argument, JVM will eliminate the duplicate strings from the application’s memory during garbage collection. However please be advised that the ‘ -XX:+UseStringDeduplication ’ argument will work only with the G1 GC algorithm. You can activate the G1 GC algorithm by passing ‘ -XX:+UseG1GC ’.

We got excited. We thought just by introducing ‘ -XX:+UseG1GC -XX:+UseStringDeduplication ’ JVM argument, we would be able to save 30% of memory without any code refactoring. Wow, isn’t it wonderful? To verify this theory, we conducted two different tests in our performance lab:

Test 1: Passing ‘ -XX:+UseG1GC ’

Test 2: Passing ‘ -XX:+UseG1GC -XX:+UseStringDeduplication ’

We enabled Garbage collection logs on the application to study the memory usage pattern. Analyzed Garbage Collection logs using the free online garbage collection log analysis tool – GCeasy. We were hoping that in the test run #2 we would be able to see a 30% reduction in the memory consumption, because of the elimination of duplicate strings. However, the reality was quite different. We didn’t see any difference in the memory usage. Both test runs were consistently showing the same amount of memory utilization. See the heap usage graphs generated by the GCeasy tool by analyzing the garbage collection logs.

Fig: GCeasy Heap usage graph with ‘ -XX:+UseG1GC ’





Fig: GCeasy heap usage graph with ‘ -XX:+UseG1GC -XX:+UseStringDeduplication ’

In Test run #1, the heap usage was hovering around 1500mb all through the test, in test run #2 the heap usage was hovering around 1500mb. Disappointingly, we didn’t see the anticipated 30% reduction in the memory usage, despite introducing the ‘ -XX:+UseG1GC -XX:+UseStringDeduplication ’ JVM arguments.

Why There Wasn’t Reduction in Heap Usage

‘Why wasn’t there a reduction in heap usage?’ – this question really puzzled us. Did we configure JVM arguments rightly? Doesn’t ‘ -XX:+UseStringDeduplication ’ do its job correctly? Is the analysis report from the GCeasy tool correct? All these questions troubled our sleep. After a detailed analysis, we figured out the bitter truth. Apparently ‘ -XX:+UseStringDeduplication ’ will eliminate duplicate strings that are present in the old generation of the memory only. It will not eliminate duplicate strings in the young generation. Java memory has 3 primary regions: young generation, old generation, and Metaspace. Newly created objects go into the young generation. Objects that survived for a longer period are promoted to the old generation. JVM related objects and metadata information are stored in Metaspace. Thus, in other words, ‘ -XX:+UseStringDeduplication ’ will only remove duplicate strings that are living for a longer period. Since this is a web application, most of the string objects were created and destroyed immediately. It was very clear from the following statistics reported in the GCeasy log analysis report:

Fig: Object creation/promotion stats reported by GCeasy

The average object creation rate of this application is 44.93 mb/sec, whereas the average promotion rate (i.e. from the young generation to the old generation) is only 918 kb/sec. It’s indicative that a very small percentage of objects are living for a long time. Even in these 918 kb/sec promoted objects, string objects are going to be a smaller portion. Thus the amount of duplicate strings removed by ‘ -XX:+UseStringDeduplication ’ was very negligible. Thus, sadly, we didn’t see the expected reduction in memory.

Conclusion

(a). ‘ -XX:+UseStringDeduplication ’ will be useful only if the application has a lot of long-lived duplicate strings. It wouldn’t yield fruitful results for applications when the majority of the objects are short-lived. Unfortunately, most modern web applications' and microservice applications' objects are short-lived.

(b). Another famous option recommended in the industry to eliminate duplicate strings is to use the String#intern() function. However, String#intern() isn’t going to be useful for this application. Because, in String#intern() , you end up creating the string objects and then eliminating it right after. If a string is short-lived by nature, you don’t need to do this step, as regular garbage collection processes will eliminate the strings. Also, String#intern() has a possibility to add (very little) latency overhead to the transaction and CPU overhead.

(c). Given the current situation, the best way to eliminate duplicate strings from the application is to refactor the code to make sure duplicate strings are not even created. HeapHero points out the code paths where a lot of duplicate of strings are created. Using those pointers, we are going to continue our journey to refactor the code to reduce memory consumption.

Appendix

(1). Garbage collection log analysis report of test run #1

(2). Garbage collection log analysis report of test run #2