Modern Java applications do a lot of string manipulations due to web service API calls (i.e. JSON, REST, SOAP, …), external data source calls (SQL, data returned back from DB, …), text parsing, text building, etc. Thus, string objects can easily occupy at least 30% of memory. Apparently, the majority of those String objects are duplicates. Because of string duplication, a considerable amount of memory is wasted. Thus, to optimize the memory wasted by duplicate string objects, JEP 192 has become a welcome enhancement to Java.

What Does JEP 192 Do?

When the G1 GC algorithm runs, it removes garbage objects from memory. It also removes duplicate string objects from memory — string deduplication. This feature can be activated by passing the following JVM arguments:

-XX:+UseG1GC -XX:+UseStringDeduplication





Note 1: In order to use this feature, you need to run on Java 8 update 20 or later versions.

Note 2: In order to use ‘-XX:+UseStringDeduplication’, you need to be using the G1 GC algorithm.

Let’s Study With an Example

Let’s validate this feature with this simple program. This example has been chosen basically to study how the JVM handles duplicate strings.

public class StringDeduplicationExample { public static List<String>myStrings = new ArrayList public static void main(String[] args) throws Exception { for (int counter = 0; counter < 200; ++counter) { for (int secondCounter = 0; secondCounter < 1000; ++secondCounter) { // Add it 1000 times. myStrings.add(("Hello World-" + counter)); } System.out.println("Hello World-" + counter + " has been added 1000 times"); } } }





This program basically creates:

1000 instances of “Hello World-0” strings

1000 instances of “Hello World-1” strings

1000 instances of “Hello World-2” strings

...

...

...

1000 instances of “Hello World-199” strings

We ran this program a couple of times with two different JVM arguments.

Run #1

The first time we ran the program by passing ‘-XX:+UseStringDeduplication’ JVM argument. i.e.:

-Xmx20M -XX:+UseG1GC -XX:+UseStringDeduplication





Run #2

The second, time we ran the same program without passing the ‘-XX:+UseStringDeduplication’ argument:

-Xmx20M -XX:+UseG1GC





During both the runs, we captured heap dumps and analyzed them through the heap dump analysis tool HeapHero.io. HeapHero.io detects the amount of memory wasted due to various inefficient programming practices, including the amount of memory wasted due to duplicate strings.

Here are the reports generated by HeapHero.io:

Here are few interesting observations from the report



Run #1 (argument passed) Run #2 (argument not passed) Overall Heap Size 7.94mb 15.89mb Total # of Strings 206k 206k Size of memory wasted due to Duplicate Strings 5.6mb 13.81mb



Even though the same code was executed, in Run #1 (where ‘-XX:+UseStringDeduplication’ was passed), you can see the overall heap size was 7.94mb, whereas in Run #2 (‘-XX:+UseStringDeduplication’ is not passed), there is a considerable increase in the overall heap size — 15.89mb.

Even though there is an equivalent number of string objects in both the runs (206k), the amount of memory wasted due to duplicate strings in Run #1 is 5.6mb, whereas in Run #2, it was 13.81mb.

This dramatic reduction in memory consumption was made possible because of the ‘-XX:+UseStringDeduplication’ argument, which evicted a significant number of duplicate strings from the application.

Thus we encourage you to take advantage of ‘-XX:+UseG1GC -XX:+UseStringDeduplication’ and reduce memory wastage caused by duplicate strings. This change has the potential to reduce the overall memory footprint of your application.