Garbage-First Garbage Collector (G1GC) is a new GC Algorithm introduced in later version of JDK 1.7. Prior to the introduction of G1GC there have been 2 other GC Algorithms: ParallelGC and Concurrent Mark Sweep (CMS) algorithms. While ParalleGC was the choice for high throughput applications like SOA, CMS was the choice for applications requiring low latency. ParalleGC, CMS and G1GC all exist in JDK 1.7 and higher versions. One must make a informed decision to pick the right algorithm for their application.

This blog features tuning of G1GC specifically to Oracle SOA product. However, the same tuning process can be applied to other applications/products running with G1GC algorithm.

Considerations for Moving to G1GC

While ParallelGC maybe well suited for SOA applications, there are times when one might think of going for G1GC instead of ParallelGC. Some of the considerations for moving to G1GC are the following:

1. Full GC Long or Frequent

ParallelGC is a high throughput algorithm in which it accumulates garbage over time. Once the garbage has eaten up all the heap then it decides to do a FULL GC (Garbage Collection) that will clean up the garbage. However, doing so it takes longer time toe execute Full GC. This may not be optimal for many solutions because of functional implications due to stop-the-world the FULL GC introduces. A FULL GC can also cause several problems like dead node detection, service migration, JTA migration etc. So, if the FULL GC is taking a long time and if it is not desirable for the solution then G1GC can be chosen.

2. Heap Size very Large

With a large heap in order of 16G or over, the Full GC times will take longer with ParalleGC. The time taken by ParallelGC increases with heap size increasing. With very large heap G1GC can be chosen.

3. Hardware

G1GC has more overhead with using system threads than parallel GC. G1GC does incremental GC collections without the usual Stop the World (STW). Hence G1GC is recommended on hardware that has sufficient hardware threads where concurrent GCs can occur without noticeable degradation of application performance.

Tuning G1GC

After going through the considerations and if the situations meets the some or all of the considerations it is a good idea to go ahead with G1GC algorithm.

The SOA application in reference to this blog met all the criterion above. The SOA application had a huge heap about 22G. With 22G of heap the ParallelGC takes a long time to complete FullGC. Moreover, this SOA application was running on Exalogic with 22 hardware threads per node. This gave sufficient concurrency to run G1GC while not comprising on the threads that were used for the application.

The next step was to tune G1GC to meet the following goals:

Goals:

1. Best Throughput:

G1GC algorithm should provide comparable throughput as that given by ParallelGC. The throughput shall be measured in every test and tuning shall be done to better the throughput test after test until the throughput cannot can be bettered any further.

2. Avoid Full GC OR Constant-Time Full GC:

As much possible the tuning must help to avoid FULL GC. If FULL GCs are unavoidable then atleast the FULL GCs should take constant time and this time should be as low as possible. For this the Promotion Rate of Objects and Freed memory/Min of the objects should be bettered test after test

There were multiple tests that were run and after each test the results were analyzed to further tune the G1GC until the desired goals were met.

Analysis Tools:

For the analysis of the test results the following tools were used:

1. gceasy.io This is web-based gc Analyzer tool availble at www.gceasy.io 2. GC Viewer GC Viewer is a Java-based GUI tool for GC log analysis. Note: Make sure to download the latest source of GC Viewer from GIT and then compile the project to get the jar file. 3. GC Logs The Raw GC logs were analyzed for each test run.

Test1:

This is the first test after changing the GC algorithm from ParallelGC to G1GC. The following are the ParallelGC configurations that were tuned for the SOA application.

Parallel GC configuration:

-Xmx18432M -Xms18432M -Xmn6144M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseParallelOldGC -XX:ParallelGCThreads=22 -XX:-UseAdaptiveSizePolicy -XX:SurvivorRatio=4

When moving from ParallelGC to G1GC, do not introduce too many tuning changes right in the first test. Tuning should always be incremental and there needs to be good rationale to tune any parameter. The following were the first cut of G1GC parameters

G1GC Configuration:

-Xmx18432M -Xms18432M -Xmn6144M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseParallelOldGC -XX:+UseG1GC -XX:ConcGCThreads=6 -XX:MaxGCPauseMillis=1000 -XX:ParallelGCThreads=22 -XX:-UseAdaptiveSizePolicy -XX:SurvivorRatio=4

Clearly there are some addtional parameters and and some parameters are taken off. The following explains the reasoning behind the parameter changes:

UseAdaptiveSizePolicy and -Xmn6144M should both be removed with G1GC because G1GC automatically helps adjust the sizes of New Gen and Old Gen.

ConcGCThreads is set to 6. Generally this should be 1/4th of ParallelGC threads.

MaxGCPauseMillis is set to 1000ms.

Observations and Recommendations:

1. In the logs the following traces are observed:

13491.652: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 107 regions, reclaimable: 543317752 bytes (2.80 %), threshold: 10.00 %]

This shows that the concurrent marking cycle has completed but the Mixed GCs are not done because the reclaimable space is less than tolerable limit set for the waste with G1HeapWastePercent. Its default value is 10%. G1HeapWastePercent can be decreased to be around 2% so that mixed GCs can occur at a lower threshold.

2. The heap has grown to 16G and there is not enough space left to promote objects and the young GC fails. At this time, Ergonomics requests to start a concurrent marking cycle because the heap occupancy has suddenly crossed the 45% threshold:

2016-05-24T20:38:04.082-0500: 2397.689: [GC pause (young) 2397.689: (to-space exhausted), 52.1631267 secs] [Eden:10096.0M(10096.0M)->0.0B(8192.0K) Survivors: 320.0M->1304.0M Heap: 16.0G(18.1G)->17.4G(18.1G)]

Another attempt to collect but that fails too:

2016-05-24T20:38:56.269-0500: 2449.874: [GC pause (young) (initial-mark) 2449.874: (to-space exhausted), 40.8441375 secs] [Eden: 8192.0K(8192.0K)->0.0B(880.0M) Survivors: 1304.0M->40.0M Heap: 17.4G(18.1G)->17.6G(18.1G)]

Now the heap is too FULL, it resorts to Full GC:

2016-05-24T20:39:38.799-0500: 2492.403: [Full GC 17G->6211M(18G), 52.7499550 secs] [Eden: 0.0B(920.0M)->0.0B(9832.0M) Survivors: 0.0B->0.0B Heap: 17.7G(18.1G)->6211.3M(18.1G)]

To handle this situation the following recommendations can be considered:

i. Decrease the Pause time goal to say 500ms or leave it to default so that young regions get collected earlier than 10GB occupancy

ii. Set G1ResrvePercent=15% which would reserve 2.7GB as free space. This will help avoid the to-space exhaustion since we would always have 2.7gb free for object promotion. (as we saw from the logs that when heap occupancy was 15.6G young GC could successfully promote objects). With this setting if we still see to-space exhaustion then we can further increase this value.

iii. Increase the Heap Size to 18+2.7 GB (or higher) since 2.7gb will be kept as reserved free space.

iv. Tuning the InitiatingHeapOccupancyPercentage is not recommended for now because almost 10G is occupied by the young regions and if those are collected in time then the Full GCs can be avoided.

Test 2:

G1GC Configuration:

-Xmx22528M -Xms22528M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseG1GC -XX:ConcGCThreads=6 -XX:ParallelGCThreads=22 -XX:MaxGCPauseMillis=1000 -XX:G1HeapWastePercent=2 -XX:G1ReservePercent=15

Observations and Recommendations:

1. Avg Promotion Rate=732 kb/sec 2. Freed Mem/min=3872M/min 3. Throughput=98.69% 4. The test indicate that a FULL GC was encountered 5. The following traces are observed:

2016-05-26T02:18:47.053-0500: 26232.157: [GC pause (mixed) 26232.157:[G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 18970,predicted base time: 408.01 ms, remaining time: 591.99 ms, target pause time:1000.00 ms] 26232.157: [G1Ergonomics (CSet Construction) add young regions to CSet,eden: 105 regions, survivors: 23 regions, predicted young region time: 54.20ms]

26232.157: [G1Ergonomics (CSet Construction) finish adding old regions toCSet, reason: reclaimable percentage not over threshold, old: 4 regions, max:256 regions, reclaimable: 428589248 bytes (2.00 %), threshold: 2.00 %]

26232.157: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 105regions, survivors: 23 regions, old: 4 regions, predicted pause time: 480.20ms, target pause time: 1000.00 ms]

26232.399: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 11525947392bytes, allocation request: 0 bytes, threshold: 9663676380 bytes (45.00 %),source: end of GC]

26232.399: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason:reclaimable percentage not over threshold, candidate old regions: 103regions, reclaimable: 428589248 bytes (2.00 %), threshold: 2.00 %], 0.2423031 secs]

[Eden: 840.0M(840.0M)->0.0B(6224.0M) Survivors: 184.0M->96.0M Heap:11.7G(20.0G)->10.8G(20.0G)]



The GC has collected only 4 old regions. The expectation was more old regions to be collected and hence consider working with following parameters:

-XX:G1OldCSetRegionThresholdPercent=15 (Increase this parameter form 10 to 15 so we have higher upper limit for older objects to be collected)

-XX:G1MixedGCLiveThresholdPercent=55 (Decrease this parameter from 65 to 55 to have old objects collected at this occupancy threshold)

6. There is a large flux in eden sizes. For this issue, limiting the maximum eden space maybe a good idea. Making-XX:G1MaxNewSizePercent=25 binds the eden size to 1/4 of the total heap. This should avoid the eden flux.

[Eden: 7984.0M(7984.0M)->0.0B(10.5G) Survivors: 104.0M->168.0M Heap:13.6G(20.0G)->6022.3M(20.0G)]

Where the young gen is so large that G1GC can't possibly finish a marking cycle if too much survives.

7. With this in place the behavior should be a bit more predictable and InitiatingHeapOccupancyPercent can be set to 60, the default value of 45 is too low since there appears to be some sort of steady state around 12-13G.

Test 3:

G1GC Configuration:

-Xmx22528M -Xms22528M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseG1GC -XX:ConcGCThreads=12 -XX:ParallelGCThreads=22 -XX:MaxGCPauseMillis=1000 -XX:G1HeapWastePercent=1 -XX:G1ReservePercent=15 -XX:+UnlockExperimentalVMOptions -XX:G1OldCSetRegionThresholdPercent=20 -XX:G1MixedGCLiveThresholdPercent=45 -XX:G1MaxNewSizePercent=25 -XX:InitiatingHeapOccupancyPercent=60

Observations and Recommendations:

1. Avg Promotion Rate=567 kb/sec 2. Freed Mem/min=3925M/min 3. Throughput=98.53% 4. The test indicate that a FULL GC was encountered 5. The Eden flux is no more seen after capping the Max Eden space. However, the eden still keeps decreasing over time making way for the Old Gen to utilize the space. Few more tests can be run before making a decision to cap the Min Eden space.

6. One more problem that is seen is that not enough number of old regions were included in the collections.

[Eden: 1008.0M(1008.0M)->0.0B(976.0M) Survivors: 112.0M->144.0M Heap:18.6G(22.0G)->17.7G(22.0G)] [Times: user=3.41 sys=0.00, real=0.26 secs]

2016-05-31T07:29:30.091-0500: 63963.414: [GC pause (mixed) 63963.414:[G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 18677, predicted base time: 240.93 ms, remaining time: 759.07 ms, target pause time:

1000.00 ms]63963.414: [G1Ergonomics (CSet Construction) add young regions to CSet,eden: 122 regions, survivors: 18 regions, predicted young region time: 65.67ms]

63963.414: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 3 regions, max:564 regions, reclaimable: 236219288 bytes (1.00 %), threshold: 1.00 %]

63963.414: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 122regions, survivors: 18 regions, old: 3 regions, predicted pause time: 310.84ms, target pause time: 1000.00 ms]

63963.709: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 18824036352 bytes, allocation request: 0 bytes, threshold: 14173392060 bytes (60.00 %), source: end of GC]

63963.709: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason:reclaimable percentage not over threshold, candidate old regions: 44 regions,reclaimable: 236219288 bytes (1.00 %), threshold: 1.00 %], 0.2948710 secs]

[Eden: 976.0M(976.0M)->0.0B(976.0M) Survivors: 144.0M->144.0M Heap:18.6G(22.0G)->17.7G(22.0G)]

For this problem, setting G1MixedGCLiveThresholdPercent=85 makes sure that good number of old regions became candidates for inclusion. Looks like lowering the value of G1MixedGCLiveThresholdPercent (in the previous test) had a negative effect on the tests

7. Setting the parameter G1HeapWastePercent to an aggressive value of 1 does benefit much. So for the next test we will set the value back to 2.

Test 4:

G1GC Configuration:

-Xmx20480M -Xms20480M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseG1GC -XX:ConcGCThreads=12 -XX:ParallelGCThreads=22 -XX:MaxGCPauseMillis=1000 -XX:G1HeapWastePercent=2 -XX:NewSize=5g -XX:G1ReservePercent=15 -XX:+UnlockExperimentalVMOptions -XX:G1OldCSetRegionThresholdPercent=15 -XX:G1MixedGCLiveThresholdPercent=85 -XX:G1MaxNewSizePercent=25

Observations and Recommendations:

1. Avg Promotion Rate=567 kb/sec 2. Freed Mem/min=3949M/min 3. Throughput=99.56% 4. The following traces are observed:

47131.902: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason:reclaimable percentage not over threshold, candidate old regions: 220 regions, reclaimable: 428700856 bytes (2.00 %), threshold: 2.00 %]

47450.922: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 1111490560 0 bytes, allocation request: 0 bytes, threshold: 9663676380 bytes (45.00 %), source: end of GC]

47450.922: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 223 regions, reclaimable: 43719 6056 bytes (2.04 %), threshold: 2.00 %]

2016-06-01T07:08:18.279-0500: 47468.369: [GC pause (mixed) 47468.370:[G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 15583,predicted base time: 436.80 ms, remaining time: 563.20 ms, target pause time:1000.00 ms]

47468.742: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycleinitiation, reason: still doing mixed collections, occupancy: 11131682816 bytes, allocation request: 0 bytes, threshold: 9663676380 bytes (45.00 %),source: end of GC]

47468.742: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason:reclaimable percentage not over threshold, candidate old regions: 219 regions, reclaimable: 428301560 bytes (1.99 %), threshold: 2.00 %]

47805.150: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycleinitiation, reason: still doing mixed collections, occupancy: 1114007142

4 bytes, allocation request: 0 bytes, threshold: 9663676380 bytes (45.00 %),source: end of GC]

47805.150: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 227 regions, reclaimable: 44126

5288 bytes (2.05 %), threshold: 2.00 %]

2016-06-01T07:14:10.805-0500: 47820.886: [GC pause (mixed)

47820.886:[G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 15272,predicted base time: 487.50 ms, remaining time: 512.50 ms, target pause time:1000.00 ms]

47821.265: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycleinitiation, reason: still doing mixed collections, occupancy: 1114846003 2 bytes, allocation request: 0 bytes, threshold: 9663676380 bytes (45.00 %),source: end of GC]

47821.265: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason:reclaimable percentage not over threshold, candidate old regions: 219 regions, reclaimable: 429481256 bytes (2.00 %), threshold: 2.00 %]

To continue to have the mixed collections, set G1MixedGCLiveThresholdPercent to 90

Test 5:

G1GC Configuration:

-Xmx22528M -Xms22528M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseG1GC -XX:ConcGCThreads=12 -XX:ParallelGCThreads=22 -XX:MaxGCPauseMillis=1000 -XX:G1HeapWastePercent=2 -XX:G1ReservePercent=15 -XX:+UnlockExperimentalVMOptions -XX:G1OldCSetRegionThresholdPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1MaxNewSizePercent=25

Observations and Recommendations:

1. Avg Promotion Rate=592 kb/sec 2. Freed Mem/min=3900M/min 3. Throughput=99.47% 4. FULL GC encountered 5. We need to shift our attention to the third goal i.e. to contain the Full GC times within a limit.

The GC logs shows that the mixed GC are happening and it is trying to collect as much as it can. The rest of the data that it is not able to collect is because that data is considered live. This live data however contains garbage referenced and kept live by the unreachable classes. In JDK7, these unreachable classes are unloaded as part of the Full GC and that's when this unclaimed garbage also gets collected. So, this is as far as we could go ensuring that the old regions get collected as part of the mixed GCs.

Now, to make sure that the Full GCs that are bound to happen to unload the unreachable classes and collect the objects referenced by these unreachable classes are contained, we will set the minimum size for the young gen. We will set 1/5th as the minimum and 1/4th as the maximum for the young gen.

Test 6:

G1GC Configuration:

Xmx22528M -Xms22528M -XX:PermSize=1536M -XX:MaxPermSize=1536M -XX:+UseG1GC -XX:ConcGCThreads=12 -XX:ParallelGCThreads=22 -XX:MaxGCPauseMillis=1000 -XX:G1HeapWastePercent=2 -XX:G1ReservePercent=15 -XX:+UnlockExperimentalVMOptions -XX:G1OldCSetRegionThresholdPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1NewSizePercent=20 -XX:G1MaxNewSizePercent=25

Observations and Recommendations:

1. Avg Promotion Rate=522 kb/sec 2. Freed Mem/min=4642M/min 3. Throughput=99.35% 4. FULL GC encountered 5. The Avg Promotion Rate and Freed Memory/min is the best from the previous test runs. With all the tuning, there is not way to avoid a FULL GC because of the objects in the live data that are referred by unreachable classes are not collected during Mixed GC in JDK 1.7. However this behaviour has changed in JDK 1.8. Since this exercise was performed on SOA11g, we could only use JDK 1.7 as SOA 11g is not supported on JDK 1.8 as yet. 6. It is observed that the Maximum Eden space is contained between min and max limits.

6. The throughput rate is 99.35% which is almost the same as ParallelGC in theory.

Summary:

Although we could not avoid the occurrence of FULL GC, and therefore could not get tenured generation growth flat. However, we still have been able to achieve the following:

1. The occurrence of FULL GCs were restricted to 1 per little over day 2. Every Full GC took a constant time of avg of 28s. This is very reasonable given that G1GC runs FULL GC on single thread when compared to ParallelGC that makes use of all the ParallelGC threads. However, dues to incremental garbage collection mechanism in G1GC the garbage to be cleared with FULL GC would be far less when compared to ParallelGC. 3. Reduce the promotion rate of objects 4. The average throughput was 99.46% which is very much comparable to the ParallelGC throughput.

Using JDK 8, following the same tuning method one can achieve more flater growth of tenured generation.