Summary

Enhance the G1 garbage collector to automatically return Java heap memory to the operating system when idle.

Non-Goals

Sharing of committed but empty pages between Java processes. Memory should be returned (uncommitted) to the operating system.

The process of giving back memory does not need to be frugal with CPU resources, nor does it need to be instantaneous.

Use of different methods to return memory other than available uncommit of memory.

Support for other collectors than G1.

Success Metrics

G1 should release unused Java heap memory within a reasonable period of time if there is very low application activity.

Motivation

Currently the G1 garbage collector may not return committed Java heap memory to the operating system in a timely manner. G1 only returns memory from the Java heap at either a full GC or during a concurrent cycle. Since G1 tries hard to completely avoid full GCs, and only triggers a concurrent cycle based on Java heap occupancy and allocation activity, it will not return Java heap memory in many cases unless forced to do so externally.

This behavior is particularly disadvantageous in container environments where resources are paid by use. Even during phases where the VM only uses a fraction of its assigned memory resources due to inactivity, G1 will retain all of the Java heap. This results in customers paying for all resources all the time, and cloud providers not being able to fully utilize their hardware.

If the VM were able to detect phases of Java heap under-utilization ("idle" phases), and automatically reduce its heap usage during that time, both would benefit.

Shenandoah and OpenJ9's GenCon collector already provide similar functionality.

Tests with a prototype in Bruno et al., section 5.5, shows that based on the real-world utilization of a Tomcat server that serves HTTP requests during the day, and is mostly idle during the night, this solution can reduce the amount of memory committed by the Java VM by 85%.

Description

To accomplish the goal of returning a maximum amount of memory to the operating system, G1 will, during inactivity of the application, periodically try to continue or trigger a concurrent cycle to determine overall Java heap usage. This will cause it to automatically return unused portions of the Java heap back to the operating system. Optionally, under user control, a full GC can be performed to maximize the amount of memory returned.

The application is considered inactive, and G1 triggers a periodic garbage collection if both:

More than G1PeriodicGCInterval milliseconds have passed since any previous garbage collection pause and there is no concurrent cycle in progress at this point. A value of zero indicates that periodic garbage collections to promptly reclaim memory are disabled.

The average one-minute system load value as returned by the getloadavg() call on the JVM host system (e.g. container) is below G1PeriodicGCSystemLoadThreshold . This condition is ignored if G1PeriodicGCSystemLoadThreshold is zero.

If either of these conditions is not met, the current prospective periodic garbage collection is cancelled. A periodic garbage collection is reconsidered the next time G1PeriodicGCInterval time passes.

The type of periodic garbage collection is determined by the value of the G1PeriodicGCInvokesConcurrent option: if set, G1 continues or starts a concurrent cycle, otherwise G1 performs a full GC. At the end of either collection, G1 adjusts the current Java heap size, potentially returning memory to the operation system. The new Java heap size is determined by the existing configuration for adjusting the Java heap size, including but not limited to the MinHeapFreeRatio , the MaxHeapFreeRatio , and minimum and maximum heap size configuration.

By default, G1 starts or continues a concurrent cycle during this periodic garbage collection. This minimizes disruption of the application, but compared to a full collection may ultimately not be able to return as much memory.

Any garbage collection triggered by this mechanism is tagged with the G1 Periodic Collection cause. An example of how such a log could look like is as follows:

(1) [6.084s][debug][gc,periodic ] Checking for periodic GC. [6.086s][info ][gc ] GC(13) Pause Young (Concurrent Start) (G1 Periodic Collection) 37M->36M(78M) 1.786ms (2) [9.087s][debug][gc,periodic ] Checking for periodic GC. [9.088s][info ][gc ] GC(15) Pause Young (Prepare Mixed) (G1 Periodic Collection) 9M->9M(32M) 0.722ms (3) [12.089s][debug][gc,periodic ] Checking for periodic GC. [12.091s][info ][gc ] GC(16) Pause Young (Mixed) (G1 Periodic Collection) 9M->5M(32M) 1.776ms (4) [15.092s][debug][gc,periodic ] Checking for periodic GC. [15.097s][info ][gc ] GC(17) Pause Young (Mixed) (G1 Periodic Collection) 5M->1M(32M) 4.142ms (5) [18.098s][debug][gc,periodic ] Checking for periodic GC. [18.100s][info ][gc ] GC(18) Pause Young (Concurrent Start) (G1 Periodic Collection) 1M->1M(32M) 1.685ms (6) [21.101s][debug][gc,periodic ] Checking for periodic GC. [21.102s][info ][gc ] GC(20) Pause Young (Concurrent Start) (G1 Periodic Collection) 1M->1M(32M) 0.868ms (7) [24.104s][debug][gc,periodic ] Checking for periodic GC. [24.104s][info ][gc ] GC(22) Pause Young (Concurrent Start) (G1 Periodic Collection) 1M->1M(32M) 0.778ms

In the above example, run with a G1PeriodicGCInterval of 3000ms, in step (1) G1 initiates a concurrent cycle, as indicated by (Concurrent Start) and (G1 Periodic Collection) , after some inactivity of the application. This concurrent cycle initially returns some memory, shown by the decrease in the capacity numbers (78M) and (32M) from (1) to (2). In the interval between (2) to (4) more periodic collections are triggered, this time triggering a mixed collection to compact the heap. The following periodic garbage collections (5) to (7) start a concurrent cycle as G1 policy determines that at that time there is not enough garbage in the old generation to start a mixed GC phase. In this case, periodic garbage collections (5) to (7) will not further shrink the heap since the minimum heap size has already been reached.

Changes to object liveness during application inactivity (e.g., due to soft references expiring) may trigger further reductions in committed Java heap during that idle time.

Alternatives

Similar functionality could be achieved from outside the VM, e.g., via the jcmd tool or some code injected into the VM. This has hidden costs: assuming that the check is performed using a cron-based task, in case of hundreds or thousands of containers on a node this may mean that the heap compaction action is performed at the same time by many of these containers, which results in very large CPU spikes on the host.

Another alternative is a Java agent which is automatically attached to each Java process. Then the time of the check is distributed naturally as containers start at different time, plus it's less expensive on CPU because you do not launch any new process. However this method adds significant complexity for users, which may discourage adoption.

The given use case, shrinking the Java heap in a timely fashion, is considered a fairly common use case that warrants special support in the VM.

Risks and Assumptions

In the default values of the configuration we disable this feature. This results in no unexpected changes in the VM behavior for latency or throughput sensitive applications. When enabled, we assume that in general giving back Java heap memory to the operating system is desirable, and the impact of the resulting concurrent cycle or its continuation on application throughput is negligible.

When this feature is enabled, the VM runs these periodic collections under the conditions above regardless of other options. E.g. the VM could make an assumption that if the user sets -Xms to -Xmx and other (combinations of) options to get minimal and consistent garbage collection pauses. This will not be the case for consistency reasons.

In case periodic garbage collections are still disturbing program execution too much, we provide controls to let the decision take overall system CPU load into account, or let the user disable periodic garbage collections completely.