If the GC logging (i.e., write() calls) is blocked by OS, the blocking time contributes to the STW pause. The new question is why buffered writes are blocked? Digging into various resources including the kernel source code, we realized that buffered writes could be stuck in kernel code. There are multiple reasons including: (1) stable page write; and (2) journal committing.



Stable page write: JVM writing to GC log files firstly “dirties” the corresponding file cache pages. Even though the cache pages are later persisted to disk files via OS’s writeback mechanism, dirtying the cache pages in memory is still subject to a page contention caused by “stable page write.” With “stable page write,” if a page is under OS writeback, a write() to this page has to wait for the writeback to complete. The page is locked to ensure data consistency by avoiding a partially fresh page from being persisted to disk.



Journal committing: For the journaling file system, appropriate journals are generated during file writing. When appending to the GC log file results in new blocks being allocated, the file system needs to commit the journal data to disk first. During journal committing, if the OS has other IO activities, the commitment might need to wait. If the background IO activities are heavy, the waiting time can be noticeably long. Note that EXT4 file system has a feature of “delayed allocation” which postpones certain journal data to OS writeback time, which alleviates this problem. Note also that changing EXT4’s data mode from the default “ordered” mode to “writeback” does not really address this cause, as the journal needs to be persisted before write-to-extend call returns.

Background IO activities

From the standpoint of a particular JVM garbage collection, background IO activities are inevitable in typical production environments. There are several sources of such IO activities: (1) OS activity; (2) administration and housekeeping software; (3) other co-located applications; (4) IO of the same JVM instance. First, OS contains many mechanisms (e.g., “/proc” file system) that incur data writing to underlying disks. Second, system-level software such as CFEngine also perform disk IO. Third, if the node has co-located applications that share the disk drives, then other applications contend on IO. Fourth, the particular JVM instance may use disk IO in ways other than GC logging.

Solutions

Since in current HotSpot JVM implementation (as well as in some others) GC logging can be blocked by background IO activities, there are various solutions that can help mitigate this problem when writing to GC log file.

First, the JVM implementation could be enhanced to completely address this issue. Particularly, if the GC logging activities are separated from the critical JVM GC processes that cause STW pauses, then this problem will go away. For instance, JVM can put the GC logging function into a different thread which handles the log file writing independently, hence not contributing to the STW pauses. Taking the separate-thread approach however risks losing last GC log information during JVM crash. It might make sense to expose a JVM flag allowing users to specify their preference.

Since the extent of STW pauses caused by background IO depends on how heavy the latter is, various ways to reduce the background IO intensity can be applied. For instance, de-allocating other IO-intensive applications on the same node, reducing other types of logging, improving on the log rotation, etc.

For latency-sensitive applications such as online ones serving interactive users, large STW pauses (e.g., >0.25 seconds) are intolerable. Hence, special treatments need to be applied. The bottom line of ensuring no big STW pauses induced by OS is to avoid GC logging being blocked by OS IO activities.

One solution is to put GC log files on tmpfs (i.e., -Xloggc:/tmpfs/gc.log). Since tmpfs does not have disk file backup, writing to tmpfs files does not incur disk activities, hence is not blocked by disk IO. There are two problem with this approach: (1) the GC log file will be lost after system crashes; and (2) it consumes physical memory. A remedy to this is to periodically backup the log file to persistent storage to reduce the amount of the loss.1

Another approach is to put GC log files on SSD (Solid-State Drives), which typically has much better IO performance. Depending on the IO load, SSD can be adopted as a dedicated drive for GC logging, or shared with other IO loads. However, the cost of SSD needs to be taken into consideration.

Cost-wise, rather than using SSD, a more cost-effective approach is to put GC log file on a dedicated HDD. With only the IO activity being the GC logging, the dedicated HDD likely can meet the low-pause JVM performance goal. In fact, the Scenario I we showed above can mimic such a setup, since in that setup no other IO activities exist on the GC-logging drive.

Evaluation of putting GC log on SSD and tmpfs

We take the dedicated-file-system approach by putting the GC log file on SSD and tmpfs drives. We run the same Java workload and the background IO load as in Scenario II.

For both SSD and tmpfs, we observe similar results, and the following figure shows the results of putting GC log file on a SSD disk. We notice that the JVM pausing performance are on-par with Scenario I, and all pauses are under 0.25 seconds. The results indicate the background IO load does not impact the application performance.