Ruby 2.1: RGenGC

Ruby 2.1 adds a "restricted" generational collector, with minor mark phases that dramatically reduce the cost of GC.

Let's take a look at the evolution of Ruby's GC.

Ruby 1.8: simple mark and sweep

Classic mark and sweep implementation. The entire world is stopped during both phases.

Traverse object graph from roots and mark live objects, using a bit inside the object structure (FL_MARK). Iterate over all heap slots and add unmarked slots to the freelist.

Ruby 1.9.3: lazy sweep

@nari3 adds LazySweepGC, reducing GC pauses to just the mark phase. The heap is swept incrementally as object slots are required.

Ruby 2.0: bitmaps for COW-safety

@nari3 adds Bitmap Marking GC, helping Unix systems share memory across child processes. The mark phase is also rewritten to be non-recursive.

Although the memory savings from bitmaps were only modest, the patch freed up a bit ( FL_MARK became FL_WB_PROTECTED later) and laid the groundwork for a generational collector.

Ruby 2.1: oldgen and minor marking

@ko1 designs RGenGC, a generational collector that can be implemented incrementally and supports C-extensions.

Objects on the heap are divided into two categories:

protected by a write-barrier ( FL_WB_PROTECTED )

) unprotected (or "shady") missing write-barrier ( Proc , Ruby::Env ) unsafe access from C-extension ( RARRAY_PTR , RSTRUCT_PTR )



Only protected objects can be promoted to oldgen.

(This is the "restricted" in RGenGC.)

Unprotected objects cannot be promoted, but if referenced from oldgen they are added to a remembered set. Minor marks are much faster because they only have to traverse references from the remembered set.

Heap Layout

Ruby objects live on the ruby heap, which is split up into pages. Each page is 16KB and holds ~408 object slots.

Every RVALUE slot on this page occupies 40 bytes. Strings shorter than 23 bytes and Arrays with less than four elements can be embedded directly within this 40 byte slot.

GC :: INTERNAL_CONSTANTS [ :RVALUE_SIZE ] #=> 40 bytes per object slot (on x84_64) GC :: INTERNAL_CONSTANTS [ :HEAP_OBJ_LIMIT ] #=> 408 slots per heap page

Pages in the heap reside in one of two places: eden or tomb. Eden contains pages that have one or more live objects. The tomb contains empty pages with no objects. The sum of these pages represent the capacity of the ruby heap:

GC . stat ( :heap_used ) == GC . stat ( :heap_eden_page_length ) + GC . stat ( :heap_tomb_page_length )

During lazy sweep eden pages are swept one at a time. Each page provides up to 408 object slots for re-use. By the time the lazy sweep completes, all unmarked slots in eden have been replaced by new objects.

Empty pages found during sweep are moved to the tomb. This reduces fragmentation in eden (by filling in sparse pages first), and allows the tomb to be grown or shrunk before it is used.

Once eden runs out of slots, the empty pages in tomb are moved back to eden to add space. This happens incrementally, one page at a time. When the tomb runs out of pages, a mark is triggered and the cycle begins anew.

Major vs Minor GC

As an example, let's look at github.com's large rails app.

First, we'll measure how many long-lived objects are created after the app is booted:

# preload controllers/models and other code GitHub . preload_all # measure heap stats after a major mark and full sweep GC . start # same as GC.start(full_mark: true, immediate_sweep: true) # three ways to measure live slots count = ObjectSpace . count_objects count [ :TOTAL ] - count [ :FREE ] #=> 565121 GC . stat ( :heap_live_slot ) #=> 565121 GC . stat ( :total_allocated_object ) - GC . stat ( :total_freed_object ) #=> 565121

Of these ~565k long-lived bootup objects, ~95% are promoted to oldgen:

s = GC . stat 100 . 0 * s [ :old_object ] / s [ :heap_live_slot ] #=> 94.90 100 . 0 * s [ :remembered_shady_object ] / s [ :heap_live_slot ] #=> 1.88

This means only ~5% of the heap needs to be traversed on minor marks, via references from the ~2% of objects that are remembered.

As expected, this makes minor GC pauses much much shorter: only 7ms in our app compared to 58ms for a major mark.

time { GC . start ( full_mark : true , immediate_sweep : false ) } #=> 0.058 time { GC . start ( full_mark : false , immediate_sweep : false ) } #=> 0.007

The majority of mark phases during code execution will use a minor mark and finish very quickly. However, over time the size of the remembered set and oldgen can grow. If either of these double in size, a major mark is used to reset them.

The limits used to trigger a major mark can be monitored via GC.stat:

>> GC . stat . values_at ( :remembered_shady_object , :old_object ) => [10647, 536785] >> GC . stat . values_at ( :remembered_shady_object_limit , :old_object_limit ) => [21284, 1073030]

The frequency of major vs minor marks can also be monitored. For example, you might graph "major GCs per request" , "minor GCs per request" or "minor GCs per major GC" .

GC . stat ( :count ) == GC . stat ( :minor_gc_count ) + GC . stat ( :major_gc_count )

Tuning Variables

In our app above, we use the following GC settings:

export RUBY_GC_HEAP_INIT_SLOTS = 600000 export RUBY_GC_HEAP_FREE_SLOTS = 600000 export RUBY_GC_HEAP_GROWTH_FACTOR = 1.25 export RUBY_GC_HEAP_GROWTH_MAX_SLOTS = 300000

RUBY_GC_HEAP_INIT_SLOTS: initial number of slots on the heap (default: 10000)

Our app boots with ~600k long-lived objects, so we use 600k here to reduce GC activity during boot.

RUBY_GC_HEAP_FREE_SLOTS: minimum free slots reserved for sweep re-use (4096)

Our servers have extra RAM, so we bump this up to trade off memory for time between GCs. An average request allocates 75k objs, so 600k free slots gives us ~8 requests in between each mark phase.

RUBY_GC_HEAP_GROWTH_FACTOR: factor used to grow the heap (1.8x)

Since our heap is already quite big with the settings above, we reduce the growth factor (1.25x) to add slots in smaller increments.

RUBY_GC_HEAP_GROWTH_MAX_SLOTS: maximum new slots to add (no limit)

In addition to reducing the growth factor, we cap it so a maximum of 300k objects can be added to the heap at once.

malloc() Limits

So Ruby objects occupy 40 bytes each, inside pages on the eden heap.

When an object needs even more space, it allocates memory on the regular process heap (via a ruby_xmalloc() wrapper). For instance when a string outgrows 23 bytes, it allocates a separate, larger buffer for itself. The additional memory used by this string (or any object) can be measured with ObjectSpace.memsize_of(o) in objspace.so.

Internally the VM keeps track of malloc_increase , the number of bytes that have been allocated but not yet freed. This is effectively the memory growth of the process. When more than 16MB is added, a GC is forced even if free slots are still available. The limit starts at 16MB, but adapts to the memory usage patterns in your code.

The initial/max values and dynamic growth factor can also be controlled via environment variables:

RUBY_GC_MALLOC_LIMIT: (default: 16MB)

RUBY_GC_MALLOC_LIMIT_MAX: (default: 32MB)

RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR: (default: 1.4x)

Similarly, the memory growth associated with oldgen is tracked separately in oldmalloc_increase . When this limit is tripped, a major GC is forced. These limits can be tuned as well:

RUBY_GC_OLDMALLOC_LIMIT: (default: 16MB)

RUBY_GC_OLDMALLOC_LIMIT_MAX: (default: 128MB)

RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR: (default: 1.2x)

Both malloc increase and limit values can be monitored via GC.stat:

>> GC . stat . values_at ( :malloc_increase , :malloc_limit ) => [14224, 64000000] >> GC . stat . values_at ( :oldmalloc_increase , :oldmalloc_limit ) => [20464, 64000000]

In our app we've increased the initial limit to 64MB, to reduce GC activity during boot and when memory usage peaks.

export RUBY_GC_MALLOC_LIMIT = 64000000 export RUBY_GC_OLDMALLOC_LIMIT = 64000000

GC Events

And finally, ruby 2.1 ships with new tracepoints that can be used to monitor the GC at runtime. These are available from C, via rb_tracepoint_new() :

RUBY_INTERNAL_EVENT_GC_START

RUBY_INTERNAL_EVENT_GC_END_MARK

RUBY_INTERNAL_EVENT_GC_END_SWEEP

C-extensions using these events can also take advantage of rb_gc_stat() and rb_gc_latest_gc_info() , which provide safe access to GC.stat and GC.latest_gc_info .

Ruby 2.2 and beyond

With the introduction of RGenGC, Ruby 2.1 includes a significant upgrade to ruby's GC. Seven millisecond minor marks and a 95% oldgen promotion rate are remarkable achievements, especially considering not one of our C-extensions had to be modified. Hats off to @ko1!