MRI maintainers have put a tremendous amount of work into improving the garbage collector in Ruby 2.0 through 2.2. The engine has thus gained a lot more horsepower. However, it’s still not trivial to get the most out of it. In this post we’re going to gain a better understanding of how and what to tune for.

Photo by Steve Arnold, used under a Creative Commons license.

Koichi Sasada (_ko1, Ruby MRI maintainer) famously mentioned in a presentation (slide 89):

Try GC parameters There is no silver bullet No one answer for all applications You should not believe other applications settings easily

Try and try and try!

This is true in theory but a whole lot harder to pull off in practice due to three primary problems:

Interpreter GC semantics and configuration change over time.

One GC config isn’t optimal for all app runtime contexts: tests, requests, background jobs, rake tasks, etc.

During the lifetime and development cycles of a project, it’s very likely that existing GC settings are invalidated quickly.

An evolving Garbage Collector

The garbage collector has frequently changed in the latest MRI Ruby releases. The changes have also broken many existing assumptions and environment variables that tune the GC. Compare GC.stat on Ruby 2.1:

1 2 3 4 5 6 7 8 { :count => 7 , :heap_used => 66 , :heap_length => 66 , :heap_increment => 0 , :heap_live_slot => 26397 , :heap_free_slot => 507 , :heap_final_slot => 0 , :heap_swept_slot => 10698 , :heap_eden_page_length => 66 , :heap_tomb_page_length => 0 , :total_allocated_object => 75494 , :total_freed_object => 49097 , :malloc_increase => 465840 , :malloc_limit => 16777216 , :minor_gc_count => 5 , :major_gc_count => 2 , :remembered_shady_object => 175 , :remembered_shady_object_limit => 322 , :old_object => 9109 , :old_object_limit => 15116 , :oldmalloc_increase => 1136080 , :oldmalloc_limit => 16777216 }

…with Ruby 2.2:

1 2 3 4 5 6 7 8 9 10 11 { :count => 6 , :heap_allocated_pages => 74 , :heap_sorted_length => 75 , :heap_allocatable_pages => 0 , :heap_available_slots => 30162 , :heap_live_slots => 29729 , :heap_free_slots => 433 , :heap_final_slots => 0 , :heap_marked_slots => 14752 , :heap_swept_slots => 11520 , :heap_eden_pages => 74 , :heap_tomb_pages => 0 , :total_allocated_pages => 74 , :total_freed_pages => 0 , :total_allocated_objects => 76976 , :total_freed_objects => 47247 , :malloc_increase_bytes => 449520 , :malloc_increase_bytes_limit => 16777216 , :minor_gc_count => 4 , :major_gc_count => 2 , :remembered_wb_unprotected_objects => 169 , :remembered_wb_unprotected_objects_limit => 278 , :old_objects => 9337 , :old_objects_limit => 10806 , :oldmalloc_increase_bytes => 1147760 , :oldmalloc_increase_bytes_limit => 16777216 }

In Ruby 2.2 we can see a lot more to introspect and tune, but this also comes with a steep learning curve which is (and should be) out of scope for most developers.

One codebase, different roles

A modern Rails application is typically used day to day in different contexts:

Running tests

rake tasks

database migrations

background jobs

They all start pretty much the same way with the VM compiling code to instruction sequences. Different roles affect the Ruby heap and the garbage collector in very different ways, however.

This job typically runs for 13 minutes, triggers 133 GC cycles and allocates a metric ton of objects. Allocations are very bursty and in batches.

1 2 3 4 5 6 7 class CartCleanupJob < ActiveJob :: Base queue_as :default def perform ( * args ) Cart . cleanup ( Time . now ) end end

This controller action allocates 24 555 objects. Allocator throughput isn’t very variable.

1 2 3 4 5 class CartsController < ApplicationController def show @cart = Cart . find ( params [ :id ] ) end end

Our test case contributes 175 objects to the heap. Test cases generally are very variable and bursty in allocation patterns.

1 2 3 4 5 def test_add_to_cart cart = carts ( :empty ) cart << products ( :butter ) assert_equal 1 , cart . items . size end

The default GC behavior isn’t optimal for all of these execution paths within the same project and neither is throwing a single set of RUBY_GC_* environment variables at it.

We’d like to refer to processing in these different contexts as “units of work”.

Fast development cycles

During the lifetime and development cycle of a project, it’s very likely that garbage collector settings that were valid yesterday aren’t optimal anymore after the next two sprints. Changes to your Gemfile, rolling out new features, and bumping the Ruby interpreter all affect the garbage collector.

1 2 3 4 5 source 'https://rubygems.org' ruby '2.2.0' # Invalidates most existing RUBY_GC_* variables gem 'mail' # slots galore

Process lifecycle events

Let’s have a look at a few events that are important during the lifetime of a process. They help the tuner to gain valuable insights into how well the garbage collector is working and how to further optimize it. They all hint at how the heap changes and what triggered a GC cycle.

How many mutations happened for example while

processing a request

between booting the app and processing a request

during the lifetime of the application?

When it booted

When the application is ready to start doing work. For Rails application, this is typically when the app has been fully loaded in production, ready to serve requests, ready to accept background work, etc. All source files have been loaded and most resources acquired.

When processing started

At the start of a unit of work. Typically the start of an HTTP request, when a background job has been popped off a queue, the start of a test case or any other type of processing that is the primary purpose of running the process.

When processing ended

At the end of a unit of work. Typically the end of a HTTP request, when a background job has finished processing, the end of a test case or any other type of processing that is the primary purpose of running the process.

When it terminated

Triggered when the application terminates.

Knowing when and why GC happens

Tracking GC cycles interleaved with the aforementioned application events yield insights into why a particular GC cycle happens. The progression from BOOTED to TERMINATED and everything else is important because mutations that happen during the fifth HTTP request of a new Rails process also contribute to a GC cycle during request number eight.

On tuning

Primarily the garbage collector exposes tuning variables in these three categories:

Heap slot values: where Ruby objects live

Malloc limits: off heap storage for large strings, arrays and other structures

Growth factors: by how much to grow slots, malloc limits etc.

Tuning GC parameters is generally a tradeoff between tuning for speed (thus using more memory) and tuning for low memory usage while giving up speed. We think it’s possible to infer a reasonable set of defaults from observing the application at runtime that’s conservative with memory, yet maintain reasonable throughput.

A solution

We’ve been working on a product, TuneMyGC for a few weeks that attempts to do just that. Our goals and objectives are:

A repeatable and systematic tuning process that respects fast development cycles

It should have awareness of runtime profiles being different for HTTP requests, background job processing etc.

It should support current mainline Ruby versions without developers having to keep up to date with changes

Deliver reasonable memory footprints with better runtime performance

Provide better insights into GC characteristics both for app owners and possibly also ruby-core

Here’s an example of Discourse being automatically tuned for better 99th percentile throughput. Response times in milliseconds, 200 requests:

Controller GC defaults Tuned GC categories 227 160 home 163 113 topic 55 40 user 92 76

1 $ RUBY_GC_TUNE = 1 RUBY_GC_TOKEN = a5a672761b25265ec62a1140e21fc81f ruby script/bench.rb -m -i 200

Raw GC stats from Discourse’s bench.rb script:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 GC STATS: count: 106 heap_allocated_pages: 2447 heap_sorted_length: 2455 heap_allocatable_pages: 95 heap_available_slots: 997407 heap_live_slots: 464541 heap_free_slots: 532866 heap_final_slots: 0 heap_marked_slots: 464530 heap_swept_slots: 532876 heap_eden_pages: 2352 heap_tomb_pages: 95 total_allocated_pages: 2447 total_freed_pages: 0 total_allocated_objects: 27169276 total_freed_objects: 26704735 malloc_increase_bytes: 4352 malloc_increase_bytes_limit: 16777216 minor_gc_count: 91 major_gc_count: 15 remembered_wb_unprotected_objects: 11669 remembered_wb_unprotected_objects_limit: 23338 old_objects: 435435 old_objects_limit: 870870 oldmalloc_increase_bytes: 4736 oldmalloc_increase_bytes_limit: 30286118

1 $ RUBY_GC_TUNE = 1 RUBY_GC_TOKEN = a5a672761b25265ec62a1140e21fc81f RUBY_GC_HEAP_INIT_SLOTS = 997339 RUBY_GC_HEAP_FREE_SLOTS = 626600 RUBY_GC_HEAP_GROWTH_FACTOR = 1.03 RUBY_GC_HEAP_GROWTH_MAX_SLOTS = 88792 RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR = 2.4 RUBY_GC_MALLOC_LIMIT = 34393793 RUBY_GC_MALLOC_LIMIT_MAX = 41272552 RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR = 1.32 RUBY_GC_OLDMALLOC_LIMIT = 39339204 RUBY_GC_OLDMALLOC_LIMIT_MAX = 47207045 RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR = 1.2 ruby script/bench.rb -m -i 200

Raw GC stats from Discourse’s bench.rb script:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 GC STATS: count: 44 heap_allocated_pages: 2893 heap_sorted_length: 2953 heap_allocatable_pages: 161 heap_available_slots: 1179182 heap_live_slots: 460935 heap_free_slots: 718247 heap_final_slots: 0 heap_marked_slots: 460925 heap_swept_slots: 718277 heap_eden_pages: 2732 heap_tomb_pages: 161 total_allocated_pages: 2893 total_freed_pages: 0 total_allocated_objects: 27167493 total_freed_objects: 26706558 malloc_increase_bytes: 4352 malloc_increase_bytes_limit: 34393793 minor_gc_count: 34 major_gc_count: 10 remembered_wb_unprotected_objects: 11659 remembered_wb_unprotected_objects_limit: 27981 old_objects: 431838 old_objects_limit: 1036411 oldmalloc_increase_bytes: 4736 oldmalloc_increase_bytes_limit: 39339204

We can see a couple of interesting points here:

There is much less GC activity – only 44 rounds instead of 106.

Slot buffers are still decent for high throughput. There are 718247 free slots ( heap_free_slots ) of 1179182 available slots ( heap_available_slots ), which is 64% of the current live objects ( heap_live_slots ). This value however is slightly skewed because the Discourse benchmark script forces a major GC before dumping these stats - there are about as many swept slots as free slots ( heap_swept_slots ).

) of 1179182 available slots ( ), which is 64% of the current live objects ( ). This value however is slightly skewed because the Discourse benchmark script forces a major GC before dumping these stats - there are about as many swept slots as free slots ( ). Malloc limits ( malloc_increase_bytes_limit and oldmalloc_increase_bytes_limit ) and growth factors ( old_objects_limit and remembered_wb_unprotected_objects_limit ) are in line with actual app usage. The TuneMyGC service considers when limits and growth factors are bumped during the app lifecycle and attempts to raise limits via environment variables slightly higher to prevent excessive GC activity.

Now it’s your turn.

Feel free to take your Rails app for a spin too!

1. Add to your Gemfile.

1 gem 'tunemygc'

2. Register your Rails application.

1 2 $ bundle exec tunemygc -r bundletunemygc -r [email protected] Application registered. Use RUBY_GC_TOKEN = 08de9e8822c847244b31290cedfc1d51 in your environment.

3. Boot your app. We recommend an optimal GC configuration when it ends

1 $ RUBY_GC_TOKEN = 08de9e8822c847244b31290cedfc1d51 RUBY_GC_TUNE = 1 bundle exec rails s

Related articles

This article is a part of a series about Rails performance optimization and GC tuning. Other articles in the series: