The article “Ruby Garbage Collection: Still Not Ready for Production” has been making the rounds.

In it we learned that our GC algorithm is flawed and were prescribed some rather drastic and dangerous workarounds.

At the core it had one big demonstration:

Run this on Ruby 2.1.1 and you will be out of memory soon:

while true "a" * (1024 ** 2) end

Malloc limits, Ruby and you

From very early versions of Ruby we always tracked memory allocation. This is why I found FUD comments such as this troubling:

the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory

This is clearly misunderstanding Ruby. In fact, the aforementioned article does nothing to mention memory allocation may trigger a GC.

Historically Ruby was quite conservative issuing GCs based on the amount of memory allocated. Ruby keeps track of all memory allocated (using malloc) outside of the Ruby heaps between GCs. In Ruby 2.0, out-of-the-box every 8MB of allocations will result in a full GC. This number is way too small for almost any Rails app, which is why increasing RUBY_GC_MALLOC_LIMIT is one of the most cargo culted settings out there in the wild.

Matz picked this tiny number years ago when it was a reasonable default, however it was not revised till Ruby 2.1 landed.

For Ruby 2.1 Koichi decided to revamp this sub-system. The goal was to have defaults that work well for both scripts and web apps.

Instead of having a single malloc limit for our app, we now have a starting point malloc limit that will dynamically grow every time we trigger a GC by exceeding the limit. To stop unbound growth of the limit we have max values set.

We track memory allocations from 2 points in time:

memory allocated outside Ruby heaps since last minor GC

memory allocated since last major GC.

At any point in time we can get a snapshot of the current situation with GC.stat:

> GC.stat => {:count=>25, :heap_used=>263, :heap_length=>406, :heap_increment=>143, :heap_live_slot=>106806, :heap_free_slot=>398, :heap_final_slot=>0, :heap_swept_slot=>25258, :heap_eden_page_length=>263, :heap_tomb_page_length=>0, :total_allocated_object=>620998, :total_freed_object=>514192, :malloc_increase=>1572992, :malloc_limit=>16777216, :minor_gc_count=>21, :major_gc_count=>4, :remembered_shady_object=>1233, :remembered_shady_object_limit=>1694, :old_object=>65229, :old_object_limit=>93260, :oldmalloc_increase=>2298872, :oldmalloc_limit=>16777216}

malloc_increase denotes the amount of memory we allocated since the last minor GC. oldmalloc_increase the amount since last major GC.

We can tune our settings, from “Ruby 2.1 Out-of-Band GC”:

RUBY_GC_MALLOC_LIMIT: (default: 16MB)

RUBY_GC_MALLOC_LIMIT_MAX: (default: 32MB)

RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR: (default: 1.4x)

and

RUBY_GC_OLDMALLOC_LIMIT: (default: 16MB)

RUBY_GC_OLDMALLOC_LIMIT_MAX: (default: 128MB)

RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR: (default: 1.2x)

So, in theory, this unbound memory growth is not possible for the script above. The two MAX values should just cap the growth and force GCs.

However, this is not the case in Ruby 2.1.1

Investigating the issue

We spent a lot of time ensuring we had extensive instrumentation built in to Ruby 2.1, we added memory profiling hooks, we added GC hooks, we exposed a large amount of internal information. This has certainly paid off.

Analyzing the issue raised by this mini script is trivial using the gc_tracer gem. This gem allows us to get a very detailed snapshot of the system every time a GC is triggered and store it in a text file, easily consumable by spreadsheet.

We simply add this to the rogue script:

require 'gc_tracer' GC::Tracer.start_logging("log.txt")

And get a very detailed trace back in the text file:

In the snippet above we can see minor GCs being triggered by exceeding malloc limits (where major_by is 0) and major GCs being triggered by exceeding malloc limits. We can see out malloc limit and old malloc limit growing. We can see when GC starts and ends, and lots more.

Trouble is, our limit max for both oldmalloc and malloc grows well beyond the max values we have defined:

So, bottom line is, looks like we have a straight out bug.

https://bugs.ruby-lang.org/issues/9687

I one line bug, that will be patched in Ruby 2.1.2 and is already fixed in master.

Are you affected by this bug?

It is possible your production app on Ruby 2.1.1 is impacted by this. Simplest way to find out is to issue a GC.stat as soon as memory usage is really high.

The script above is very aggressive and triggers the pathological issue, it is quite possibly you are not even pushing against malloc limits. Only way to find out is measure.

General memory growth under Ruby 2.1.1

A more complicated issue we need to tackle is the more common “memory doubling” issue under Ruby 2.1.1. The general complaint goes something along the line of “I just upgraded Ruby and now my RSS has doubled”

This issue is described in details here: Bug #9607: Change the full GC timing - Ruby trunk - Ruby Issue Tracking System

Memory usage growth is partly unavoidable when employing a generational GC. A certain section of the heap is getting scanned far less often. It’s a performance/memory trade-off. That said, the algorithm used in 2.1 is a bit too simplistic.

If ever an objects survives a minor GC it will be flagged as oldgen, these objects will only be scanned during a major GC. This algorithm is particularly problematic for web applications.

Web applications perform a large amount of “medium” lived memory allocations. A large number of objects are needed for the lifetime of a web request. If a minor GC hits in the middle of a web request we will “promote” a bunch of objects to the “long lived” oldgen even though they will no longer be needed at the end of the request.

This has a few bad side effects,

It forces major GC to run more often (growth of oldgen is a trigger for running a major GC) It forces the oldgen heaps to grow beyond what we need. A bunch of memory is retained when it is clearly not needed.

.NET and Java employ 3 generations to overcome this issue. Survivors in Gen 0 collections are promoted to Gen 1 and so on.

Koichi is planning on refining the current algorithm to employ a somewhat similar technique of deferred promotion. Instead of promoting objects to oldgen on first minor GC and object will have to survive two minor GCs to be promoted. This means that if no more than 1 minor GC runs during a request our heaps will be able to stay at optimal sizes. This work is already prototyped into Ruby 2.1 see RGENGC_THREEGEN in gc.c (note, the name is likely to change). This is slotted to be released in Ruby 2.2

We can see this problem in action using this somewhat simplistic test:

@retained = [] @rand = Random.new(999) MAX_STRING_SIZE = 100 def stress(allocate_count, retain_count, chunk_size) chunk = [] while retain_count > 0 || allocate_count > 0 if retain_count == 0 || (@rand.rand < 0.5 && allocate_count > 0) chunk << " " * (@rand.rand * MAX_STRING_SIZE).to_i allocate_count -= 1 if chunk.length > chunk_size chunk = [] end else @retained << " " * (@rand.rand * MAX_STRING_SIZE).to_i retain_count -= 1 end end end start = Time.now # simulate rails boot, 2M objects allocated 600K retained in memory stress(2_000_000, 600_000, 200_000) # simulate 100 requests that allocate 100K objects stress(10_000_000, 0, 100_000) puts "Duration: #{(Time.now - start).to_f}" puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1; }'`}"

In Ruby 2.0 we get:

% ruby stress.rb Duration: 10.074556277 RSS: 122784

In Ruby 2.1.1 we get:

% ruby stress.rb Duration: 7.031792076 RSS: 236244

Performance has improved, but memory almost doubled.

To mitigate the current pain point we can use the new RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR environment var.

Out of the box we trigger a major gc if our oldobject count doubles. We can tune this down to say 1.3 times and see a significant improvement memory wise:

% RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.3 ruby stress.rb Duration: 6.85115156 RSS: 184928

On memory constrained machines we can go even further and disable generational GC altogether.

% RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb Duration: 6.759709765 RSS: 149728

We can always add jemalloc for good measure to shave off an extra 10% percent or so:

LD_PRELOAD=/home/sam/Source/jemalloc-3.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb Duration: 6.204024629 RSS: 144440

If that is still not enough you can push malloc limits down (and have more GCs run due to hitting it)

% RUBY_GC_MALLOC_LIMIT_MAX=8000000 RUBY_GC_OLDMALLOC_LIMIT_MAX=8000000 LD_PRELOAD=/home/sam/Source/jemalloc-3.5.0/lib/libjemalloc.so RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 ruby stress.rb Duration: 9.02354988 RSS: 120668

Which is nice since we are back to Ruby 2.0 numbers now and lost a pile of performance.

Ruby 2.1 is ready for production

Ruby 2.1 has been running in production at GitHub for a few months with great success. The 2.1.0 release was a little rough 2.1.1 addresses the majority of the big issues it had. 2.1.2 will address the malloc issue, which may or may not affect you.

If you are considering deploying Ruby 2.1 I would strongly urge giving GitHub Ruby a go since it contains a fairly drastic performance boost due to funny-falcons excellent method cache patch.

Performance has much improved at the cost of memory, that said you can tune memory as needed and measure impact of various settings effectively.

Summary

If you discover any issues, please report them on https://bugs.ruby-lang.org/

report them on https://bugs.ruby-lang.org/ Use Ruby 2.1.1 in production, upgrade to 2.1.2 as soon as it is released

Be sure to look at jemalloc and GC tuning for memory constrained systems. See also: Feature #9113: Ship Ruby for Linux with jemalloc out-of-the-box - Ruby trunk - Ruby Issue Tracking System

Always be measuring. If you are seeing issues run GC.stat, you can attach to the rogue process using rbtrace a gem you should consider including on production systems.

Resources: