EDIT: For all those tuning into my blog for the first time be sure to check out the Ruby threading bugfix from an earlier post: Ruby Threading Bugfix: Small fix goes a long way

In this blog post I’m releasing some patches to MRI Ruby 1.8.7p72 that add heap dumping, object reference finder, stack dumping, object allocation/deallocation tracking, and some more goodies to MRI Ruby 1.8.7p72. These are some changes to Ruby I’ve been working on at the startup I am with (Kickball Labs!), but I think they are generally useful to other people who are looking to try and track down memory problems in Ruby. Let’s get started…

Memory leaks? In Ruby ?!

Many people working on large Ruby applications have complained about memory consumption and the lack of tools available to dump the object heap and search for which objects hold references to other objects. In a future blog post, I hope to talk about different methods of garbage collection and dissect some popular languages and their GC algorithms. Very briefly though, let’s take a quick look at how Ruby’s GC works and how leaks can occur.

The garbage collector in MRI Ruby marks objects as in use during the mark phase if a reference to the object is found. The traversal is started at a set of top level root objects which are marked as in use and recurses on the objects which are referenced from the current object. In this way all reachable objects are marked as in use. The sweep phase then hits every object on the heap and marks objects as free if they are not in use.

We all try to write code which is decoupled, clean, and well documented. Let’s face it though: we are human and we make mistakes. Image a pile of objects all holding references to each other. As long as this pile of objects has no reachable reference from a path originating from a root object, these objects will get garbage collected. Unfortunately, from time to time we forget to clear out all references to old objects we are no longer using. In our pile of objects example, all it takes is one reachable reference from a root object to keep the entire pile of objects in memory.

Here at Kickball labs we thought we were encountering such a situation, but we had no easy way to verify and track it down. I hacked up a heap dumper, stack dumper, reference finder, and some other stuff to help in our quest to take out the trash.

Memtrack APIs

To start I did a bunch of google searching to find if anyone else had gone down this path before. I stumbled across a company called Software Verify, which released a really cool set of APIs, Ruby Memory Tracking API , for tracking object allocation, deallocation, getting the root heap objects, and getting references to objects. The API was really clean and allowed me to hook in my own callbacks for each of those events. One small downside though; the memtrack APIs were for ruby 1.8.6 and when I created a diff they did not apply cleanly to 1.8.7. With a tiny bit of work I was able to get APIs to apply to 1.8.7p72. The patch can be found below and applies to MRI Ruby 1.8.7-p72.

Heap/Stack dumper overview

I hooked in some callbacks to the Memtrack API and added some additional code for heap_find and stack_dump. My algorithm for heap_dump and heap_find is pretty simple. I start by traversing each of the objects on the heap and dumping them each (for heap_dump) or checking if the object has a reference to the object passed into heap_find.

I also copied some code from the mark phase of Ruby’s GC to check the contents of the registers and stack frames. I did this because Ruby checks this information before marking an object as in use. You may be wondering how this can possibly work; couldn’t any FIXNUM that gets passed around in the Ruby internals look like a heap pointer? Well… not quite. Ruby FIXNUMs are 31 bits and as a result they can be bit shifted by 1 bit and their LSB can be set to 1 without losing any data. Doing this ensures that the resulting value cannot look like a pointer to the heap.

HOWTO use the stack/heap dumper

Four functions were added to GC: heap_dump, heap_find, heap_counts, and stack_dump. After applying the patches below and rebuilding Ruby, you can use these functions. There are a bunch of environment variables you can set to control how much output you get. WARNING: There can be a lot of output.

Environment vars and what they do:

RUBY_HEAP_LOG – setting this environment var will cause heap dumping output to be written to the specified file. If this is not set data will be written to stderr

RUBY_HEAP_ALLOC – setting this environment var to anything greather than or equal to 1 will cause output whenever a ruby object is allocated or deallocated this creates a LOT of output.

RUBY_HEAP_SETUP – setting this environment var to anything greater than or equal to 1 will cause output whenever a ruby object has its klass field set.

Ok – now let’s take a quick look at the four added functions and what they do when you use them:

GC.heap_dump – this function writes to stderr (unless RUBY_HEAP_LOG is set) the contents of the Ruby heap. It outputs each object that is not marked as free and information about that object and the objects it references. For strings, for example, the value and length of the string will be output. For classes, the name of the class (unless it is an anonymous class) and the superclass.

GC.heap_find – this function takes one argument and traverses the heap looking for objects which hold a reference to the argument. Once found, it outputs information about the object holding the reference and also information about the object you passed in.

GC.heap_counts – this function is poorly implemented and can only be called after you have called GC.heap_dump. This function outputs the number of each type of Ruby object that has been created (object, node, class, module, string, fixnum, float, etc).

GC.stack_dump – this function checks the registers and stack frames for things which look like pointers to the heap. If the pointer looks like it is pointed at the heap, the address of the pointer is output as well as information about the object it references.

The good stuff: Patches for Ruby 1.8.7p72

Apply the patches below to the MRI Ruby 1.8.7p72 code base and rebuild Ruby. The code below is a bit ugly as it was written in a rush. Use with caution and be sure to email me with any bugfixes or cool additions you come up with!

Patch to apply Memtrack API to MRI Ruby 1.8.7-p72:

memtrack-api-187p72.patch

Patch to apply heap dumper, stack dumper, ref finder, alloc/dealloc tracker, and everything else which is awesome:

heapdumper.patch

Conclusion

Ruby is lacking tools for serious memory analysis. Using the Memtrack API and some other code, tracking down memory issues can be less painful. The tools included in the patches above are only a first cut at developing real, useful memory analysis tools for Ruby. I hope to see more work in creating tools for Ruby.

Thanks for reading! If you have problems using the patches or you create a cool memory analysis tool send me some mail.