Gleb Arshinov

The News : Ruby on Rails performance is dominated by garbage collection. We present a set of patches to greatly improve Rails performance and show how to profile memory usage to get further performance gains.

What's at Stake : Rails is slow for many uses and did not lend itself well to optimization. Significant performance gains could only be achieved at application level at large development cost.

The Upside : 5x potential performance gains; easy way to identify whether GC is a bottleneck; deterministic process to fix memory bottlenecks; set of canned patches to solve the biggest problems; you can help



I don't have time, make my app 5 times faster!

Apply the monkey patches below. Enjoy.

Time your app with garbage collection disabled to figure out how much more performance you are leaving on the table. Enjoy.

Come back for more patches. Enjoy.

What's going on here?

Ruby runtime uses garbage collection. Your code allocates memory as it runs. From time to time garbage collector kicks in, halts the whole system, and cleans up no-longer-referenced memory using mark-and-sweep algorithm.

GC gets triggered by multiple conditions. The one that matters the most is 8M trigger. Every time you allocate 8M of memory GC runs. Complex Rails request can allocate hundreds or even thousands of megabytes of memory, making GC runs dozens of times. Each GC pass takes 50-150ms. You do the math.

It's possible to patch ruby interpreter, increase the triggers and reduce frequency of garbage collection at the cost of additional memory use. This helps, but does not resolve the problem. If there is more garbage, each GC run takes longer. We'll devote a separate post to this later.

But, I am running ruby-prof already, you ignorant fool!?

And so are we. It does not really help:

ruby-prof attributes full cost of garbage collection to the method where GC gets triggered, not the methods that allocate the memory . That's like blaming the last straw for breaking the camel's back.

. That's like blaming the last straw for breaking the camel's back. ruby-prof does not show that garbage collection took place and how much time it took. All you can tell is that some method took a long time.

If you go by the these result, and remove the straw that's breaking the camel's back his load does not get much lighter.

Statistics being on your side, things are not entirely hopeless. GC gets triggered within malloc, so methods doing a lot allocations are also likely to trigger GC. Further, simply allocating memory is relatively expensive, and that will also show up in profiler output.

You might get lucky, optimize ruby-prof hotspot and get better performance, but more often than not you will fail. The correlation is just too weak. We believe this is exactly why Rails resisted performance optimization. Up till now.

Profiling memory allocations

Ruby-prof is almost the right tool for memory profiling. "Almost" is because the free cheese is only found in the mousetrap. You'll have to do some work first.

You need to build patched ruby (patch here) and latest ruby-prof (patch here). With both tools patched, set RubyProf.measure_mode = RubyProf::MEMORY before running the profiler.

Make sure to turn off GC when profiling for pure performance. You'll get get much more representative results.

The detailed HOWTO and description on the process is in our next blog post.

Will it help me?

In general your code will benefit most if you:

Create or update a lot of rows using ActiveRecord

Run large number of SQL statements of any kind using ActiveRecord, e.g. load many AR object through separate #find calls

Use ActiveView helpers, especially in a loop, to generate complex HTML

Use has_many AR association

It's easy to tell for sure if you have performance tests. All that "performance test" means is that you can run a single request and time how long it takes.

Just run the same test as usual, and then with GC disabled and compare the results. Most code will run much faster with GC disabled. E.g. for Acunote we test how long it takes to copy 120 tasks from one sprint from another.

Before optimizations with GC: 14 sec Before optimizations w/out GC: 6 sec

You can't run ruby without GC in production. It will rapidly run out of memory and crash. You can, however, apply optimization, produce less garbage and get your code be even faster than original benchmark with GC turned off!

After optimizations with GC: 3 sec

The code that allocates little memory, but is algorithmically complex and is slow because of that, will see little improvement. For example Acunote's burndown and sprint completion date calculations fall squarely into this category and did not benefit. These aside, Acunote performance was dominated by the cost of memory allocation.

Give me the patches!

Problem Patch ActiveRecord::Base#create calls attributes_with_quotes twice, ActiveRecord::Base#update once. #attributes_with_quotes calls #attributes , which clones all attribute values. None of these clones are necessary.



Impact:

When copying 120 tasks in Acunote this costs 650M. Patch improves performance from 14s to 6s.

Notes:

Helps when you do a lot of creates and updates through ActiveRecord. Rails 1.2 module ActiveRecord class Base private def attributes_with_quotes ( include_primary_key = true ) result = {} @attributes . each_key do | name | if column = column_for_attribute ( name ) result [ name ] = quote_value ( read_attribute ( name ), column ) unless ! include_primary_key && column . primary end end result end end end Rails 2.0 module ActiveRecord class Base private def attributes_with_quotes ( include_primary_key = true , include_readonly_attributes = true ) quoted = {} @attributes . each_pair do | name , value | if column = column_for_attribute ( name ) quoted [ name ] = quote_value ( read_attribute ( name ), column ) unless ! include_primary_key && column . primary end end include_readonly_attributes ? quoted : remove_readonly_attributes ( quoted ) end end end Rails surrounds every SQL call and ActionView::Base#render with Benchmark#realtime . Benchmark#realtime allocates unnecessary 45k per call.



Impact:

When copying 120 tasks in Acunote this costs 120M. Patch improves performance from 6s to 5s.

Notes:

Helps when you do read, create, update lots of AR records, or execute any other SQL, or call #render(:partial) in a loop. module Benchmark def realtime r0 = Time . now yield r1 = Time . now r1 . to_f - r0 . to_f end module_function :realtime end Rails 1.2 tries to pretty-print and log every SQL statement even when logger is in Logger::INFO mode (production environment) and prints nothing.



Impact:

When copying 120 tasks in Acunote this costs 17M. Patch improves performance by 1s.

Notes:

Helps when you do read, create, update lots of AR records, or execute any other SQL in a loop.

Patch only applicable to Rails 1.2. Rails 2.0 has the correct log_info implementation. Rails 1.2 module ActiveRecord module ConnectionAdapters class AbstractAdapter protected def log_info_with_level_check ( sql , name , runtime ) return unless @logger and @logger . level == Logger :: DEBUG log_info_without_level_check ( sql , name , runtime ) end alias_method_chain :log_info , :level_check end end end Rails 2.0

No patch necessary. Comparison of BigDecimal's (used by Rails for Numeric data types) to booleans is slow because it requires unnecessary method_missing call and exception catch.



Impact:

When rendering 120 tasks in Acunote this costs 4M. Patch improves performance by 100-120ms.

Notes:

Helps when you have the imprudence to accidentally compare BigDecimal's with true or false. class BigDecimal alias_method :eq_without_boolean_comparison , :== def eq_with_boolean_comparison ( other ) return false if [ FalseClass , TrueClass ]. include? other . class eq_without_boolean_comparison ( other ) end alias_method :== , :eq_with_boolean_comparison end ActionView::Helpers::JavaScriptHelper #link_to_function allocates lots of memory when run. It runs every time view is rendered.



Impact:

When rendering 120 tasks in Acunote this costs 5M. This improves performance by 100-110ms.

Notes:

Helps when you have rich AJAX page with lots of dynamic elements. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. Hand-replace link_to_function with <a href="#" onclick="..."/> . ActionView::Helpers::FormTagHelper #text_field_tag allocates lots of memory when run. It runs every time view is rendered.



Impact:

When rendering 120 tasks in Acunote this costs 2M. This improves performance by 40-50ms.

Notes:

Helps when you have page with lots of input boxes. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. Hand-replace text_field_tag with <input type="text"/> . ActionView::Helpers::AssetTagHelper #image_tag allocates lots of memory when run. It runs every time view is rendered.



Impact:

When rendering 120 tasks in Acunote this costs 8M. This improves performance by 110-150ms.

Notes:

Helps when you have page with lots of images. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. image_tag itself can be optimized. Help wanted. If you don't use asset hosts, hand-replace image_tag with <img src="..."/> .

If you use asset hosts, there's no patch. Passing a block to a method on instance of ActiveRecord::Associations ::{HasAndBelongsToManyAssociation /etc.} chews up memory.

An example (from activerecord/lib /activerecord/associations.rb) is association.select { |record| record.new_record? } line inside #add_multiple_associated _save_callbacks method.



Impact:

This is used all over ActiveRecord code, and often by application code.

Notes:

We have been unable to figure out the root cause. Help wanted. Separate article to come. None known.

Sometimes it helps to convert association to array before using, for example association.to_a.select { |record| record.new_record? }

How can I help?

Profile memory allocations for your Rails application. Figure out and fix hotspots. Post or link to patches and notes in the comments section. We'll be updating this post, as you and we figure out more.

section. We'll be updating this post, as you and we figure out more. If you are writing a large commercial Rails application, have one of you engineers do some memory profiling. Fast is a feature, and it's a nice way to contribute to the community.

Help us figure out the some of the issues we found, but haven't been able to figure out yet.

Submitted Patches

Benchmark#realtime patch

Patch accepted. Matz committed it to Ruby 1.8 branch. Also, included in Rails trunk.

patch Patch accepted. Matz committed it to Ruby 1.8 branch. Also, included in Rails trunk. ActiveRecord::Base#attributes_with_quotes patch

Patch accepted. Michael Koziarski committed it to Rails trunk.

Thanks To

Alexander Dymo for actually doing all the work

Alexander Goldstein for directing us to look at GC in the first place

DHH and rails-core for giving us something to optimize, and for making Rails code so easy to understand and modify

Charlie Savage and Shugo Maeda for ruby-prof

Sylvain Joyeux's for original memory allocations number measurement patch

Stefan Kaes for all his performance work, railsbench and especially GC patch

Sightings

Updates

Update 1: noted that patches are against 1.2.4 Requested patches against 2.0

Update 2: added patches against Rails 2.0

Update 3: added "submitted patches" section with upstream patches

Update 4: #realtime and #attributes_with_quotes patches accepted upstream

Update 5: added Sightings section