Alexander Dymo

The Battle For Performance

Rails is slow. Your rails application which does fairy simple database operations will be born slow and will eventually become slower and slower while still not doing anything fancy. That's like getting a new powerful locomotive (on Rails) together with a free bonus of overladen freight cars.

Having said that, I can virtually hear the herd of voices screaming "dude, optimize your own application and don't blame Ruby and Rails for everything". Sure thing, our application isn't perfect and I do believe it can be optimized but you can't win the performance battle just by that.

The winning tactics for this battle is guerrilla tactics. It turns out you can guerrilla-patch (aka monkey-patch) Ruby and Rails and easily get up to 3x speedup in some cases. We started doing that to our own application - Acunote and got amazing results so far. Read on to find out how.

Ruby-Prof: The Guerrilla's Weapon

Acunote, our pet rails application, often need to copy a bunch of ActiveRecord (AR) objects. As any other project management tools, our application works with tasks organized into iterations (sprints). Sometimes you want to copy tasks from current iteration/sprint to the next iteration/sprint. And boy, that copy can become really slow. As an example, let's look at the performance and profiling data of one request to copy 120 tasks.

Let's measure 3 things:

the real time request takes in production environment;

processing time profile information using the excellent ruby-prof profiler by Charlie Savage and Shugo Maeda;

memory profile information gathered with patched ruby-prof (I took Sylvain Joyeux's memory allocations measurement mode patches and hacked together my own ruby-prof memory size measurement mode).

So, for "copy 120" request we get:

Request time (without profiler): 13.6 sec. Time: Total: 16.78 %self total self wait child calls name 18.36 3.34 3.08 0.00 0.26 20766 Kernel#clone ... Allocated Memory Size (in KB): Total: 1 001 730.00 %self total self wait child calls name 61.87 646795.00 619745.00 0.00 27049.00 20766 Kernel#clone ...

"WTF" politely describes my initial reaction when I first saw that. How on earth we can spend 17 seconds and consume 1 Gigabyte of memory by just copying 120 ActiveRecord objects? Ok, ok, we not only copy them but store some more information to the database like task tags, copy events and so on. Therefore we end up saving actually 360 AR objects instead of 120. Also the memory profiler is inaccurate and usually shows a bit more memory than the process actually takes.

Still, what is going on during those 17 seconds? Why do we need a gig of RAM? Gee, on the same machine I can compile the whole ruby interpreter in 57 seconds and the compiler will never take more than 130M...

Heh, let's calm down and see what profiler says. The obvious suspect is Kernel#clone method. The (stripped) call tree for it as reported by ruby-prof is

Task#save ... ActiveRecord::Base#create_without_callbacks ActiveRecord::Base#attributes_with_quotes ActiveRecord::Base#attributes ActiveRecord::Base#clone_attributes ActiveRecord::Base#clone_attribute_value Kernel#clone

From Charlie's blog we know that AR::Base::attributes method is evil. It clones attributes before returning them and Charlie gave a wise advice to not use it. And indeed in your application is a good idea to call, for example, task['foo'] instead of task.foo or task.attributes['foo'] . But here AR::Base::create itself does the evil thing.

Let me cite Rails code (from rails/activerecord/lib/activerecord/base.rb):

def update connection.update( "UPDATE #{self.class.table_name} " + "SET #{quoted_comma_pair_list(connection, attributes_with_quotes(false))} " + "WHERE #{self.class.primary_key} = #{quote_value(id)}", "#{self.class.name} Update" ) end

def create if self.id.nil? && connection.prefetch_primary_key?(self.class.table_name) self.id = connection.next_sequence_value(self.class.sequence_name) end self.id = connection.insert( "INSERT INTO #{self.class.table_name} " + "(#{quoted_column_names.join(', ')}) " + "VALUES(#{attributes_with_quotes.values.join(', ')})", "#{self.class.name} Create", self.class.primary_key, self.id, self.class.sequence_name ) @new_record = false id end

def quoted_column_names(attributes = attributes_with_quotes) attributes.keys.collect do |column_name| self.class.connection.quote_column_name(column_name) end end

def attributes_with_quotes(include_primary_key = true) attributes.inject({}) do |quoted, (name, value)| if column = column_for_attribute(name) quoted[name] = quote_value(value, column) unless !include_primary_key && column.primary end quoted end end

Great! AR::create clones all attributes twice (first with quoted_column_names and then with attributes_with_quotes ). AR::update is a nicer guy, it clones attributes only once. For each of those 120 copied tasks we use AR::create twice and AR::update once. Therefore we call clone_attributes not less than 12022+120 = 600 times.

Profiler says we lose 3 seconds and 650M memory because of that:

ActiveRecord::Base#clone_attributes: Time: 3.21 sec Memory: 650 301.00 K

Ok. Time to employ our promised guerrilla tactics, but first let's talk about memory.

Lyrical Digression: Why Do We Care About Memory?

While guerrilla warriors take a nap before the battle, let's digress and look at why we measure the size of allocated memory. Sure thing, allocating literally tons of memory is bad. But we're using Ruby and Ruby has its own guerrilla warrior which strikes our application when it innocently eats its tasty memory.

The name of our enemy is "Ruby Garbage Collector" and the cunning enemy it is. As ruby's gc.c source says and Why the Lucky Stiff explains, ruby GC kicks in when you allocate more memory than is available on the heap and heap boundary is 8M. When we sequentially allocate 650 megabytes there's a chance that GC will be called 650/8=81 times. Each GC call accounts for approximately 70-80ms which in our case should add roundabout 5 seconds to the running time.

Let's do some math... Kernel#clone itself takes 3 seconds and we expect garbage collection to take 5 more seconds. In total, 8 seconds are wasted because of cloning. See the trend? The more memory we allocate the worse our performance is.

Here's the morale: "watch your memory". Large memory consumption costs you more than you'd otherwise think because of garbage collection.

Guerrilla patching AR::create and AR::update

Now it's time to employ our guerrilla tactics and fix ActiveRecord. It's easy to rewrite attributes_with_quotes and avoid cloning the attributes:

module ActiveRecord class Base private def attributes_with_quotes(include_primary_key = true) result = {} @attributes.each_key do |name| if column = column_for_attribute(name) result[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary end end result end end end

Now let's put that to environment.rb and see what we get:

Request time (without profiler): 6.7 sec. Time: Total: 9.5 %self total self wait child calls name 4.84 0.69 0.46 0.00 0.23 24023 ActiveRecord::ConnectionAdapters::Quoting#quote ... 2.22 0.52 0.18 0.00 0.34 2779 <Module::Benchmark>#realtime ... 0.00 0.00 0.00 0.00 0.00 6 Kernel#clone ... Allocated Memory Size (in KB): Total: 350 966.00 %self total self wait child calls name 34.31 125655.00 120420.00 0.00 5235.00 2779 <Module::Benchmark>#realtime ... 0.00 0.00 0.00 0.00 0.00 6 Kernel#clone ...

Ok, much better: 650 megabytes of memory saved with that little monkey patch! We estimated that will save 8 seconds. Let's look:

Original request time: 13.6 sec Request time after AR patch: 6.7 sec Time saved: 6.9 sec

It turns out our estimation was almost correct. We indeed saved not only the time taken by Kernel#clone but also the time spent by garbage collector and gained 2x speedup for free. Nice! Our guerrilla tactics clearly works and we can for sure do better than 2x.

Guerrilla-patching Benchmark.realtime

Next suspect, as seen from memory profile above, is Benchmark#realtime . From time profiler the method doesn't look bad. We spend only 180 milliseconds in it but look at memory profiler! It 180ms our Benchmark#realtime friend allocates 120 megabytes of memory. Guess what? Ruby garbage collector will be happy to kick our ass again at least 15 times and that estimates to about 1 second loss.

Let's see what's wrong in benchmark.rb:

def measure(label = "") t0, r0 = Benchmark.times, Time.now yield t1, r1 = Benchmark.times, Time.now Benchmark::Tms.new(t1.utime - t0.utime, t1.stime - t0.stime, t1.cutime - t0.cutime, t1.cstime - t0.cstime, r1.to_f - r0.to_f, label) end

def realtime(&blk) Benchmark::measure(&blk).real end

Great, we calculate lots of useless information (like cpu time, user time, etc) in Benchmark#measure and then just throw it away. Let's do better, simpler and more memory efficient:

module Benchmark def realtime r0 = Time.now yield r1 = Time.now r1.to_f - r0.to_f end module_function :realtime end

That's it. No extra object creation. And here's our income:

Request time (without profiler): 5.8 sec. Time: Total: 8.63 %self total self wait child calls name 4.29 0.69 0.46 0.00 0.23 24023 ActiveRecord::ConnectionAdapters::Quoting#quote ... 0.35 0.19 0.03 0.00 0.16 2779 <Module::Benchmark>#realtime ... Allocated Memory Size (in KB): Total: 225 668.00 %self total self wait child calls name 43.78 98805.00 98805.00 0.00 0.00 5280 Kernel#binding 0.00 1918.00 2.00 0.00 1916.00 2779 <Module::Benchmark>#realtime ...

Whoa! 120 megabytes and 1 second were saved with 9-liner guerrilla patch. Nice, isn't it? Instead of original 13.6 sec our request now runs in 5.8 sec. That's more than 2x improvement already and we can do even better, but I'll leave that for the next blog.

When Guerrilla Tactics Help

Nothing with Ruby and Rails is wrong until you do things in the loop. For example, Benchmark#realtime doesn't have any impact if all you do is save one AR object. You just allocate 25 extra objects that take 45 extra kilobytes. But you're doomed once you do that in a loop. Do the benchmark 200 times and you'll guaranteed the garbage collector will happily shoot you in the back.

Also bear in mind that garbage collector should not be running while you profile your code. With GC you'll never get repeatable results because it will kick in at different times randomly increasing the time and memory consumption of your methods.

Make Your Application Fast: Optimize MEMORY

Let me reiterate, by optimizing memory usage instead of (or in parallel with) processing time you can get significant performance improvement, and guerrilla patching memory-inefficient bits is the right way to make Ruby and Rails faster I think we're on the right track here. Memory is what we need to optimize. Memory is what wasn't optimized before. And you've just seen what we can get by optimizing memory.

We (Acunote devs) will continue the memory profiling and will blog as soon as we get new results. Our previous blog post briefly summarize our current results, be sure to read it if you haven't done that yet. Meanwhile, don't wait for us, grab ruby-prof, my memory profiler patches and dig by yourself ;) I bet you'll find more big memory eaters and when you do so, please drop us a note. Let's collect our guerrilla patches and then file them upstream.