in /tech/ruby

We’re pretty obsessed with performance at Gilt Groupe. You can get a taste for what we’re dealing with, and how we’re dealing with it, from our recent presentation at RailsConf.

One of the techniques we’re using is to precompute what certain high-volume pages will look like at a given time in the future, and store the result as static HTML that we serve to the actual users at that time. For ease of initial development, and because there’s still a fair bit of business logic involved in determining which version of a particular page to serve, this was done inside our normal controller actions which look for a static file to serve, before falling back to generating it dynamically.

We’re now running on Rails 2.3 and, of course, Rails Metal is the new hotness in 2.3. I spent the last couple days looking into how much improvement in static file serving we would see by moving it into the Metal layer. Based on most of what I’ve read, I expected we might shave off a couple milliseconds. This expectation turned out to be dramatically wrong.

Metal components operate outside the realm of the usual Rails timing and logging components, so you don’t get any internal measurements of page performance. Instead, I fired up ab to measure the serving times externally. What I found for the page I was benchmarking was that the Metal implementation took about 5ms. The old controller action took 170ms. But, wait... the Rails logs were only reporting 8ms for that action. Something was fishy.

I started inserting timers at various places in the Rails stack, trying to figure out where the other 160ms was going. A little bit was routing logic and other miscellaneous overhead, but even setting a timer around the very entry points into the Rails request serving path, I was only seeing 15ms being spent. This was getting really puzzling, because at this point where a Rack response is returned to the web server, I expected things to look identical between Metal and ActionController. However, looking more closely at the response objects I discovered the critical difference. The Metal response returns an [String] , while the controller returned an ActionController::Response.

I went into the Rails source and found the each method for ActionController::Response. Here it is:

def each(&callback) if @body.respond_to?(:call) @writer = lambda { |x| callback.call(x) } @body.call(self, self) elsif @body.is_a?(String) @body.each_line(&callback) else @body.each(&callback) end @writer = callback @block.call(self) if @block end

The critical line is the case where the body is a String. The code iterates over each line in the response. Each line is written individually to the network socket. In the case of the particular page I was looking at, that was 1300 writes. Ouch.

To confirm this was the problem, I changed that line to

yield @body

With the whole body being sent in a single write, ab reported 15ms per request, right in line with what I measured inside Rails.

1 line changed. 150ms gained. Not too bad.

This sort of performance pessimization we uncovered is particularly insidious because it’s completely invisible to all the usual Rails monitoring tools. It doesn’t show up in your logged response time; you won’t see it in NewRelic or TuneUp. The only way you’re going to find out about it is by running an external benchmarking tool. Of course, this is always a good idea, but it’s easy to forget to do it, because the tools that work inside the Rails ecosystem are so nice. But the lesson here is, if you’re working on performance optimizations, make sure to always get a second opinion.