Ruby 2.1: Profiling Ruby

Ruby 2.1 is shipping with rb_profile_frames() , a new C-api for fetching ruby backtraces. The api performs no allocations and adds minimal cpu overhead making it ideal for profiling, even in production environments.

I've implemented a sampling callstack profiler for 2.1 called stackprof with this API, using techniques popularized by gperftools and @brendangregg. It works remarkably well, and provides incredible insight into the execution of your code.

For example, I recently used StackProf::Middlware on one of our production github.com unicorn workers. The resulting profile is analyzed using bin/stackprof :

$ stackprof data/stackprof-cpu-4120-1384979644.dump --text --limit 4 ================================== Mode: cpu(1000) Samples: 9145 (1.25% miss rate) GC: 448 (4.90%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 236 (2.6%) 231 (2.5%) String#blank? 546 (6.0%) 216 (2.4%) ActiveRecord::ConnectionAdapters::Mysql2Adapter#select 212 (2.3%) 199 (2.2%) Mysql2::Client#query_with_timing 190 (2.1%) 155 (1.7%) ERB::Util#html_escape

Right away, we see that 2.6% of cpu time in the app is spent in String#blank? .

Let's zoom in for a closer look:

$ stackprof data/stackprof-cpu-4120-1384979644.dump --method 'String#blank?' String#blank? (lib/active_support/core_ext/object/blank.rb:80) samples: 231 self (2.5%) / 236 total (2.6%) callers: 112 ( 47.5%) Object#present? code: | 80 | def blank? 187 (2.0%) / 187 (2.0%) | 81 | self !~ /[^[:space:]]/ | 82 | end

We see that half the calls into blank? are coming from Object#present? .

As expected, most of the time in the method is spent in the regex on line 81. I noticed the line could be optimized slightly to remove the double negative:

def blank? - self !~ /[^[:space:]]/ + self =~ /\A[[:space:]]*\z/ end

That helped, but I was curious where these calls were coming from. Let's follow the stack up and look at the Object#present? callsite:

$ stackprof data/stackprof-cpu-4120-1384979644.dump --method 'Object#present?' Object#present? (lib/active_support/core_ext/object/blank.rb:20) samples: 12 self (0.1%) / 133 total (1.5%) callers: 55 ( 41.4%) RepositoryControllerMethods#owner 31 ( 23.3%) RepositoryControllerMethods#current_repository callees (121 total): 112 ( 92.6%) String#blank? 6 ( 5.0%) Object#blank? 3 ( 2.5%) NilClass#blank? code: | 20 | def present? 133 (1.5%) / 12 (0.1%) | 21 | !blank? | 22 | end

So Object#present? is almost always called on String objects (92.6%), with the majority of these calls coming from two helper methods in RepositoryControllerMethods .

The callers in RepositoryControllerMethods appeared quite simple, but after a few minutes of staring I discovered the fatal mistake causing repeated calls to present? :

def owner - @owner ||= User.find_by_login(params[:user_id]) if params[:user_id].present? + @owner ||= (User.find_by_login(params[:user_id]) if params[:user_id].present?) end

This simple 2 byte change removed 40% of calls to String#blank? in our app. To further minimize the cost of String#blank? in our app, we switched to @samsaffron's pure C implementation in the fast_blank gem.

The end result of all these optimizations was a dramatic reduction in cpu time (i.e. without idle time) spent processing requests:

With 2.1 and stackprof, it's easier than ever to make your ruby code fast. Try it today!