At Codemancers, we’re building Rbkit, a fresh code profiler for the Ruby language with tonnes of cool features. I’m currently working on implementing a CPU profiler inside rbkit gem which would help rbkit UI to reconstruct the call graph of the profiled ruby process and draw useful visualizations on the screen. I learned a bunch of new things along the way and I’d love to share them with you in this series of blog posts.

We’re going to start from the fundamentals and step by step, we’re going to write a rudimentary CPU profiler for Ruby ourselves! By the time we finish we would learn :

What’s CPU profiling

Profiling modes - Instrumentation and Sampling

CPU time and Wall time - what they mean and how to measure them

Writing a simple C extension and using it in Ruby land

Ruby tracepoints - call and return

and Signal handling in C

Pausing Ruby process using a signal and peeking at the call stack

Some useful and silly experiments using profiling data

Part I. An Introduction to CPU Profiling

By doing a CPU profile of your program, you can find out how expensive your program is with respect to CPU usage. In order to profile your program, you’ll need to use a profiling tool and follow the following steps :

Start CPU profiling Execute the code you want to profile Stop CPU profiling and get profiling result Analyze result

By analyzing the profiling result, you can find the bottlenecks which slow down your whole program.

Profiling modes

CPU profiling is done in broadly 2 ways:

1. Instrumentation

In this mode, the profiling tool makes use of some hooks, either provided by the interpreter or inserted into the program, to understand the call graph and measure the execution time of each method in the call graph.

For example, consider the following piece of Ruby code :

def main 3 . times do find_many_square_roots find_many_squares end end def find_many_square_roots 5000 . times { | i | Math . sqrt ( i )} end def find_many_squares 5000 . times { | i | i ** 2 } end main

I’ve inserted some comments to help understand how the hooks would get executed if the Ruby interpreter gave us method call and return hooks:

def main # method call hook gets executed 3 . times do find_many_square_roots find_many_squares end # method end hook gets executed end def find_many_square_roots # method call hook gets executed 5000 . times { | i | Math . sqrt ( i )} # method end hook gets executed end def find_many_squares # method call hook gets executed 5000 . times { | i | i ** 2 } # method end hook gets executed end main

Now if we could print the current time and the name of the current method inside these hooks, we’d get an output which looks somewhat like this :

sec:00 usec:201007 called main sec:00 usec:201108 called find_many_square_roots sec:00 usec:692123 returned find_many_square_roots sec:00 usec:692178 called find_many_squares sec:00 usec:846540 returned find_many_squares sec:00 usec:846594 called find_many_square_roots sec:01 usec:336166 returned find_many_square_roots sec:01 usec:336215 called find_many_squares sec:01 usec:484880 returned find_many_squares sec:01 usec:484945 called find_many_square_roots sec:01 usec:959254 returned find_many_square_roots sec:01 usec:959315 called find_many_squares sec:02 usec:106474 returned find_many_squares sec:02 usec:106526 returned main

As you can see, this output can tell us how much time was spent inside each method. It also tells us how many times each method was called. This is roughly how instrumentation profiling works.

Pros :

High accuracy

We get method invocation count

Easy to implement

Cons :

Extra overhead of executing hooks for each and every method that’s profiled

2. Sampling

In this mode of profiling, the profiler interrupts the program execution once in every x unit of time, and takes a peek into the call stack and records what it sees(called a “sample”). Once the program finishes running, the profiler collects all the samples and finds out the number of times each method appears across all the samples.

Hard to visualize? Let’s look at the same example code and see how different the output would be, if we used a sampling profiler.

Output from a sampling profiler would look like this :

Call stack at 0.5sec: main/find_many_square_roots Call stack at 1.0sec: main/find_many_square_roots Call stack at 1.5sec: main/find_many_square_roots Call stack at 2.0sec: main/find_many_squares

In this example, the process was interrupted every 0.5 second and the call stack was recorded. Thus we got 4 samples over the lifetime of the program and out of those 4 samples, find_many_square_roots is present in 3 and find_many_squares is present in only one sample. From this sampling, we say that find_many_square_roots took 75% of the CPU where as find_many_squares took only 25%. That’s roughly how sampling profilers work.

Pros :

Negligible overhead compared to instrumentation profiling

Easy to find slow/long running methods

Cons :

Not good at measuring short running methods

We don’t get method invocation count

Not easy to write a sampling profiler on your own

Recap

We just looked into what CPU profiling means and the 2 common strategies of CPU profiling. In part 2, we’ll explore the 2 units of measuring CPU usage - CPU time and Wall time. We’ll also get our hands dirty and write some code to get these measurements. Thanks for reading!

If you have any questions or feedback, feel free to drop us a mail at team@codemancers.com