I’ve been working on some Python and Ruby libraries lately that wrap C extensions. An interesting and important observation came to me as I was doing some of the design.

Please note this is not meant to be any kind of commentary of the relative value between these languages. It’s a specific observation that is useful when you are crossing the language barrier and deciding the boundary between what should go in the high-level language vs what should go in C or C++.

The observation I made is that C and C++ compilers can inline, whereas interpreters for Python and Ruby generally do not.

This may seem like a mundane observation, but what it means is that building abstractions in the high-level language has a noticeable cost, where in C and C++ simple abstractions built around function calls are basically free.

To illustrate, take this Python program:

total = 0 for i in range ( 1000000 ): total += i print total

Now suppose we want to abstract this a bit (this is a toy example, but mirrors the structure of real abstractions):

total = 0 class Adder : def __init__ ( self ): self . total = 0 def add ( self , i ): self . total += i adder = Adder () for i in range ( 1000000 ): adder . add ( i ) print adder . total

On my machine, the second example is less than half the speed of the first. (The same is true of Ruby when I tried equivalent programs).

$ time python test.py 499999500000 real 0m0.158s user 0m0.133s sys 0m0.023s $ time python test2.py 499999500000 real 0m0.396s user 0m0.367s sys 0m0.024s

Compare this with the equivalent first program in C++ (I used “volatile” to prevent the compiler from being too smart and collapsing the loop completely):

#include <stdio.h> int main () { volatile long total = 0 ; for ( long i = 0 ; i < 100000000 ; i ++ ) { total += i ; } printf ( "%ld

" , total ); }

And the version with the adder abstracted into a class:

#include <stdio.h> class Adder { public: Adder () : total ( 0 ) {} void add ( long i ) { total += i ; } volatile long total ; }; int main () { Adder adder ; for ( long i = 0 ; i < 100000000 ; i ++ ) { adder . add ( i ); } printf ( "%ld

" , adder . total ); }

On my machine, not only do they take the same amount of time, they compile into literally exactly the same machine code.

We already know that Python and Ruby are noticeably slower than C and C++ (again, not a dig, the two serve different purposes), which suggests that performance-critical code should go in C or C++. But the extra observation here is that any layers or abstractions in Python or Ruby have an inherent cost, whereas in C or C++ you can layer abstractions much more freely without fear of additional overhead, particularly for functions or classes in a single source file.