Last month I described a program I wrote for a machine translation project in Perl and how it turned out to be much slower than I expected. As you may recall, I rewrote the program in C++, and it was a hundred times faster. I suspected this was due to all the conversion in Perl back and forth from strings to floating point numbers in the program, and to all the string copying necessary for function calls. Some people (in person and in the comments ) suggested I should try rewriting it in Python, which has native floating point numbers, to see if it was any faster.

It wasn't. I'd never written a Python program before, so I timed myself as I ported and debugged it—about an hour and a half. When I finally had it working, it ran about as slow as the Perl version. A full run of the Perl version takes 20 hours and I wasn't feeling patient enough to let the Python version run to completion, but eyeballing the program's progress indicator showed the Perl and Python versions were about the same speed, while the C++ version was much faster.

This implies it's not actually all the string conversion and copying that are making the program slow, because Python doesn't do those. So what's slowing Perl and Python down? I decided to find out. I wrote little programs called add, mult, and func, in C++, Python, and Perl—nine programs altogether. Each consists of a single loop, and the names tell you what happens in that loop: add adds integers, mult multiplies floats, and func calls a dummy function. (In fact, mult and func also do an integer add for the loop counter, so they're really like add plus an additional operation.) All of them take a single command-line argument which tells them how many times to loop.

I started running them, increasing the number of iterations by a factor of ten each time, until I got some nice, macroscopic timings that were big enough to swamp any fixed startup cost. The magic number was 10,000,000 iterations. I've collated the timings, which are the "real" times reported by the Unix time command, in the table below. The "Slowdown" column tells how many times slower than the fastest (C++) version each program is:

Program Language time (ms) Slowdown add C++ 23

add Perl 1464 64 x add Python 3242 141 x mult C++ 99

mult Perl 2686 27 x mult Python 5285 53 x func C++ 43

func Perl 5143 120 x func Python 6413 149 x

As you can see, it's not specifically the floating point operations or the function calls that are causing the trouble. Even a tight loop containing nothing but integer operations is 64 times slower (Perl) or 141 times slower (Python) than the C++ version. Doing floating point multiplications or function calls in the loop affects the timings a bit, but the results are still roughly proportional: C++ is fast, Perl is an order of magnitude or two slower, and Python is roughly another factor of two slower than that.

This leads me to think that the performance differences are due almost entirely to the fact that Python and Perl are interpreted languages. I guess there's a lot of overhead associated with compiling the code (or the bytecode, in the case of Python) repeatedly as it's executed. At least, I'm pretty sure that's what's going on in the interpreters—if anyone knows better, I'd be curious to hear about it.

These results are pretty disappointing. I like the high-level scripting languages, with all their built-in convenience, for doing quick-and-dirty programming. As I said, this was my introduction to Python, which I'm about to use for an unrelated project, and it's a pretty nice language—very terse, kind of like pseudo-code that actually runs. Unfortunately, for any project that's going to be computationally intensive—which is pretty much the definition of "computational linguistics"—a 25-to-150-times performance hit just isn't acceptable. With Python (and maybe Perl, I'm not sure) it's possible to write the innermost loops in C or C++, but I don't really like that solution for two reasons. First, as long as I'm going to have to write some C++ for speed, I might as well write it all in C++ and have the whole program be fast. Second, if I ever distribute the code, it narrows the potential audience if they have to know, and have installed, two computer languages to use it instead of just one.

Hmm, now I'm wondering how Java would perform. No! Must... resist! Must... finish... generals paper...

[Note: This is actually the third version of this post. The numbers reported in the original version were wrong—even though I'd called g++ with the -O0 option, it still did some optimizations that threw the timings off, especially for mult. After this was pointed out in the comments, I tried to do a quick fix and ended up screwing it up much worse. Oops.

In particular, multiplying by -1.0 was being compiled into an exclusive-or instruction instead of a floating point multiplication, so it ran too fast. My quick fix was to multiply by -1.1 instead, but raising -1.1 to the ten millionth power overflowed repeatedly and wasted a bunch of time in floating point exception handling. That made it run way too slow. The real fix was to multiply twice in each loop, once by 10.0 and once by 0.1 (and do the loop half as many times). That correctly compiles to floating point multiplication instructions and never overflows. I also had to make all the other programs increment their loop counters by two in order to get an addition instruction instead of a special-purpose increment instruction, but the performance difference there was very small.

I decided that having an update and an update to that update in this post would be hopelessly confusing, so I rewrote the whole thing instead. Thanks to Joshua "Necro" Macy, Russell "Mimsy was the" Borogove, and the myserious KK for pointing this stuff out and keeping me honest. I've left all the original comments intact, so if you're interested, you can probably reconstruct most of the sordid history of this post.

Oh, and greetings to any future hiring committes who might be reading this. Now, you might be thinking that the initial errors in this post imply a certain sloppiness on my part, but I prefer to focus on how the final, corrected version demonstrates my committment to getting the details right...eventually.]

[Update: In response to several requests, here's the source for the various programs:

add.cpp

#include <stdlib.h>



int main(int argc, char** argv)

{

int i = 0;

int count = 2 * atoi(argv[1]);

while (i < count) {

i += 2;

}



return 0;

}

add.pl

#!/usr/bin/perl



$i = 0;

$count = 2 * $ARGV[0];

while ($i < $count) {

$i += 2;

}

add.py

#!/usr/bin/python



import sys



i = 0

count = 2 * int(sys.argv[1])

while i < count:

i += 2

mult.cpp

#include <stdlib.h>



int main(int argc, char** argv)

{

double val = 1.0;

int i = 0;

int count = 2 * atoi(argv[1]);

while (i < count) {

val *= 10.0;

i += 2;

val *= 0.1;

i += 2;

}



return 0;

}

mult.pl

#!/usr/bin/perl



$val = 1.0;

$i = 0;

$count = 2 * $ARGV[0];

while ($i < $count) {

$val *= 10.0;

$i += 2;

$val *= 0.1;

$i += 2;

}

mult.py

#!/usr/bin/python



import sys



val = 1.0

i = 0

count = 2 * int(sys.argv[1])

while i < count:

val *= 10.0

i += 2

val *= 0.1

i += 2

func.cpp

#include <stdlib.h>



void func(void)

{

return;

}



int main(int argc, char** argv)

{

int i = 0;

int count = 2 * atoi(argv[1]);

while (i < count) {

func();

i += 2;

}



return 0;

}

func.pl

#!/usr/bin/perl



sub func

{

}



$i = 0;

$count = 2 * $ARGV[0];

while ($i < $count) {

&func();

$i += 2;

}

func.py

#!/usr/bin/python



import sys



def func():

return



i = 0

count = 2 * int(sys.argv[1])

while i < count:

func()

i += 2

If you have different results on your system, feel free to post the results in the comments.]

[Update: For some reason, this post seems to be accepting new comments but not displaying them. I'm hoping that this update mysteriously causes the post to republish correctly. (Get in the car and try it again!) If not, my apologies to John-Mark Gurney and Andrew.]