This is the C::Blocks Advent Calendar, in which I release a new treat each day about the C::Blocks library. Yesterday I showed how to declare and share C functions, which I think is C::Blocks’ killer feature. Today I will show some simple benchmarks that illustrate the performance of C::Blocks-compiled code.

Sorry that this Fifth Advent entry is being posted on the Seventh day of December! I got some really weird benchmark results and it took a while to really dig through them and figure out what was going on. I’ll have to work double-time for the next few days to get caught back up!

Let’s begin by measuring how long it takes to load C::Blocks . As a point of comparison, a script that does nothing except load strict and warnings would be this:

use strict; use warnings;

On my machine the bash built-in time usually reports a duration of 5-10ms:

$ time perl test.pl real 0m0.009s user 0m0.008s sys 0m0.000s

If we simply include C::Blocks, the time jumps up a little:

# test.pl use strict; use warnings; use C::Blocks; $ time perl test.pl real 0m0.043s user 0m0.040s sys 0m0.000s

Simply loading C::Blocks costs about 30ms. Finally, if we load PerlAPI, Types, and include a cblock that modifies a variable, we get some idea for the minimal cost of really using C::Blocks. Here’s the test script:

use strict; use warnings; use C::Blocks; use C::Blocks::PerlAPI; use C::Blocks::Types qw(uint); my uint $foo = 0; cblock { $foo = 5; }

Notice it doesn’t print anything because I don’t want the I/O system to interfere with the timing. Running this I get a running time of no more than 60ms:

$ time perl test.pl real 0m0.060s user 0m0.048s sys 0m0.008s

Thus, the minimal additional cost of doing something useful with C::Blocks (on my machine) should be at least 50ms, five times what it takes to merely load the Perl interpreter.

With these as our baseline, let’s do some more useful benchmarks. Perl is supposed to have perfomant I/O and string manipulation. What is the tradeoff between pure-Perl and C for munging the contents of a text file? We expect the C code to be lengthier, but will it lead to any appreciable performance increase? To find out, I generated text files with lines of random text, some of which included Fred. The program should create a new file with the same contents except Fred is replaced by Barney. Here is the Perl program:

use strict; use warnings; open my $out_fh, '>', 'perlout.txt'; while (<>) { s/Fred/Barney/g; print $out_fh $_; }

Here is the C::Blocks program:

use strict; use warnings; use C::Blocks; use C::Blocks::Types qw(char_array); my char_array $filename = $ARGV[0]; cblock { FILE * in_fh = fopen($filename, "r"); FILE * out_fh = fopen("cout.txt", "w"); char * original = "Fre"; int match_length = 0; int curr_char = fgetc(in_fh); while (curr_char != EOF) { if (curr_char == original[match_length]) { /* found character in sequence */ match_length++; } else if (match_length == 3 && curr_char == 'd') { /* found full name! print and reset */ fprintf(out_fh, "Barney"); match_length = 0; } else { /* incomplete match, print what we've skipped */ if (match_length) fprintf(out_fh, "%.*s", match_length, original); /* just in case we have FFred or FreFred */ if (curr_char == 'F') match_length = 1; else { match_length = 0; fputc(curr_char, out_fh); } } curr_char = fgetc(in_fh); } fclose(in_fh); fclose(out_fh); }

Here’s a breakdown of the difference between these two programs. For the Perl script, we have:

Lines of code: 7

Development time: about 10 seconds

Cleverness: none, this is a boringly obvious script

Execution time: 80ms to 130ms

As for the C script, we have:

Lines of code: 41

Development time: 45 minutes, including one failed start and two rounds of testing

Cleverness: mildly happy about tracking state with original

Execution time: 130ms to 160ms

First the caveats. The execution times are obviously specific to my machine, but the relationship between them is likely to remain the same on other platforms. If this code were compiled using gcc, it would likely run faster because the Tiny C Compiler underlying C::Blocks does not produce lightning fast C code. (We’ll get to that in a second.) Also, I am pretty sure that a more succinct C version could be written, and there’s probably another C implementation of this process that would run faster. Finally, I didn’t really hit noticeable performance issues until I processed 100,000 lines of text. Anything shorter than that was hardly any different from the startup times.

All of those caveats having been said, I doubt that the C::Blocks version can be made to run significantly faster. As such, we are left with the basic conclusion that the Perl version is easier to write, easier to understand, fewer lines of code, and comparable speed. If you are trying to write code that performs lots of I/O and string manipulation, just work with pure Perl.

The conclusion from the previous benchmark is not a surprise because I was playing to Perl’s strengths. One of Perl’s weak spots is fast numerics, like the sort of random number generator work I introduced yesterday. Is C::Blocks significantly faster than Perl? And, does it deliver performance that is comparable to what we would get with an XS module? To find out, I ran the script located at the bottom of this post. The script implements the random number generator in pure Perl, C::Blocks, and Inline::C. In these results, the N = 10 line indicates the number of iterations run through the random number generator in a single function call, so we actually have a sequence of results for a wide range of iteration count:

--- For N = 10 --- Rate Perl CBlocks Inline Perl 275691/s -- -96% -97% CBlocks 7126245/s 2485% -- -24% Inline 9331219/s 3285% 31% -- --- For N = 31 --- Rate Perl CBlocks Inline Perl 91167/s -- -97% -98% CBlocks 3243566/s 3458% -- -44% Inline 5782587/s 6243% 78% -- --- For N = 100 --- Rate Perl CBlocks Inline Perl 28981/s -- -98% -99% CBlocks 1214700/s 4091% -- -56% Inline 2752511/s 9398% 127% -- --- For N = 316 --- Rate Perl CBlocks Inline Perl 9221/s -- -98% -99% CBlocks 413537/s 4385% -- -59% Inline 1017940/s 10940% 146% -- --- For N = 1000 --- Rate Perl CBlocks Inline Perl 2901/s -- -98% -99% CBlocks 132740/s 4476% -- -61% Inline 337317/s 11528% 154% -- --- For N = 3162 --- Rate Perl CBlocks Inline Perl 922/s -- -98% -99% CBlocks 42309/s 4487% -- -62% Inline 110277/s 11856% 161% -- --- For N = 10000 --- Rate Perl CBlocks Inline Perl 293/s -- -98% -99% CBlocks 13525/s 4512% -- -62% Inline 35223/s 11911% 160% --

Above N = 10,000, the relative results were essentially unchanged.

The good news is that the C::Blocks implementation consistently beats out the Perl implementation by a significant margin. Looking at the rate, we see the C::Blocks version producing between 25x and 45x more random numbers per second than the Perl implementation. What’s more, the line counts for these two implementations are pretty similar. Switching to a C::Blocks implementation would produce a huge speedup in this portion of code at essentially the same code complexity.

On the other hand, the Inline::C implementation is clearly the fastest implementation. For 10,000 iterations and higher it will produce 2.6 random numbers with each round of the C::Blocks implementation. For some, that factor of 2.6 is crucial, and an XS-based solution is the right tool.

Real code does not boil down to benchmarks like this. The execution of real code is often governed by a number of slow points whose effect is cumulative. Speeding up the worst offender will speed up your program, but then you’ve only removed one of many slow points. While the others remain, making any satisfactory code blazingly fast is a waste of time. If you are considering using using C to speed up some critical piece of code, C::Blocks won’t give you the fastest implementation, but it’ll probably be fast enough.

Today I looked into the performance of C::Blocks for two important circumstances: a task dominated by I/O and text processing, and a task dominated by computation. For text processing Perl is clearly the right tool for the job. For numerical computation, Inline::C is clearly faster, but if the computation is only one bottleneck in a complicated program, the speedup from C::Blocks is likely good enough to move the pain-point to some other part of the code base.

C::Blocks Advent Day 1 2 3 4 5 6 7 8 9 10 11 12 13

P. S. Why was this entry so late? Apart from being pressed for time, I got benchmark results that varied wildly. I’m OK presenting benchmark results that I don’t like: I’m a bit annoyed by that factor of 2.6, but that’s an fair representation of things. I’m not OK with the kind of crazy results I was getting, sometimes showing that the C::Blocks implementation was competitive with Inline::C, and other times showing that it ran at the same speed as the Perl one. After a wide variety of implementations, I came to the conclusion that it was a CPU memory page alignment issue. The latest version on github fixes this and I will release another pre-Beta version soon (but for now, the bed is calling).