Thinking inside the box , and kindly contributed to Want to share your content on R-bloggers? [This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R 2.13.0 will be released tomorrow.

It contains a new package by R Core member

Luke Tierney: compiler. What

a simple and straightforward name, and something that Luke has been working

on for several years. The NEWS file says

o Package compiler is now provided as a standard package. See ?compiler::compile for information on how to use the compiler. This package implements a byte code compiler for R: by default the compiler is not used in this release. See the 'R Installation and Administration Manual' for how to compile the base and recommended packages.

so the compiler is not yet used for R’s own base and recommended packages.

While working on my slides for the upcoming

Rcpp workshop preceding R/Finance 2011,

I thought of a nice test example to illustrate the compiler. Last summer and fall,

Radford Neal had sent a very nice, long and detailed list of possible patches for R

performance improvements to the development list. Some patches were integrated and some more discussion ensued. One strand was on the

difference in parsing between normal parens and curly braces. In isolation, these seem too large,

but (as I recall) this is due some things ‘deep down’ in the parser.

However, some folks really harped on this topic. And it just doesn’t die as a

post from last week

demonstrated once more.

Last year, Christian Robert had whittled it down

to a set of functions, and I made a somewhat sarcastic post argueing that I’d rather

use Rcpp to get 80-fold speed increase

than spend my time argueing over ten percent changes in code that could be made faster so easily.

So let us now examine what the compiler package can do for us. The starting point is the same as last year: five variants

of computing 1/(1+x) for a scalar x inside an explicit for loop. Real code would never do it this way as vectorisation comes

to the rescue. But for (Christian’s) argument’s sake, it is useful to highlight differences in the parser. We once again use

the nice rbenchmark package to run, time and summarise alternatives:

> ## cf http://dirk.eddelbuettel.com/blog/2010/09/07#straight_curly_or_compiled > f g h j k ## now load some tools > library(rbenchmark) > ## now run the benchmark > N benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1), + columns=c("test", "replications", + "elapsed", "relative"), + order="relative", replications=10) test replications elapsed relative 5 k(N, 1) 10 9.764 1.00000 1 f(N, 1) 10 9.998 1.02397 4 j(N, 1) 10 11.019 1.12853 2 g(N, 1) 10 11.822 1.21077 3 h(N, 1) 10 14.560 1.49119

This replicates Christian’s result. We find that function k() is the fastest using curlies, and that explicit exponentiation in

function h() is the slowest with a relative penalty of 49%, or an absolute difference of almost five seconds between the 9.7

for the winner and 14.6 for the worst variant. On the other hand, function f() , the normal way of writing things, does pretty

well.

So what happens when we throw the compiler into the mix? Let’s first create compiled variants using the new cmpfun() function and then try again:

> ## R 2.13.0 brings this toy > library(compiler) > lf lg lh lj lk # now run the benchmark > N benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1), + lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1), + columns=c("test", "replications", + "elapsed", "relative"), + order="relative", replications=10) test replications elapsed relative 9 lj(N, 1) 10 2.971 1.00000 10 lk(N, 1) 10 2.980 1.00303 6 lf(N, 1) 10 2.998 1.00909 7 lg(N, 1) 10 3.007 1.01212 8 lh(N, 1) 10 4.024 1.35443 1 f(N, 1) 10 9.479 3.19051 5 k(N, 1) 10 9.526 3.20633 4 j(N, 1) 10 10.775 3.62673 2 g(N, 1) 10 11.299 3.80310 3 h(N, 1) 10 14.049 4.72871

Now things have gotten interesting and substantially faster, for very little cost. Usage is straightforward: take your function and compile

it, and reap more than a threefold speed gain. Not bad at all. Also of note, the difference between the different expressions essentially

vanishes. The explicit exponentiation is still the loser, but there may be an additional explicit function call involved.

So we do see the new compiler as a potentially very useful addition. I am sure more folks will jump on this and run more

tests to find clearer corner cases. To finish, we have to of course once more go back to

Rcpp for some real gains, and some comparison between compiled

and byte code compiled code.

> ## now with Rcpp and C++ > library(inline) > ## and define our version in C++ > src (ns); + double x = as (xs); + for (int i=0; i l ## now run the benchmark again > benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1), + l(N,1), + lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1), + columns=c("test", "replications", + "elapsed", "relative"), + order="relative", replications=10) test replications elapsed relative 6 l(N, 1) 10 0.120 1.0000 11 lk(N, 1) 10 2.961 24.6750 7 lf(N, 1) 10 3.128 26.0667 8 lg(N, 1) 10 3.140 26.1667 10 lj(N, 1) 10 3.161 26.3417 9 lh(N, 1) 10 4.212 35.1000 5 k(N, 1) 10 9.500 79.1667 1 f(N, 1) 10 9.621 80.1750 4 j(N, 1) 10 10.868 90.5667 2 g(N, 1) 10 11.409 95.0750 3 h(N, 1) 10 14.077 117.3083

Rcpp still shoots the lights out by a factor of 80 (or even almost 120 to the

worst manual implementation) relative to interpreted code. Relative to the compiled byte code, the speed difference is about 25-fold. Now,

these are of course entirely unrealistic code examples that are in no way, shape or form representative of real R work. Effective speed

gains will be smaller for both the (pretty exciting new) compiler package and also for our C++ integration package

Rcpp.

Before I close, two more public service announcements. First, if you use Ubuntu see

this post by Michael on r-sig-debian

announcing his implementation of a suggestion of mine: we now have R alpha/beta/rc builds via his Launchpad PPA. Last Friday, I had the

current R-rc snapshot of R 2.13.0 on my Ubuntu box only about six hours after I (as Debian maintainer for R) uploaded the underlying new

R-rc package build to Debian unstable. This will be nice for testing of upcoming releases. Second, as I mentioned, the

Rcpp workshop

on April 28 preceding

R/Finance 2011 on April 29 and 30 still has a few slots

available, as has the conference itself.