Two Surpising Things about R

2010-08-15 at 12:33 am

I see that it’s been over a year since my last post! I have a backlog of blog post ideas, but something else always seems to have higher priority. Today, though, I have some interesting (and useful) things to say about R, which I discovered in the last few days, and which shouldn’t take long to blog about. Of course, some other people may already be quite familiar with these things. Or maybe not…

First up, a useful feature of R that I hadn’t realized existed, which comes with a surprising gain in efficiency. Second, something surprisingly slow about R’s implementation of a very common operation.

First, the good thing I discovered about R. In complex mathematical expressions, it’s common to use more than one type of bracket, so that it’s easier to pair them up visually. Typical programming languages use only parentheses, other brackets having been appropriated for other uses. But it turns out that in R you can use both parentheses and curly brackets! The curly brackets are normally used to group statements, but an expression is one type of statement, and the last (or only) in the group provides the value. I’m not sure that this was always the case — I vaguely recall otherwise with some earlier version (perhaps an early version of S). But it works now.

Here’s the even more surprising thing. It occurred to me that before rushing out and using this feature, I should check that it doesn’t introduce some horrible inefficiency, as might be the case if curly brackets were optimized for their more common use in grouping statements. So I did a little test, as follows:

> a <- 5; b <- 1; c <- 4 > f <- function (n) for (i in 1:n) d <- 1/{a*{b+c}} > g <- function (n) for (i in 1:n) d <- 1/(a*(b+c)) > system.time(f(1000000)) user system elapsed 3.92 0.00 3.94 > system.time(g(1000000)) user system elapsed 4.17 0.00 4.17

Using curly brackets speeds the program up by about 6%!

That’s with R version 2.9.2 on a Windows XP machine. Of course I ran it several times to be sure that results were consistent. And I ran it on two other versions of R, on Intel Solaris and Sun SPARC machines, with similar results.

I’m having difficulty imagining how curly brackets can be more efficient than parentheses. Could there be a dispatch operation somewhere in which a curly bracket operator gets recognized faster than a parenthesis operator? But surely any such search wouldn’t be done by linearly, or by any other method where this could happen. I could imaginge some strange accidental effect of caching, except that it’s consistent over different versions of R and different machine architectures.

The second surprising thing is how long it takes R to square all the numbers in a vector. Consider the following:

> a <- 1:10000 > f <- function (n) for (i in 1:n) b <- a^2 > g <- function (n) for (i in 1:n) b <- a*a > system.time(f(1000)) user system elapsed 0.58 0.00 0.58 > system.time(g(1000)) user system elapsed 0.16 0.00 0.16

So multiplying the vector by itself is about 3.6 times faster than squaring it the obvious way with ^2. This is again R version 2.9.2, released 2009-08-24.

My first thought was that R treats ^2 as a general exponentiation operation, requiring taking a log, multiplying by two, and then exponentiating. But no, as seen below, a general exponentiation takes even longer

> h <- function (n) for (i in 1:n) b <- a^2.1 > system.time(h(1000)) user system elapsed 2.15 0.00 2.15

So my guess is that R does check for an exponent being exactly two, and treats that specially, but that it does this check again and again, for every element of the vector.

The speed gain from replacing a^2 with a*a is enough to justify this replacement in time-critical code, even though it makes the program less readable. But perhaps squaring will be (has already been?) made faster in a later version.

Share this: Twitter

Facebook

Related

Entry filed under: R Programming, Statistics, Statistics - Computing.