I’ve been mentioning here that I’ll be discussing a new package, Rth, developed by me and Drew Schmidt, the latter of pbdR fame. It’s now ready for use! In this post, I’ll explain what goals Rth has, and how to use it.

Platform Flexibility

The key feature of Rth is in the word flexible in the title of this post, which refers to the fact that Rth can be used on two different kinds of platforms for parallel computation: multicore systems and Graphics Processing Units (GPUs). You all know about the former–it’s hard to buy a PC these days that is not at least dual-core–and many of you know about the latter. If your PC or laptop has a somewhat high-end graphics card, this enables extremely fast computation on certain kinds of problems. So, whether have, say, a quad-core PC or a good NVIDIA graphics card, you can run Rth for fast computation, again for certain types of applications. And both multicore and GPUs are available in the Amazon EC2 cloud service.

Rth Quick Start

Our Rth home page tells you the GitHub site at which you can obtain the package, and how to install it. (We plan to place it on CRAN later on.) Usage is simple, as in this example:

library(Rth)

Loading required package: Rcpp

> x <- runif(10)

> x

[1] 0.21832266 0.04970642 0.39759941 0.27867082 0.01540710 0.15906994

[7] 0.65361604 0.95695404 0.25700848 0.94633625

> sort(x)

[1] 0.01540710 0.04970642 0.15906994 0.21832266 0.25700848 0.27867082

[7] 0.39759941 0.65361604 0.94633625 0.95695404

> rthsort(x)

[1] 0.01540710 0.04970642 0.15906994 0.21832266 0.25700848 0.27867082

[7] 0.39759941 0.65361604 0.94633625 0.95695404

Performance

So, let’s see how fast we can sort 50000000 U(0,1) numbers. We’ll try R’s built-in sort (with the default method, Quicksort), and then try Rth with 2 cores and then 4.

> system.time(sort(x))

user system elapsed

18.866 0.209 19.144

> system.time(rthsort(x,nthreads=2))

user system elapsed

5.763 0.739 3.949

> system.time(rthsort(x,nthreads=4))

user system elapsed

8.798 1.114 3.724

I ran this on a 32-core machine, so I could have tried even more threads, though typically one reaches a point at which increasing the number of cores actually slows things down.

The cogniscenti out there will notice immediately that we obtained a speedup of far more than 2 while using only 2 cores. This obviously is due to use of different algorithms. In this instance, the difference arises from a different sorting algorithm being used in Thrust, a software system on top of which Rth runs. (See the Rth home page for details on Thrust.)

Rth is an example of what I call Pretty Good Parallelism (an allusion to Pretty Good Privacy). For certain applications it can get you good speedup on two different kinds of common platforms (multicore, GPU). Like most parallel computation systems, it works best on very regular, “embarrassingly parallel” problems. For very irregular, complex apps, one may need to resort to very detailed C code to get a good speedup.

Platforms

As mentioned, the code runs on top of Thrust, which runs on Linux, Mac and Windows OSs. Also, it uses Rcpp, which is cross-platform as well.

In other words, Rth should run under all three OSs. However, so far it has been tested only on Linux and Mac platforms. It should work fine on Windows, but neither of us has ready access to such a machine, so it hasn’t been tested there yet.

Necessary Programming Background

As seen above, the Rth functions are just R code, hence usable by anyone familiar with R. No knowledge of Thrust, C++, GPU etc. is required.

However, you may wish to write your own Rth functions. In fact, we hope you can contribute to the package! For this you need a good knowledge of C++, which is what Thrust is written in.

What Functions Are Available, And What Might Be Available?

Currently the really fast operations available in Rth are: sort/order/rank; distance computation; histograms; and contingency tables. These can be used as foundations for developing other functions. For example, the parallel distance computation can be used to write code for parallel k-means clustering, or for kernel-based nonparametric multivariate density estimation. Some planned new functions are listed on the home page.

Conclusion

Give Rth a try! Let us know about your experiences with it, and again, code contributions would be highly welcome.

I plan to devote some of my future blog posts here to other topics in parallel computation. Much of the material will come from my forthcoming book, Parallel Computation for Data Science.