Left Censored » R, and kindly contributed to Want to share your content on R-bloggers? [This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the last several days, I have had the “pleasure” of getting parallel processing with R running on the the Ohio Supercomputer Center’s (OSC) Glenn cluster. I am working on a project that uses GenMatch from Sekhon’s Matching , which uses the snow library to manage parallel processing. Getting snow to run properly on single machines, or ever with a cluster of machines via ssh connections is fairly trivial. But using it on the OSC cluster turned out to be a bit more difficult. Well, difficult in relative terms. Once you know the steps to take, it’s not all that bad. While I am still not completely sure I’ve done everything correctly, I thought I would post this short guide in hopes that it could save someone else a few days of headaches. I’ll update the post if I discover something is incorrect.

Step 1: Compile Rmpi

In order to utilize more than one node on the Glenn cluster, you need to have Rmpi installed and, importantly, linked to the appropriate MPI libraries provided by OSC. To do so, you first need to create a .R/Makevars file in your home directory that will instruct R to use mpicc instead of gcc to compile the Rmpi library.

$ mkdir ~/.R $ nano ~/.R/Makevars

And this is what you should place in Makevars :

CC=mpicc SHLIB_LD=mpicc

﻿Next, you will need to swap out the mpi module, the default, and replace it with an alternative. If the R module isn’t yet loaded, you will need to do that as well.

$ module swap mpi mvapich2-1.0.2p1-gnu $ module load R-2.8.0

If you aren’t sure which version of MPI you should load, you can use the module avail command to see what’s available. Or, better yet, you could email the excellent support staff at OSC. Note that I was not able to get Rmpi to install correctly with R-2.11.1. Since I had 2.8 working, I didn’t do much further investigation.

Now it’s time to compile and install Rmpi . Download the most recent version and place it in your working directory. You can either do that through your browser or with wget ; e.g.,

$ wget http://cran.r-project.org/src/contrib/Rmpi_0.5-9.tar.gz

Just be sure to replace the Rmpi package version above with the most recent. After doing so, the following command should correctly install the package.

$ R CMD INSTALL --configure-vars="CPPFLAGS=-I${MPICH_HOME}/include LDFLAGS=-L${MPICH_HOME}/lib" \ --configure-args="--with-Rmpi-include=${MPICH_HOME}/include --with-Rmpi-libpath=${MPICH_HOME}/lib --with-Rmpi-type=MPICH2" \ Rmpi_0.5-9.tar.gz

Note that the command above should only have line breaks immediately after the \ . In other words, the whole command is three lines in length, each one ended at \ , which marks a continuation.

Step 2: Setting up your PBS job script

Successfully processing a job across multiple nodes with R and snow requires some small changes to your PBS script. If you aren’t yet familiar with PBS scripts, a good place to start is here and here. First, you should create a directory to hold all of the files associated with your batch job. Here I create one called Test in my home directory:

$ mkdir ~/Test

Now create a PBS script file.

$ nano SnowTest.job

And add something like this:

#PBS -l walltime=00:10:00 #PBS -l nodes=2:ppn=8 #PBS -N SnowTest #PBS -S /bin/bash #PBS -j oe #PBS -m abe #PBS -M [email protected] set echo export TEST=${HOME}/Test pbsdcp -r ${TEST}/* $TMPDIR cd $TMPDIR module swap mpi mvapich2-1.0.2p1-gnu module load R-2.8.0 mpiexec -n 16 RMPISNOW < SnowTest.R pbsdcp -g -r '*' ${TEST}/ exit