Have you ever found yourself searching for a statistics package in Python, but it just isn’t available? This is the biggest reason I’ve heard when my colleagues say they’re unwilling to make the switch from R to Python for statistical analysis. To counteract that argument, the geniuses developing IPython created Rmagic, a package which allows you to run R code within the IPython interface.

Let’s get to the cut and dry and see how it works. If you want to follow along, the IPython Notebook and data file is available on github: https://github.com/rhiever/rmagic-tutorial

Installing Rmagic

Rmagic comes along with the IPython package, so just follow my IPython tutorial to install IPython. Beyond IPython, all you need is Python’s rpy2 package to run Rmagic:

sudo easy_install rpy2

I also exclusively use the pandas package when I’m dealing with data, so make sure to have that installed as well:

sudo easy_install pandas

Using Rmagic for statistical analysis

I do most of my statistical analysis in Python nowadays, but sometimes there’s just that one statistical function that I need that Python doesn’t quite have yet. In those cases, I use Rmagic to “fill the gap” in Python’s statistics packages. Rmagic lets me pass my data to R, run the R function on the data, then seamlessly return the data back to Python before I start having nightmares about using R again. Since there are already a ton of R statistics tutorials out there, this tutorial will instead concentrate on how to use Rmagic to link together Python and R code.

Load the rmagic extension

%load_ext rmagic

Use R to view summary information about the data

If you like how R provides summary information about the data, you can print out the summary() function.

%R tells IPython that the line is R code.

-i parasiteData tells IPython to pass the parasiteData DataFrame to R.

The rest is R code. You can have multiple lines of R code separated by semicolons.

Note that when you want to print anything out from R, you need to place the print() function around it.

from pandas import * # read data from data file into a pandas DataFrame parasiteData = read_csv("parasite_data.csv", sep=",", na_values=["", " "]) # print the R summary() function for the data %R -i parasiteData print(summary(parasiteData)) Virulence Replicate ShannonDiversity Min. :0.50 Min. : 1.0 Min. :0.0000 1st Qu.:0.60 1st Qu.:13.0 1st Qu.:0.0000 Median :0.75 Median :25.5 Median :0.8457 Mean :0.75 Mean :25.5 Mean :0.8364 3rd Qu.:0.90 3rd Qu.:38.0 3rd Qu.:1.5337 Max. :1.00 Max. :50.0 Max. :2.9008 NA's :50

View a R plot in IPython

You can even view plots of your data in IPython from R.

from pandas import * # read data from data file into a pandas DataFrame parasiteData = read_csv("parasite_data.csv", sep=",", na_values=["", " "]) # plot the Shannon Diversity as a function of Virulence in R %R -i parasiteData plot(ShannonDiversity ~ Replicate, data = parasiteData, xlab="Replicate", ylab="Shannon Diversity")

Filling in the gaps

Beyond the basics, you can use R to perform a single function on your data, then return the result back to Python. Here’s a simple example.

-o meanShannonDiversity tells IPython to return the meanShannonDiversity variable back from R to IPython.