Advanced computing with IPython

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

If you use Python, there's a good chance you have heard of IPython, which provides an enhanced read-eval-print loop (REPL) for Python. But there is more to IPython than just a more convenient REPL. Today's IPython comes with integrated libraries that turn it into an assistant for several advanced computing tasks. We will look at two of those tasks, using multiple languages and distributed computing, in this article.

IPython offers convenient access to documentation, integration with matplotlib, persistent history, and many other features that greatly ease interactive work with Python. IPython also comes with a collection of "magic" commands that alter the effect of single lines or blocks of code; for example, you can time your code simply by typing %%time at the prompt before entering your Python statements. All of these features also work when using the Jupyter notebook with the IPython kernel, so you can freely switch between the terminal and the browser-based interface while using the same commands.

Multilingual computing

No one language is ideal for everything. IPython and Jupyter allow you to exploit the strengths of multiple languages in a single notebook or interactive session.

The figure on the right, which is a snippet from a Jupyter notebook, shows the simplest way to make use of this feature. The %%ruby cell magic command (the double %% means that the magic command will apply to the entire cell) causes the cell contents to be handled by the Ruby interpreter. The --out flag stores the cell output into the named variable, which is then globally available to the Python kernel that interprets the contents of the other cells. The following cell casts the string output of the Ruby code into an integer (note that int() is legal Python, but not a part of Ruby). The Ruby code simply adds up the integers from 1 to 100; the final result is stored in a .

This can be done without installing any IPython or Jupyter kernels or extensions—only Ruby is required. The same thing can be done with Perl, Bash, or sh; other interpreted languages can be added by editing the list in the source at [path to IPython]/core/magics/script.py .

F2PY is a component of NumPy/SciPy that compiles and wraps Fortran subroutines so that they can be used with Python. The appeal is to be able to take advantage of Fortran's fast numerical operations together with the high-level convenience and interactivity of Python. Typically, using F2PY requires several manual steps. However, a third-party extension called Fortran magic (installable with pip ) provides a cell magic that uses F2PY under the hood to do the compilation and interface creation. All that is needed to is a single line in a cell containing a Fortran subroutine or function (a Fortran compiler may need to be installed if one is not already present on the system).

The figure below shows the process. First, we define a python function, called eap() , that uses a slowly converging approximation to e, the base of the natural logarithms. It calculates d successive approximations, returning the final one. The next cell loads the Fortran magic machinery, which generates a user warning (collapsed), as this was developed for an older version of IPython/Jupyter (but everything still works). The load command defines the cell magic that we use in the cell after that, which contains a Fortran version of the same function, called eaf() . When that cell is executed, IPython compiles the code and generates the Python interface. In the last two cells, each program is invoked with timing turned on; they produce comparable outputs, but the Fortran version is about 24 times faster.

With only a single magic command, you can package a compiled Fortran command for interactive use in your session or notebook. Since the numerical parts of your programs are both the easiest to translate into Fortran, and the parts that will benefit the most from the process, this is a simple way to speed up a computation, and a good demonstration of the power of the IPython ecosystem.

Parallel and distributed computing

IPython provides a number of convenient solutions for dividing a computation among the processing cores of either a single machine or multiple networked computers. The IPython parallel computing tools do much of the setup and bookkeeping; in simple cases, it should allow performing parallel computations in an interactive context almost as simply as normal, single-processor calculations.

A common reason for running code on multiple processors is to speed it up or to increase throughput. This is only possible for certain types of problems, however, and only works well if the time saved doing arithmetic outweighs the overhead of moving data between processors.

If your goal is to maximize the speed of your computation, you will want to speed up its serial performance (its speed on a single processing core) as much as is practical before trying to take advantage of parallel or distributed computation. This set of slides [PDF] provides a clear and concise introduction to the various ways you might approach this problem when your core language is Python; it also introduces a few parallelization strategies. There are a large number of approaches to speeding up Python, all of which lie beyond the concerns of this article (and, regardless of language, the first approach should be a critical look at your algorithms and data structures).

Aside from trying to speed up your calculation, another purpose of networked, distributed computing is to gather information from or run tests on a collection of computers. For example, an administrator of a set of web servers located around the world can take advantage of the techniques described below to gather performance data from all the servers with a single command, using an IPython session as a central control center.

First, a note about NumPy. The easiest way to parallelize a Python computation is simply to express it, if possible, as a sequence of array operations on NumPy arrays. This will automatically distribute the array data among all the cores on your machine and perform array arithmetic in parallel (if you have any doubt, submit a longish NumPy calculation while observing a CPU monitor, such as htop , and you will see all cores engaged). Not every computation can be expressed in this way; but if your program already uses NumPy in a way that allows parallel execution, and you try to use the techniques described below to run your program on multiple cores, it will introduce unnecessary interprocess communication, and slow things down rather than speeding them up.

In cases where your program is not a natural fit for NumPy's array processing, IPython provides other, nearly equally convenient methods for taking advantage of multiple processors. To use these facilities, you must install the ipyparallel library ( pip install ipyparallel will do it).

IPython and the ipyparallel library support a large variety of styles and paradigms for parallel and distributed computing. I have constructed a few examples that demonstrate several of these paradigms in a simple way. This should give some entry points to begin experimenting immediately, with a minimum of setup, and give an idea of the range of possibilities. To learn about all the options, consult the documentation [PDF].

The first example replicates a computation on each of the machine's CPU cores. As mentioned above, NumPy automatically divides work among these cores, but with IPython you can access them in other ways. To begin, you must create a computing "cluster". With the installations of IPython and ipyparallel come several command-line tools. The command to create a cluster is:

$ ipcluster start --n=x

x

--n=4

Normally,is set to the number of cores in your system; my laptop has four, soThat command should result in a message that the cluster was created. You can now interact with it from within IPython or a Jupyter notebook.

The figure on the right shows part of a Jupyter session using the cluster. The first two cells import the IPython parallel library and instantiate a Client . To check that all four cores are in fact available, the ids list of the Client instance are displayed.

The next cell (in the figure on the left) imports the choice() function, which randomly chooses an element from a collection, pylab for plotting, and configures a setting that causes plots to be embedded in the notebook.

Note the cell is using the %%px magic. This incantation, at the top of a cell, causes the calculation in the cell to be replicated, by default, into a separate process for each core. Since we started our cluster using four cores, there will be four processes, each of which has its own private versions of all the variables, and each of which runs independently of the others.

The next two cells (below) compute a 2D random walk with a million steps. They are each decorated by the %%timeit cell magic, which times the calculation within a cell. The first cell uses the --targets argument to limit the calculation to a single process on a single core (core "0"; we could have chosen any number from 0 to 3); the second uses %%px without an argument to use all cores. Note that in this style of parallel computing, each variable, including each list, is replicated among the cores. This is in contrast to array calculations in NumPy, where arrays are split among cores, each one working on a part of a larger problem. Therefore, in this example, each core will calculate a separate, complete million-step random walk.

If the timings showed identical times for each version, that would mean that we actually did four times as much work in the same amount of time in the second case, because the calculation is done four times. However, the second version actually took a bit more than twice as long, which means that we only achieved a speedup of about two. This is due to the overhead of interprocess communication and setup, and should decrease as the calculations become more lengthy.

What happens if we append a plot command to the cells, to see what the random walks look like? The next figure shows how each process produces its own plot (with its own random seed). This style of multiprocessing can be a convenient way to compare different versions of a calculation side by side, without having to run each iteration one after the other.

You can also execute a one-liner on the cluster, using %px , which is the single-line version of the magic command. Using this, you can mix serial and parallel code within a single cell. So after importing the random integer function ( randint() ):

%px randint(0,9) Out[0:8]: 1 Out[1:8]: 3 Out[2:8]: 9 Out[3:8]: 0

The output cell is labeled, as usual, in order (here it was the 8th calculation in my session), but also indicates which of the four compute cores produced each result.

The %%px and %px magic commands are easy ways to replicate a computation among a cluster of processors when you want each processor to operate on its own private copy of the data. A classic technique for speeding up a calculation on a list by using multiple processors follows a different pattern: the list is divided among the processors, each goes to work on its individual segment, and the results are reassembled into a single list. This works best if the calculation on each array element does not depend on the other elements.

IPython provides some convenience functions for making these computations easy to express. Here we'll take a look at the one that's most generally useful. First, consider Python's map() function; for example:

list(map(lambda x:f(x), range(16)))

f()

[0..15]

That will apply the functionto each element of the listand return the resulting list. It will do this on one processor, one element at a time.

But if we've started up a cluster as above, we can write it this way:

rp[:].map_sync(lambda x:f(x), range(16))

rp

rp[0:1]

rp[2:3]

This will divide the list into four segments of equal length, send each piece to a separate processor, and put the result back together, replacing the original. You can control which processors are employed by indexing, so you can easily tellto work on one array while you havedo something else.

The processors that are used need not be cores on the local machine. They can reside on any computers that can be reached through the internet, or on the local network. The simplest setup for computing over a network is when your compute cluster consists of machines that you can reach using SSH. Naturally, the configuration is a bit more involved than simply typing the ipcluster command. I have created a document [PDF] to describe a minimal example configuration that will get you started computing on a networked cluster over SSH.

For over 20 years, computational scientists have relied on various approaches to parallel computing as the only way to perform really large calculations. As the desire for more accurate climate modeling, processing larger and larger data sets for machine learning, better simulations of galactic evolution, and more powerful calculations in many different fields outstripped the capabilities of single processors, parallel processing was employed to break the bottleneck. This used to be the exclusive domain of people willing to rewrite their algorithms to incorporate special parallel libraries into their Fortran and C programs, and to tailor their programs to the peculiarities of individual supercomputers.

IPython with ipyparallel offers an unprecedented ability to combine the exploratory powers of scientific Python with nearly instant access to multiple computing cores. The system presents high-level abstractions that make it intuitive to interact with a local or networked cluster of compute nodes, regardless of the details of how the cluster is implemented. This ease of interactive use has helped IPython and Python to become a popular tool for scientific computation and data science across a wide variety of disciplines. For just one recent example, this paper [PDF] presenting research on a problem at the intersection of machine learning and biochemistry, benefited from the ease of use of ipyparallel ; it includes a section discussing the advantages of the system.

Sometimes an idea is more easily expressed, or a calculation will run faster, in another language. Multilingual computing used to require elaborate interfaces to use multiple compilers or working in separate interpreters. The ability to enhance the fluid, exploratory nature of computing that IPython and Jupyter already enable, by allowing the user to code in different languages at will, enables a genuinely new way to interact with a computer.

IPython and Jupyter are more than just interfaces to an interpreter. The enhancements to the computing experience described in this article are fairly recent, but are not merely curiosities for software developers—they are already being put to use by scientists and engineers in applications. Tools such as these are levers for creativity; what that helps to bring forth in the future will be interesting to see.