Notice that we will be paying special attention to one specific multiprocessing programming pattern. We want a scheme in which (1) there are multiple servers; (2) there are multiple clients; (3) any client can submit a task (function call) to be evaluated by any available server. You might thing of this pattern as using a pool of servers (processes) to which clients can submit (often compute intensive) function calls.

For each of the above alternatives I'll try to cover: (1) appropriate (and inappropriate) uses; (2) possible use cases; (3) some how-to instruction; and example code.

My central goal in writing this document is to enable and encourage more of us to write the software that puts those machines and cores to work.

Enough ranting ... The alternatives and options discussed in this document are all intended to solve that problem. We have tools that are looking for uses. We need to learn how to put them to fuller use so that next year we can justify buying yet another machine with more cores to add to our home networks.

So, why do we all have machines with so many unused cores. Because Intel and AMD must compete, and to do so, must give us what appear to be faster machines. They can't give us more cycles (per second), since, if they did, our machines would melt. So, they give us additional cores. They number of transistors goes up, and Moore's law (technically) holds true, but for most of us, that power is largely unused and unusable.

We all have multi-core machines. It's easy to imagine a home will multiple computers and devices of several different kinds connected on a LAN (local area network) through Ethernet or wireless connections. Most (soon all) of those devices have multiple cores. And, yet most of that power is wasted while many of those cores are idle.

This document is a survey of several different ways of implementing multiprocessing systems in Python. It attempts to provide a small amount of guidance on when it is appropriate and useful to use these different approaches, and when not.

FYI, I've been able to run the above XML-RPC scripts across my LAN. In fact, I've run the server on one of my desktop machines, and I connect via WiFi from the client on my Android smart phone using QPython. For more information about QPython see: http://qpython.com/ .

Notice that in the server, we can expose a method from within a class, also.

And, in the client, create the proxy with the following:

If you only want to access this XML-RPC server only from the local machine, then you might create the server with the following:

And, on the client side, it's simply a matter of creating a "proxy" and doing what looks like a standard Python function call through that proxy. Here is a simple, sample client:

On the server side, we implement conventional Python functions, and then register them with an XML-RPC server. Here is a simple, sample server:

XML-RPC is a simple and easy way to get distributed processing. With it, you cat request that a function be called in a Python process on a remote machine and that the result be returned to you.

You can also create parallel functions by using a Python decorator. Example:

There is more information on using IPython parallel computing with remote hosts here: http://ipython.org/ipython-doc/dev/parallel/parallel_process.html#using-the-ipcontroller-and-ipengine-commands

But change the user name and IP address to that of the remote machine.

When you create your client, use something like the following:

Copy your client profile ~/.ipython/profile_default/security/ipcontroller-client.json from the remote machine to the security/ directory under the profile you will be using on the local machine.

Start the IPython controller and engines on the remote machine. For example:

Submitting jobs to be run on IPython engines on a remote machine turns out, in some cases at least, to be very easy. Do the following:

We started the cluster with the default scheduler scheme, which is "least load". For other schemes do the following and look for "scheme":

Because these function calls are executed in separate processes, they avoid conflict over Python's GIL (global interpreter lock).

This example asks parallel python to execute four function calls in parallel in four separate processes.

Create the cluster. Use the ipcluster executable from the IPython parallel processing. Example:

We'd like to know how to submit tasks for parallel execution. Here is a bit of instruction on how to do it.

One easy way to install Python itself and IPython, SciPy, Numpy, etc. is to install the Anaconda toolkit. You can find out about it here: http://www.continuum.io/ and here https://store.continuum.io/cshop/anaconda/ .

The documentation has examples. And, here is some sample code that is a little more complex:

Be aware that the multiprocessing module creates separate operating system processes. Each one runs in its own memory space; each one has its own Python interpreter; each one has its own GIL (global interpreter lock); each one has its own copies of imported modules; and each module in each of these multiple processes has its own copies of global variables.

The python standard library contains the module multiprocessing . That module (it's actually a Python package or a library that acts like a module) contains some reasonable support for creating and running multiple processes implemented in Python and for communicating between those processes using Queues and Pipes (also in the multiprocessing module). You can learn more about that module here: https://docs.python.org/2/library/multiprocessing.html

The examples provided with the distribution work well. But, the project does not seem very active.

It is light, easy to install and integrate with other python software.

PP is a python module which provides mechanism for parallel execution of python code on SMP (systems with multiple processors or cores) and clusters (computers connected via network).

And, here is the worker, also written in Python:

Again, so as to show how to request services implemented in Python from Node.js, our client is written in Node.js and the broker and workers are written in Python.

The previous example sent a task to a worker, even if that worker was not yet finished with its previous task. In this next example, the broker will forward a request to a worker only if that worker has signaled that it is finished with it's previous task, if it had one, and that it is ready for its next task.

Finally, here is the Python worker that actually uses Lxml to provide XML processing capabilities:

And, this is the Python broker that acts like an intermediary between clients and one or more workers:

In our example, the Node.js module makes multiple requests in the form of ZeroMQ messages that go to a "broker", which passes them along to a Python worker module. If we start up more than one worker processes, these requests will be forwarded, round-robin style, to one or another worker.

In this example, we will use ZeroMQ to accomplish (at least) two things:

One significant benefit of using ZeroMQ is that we can write different processes in different languages. Thus, we can, for example, implement a process in Node.js that sends messages to and requests services from a process written in Python.

However, if you start one instance of hwserver.py and multiple instances of hwclient.py , you will notice a longer delay between each echo. That's because multiple clients are waiting on a single server. Notice the delay ( time.sleep(1) ) in the server. Our next challenge is to run the server in multiple processes so that the load from multiple clients will balanced across multiple servers. We could use IPython multiple processing to do that. But, there are ways to accomplish something similar with ZeroMQ itself. See, for example, the documentation on A Load Balancing Message Broker .

If you start hwserver.py in one (bash) session and hwclient.py in another session, you should see the server and the client echoing each other in their respective sessions.

And, here is the "Hello, World" client using pyzmq :

We should note that with ZeroMQ, our programming is in some sense using the Actor model, as does Erlang. This is the Actor model in the sense that (1) we are creating separate processes which do not share (in memory) resources and (2) we communicate between those processes by sending messages and waiting on message queues. ZeroMQ differs from Erlang, with respect to the Actor model in the following ways:

For my testing with Python, I used the Anaconda Python distribution, which contains support for zmq .

In order to use pyzmq and to run the examples, you will need to install:

There is a good set of examples written in a number of different languages for ZeroMQ. To get them, download the ZeroMQ guide ( https://github.com/imatix/zguide.git ), then (for us Python programmers) look in zguide/examples/Python .

Note that ZeroMQ is underneath IPython parallel. So, it may be appropriate to think of IPython parallel computing as a high level wrapper around ZeroMQ.

ØMQ (also known as ZeroMQ, 0MQ, or zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multi-core applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems. ØMQ is from iMatix and is LGPLv3 open source. [Pieter Hintjens; http://zguide.zeromq.org/page:all ]

What's left to do is to make sure (1) that each Nodejs process has its own Python process (so that compute intensive, long-running Python code (for example, those that result in complex calls to Numpy/SciPy) do not wait on each other and become slowed down by conflict over the same Python GIL (global interpreter lock) and (2) that the Python processes, once started, stay alive, because starting a process is slow.

And, here is Python code that could be called by the above:

Here is an example of JavaScript (running under Nodejs, say), calling a method in a class written in Python:

What's left to do is to call Python. Since Nodejs is written in JavaScript, this requires some kind of foreign function call. One solution would be to use a message based system, for example ZeroMQ ( http://zeromq.org/ ). zerorpc , which is a package built on top of ZeroMQ, looks hopeful (see: http://zerorpc.dotcloud.com/ ).

Web application development is not a goal of this document, but there is plenty help and lots of docs at http://nodejs.org and sites that it links to.

When we implement a Web site with Nodejs, Nodejs gives us parallel processing almost with no extra effort. This is because, although a Nodejs Web server handles all requests in a single thread, we can use the Nodejs Cluster module to distribute the handling of requests across multiple processes. Nodejs uses a separate process for each HTTP request (Web socket and AJAX requests not included?). Thus if we use the Nodejs cluster add-on, we get separate, parallel processes and load balancing.

Erlang does multiprocessing; Erlang enables us to communicate between processes; Erlang with Erlport enables us to create and communicate with Python processes. So, why not try multiprocessing in Python with an Erlang controller of some kind?

A few clarifications:

Erlport/Python processes are different from the processes that are internal to Erlang and that Erlang enables us to create and use. Erlport/Python processes are OS processes, not Erlang processes. They are much heavier weight that Erlang processes, and therefore, are slower to create. An implication of this, if we want to make many requests, is that we will want to create a pool of these processes and reuse them when requested.

Erlang programming is based on the actor model. This means that Erlang programs, even local ones, are based on a model that implements and creates separate processes that do not share resources (memory, for example) and then sending messages between those processes. And so, a multiprocessing and remote/distributed processing application implemented in a way in which external (OS) processes send messages between each other seems to be a good fit with Erlang and its capabilities.

We'll look at several examples in this document:

The first is a simple one that creates a single Erlport/Python process and then sends it requests and receives results back from it. Next we'll write an Erlang program that creates a pool of Erlport/Python processes and sends a series of requests to an available process, but waits for a process to become available if all processes in the pool are busy. And, finally, we'll implement something like the above pool of processes, but with the use of Erlang behaviors. One of the benefits to be gained from this is that, if one of our Erlport/Python processes dies, a new process will be started to replace it.

All of our examples will use the same Python code. Here it is:

#!/usr/bin/env python """ Synopsis: Sample math functions for use with Erlang and Erlport. Details: test_01 -- Solve the continuous algebraic Riccati equation, or CARE, defined as (A'X + XA - XBR^-1B'X+Q=0) directly using a Schur decomposition method. """ import numpy as np from scipy import linalg from erlport.erlterms import Atom #import json def test_01(m, n): a = np.random.random((m, m)) b = np.random.random((m, n)) q = np.random.random((m, m)) r = np.random.random((n, n)) print '(test_01) m: {} n: {}'.format(m, n, ) result = linalg.solve_continuous_are(a, b, q, r) return result def run(m=4, n=3): result = test_01(m, n) #print result #json_result = json.dumps(result.tolist()) return (Atom('ok'), result.tolist()) def main(): run() if __name__ == '__main__': main()

Notes:

The erlport package must be located where Python can import it, and Numpy and Scipy must be installed. Once again, I'm using the Anaconda distribution from Continuum Analytics (see: http://www.continuum.io/).

package must be located where Python can import it, and and must be installed. Once again, I'm using the distribution from Continuum Analytics (see: http://www.continuum.io/). The function test uses Numpy to create several arrays of random numbers, then uses linalg in SciPy to solve the continuous algebraic Riccati equation, or CARE, defined as (A'X + XA - XBR^-1B'X+Q=0) directly using a Schur decomposition method.

uses to create several arrays of random numbers, then uses in to solve the continuous algebraic Riccati equation, or CARE, defined as (A'X + XA - XBR^-1B'X+Q=0) directly using a Schur decomposition method. The function main calls function test , then returns a tuple containing the Erlang atom "ok" and the solution array after converting it to a list. We convert the Numpy array to a Python list before returning it, because Erlport understands Python lists but not Numpy arrays.

8.1 A simple call from Erlang into Python And, here is a simple Erlang program that uses that Python sample with the help of Erlport: -module(erlport_01). -export([main/0, show_list/2]). main() -> {ok, Pid} = python:start(), {ok, Result} = python:call(Pid, 'py_math_01', main, []), show_list(Result, 1), ok. show_list([], _) -> ok; show_list([Item|Items], Count) -> io:format("~p. Item: ~p~n", [Count, Item]), show_list(Items, Count + 1). Notes: We use the Erlport Python support to call function main in python module py_math_01 .

Python support to call function in python module . The show_list/2 function prints out each item in the list returned from py_math_01.main() . In the Erlang interactive shell erl we can compile and then run this as follows: 11> c(erlport_01). {ok,erlport_01} 12> erlport_01:main(). 1. Item: [12.74527763335136,-4.514001033364517,-7.4452420386061835, 5.7441252569184345] 2. Item: [-5.795658295009697,3.897769387542307,4.148522353989249, -3.1221815191228965] 3. Item: [-7.157830191325373,4.088737828859971,8.493144407323305, -6.348281687731655] 4. Item: [3.996836318360595,-2.353597255054639,-3.098202007414951, 3.5956798233304914] ok

8.2 Erlang and a simple pool of Erlang/Python processes In this example, we implement a pool of Erlang+Python processes so that we can request a process from the pool of processes (and wait until one is available, if necessary), use it, and then return it to the pool. The processes in the pool are actually Erlang processes, however, each of those Erlang processes holds (remembers the PID or process identifier) of a Python process. We create each Python process with Erlport. Here is our Erlang code that implements the pool of processes: -module(erlport_04). -export([ init/0, start/3, stop/0, rpc/1 ]). init() -> ets:new(pipelinetable01, [named_table]), ok. % % Args: % NumProcesses -- (int) number of processes to put in the pool. % PythonModule -- (atom) the name of the Python module. % ProcessWaitTime -- (int) number of milliseconds to wait if all processes % are busy. % start(NumProcesses, PythonModule, ProcessWaitTime) -> PyProcPids = start_python_processes(NumProcesses, PythonModule, []), PoolPid = spawn(fun() -> pool_loop(PyProcPids, ProcessWaitTime) end), ets:insert(pipelinetable01, {poolpid, PoolPid}), ok. stop() -> rpc(stop_python), ok. rpc(Request) -> case Request of {call_python, Function, Args} -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {pop, self()}, receive {ok, PyProcPid} -> PyProcPid ! {call_python, self(), {Function, Args}}, receive {ok, Result} -> PoolPid ! {push, self(), PyProcPid}, case Result of {ok, Result1} -> {ok, Result1}; _ -> unknown_result end; Msg -> {unknown_response, Msg} end; _ -> error end; get_pypid -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {pop, self()}, receive {ok, PyProcPid} -> {ok, PyProcPid} end; {put_pypid, PyProcPid} -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {push, self(), PyProcPid}, receive ok -> ok end; stop_python -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {stop, self()}, receive ok -> ok end end. pool_loop(PyProcPids, ProcessWaitTime) -> receive {push, _From, Proc} -> PyProcPids1 = [Proc | PyProcPids], pool_loop(PyProcPids1, ProcessWaitTime); {pop, From} -> case PyProcPids of [] -> % Give it a chance to return a process to the pool. timer:sleep(ProcessWaitTime), self() ! {pop, From}, pool_loop(PyProcPids, ProcessWaitTime); [PyProcPid | PyProcPids1] -> From ! {ok, PyProcPid}, pool_loop(PyProcPids1, ProcessWaitTime) end; {stop, From} -> stop_python_processes(PyProcPids), From ! ok, ok end. python_loop(PyPid, PythonModule) -> receive {call_python, From, {Function, Args}} -> Result = python:call(PyPid, PythonModule, Function, Args), From ! {ok, Result}, python_loop(PyPid, PythonModule); {stop, From} -> python:stop(PyPid), From ! ok end. start_python_processes(0, _, PyProcPids) -> PyProcPids; start_python_processes(N, PythonModule, PyProcPids) -> {ok, PyPid} = python:start(), PyProcPid = spawn(fun() -> python_loop(PyPid, PythonModule) end), io:format("Started Erlang/Python process -- PyProcPid: ~p~n", [PyProcPid]), start_python_processes(N - 1, PythonModule, [PyProcPid | PyProcPids]). stop_python_processes([]) -> ok; stop_python_processes([PyProcPid|PyProcPids]) -> io:format("Stopping Erlang/Python process -- PyProcPid: ~p~n", [PyProcPid]), PyProcPid ! {stop, self()}, stop_python_processes(PyProcPids). Notes: Before using the above code, we compile it with erlc .

. init/0 sets up an ETS table that enables us to remember the process ID of the pool.

sets up an ETS table that enables us to remember the process ID of the pool. The process pool is itself a process. We use it by sending it messages to pop (get) and push (return) a Python process.

start/3 (1) creates the Python processes, each of which is implemented by loop/2 and (2) creates a processes to hold those Python processes. It's this second process from which we'll request the next available Python process, and so we save its process ID in the ETS table.

(1) creates the Python processes, each of which is implemented by and (2) creates a processes to hold those Python processes. It's this second process from which we'll request the next available Python process, and so we save its process ID in the ETS table. rpc/1 implements our interface or API that enables us to make our requests (remote procedure calls) to the Python processes and get results back.

implements our interface or API that enables us to make our requests (remote procedure calls) to the Python processes and get results back. We might ask: Why is pool_loop/2 implemented as a process rather than an ordinary function. The first thing to recognize is that a process, in Erlang, is just a function that we spawn. And, next, consider that in the future implementing the ability to create and use a process may give us some flexibility later when we want to request and use a Python process, created in Erlang with Erlport, from a separate application (a separate OS process) or even from an application running on a different machine. And, here is an Erlang script that can be run from the command line and can be used to drive and test the above Erlang code: #!/usr/bin/env escript %% vim:ft=erlang: %%! -sname magpie1 -setcookie dp01 main(["-h"]) -> usage(); main(["--help"]) -> usage(); main(Args) -> ArgsSpec = [ {"p", "processes", yes}, {"o", "outfile", yes} ], Args1 = erlopt:getopt(ArgsSpec, Args), %io:format("Args1: ~p~n", [Args1]), Opts = proplists:get_all_values(opt, Args1), Args2 = proplists:get_all_values(arg, Args1), %io:format("Opts: ~p~n", [Opts]), %io:format("Args2: ~p~n", [Args2]), NumProcs1 = proplists:get_value("p", Opts), NumProcs2 = proplists:get_value("processes", Opts), %io:format("NumProcs1: ~p NumProcs2: ~p~n", [NumProcs1, NumProcs2]), NumProcs = case NumProcs1 of undefined -> case NumProcs2 of undefined -> 2; _ -> list_to_integer(NumProcs2) end; _ -> list_to_integer(NumProcs1) end, OutFile1 = proplists:get_value("o", Opts), OutFile2 = proplists:get_value("outfile", Opts), OutFile = case OutFile1 of undefined -> case OutFile2 of undefined -> standard_io; _ -> {ok, OutFile3} = file:open(OutFile2, [write]), OutFile3 end; _ -> {ok, OutFile3} = file:open(OutFile1, [write]), OutFile3 end, {NumReps1, M1, N1} = case Args2 of [] -> {2, 4, 3}; [NumReps] -> {list_to_integer(NumReps), 4, 3}; [NumReps, M, N] -> {list_to_integer(NumReps), list_to_integer(M), list_to_integer(N)} end, run(NumProcs, NumReps1, M1, N1, OutFile), case OutFile of standard_io -> ok; _ -> file:close(OutFile), ok end. run(NumProcs, Count, M, N, IoDevice) -> io:format("NumProcs: ~p Count: ~p M: ~p N: ~p~n", [NumProcs, Count, M, N]), erlport_04:init(), erlport_04:start(NumProcs, py_math_01, 100), run_n(1, Count, M, N, IoDevice), erlport_04:stop(), ok. run_n(Count, Max, _, _, _) when Count > Max -> ok; run_n(Count, Max, M, N, IoDevice) -> %io:format("M: ~p N: ~p~n", [M, N]), Result = erlport_04:rpc({call_python, run, [M, N]}), io:format(IoDevice, "Result ~p:~n~p~n", [Count, Result]), run_n(Count + 1, Max, M, N, IoDevice). usage() -> io:format(standard_error, "usage:~n", []), io:format(standard_error, " $ erlport_04.escript [options] iters [m n]~n", []), io:format(standard_error, "options:~n", []), io:format(standard_error, " -p -- number of processes~n", []), io:format(standard_error, " -o filename -- output file name~n", []), io:format(standard_error, "arguments:~n", []), io:format(standard_error, " iters -- number of iterations to run~n", []), io:format(standard_error, " m n -- size of array to create~n", []), ok. Notes: We use erlopt to parse command line options and arguments. It's available here: erlopt getopt() for Erlang <https://code.google.com/p/erlopt/>

to parse command line options and arguments. It's available here: We call run/5 to perform initialization and create processes, then call run_n/5 some specified number of times, and finally stops (kills) the processes we created.

to perform initialization and create processes, then call some specified number of times, and finally stops (kills) the processes we created. run_n/5 does a call to rpc/1 that uses one of the created processes to call into Python to perform a calculation using Numpy and return the result. You can test the above code by running the following: $ ./erlport_04.escript 3 4 3 NumProcs: 2 Count: 3 M: 4 N: 3 Started Erlang/Python process -- PyProcPid: <0.39.0> Started Erlang/Python process -- PyProcPid: <0.41.0> Result 1: {ok,[[0.2597350443603386,0.8903581238544376,0.5228550551729187, -2.3417305007787257], [0.1943395864795484,0.3445498796542211,1.232814979418004, 1.1994281436306256], [-0.03154488685636464,0.33179939314319556,1.829732033028535, 0.9854826930442282], [1.0718123745676555,0.25710117274099364,1.7896961147779082, 6.970965264066136]]} (test_01) m: 4 n: 3 Result 2: {ok,[[21.039769292519303,20.393141829871603,-37.29447963582768, 0.2965148465091619], [1.9668685825947811,0.10150396421323271,-1.6867007920529111, 0.10472863629222694], [2.1560394759605814,0.5010587053323622,-1.7840165638277685, -0.17305258786094993], [24.58501266962403,22.892461157404806,-43.847443385224864, 1.8052772934572985]]} (test_01) m: 4 n: 3 Result 3: {ok,[[-0.20902511715668637,0.7778417615117266,0.9960684337538017, 1.1824488386010166], [0.08584635712529537,-0.9819482057272886,1.2448114851957999, 0.993406879690676], [1.1897115059332493,0.5189873231754997,-0.5711746123118333, -0.966994204829159], [1.5173355312750667,-0.3145814955274761,-0.6455456477102114, -1.5082534601988669]]} Stopping Erlang/Python process -- PyProcPid: <0.41.0> Stopping Erlang/Python process -- PyProcPid: <0.39.0>

8.3 A pool of processes with failure recovery What we try to gain in this example, over and above the previous example, is the ability to recover from the failure of one of the Erlang/Python processes. In this code, we ask that we be notified when one of the Erlang/Python processes fails so that we can (1) remove the old (dead) process from the pool and (2) create a new process and insert it into the pool. Here is the code that does this: -module(erlport_06). -export([ start/3, start_link/3, init/0, stop/1, restarter/1, rpc/1, pool_loop/3, python_loop/2 ]). % % Args: % NumProcesses -- (int) number of processes to put in the pool. % PythonModule -- (atom) the name of the Python module. % ProcessWaitTime -- (int) number of milliseconds to wait if all processes % are busy. % start(NumProcesses, PythonModule, ProcessWaitTime) -> init(), PyProcs = start_python_processes(NumProcesses, PythonModule, []), PoolPid = spawn(?MODULE, pool_loop, [ PyProcs, ProcessWaitTime, PythonModule]), ets:insert(pipelinetable01, {poolpid, PoolPid}), RestarterPid = spawn(?MODULE, restarter, [PoolPid]), RestarterPid. start_link(NumProcesses, PythonModule, ProcessWaitTime) -> init(), PyProcs = start_python_processes(NumProcesses, PythonModule, []), PoolPid = spawn(?MODULE, pool_loop, [ PyProcs, ProcessWaitTime, PythonModule]), ets:insert(pipelinetable01, {poolpid, PoolPid}), RestarterPid = spawn(?MODULE, restarter, [PoolPid]), RestarterPid. init() -> io:format("creating ETS table~n"), ets:new(pipelinetable01, [named_table]), ok. stop(RestarterPid) -> rpc(stop_python), RestarterPid ! shutdown, ok. rpc(Request) -> case Request of {call_python, Function, Args} -> io:format("call_python. F: ~p A: ~p~n", [Function, Args]), [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {pop, self()}, receive {ok, PyProcPid} -> PyProcPid ! {call_python, self(), {Function, Args}}, receive {ok, Result} -> PoolPid ! {push, self(), PyProcPid}, case Result of {ok, Result1} -> {ok, Result1}; _ -> unknown_result end; Msg -> {unknown_response, Msg} end; _ -> error1 end; get_pypid -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {pop, self()}, receive {ok, PyProcsPid} -> {ok, PyProcsPid} end; {put_pypid, PyProcsPid} -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {push, self(), PyProcsPid}, receive ok -> ok end; exit -> io:format("(rpc) 1. testing exit~n", []), [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {pop, self()}, receive {ok, PyProcPid} -> io:format("(rpc) 2. testing exit. P: ~p~n", [PyProcPid]), exit(PyProcPid, test_failure), ok; _ -> error2 end; stop_python -> [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), PoolPid ! {stop, self()}, receive ok -> ok end end. %~ monitor_loop() -> %~ receive %~ {'DOWN', Ref, process, Pid, Reason} -> %~ % remove this python process and start a new one to replace it. %~ io:format("Python process ~p because ~p crashed; restarting~n", %~ [Pid, Reason]), %~ [{poolpid, PoolPid} | _] = ets:lookup(pipelinetable01, poolpid), %~ PoolPid ! {remove_and_add, Ref, Pid}, %~ monitor_loop() %~ end. restarter(PoolPid) -> receive {'EXIT', _Pid, normal} -> % not a crash ok; {'EXIT', _From, shutdown} -> exit(shutdown); % manual termination, not a crash {'EXIT', PyProcPid, Reason} -> io:format("Restarting Py process ~p/~p~n",[PyProcPid, Reason]), % % Remove the old process that died from the pool. % Restart a new erlang/python process to replace the one that died. % Insert the new one in the pool. % PoolPid ! {restart, PyProcPid}, restarter(PoolPid); shutdown -> ok end. pool_loop(PyProcs, ProcessWaitTime, PythonModule) -> receive {push, _From, Proc} -> PyProcs1 = [Proc | PyProcs], pool_loop(PyProcs1, ProcessWaitTime, PythonModule); {pop, From} -> case PyProcs of [] -> % Give it a chance to return a process to the pool. timer:sleep(ProcessWaitTime), self() ! {pop, From}, pool_loop(PyProcs, ProcessWaitTime, PythonModule); [PyProc | PyProcs1] -> From ! {ok, PyProc}, pool_loop(PyProcs1, ProcessWaitTime, PythonModule) end; {restart, PyProcPid} -> case lists:member(PyProcPid, PyProcs) of true -> % remove the python process from the pool. PyProcs1 = proplists:delete(PyProcPid, PyProcs), % create a new python process. {ok, PyPid} = python:start(), PyProcPid1 = spawn_link( erlport_05_py, python_loop, [PyPid, PythonModule]), % add the new python process to the pool. PyProcs2 = [PyProcPid1 | PyProcs1], pool_loop(PyProcs2, ProcessWaitTime, PythonModule); false -> pool_loop(PyProcs, ProcessWaitTime, PythonModule) end; {stop, From} -> stop_python_processes(PyProcs), From ! ok, ok end. python_loop(PyPid, PythonModule) -> receive {call_python, From, {Function, Args}} -> Result = python:call(PyPid, PythonModule, Function, Args), From ! {ok, Result}, python_loop(PyPid, PythonModule); {stop, From} -> python:stop(PyPid), From ! ok end. start_python_processes(0, _, PyProcs) -> PyProcs; start_python_processes(N, PythonModule, PyProcs) -> {ok, PyPid} = python:start(), PyProcPid = spawn_link(?MODULE, python_loop, [PyPid, PythonModule]), %PyProcsPid = spawn(fun() -> % erlport_05_py:python_loop(PyPid, PythonModule) end), io:format("Started Erlang/Python process -- PyProcPid: ~p PyPid: ~p~n", [PyProcPid, PyPid]), start_python_processes(N - 1, PythonModule, [PyProcPid | PyProcs]). stop_python_processes([]) -> ok; stop_python_processes([PyProcPid | PyProcs]) -> io:format("Stopping Erlang/Python process -- PyProcPid: ~p~n", [PyProcPid]), PyProcPid ! {stop, self()}, stop_python_processes(PyProcs). Notes: Notice how, in function start_python_processes , we use spawn_link rather than spawn to create our processes. That tells Erlang to send us (i.e. the process that called spawn_link ) a message when any of these processes fails.

, we use rather than to create our processes. That tells Erlang to send us (i.e. the process that called ) a message when any of these processes fails. Then in our "start" function, we create a process to listen for and receive those failure messages. This process is implemented in function restarter .

. Function restarter listens for those messages. If it receives a message that indicates a failure, it sends a message to the pool process telling it to remove the dead one and to create a new one and add it to the pool.

listens for those messages. If it receives a message that indicates a failure, it sends a message to the pool process telling it to remove the dead one and to create a new one and add it to the pool. And, this capability is implemented by a new clause in the receive statement in function pool_loop . And, here is the driver, an Erlang script that can be used to run the above code: #!/usr/bin/env escript %% vim:ft=erlang: %%! -sname crow1 -setcookie dp01 main(["-h"]) -> usage(); main(["--help"]) -> usage(); main(Args) -> ArgsSpec = [ {"p", "processes", yes}, {"o", "outfile", yes} ], Args1 = erlopt:getopt(ArgsSpec, Args), Opts = proplists:get_all_values(opt, Args1), Args2 = proplists:get_all_values(arg, Args1), NumProcs1 = proplists:get_value("p", Opts), NumProcs2 = proplists:get_value("processes", Opts), NumProcs = case NumProcs1 of undefined -> case NumProcs2 of undefined -> 2; _ -> list_to_integer(NumProcs2) end; _ -> list_to_integer(NumProcs1) end, OutFile1 = proplists:get_value("o", Opts), OutFile2 = proplists:get_value("outfile", Opts), OutFile = case OutFile1 of undefined -> case OutFile2 of undefined -> standard_io; _ -> {ok, OutFile3} = file:open(OutFile2, [write]), OutFile3 end; _ -> {ok, OutFile3} = file:open(OutFile1, [write]), OutFile3 end, {NumReps1, M1, N1} = case Args2 of [] -> {2, 4, 3}; [NumReps] -> {list_to_integer(NumReps), 4, 3}; [NumReps, M, N] -> {list_to_integer(NumReps), list_to_integer(M), list_to_integer(N)} end, run(NumProcs, NumReps1, M1, N1, OutFile), case OutFile of standard_io -> ok; _ -> file:close(OutFile), ok end. run(NumProcs, Count, M, N, IoDevice) -> io:format("NumProcs: ~p Count: ~p M: ~p N: ~p~n", [NumProcs, Count, M, N]), RestarterPid = erlport_06:start(NumProcs, py_math_01, 100), run_n(1, Count, M, N, IoDevice), erlport_06:stop(RestarterPid), ok. run_n(Count, Max, _, _, _) when Count > Max -> ok; run_n(Count, Max, M, N, IoDevice) -> Result = erlport_06:rpc({call_python, run, [M, N]}), io:format(IoDevice, "Result ~p:~n~p~n", [Count, Result]), run_n(Count + 1, Max, M, N, IoDevice). usage() -> io:format(standard_error, "usage:~n", []), io:format(standard_error, " $ erlport_06.escript -h|--help -- show this help~n", []), io:format(standard_error, " $ erlport_06.escript [options] iters [m n]~n", []), io:format(standard_error, "options:~n", []), io:format(standard_error, " -p -- number of processes~n", []), io:format(standard_error, " -o filename -- output file name~n", []), io:format(standard_error, "arguments:~n", []), io:format(standard_error, " iters -- number of iterations to run~n", []), io:format(standard_error, " m n -- size of array to create~n", []), ok. Notes: We use erlopt to give some help with parsing command line arguments. It's available here: https://code.google.com/p/erlopt/.

to give some help with parsing command line arguments. It's available here: https://code.google.com/p/erlopt/. After collecting command line options and arguments, we call function run/5 , which (1) initializes our processes and starts up restarter/1 ; (2) call the Python function the requested number of times; and, finally, (3) stop all the Erlang/Python processes and the restarter/1 process itself.