Introduction to NetWorkSpaces

Installation

Server installation

Create a directory for your Python modules, say ~/myInstalls/python . Download the tgz or zip archive from the class web site and unpack it in a temporary directory (best not to use your module directory). cd nws/python/open_server python setup.py install --prefix= --home=~/myInstalls/python Test in a different directory, say /tmp . Let Python know where to find the nws module: export PYTHONPATH=~/myInstalls/python/lib/python then, let 'er rip: twistd -noy twistd -noy ~/myInstalls/python/nws.tac 2007/02/03 14:30 EST [-] Log opened. 2007/02/03 14:30 EST [-] twistd 2.1.0 (/usr/bin/python 2.4.2) starting up 2007/02/03 14:30 EST [-] reactor class: twisted.internet.selectreactor.SelectReactor 2007/02/03 14:30 EST [-] Loading /home/accts/njc2/myInstalls/python/nws.tac... 2007/02/03 14:30 EST [-] clientCode served from directory clientCode 2007/02/03 14:30 EST [-] clientCode directory doesn't exist 2007/02/03 14:30 EST [-] Loaded. 2007/02/03 14:30 EST [-] nwss.server.NwsFactory starting on 8765 2007/02/03 14:30 EST [-] Starting factory 2007/02/03 14:30 EST [-] twisted.web.server.Site starting on 8766 2007/02/03 14:30 EST [-] Starting factory 2007/02/03 14:30 EST [-] using temp directory /tmp ... Start firefox (or other browser) on the same machine and enter the url localhost:8766 . You should see something similar to:

Client Installation

In a new window, and back at your temporary install directory, execute: cd nws/python/open_client python setup.py install --prefix= --home=~/myInstalls/python Then start the babelfish translator: python ~/myInstalls/python/lib/python/nws/babelfish.py & As before, test in a different directory and let Python know where to find the nws module: export PYTHONPATH=~/myInstalls/python/lib/python Now try it: $ python Python 2.4.2 (#1, Oct 13 2006, 17:11:24) [GCC 4.1.0 (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import nws.client as nwsC >>> ws = nwsC.NetWorkSpace('testing ...') >>> ws.store('v', range(3)) >>> ws.fetch('v') [0, 1, 2] >>> ws.store('xyz', 123.456) >>> ws.store('zyx', {'cat': 'dog', 'mouse': 'click'}) Refresh the browser, you should now see your workspace. Clicking on the workspace should list the variables in it. Clicking on a variable should list the values bound to it.

serverPort=XXXX

serverHost=foo.bar.baz

localhost

babelfish.py

-p XXXX

-h foo.bar.baz

Use

x = y

y

y

x

x

y

NWS is designed to be a coordination facility that is language neutral. NWS clients exists for a variety of languages, including MATLAB, Perl, Python, R, and Ruby. This neutrality offers advantages, chief among them: i) NWS coordination patterns and idioms can be applied to any of these, ii) by using the quasi-lingua franca of ASCII strings to encode data, NWS can be used to coordinate heterogeneous ensembles of these.

It also implies certain costs, in particular it cannot (always) be as seamlessly integrated as native bindings.

Toss up: NWS names can be any ASCII string.

Thus, the above rendered in NWS-ese:

ws.store('x', ws.fetch('y'))

fetch

x

store

ws

In many languages it is possible to neaten this up — the introductory lectures demonstrated a “cleaner” API for Python:

sv.x = sv.y

sv

Coordinated Binding Behavior

x = y

y

y

But in an ensemble setting, somebody very well might do just that. In other words, in the context of coordination, an unbound name has a perfectly valid (and useful) interpretation: “Please hold.” It doesn't have to have this interpretation, but it seems to make sense, so let's run with it.

Now consider:

x = 123 x = 456

But in an ensemble setting, lot's of other processes may be interested in the sequence of values bound to x . If so, how do we know a particular value of x has been put to good use? Enter generative communication: some coordination events generate data that exist independent of any process, others consume such data. Let's interpret the binding of a value to a name as the addition of that value to a list of values mapped to that name rather than the (possible) overwriting of a single associated value. Let's further stipulate that we do so by maintaining a FIFO queue of values. But how do we ever shed values? To complete the picture: retrieval of a value bound to a name removes one value from the queue. Again, it doesn't have to have this interpretation, but arguably this is a reasonable one.

In sum: an assignment records a value of interest, a retrieval consumes one value, an empty list of values triggers 'Please Hold' for a retrieval.

Let's see how well these play together. In one or more Python sessions, run the following:

import nws def f(x): return x*x*x ws = nws.client.NetWorkSpace('table test') while 1: ws.store('r', f(ws.fetch('x')))

import nws ws = nws.client.NetWorkSpace('table test') for x in range(10): ws.store('x', x) for x in range(10): print 'f(%d) = %d'%(x, ws.fetch('r'))

Worker pre- (and post-) start. Number or workers variable. Value ordering — for a two-body ensemble.

Variations

Consider maintaining a global maximum. Suppose many processes are cooperating in a search to find a value, x max , that will maximize a function, F, and that knowing that F's maximum is at least F+, we can rule out some candidate x's. Further, let's assume F is expensive to evaluate, but the winnowing check is cheap.

We would like to do something like:

for x in MyCandidateList: currentMax = ws.fetch('max') if noGo(currentMax, x): continue y = f(x) if y > currentMax: ws.store('max', y)

fetch consumes a value that may not be replaced.

find

for x in MyCandidateList: currentMax = ws.find('max') if noGo(currentMax, x): continue y = f(x) if y > currentMax: ws.store('max', y)

Are we maintaining a single 'max'?

Is currentMax really current?

for x in MyCandidateList: currentMax = ws.find('max') if noGo(currentMax, x): continue y = f(x) currentMax = ws.fetch('max') if y > currentMax: currentMax = y ws.store('max', currentMax)

There are other uses for find , the most common being “write-once” variables: various data that are established at the beginning of a computation, or are independent of any one computation and that are needed by two or more ensemble members.

find alters the way the value queue is referenced, but what about variations in the queue itself? NWS supports four “types” (aka “modes”):

FIFO: the default

LIFO: because you cannot have one without the other

Non-deterministic: Back to our Linda roots

Single: Not uncommon just to want the “last one”. Works well with find . Good for status values — simplifies monitors that read them.

ws.declare()

Managing Workspaces

So if multiple processes each execute:

ws = nws.client.NetWorkSpace('snake pit', host='python.zoo.cs.yale.edu')

But who would own it? And why would that matter?

Answers: By default, the process that first mentions the workspace to the server owns it. We need to clean up, and traditional gc doesn't really apply.

Workspaces, like tuplespaces, can be persistent. In practice this can quickly lead to a mess, so they are by default transitory: when the process that owns them exits, they exit. This can make staging an ensemble a bit of a pain, even if the general idea is right. So “use” vs “open”.