vxargs: running arbitrary command with explicit parallelism, visualization and redirection

Overview

vxargs is inspired by xargs and pssh. It provides the parallel versions of any arbitrary command, including ssh, rsync, scp, wget, curl, and whatever. One reason to use it is to control a large set of machines in the wide-area network. For example, I use vxargs on PlanetLab to control hundreds of machines spreading around the globe when I'm working on DHARMA project.

The main features are:

parallelism: run many jobs at the same time

flexibility: arbitrary command with arbitrary options

visualization: monitor the total/per job progress in a curses-based UI

redirection: stdout and stderr of each individual job are redirected to files respectively for further analysis.

FAQs

Why not use pssh ?

There are a couple of reasons:

(1) with pssh, you can only run limited command, e.g. ssh, rsync, and with limited command-line options. It is not flexible. With vxargs you can run everything in the way you like it.

(2) vxargs has a curses-based user interface that can dynamically monitor the execution process.

(3) vxargs is only 1 python script, which is extremely simple to install. Why not use xargs? xargs could do some of the work. Check out the (rarely used) options of --max-procs (-P) and --replace (-i). However, there seems to be no easy way to track of which individual process is running and output from all processes are mixed together. xargs also can't specify the maximal life time for each process to run. vxargs addresses these issues.

Download

ChangeLog

RPM package of vxargs maintained by Andras Horvath

vxargs link on freshmeat

HOWTO

To install vxargs, simply download the latest vxargs python script, rename it to your favorite name (e.g. vxargs), make sure it has the executable permission (e.g. chmod +x /home/username/bin/vxargs ) and its dir is in your PATH. Of course, make sure you have Python 2.2 or above installed.

Read the man page or type vxargs --help for the detailed usage. Here I'll show several examples to explain how it works. Suppose the iplist.txt file has following content:

$ cat iplist.txt 216.165.109.79 #planetx.scs.cs.nyu.edu 158.130.6.254 #planetlab1.cis.upenn.edu 158.130.6.253 #planetlab2.cis.upenn.edu 128.232.103.203 #planetlab3.xeno.cl.cam.ac.uk

check the uptime of every node in iplist.txt using ssh in parallel.

vxargs -a iplist.txt -o /tmp/result ssh {} uptime

Note: {} is replaced by the dynamic arguments (IP addresses) respectively for each job. This is equivallent to run

ssh 216.165.109.79 uptime ssh 158.130.6.254 uptime ssh 158.130.6.253 uptime ssh 128.232.103.203 uptime

-n

cat /tmp/result/abnormal_list



Synchronize the local directory mirror with all current PlanetLab production nodes

curl --silent https://www.planet-lab.org/db/nodes/production_hosts.php | vxargs -P 2 -y rsync -az -e ssh --delete mirror $SLICE@{}:



Run startjob on every cluster node named from cluster001 to cluster128 ( New! )

pattern cluster[001-128] | vxargs -P 2 -o cluster/result/`safepath` ssh {} startjob



Download cotop information from every node in the list

vxargs.py -a iplist.txt --timeout=20 curl http://{}:3120/cotop?sort=9



run a CoDNS query on every node against "www.google.com"

vxargs.py -a iplist.txt -o /tmp/codns/ -y -t 20 bash -c 'echo www.google.com| nc {} 4119'



Known Bugs

Note: the following bug is fixed in 0.3 When a command spawns multiple processes, e.g. bash -c 'echo www.google.com| nc {} 4119' , after timeout, only the main process will be terminated or killed (in the example, bash will be killed but nc may still be alive).

Feedback

Send me email if you encounter problems, find bugs, or have any random comments: maoy AT cis.upenn.edu

Last Modified: $Id: index.html,v 1.27 2005/07/27 21:04:02 maoy Exp $