As described in the comments above, I made you some sample code. It is based on Redis and I am suggesting you run Redis on your cluster manager node which is presumably close to the nodes of your cluster and always up - so a good candidate for a statistics gathering service.

The sample code is a dummy job, written in Python, and a monitoring routine written in bash , but the job could just as easily be written in C/C++ and the monitoring routine in Perl - there are all sorts of bindings for Redis - don't get hung up on a language.

Even if you can't read Python, it is very easy to understand. There are 3 threads that run in parallel. One just updates a string in Redis with the total elapsed processing time. The other two update Redis lists with time series data - a synthesised triangular wave - one running at 5 Hz and the other at 1 Hz.

I used a Redis string where variables don't need to record a history and a Redis list where a history is needed. Other data structures are available.

In the code below, the only 3 interesting lines are:

# Connect to Redis server by IP address/name r = redis.Redis(host='localhost', port=6379, db=0) # Set a Redis string called 'processTime' to value `processsTime` r.set('processTime', processTime) # Push a value to left end of Redis list r.lpush(RedisKeyName, value)

Here is the dummy job that is being monitored. Start reading where it says

###### # Main ######

Here is the code:

#!/usr/local/bin/python3 import redis import _thread import time import os import random ################################################################################ # Separate thread periodically updating the 'processTime' in Redis ################################################################################ def processTimeThread(): """Calculate time since we started and update every so often in Redis""" start = time.time() while True: processTime = int(time.time() - start) r.set('processTime', processTime) time.sleep(0.2) ################################################################################ # Separate thread generating a times series and storing in Redis with the given # name and update rate ################################################################################ def generateSeriesThread(RedisKeyName, interval): """Generate a saw-tooth time series and log to Redis""" # Delete any values from previous runs r.delete(RedisKeyName) value = 0 inc = 1 while True: # Generate next value and store in Redis value = value + inc r.lpush(RedisKeyName, value) if value == 0: inc = 1 if value == 10: inc = -1 time.sleep(interval) ################################################################################ # Main ################################################################################ # Connect to Redis on local host - but could just as easily be on another machine r = redis.Redis(host='localhost', port=6379, db=0) # Get start time of job in RFC2822 format startTime=time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime()) # ... and set Redis string "startTime" r.set('startTime',startTime) # Get process id (pid) pid=os.getpid() # ... and set Redis string "pid"" r.set('pid',pid) # Start some threads generating data _thread.start_new_thread( processTimeThread, () ) _thread.start_new_thread( generateSeriesThread, ('seriesA', 0.2) ) _thread.start_new_thread( generateSeriesThread, ('seriesB', 1) ) # Hang around (with threads still running) till user presses a key key = input("Press Return/Enter to stop.")

I then wrote a monitoring script in bash that connects to Redis, grabs the values and displays them on the terminal in a TUI (Text User Interface). You could equally use Python or Perl or PHP and equally write a graphical interface, or web-based interface.

#!/bin/bash ################################################################################ # drawGraph ################################################################################ drawGraph(){ top=$1 ; shift data=( "$@" ) for ((row=0;row<10;row++)) ; do ((y=10-row)) ((screeny=top+row)) line="" for ((col=0;col<30;col++)) ; do char=" " declare -i v v=${data[col]} [ $v -eq $y ] && char="X" line="${line}${char}" done printf "$(tput cup $screeny 0)%s" "${line}" done } # Save screen and clear and make cursor invisible tput smcup tput clear tput civis # Trap exit trap 'exit 1' INT TERM trap 'tput rmcup; tput clear' EXIT while :; do # Get processid from Redis and display pid=$(redis-cli <<< "get pid") printf "$(tput cup 0 0)ProcessId: $pid" # Get process start time from Redis and display startTime=$(redis-cli <<< "get startTime") printf "$(tput cup 1 0)Start Time: $startTime" # Get process running time from Redis and display processTime=$(redis-cli <<< "get processTime") printf "$(tput cup 2 0)Running Time: $(tput el)$processTime" # Display seriesA last few values seriesA=( $(redis-cli <<< "lrange seriesA 0 30") ) printf "$(tput cup 5 0)seriesA latest values: $(tput el)" printf "%d " "${seriesA[@]}" # Display seriesB last few values seriesB=( $(redis-cli <<< "lrange seriesB 0 30") ) printf "$(tput cup 6 0)seriesB latest values: $(tput el)" printf "%d " "${seriesB[@]}" drawGraph 8 "${seriesA[@]}" drawGraph 19 "${seriesB[@]}" # Put cursor at bottom of screen and tell user how to quit printf "$(tput cup 30 0)Hit Ctrl-C to quit" done

Hopefully you can see that you can grab data structures from Redis very easily. This gets the processTime variable set within the job on the cluster node:

processTime=$(redis-cli <<< "get processTime")

The TUI looks like this: