How to monitor geth and autorestart it on crashes with monit

638 reads

Monit monitors Geth, autorestarts it on crashes and sends email alerts. Good, good Monit!

I’m running geth and TheMillionEtherHomepage.com back-end on a low-cost Digital Ocean (DO) Ubuntu droplet. I use monit to autorestart geth and send email alerts when something goes wrong with the processes.

Here I’ll cover a working solution and a beautiful one which isn’t working (hope somebody will eventually fix it).

Install and setup monit

$ sudo apt-get update

$ sudo apt-get install monit

Backup the config file and start editing it with nano:

$ sudo cp /etc/monit/monitrc /etc/monit/monitrc-orig # backup

$ sudo nano /etc/monit/monitrc

Allow access to monit status. Uncomment these lines:

set daemon 120 # number of seconds between checkups

set httpd port 2812 and

use address localhost # only accept connection from localhost

allow localhost # allow localhost to connect to the server

Let monit include *.cfg files from its conf.d directory. Uncomment (or add at the very end):

include /etc/monit/conf.d/*.cfg

Set up mail server to send alerts. Add these lines if you are using gmail:

# GMAIL

set mailserver smtp.gmail.com port 587

username "username@gmail.com" password "12345678"

using tlsv1

with timeout 30 seconds

set alert username@gmail.com # monit will send alerts to this address

Exit nano editor: Ctrl+X to save, type Y, press enter, and enter again when it asks for a filename (leave the same filename).

Set up permossions as monitrc now has very sensitive data.

$ sudo chmod 600 /etc/monit/monitrc

For better understanding of monit:

Setup geth monitoring — the working solution

Create and start editing geth.cfg file with nano:

$ sudo nano /etc/monit/monitrc/geth.cfg

Paste this code:

# GETH

CHECK PROCESS geth MATCHING "[g]eth.*fast"

start program = "/bin/bash -c 'geth --fast --rpc --rpcport 8545 --rpccorsdomain localhost --cache=16 >/dev/null 2> /home/my-logs-folder/geth.log'" as uid username as gid username

stop program = "/bin/bash -c 'kill -HUP $(ps aux | grep '[g]eth' | awk '{print $2}')'" as uid username as gid username

if not exist then alert

if not exist then restart

Ctrl+X to save, type Y, press enter, and hit enter again when it asks for a filename.

We use >/dev/null here because geth outputs everything to the stderr (standard error output). And we save all stderr output (i.e. all logs) to the /home/my-logs-folder/geth.log file.

We use -HUP signal because -SIGINT (which you typically send with Ctrl+C) will not reach geth here.

Restart monit:

$ sudo monit reload

This config will perfectly send an email to you when geth crashes and will perfectly auto-restart geth

But. The disadvantage here is that it cannot send “graceful” -SIGINT signal to geth through monit. And have to send -HUP instead, which is presumably not safe (Is it safe to kill geth with SIGTERM?). I tried it and it works, but I prefer to send -SIGINT manually instead:

$ sudo monit unmonitor geth # so that monit won't restart geth again

$ PID_OF_GETH=`pidof geth` # get geth's process ID

$ kill -2 $PID_OF_GETH # KILL IT NOW!... but do it gracefully.

This is clumsy. I’ve tried to make it more graceful but failed. Posted this question at stackexchange — How to monitor and auto-restart geth with monit?

Setup geth monitoring — the right way

I know that the config should look very similar to this.

geth.cfg

# GETH

CHECK PROCESS geth with pidfile "/home/username/monit/geth.pid"

start program = "/bin/bash -c '/home/username/monit/geth.sh start'" as uid username as gid username

stop program = "/bin/bash -c '/home/username/monit/geth.sh stop'" as uid username as gid username

if not exist then alert

if not exist then restart

geth.sh

#!/bin/bash

PID='/home/username/monit/geth.pid'

LOGS='/home/username/monit/geth.log'

KEYS='\--fast --rpc --rpcport 8545 --rpccorsdomain localhost --ipcpath /home/username/.ethereum/geth.ipc --cache=16'

case $1 in

start)

setsid geth "$KEYS" &> "$LOGS" &

echo $! > "$PID"

;;

stop)

kill -SIGINT `cat "$PID"`

rm "$PID"

;;

*)

echo "usage: geth.sh {start|stop}"

;;

esac

exit 0

Beautiful isn’t it? If you know how to make it work please submit your answer here — How to monitor and auto-restart geth with monit?

Get monit status

After you restart monit, you can start geth with the following command:

$ sudo monit start geth

This will run geth, autorestart it on crashes and notify you via email if something goes wrong.

You can also check the monit status by the following command:

$ sudo monit status

And have a screen like this (mlneth here is python back-end of my project):

Monitoring geth

Happy monitoting and no alerts to you all!

Tags