My choice for using a monitoring tool is currently Prometheus. With Prometheus you can easily gather metrics of applications and/or databases to see the actual performance of the application/database. When you have a tool like Zabbix or Nagios, you’ll need to write one or multiple scripts to gather all metrics and see how much you can store in your database without loosing performance of your monitoring tool. About the why Prometheus and not doing this with Zabbix or other monitoring tool is an subject for maybe an other blogpost.

One interesting application to monitor is Consul. When you look for monitoring Consul in google, you’ll find a lot of pages that shows you that you can use Consul as a monitoring tool but not many on how you can monitor Consul itself. With this blogpost I’ll describe what steps I have taken to monitor Consul. Please keep in mind this is is just a start and it is incomplete, so if you have suggestions to improve it please let me know.

On this blogpost we will do the following actions:

Configure Consul

Configure statsd exporter

Create some graphs

Configure Consul

Consul has a way for exposing metrics, called Telemetry. With Telemetry you can configure Consul for sending performance metrics to external tools/applications to monitor the performance of Consul. You can see some more information about configuring Consul for Telemetry on this page https://www.consul.io/docs/agent/options.html#telemetry. With this blogpost we will use the “statsd_address” option. In order to make this happen, we have to update our Consul configuration on the Consul Servers to add the following configuration:

"telemetry": { "statsd_address": "192.168.1.202:9125" },

The IP Address is from the host itself, and in this case we have to send it to port 9125. Once we have configured this on all the Consul Servers, we need to restart them one by one so we keep the Consul Cluster running.

Configure statsd-exporter

When you use Prometheus, you’ll use exporters for your applications or databases to expose the metrics for Prometheus. Prometheus will scrape these metrics every 15 seconds (Well, you can configure that) and store them in the database. Consul doesn’t have an endpoint available to gather these metrics, we have to make use of the “statsd-exporter”. We already configured the Consul Servers to send metrics to a statsd server, so we only have to make sure we start one on each host running Consul Server.

Before we start an statsd-exporter, we first have to do some configuration first. We need to make sure we have a statd mapper file. With this file we map statsd fields into fields for prometheus and we can add labels per metric. On this page I have configured almost all mapping entries: https://gist.github.com/dj-wasabi/d9b31c4b74e561c72512f4edbdfe6927

Lets explain how an entry looks like:

consul.*.runtime.* name="consul_runtime" type="$2" host="{{ inventory_hostname }}"

The first line in this mapping construction is the name of the statsd field. You’ll see asteriks, these are wildcards and these can be used as a value by assiging it to a filter. First asteric can be used as $1, second as $2 etc. The “name” is the name of the metric field in Prometheus, in this case the name is consul_runtime. Prometheus doesn’t accept dots in the names, so we have to use underscores for this.

We then create a label named “type” and we assign the value $2. The original statsd field that Consul has sent to the statsd-exporter looks like this:

consul.b139924a6f44.runtime.num_goroutines

With this mapping construction, we assign $1 with value b139924a6f44 and $2 with value num_goroutines. The last “host” label is something I add with Ansible. I use Ansible to deploy this statsd mapper file (And all other monitoring related configuration) to all my Consul servers and then I can filter in Prometheus or other graphing tool like Grafana which metrics belongs to which host.

I use the Docker container for the statsd-exporter, I place the statsd mapper file on /data/statsd-exporter.conf and start the following command:

docker run --name statsd-exporter \ -v /data/statsd-exporter.conf:/tmp/statsd-exporter.conf:ro \ -p 9102:9102 -p 9125:9125/udp prom/statsd-exporter \ -statsd.mapping-config=/tmp/statsd-exporter.conf \ -statsd.add-suffix=false

I mount the statsd mapper file as ro (Read Only), open 2 ports and configure the statsd-exporter tool to use the mapper file. In this case 2 ports are openend. One port on which the statsd is available for retrieving performance metrics (9125) and the other port (9102) is used for Prometheus to scrape these metrics.

Prometheus

At this moment, I have added the following into the Prometheus configuration to let Prometheus scrape the statsd-exporter metrics:

scrape_configs: - job_name: 'consul' static_configs: - targets: ['192.168.1.202:9102'] labels: {'host': 'vserver-202'} - targets: ['192.168.1.203:9102'] labels: {'host': 'vserver-203'} - targets: ['192.168.1.204:9102'] labels: {'host': 'vserver-204'}

This works for now because I Ansible to generate a Prometheus configuration, but I’ll go probably using a consul_sd_config in the near future so I won’t have to add all kinds of static configuration.

Once we have restarted Prometheus and started the statsd-exporter containers, I can see the following metrics appear in Prometheus:

consul_runtime{host="vserver-204",type="free_count"} 2.3117552e+08 consul_runtime{host="vserver-204",type="heap_objects"} 22853 consul_runtime{host="vserver-204",type="num_goroutines”} 82

(And much more, but the above 3 are examples which are used as an explanation in the previous paragraphs.)

Create some graphs

Now we have the metrics in Prometheus, but now we need to create some graphs. We use Grafana for this. Grafana can be used for creating Graphs to show the actual performance of Consul. I’ve created a Dashboard and uploaded it to grafana.com: https://grafana.com/dashboards/2351

Some of the following can be found on the dashboard:

Who is the Consul Leader;

How many Consul Servers are running?

Some CPU idle utilisation and load information (You’ll need the node-exporter for this);

Performance of writing information on the Consul leader to disk or the other nodes;

etc

This dashboard is not finished yet and is a mixed combination of Consul Leader data and Consul Server specific. So some graphs shows information specific to the selected Consul server (Dropdown at the top of the page) and some graphs show specific data for the Consul Leader.

If you have suggestions to improve the current situation, by either suggestion a better statsd mapper configuration file or for the Dashboard, please let me know so I can improve it. I hope we can all benefit from each other to improve the availability and performance of Consul with this.