One of the key requirements to running a great platform is ensuring that it’s running optimally. To do this, detailed monitoring is required and it’s something we’re constantly tweaking and refining here at Conetix. We want to instantly see when there are issues and address them quickly to avoid any performance or service interruptions. This is reflected in our network uptime, we consistently deliver a platform with very little interruptions.

With all of the monitoring then comes the need to analyse and disseminate the data. More than 10,000 points of data are recorded every minute and it can become the proverbial needle in a haystack to find anomalies or errors which aren’t a widespread pattern. For instance, we process over 10 million emails a week, just through the systems we manage directly. This is mostly a smooth operation, we monitor processing latency for sending / receiving, spam rates, virus rates, throughput per domain and IP and protocol connection latency. By doing so, we catch any tiny glitches in most instances before a customer has even noticed.

Of course there's all the standard monitoring of server and application resource usage at a high level, which we entrust to PRTG. Everything from basic pings through to DNS resolution latency and SQL server deadlocks we have monitored to give us as much data as possible. We then have alerts set for various thresholds to instantly notify us of possible issues. This is then further checked through multiple external points using Panopta. Having these external checks are critical to ensure every aspect is covered, especially since Pantopa have 5 points of presence in Australia alone.

Through the utilisation of our Fortigate firewall Intrusion Prevention System (IPS), we also get a measure of what attacks are occurring and from where. We see constant attacks all the time and automatically block up to 50,000 attacks a day. The data is processed through an Elasticsearch, Logstash and Kibana (ELK) stack to provide breakdowns of attack times, source, destination and method of attack. This also gives us the power of instantaneous search, which greatly enhances our ability to analyse certain situations.

We also monitor each client VPS using some custom collection and then storage of the data in Graphite and then displayed via Grafana. The key to the system is that it's a zero configuration platform. Through the use of the Parallels virtualisation API, we're able to automatically gather these stats as soon as a new VPS instance is spun up. With Central, we have also started to expose this data directly (and automatically!) for clients to see too.

So, we have the collection of data down pat. The next step was to increase the detail in our real-time monitoring of this data. At present we had 2 x 24” monitors displaying data as well as additional monitors for our system administrators so that they could always see this information. However, this isn’t enough.

We started planning what most Internet Service Providers (ISP's) and hosting companies call a Network Operations Centre (NOC) within our office. Essentially, a large bank of displays to see everything as it occurs in real-time. With the advent of cheap TV’s, this is surprisingly cost effective to do these days. We picked up an absolute bargain from Office Works, snapping up 4 x 40” LED LCD TV’s which are full 1080p resolution for $299 ea. Only 3 years ago, you couldn’t even buy a basic TV (without the full HD resolution or LED back lighting) for the price we paid for all 4 of them.

Mounting them was made fairly easy, due to the fact that the TV’s themselves are extremely lightweight. Without the stand, they’re only 8kg’s. That’s nearly as light as the 24” monitors they replaced!

This was wasn’t even a challenge for seasoned renovator and system administrator, Sid. In fact, the biggest issue was the 100+ year old brick and mortar walls, which crumbled more than expected when trying to drill the holes. We ended up using liquid nails to ensure the plugs wouldn't come out when the bolts were in.

The next step was to find a system to drive these. After considering a NUC PC or similar, we went with some generic, Chinese made Android HDMI sticks. The particular model we went with is the MK809III, which is powered by a quad core RockChip processor. The supply of these is quite plentiful (just do a search on eBay) and they're very cheap to replace.

All we need to do was boot up and display a web browser, so we knew this would be a reasonably easy task for them to achieve. They’re also low power and we simply power them off the USB ports in the TV. We've even custom written a small Android app, which essentially is a full screen browser which opens at boot and gets the URL from a JSON call. We'll be detailing this in a follow-up blog post for those who are especially interested in this aspect.

And the end result!

We're constantly evolving the way we're showing the data in order to achieve the best possible results. For instance, we also now have one of the displays rotating with information for the customer service staff with information such as a summary of the ticket counts.

You can now read Part 2 covering the Android Sticks and Part 3 which covers some of the data collection.