Working with a large cluster of machines with no means of direct access comes with a fresh set of challenges for sysadmins familiar with traditional cloud hosting ecosystem.

Access to logs is key in debugging and resolving critical errors in your deployed application, because as comprehensive as your build tests are, the production environment can be chaos.

Centralised network logs are typically local to the machine, with a variety of cloud delivery methods for periodic transmission of both aggregate and verbose logs to a centralised store. In the decentralised space, this technique begins to exhibit prohibitive performance and cost implications at scale.

We’ve spent a lot of time road testing applications capable of handling pub/sub message broadcast in order to head off performance pinch points in the network. We’ve settled on the use of NSQ.

Architecturally the implementation is relatively simple, with room to grow as network applications influence requirements. In the first of a phased approach to integration, the NSQ server has become part of the Stargate application deployment process. Stargate is designed to exist on a high bandwidth network, crucially capable of consuming inbound transmitions from it’s direct descendants with ease.

In the testnet there are a manageable number of Gateways, so we started integration on Hosts first. We’re using Logrus for local logs which is easy to extend and have added a custom hook — a ‘context’ method — enabling logs to be broadcast across the network to the Stargate NSQ server.

Here’s how it looks:

// Original

log.Info(fmt.Sprintf("Queue size is %d", queue.len))

// Output: "Queue size is 16" // New

log.Context("request.queue.size").Info(queue.len)

// Output: "INFO[0001] 16 context=request.queue.size"

The context value dictates which NSQ channel the data belongs to, and is prepended by application type. Now, without any direct access to the Host hardware we can see realtime logs in a vertically scalable service.