Centralized Logging

With horizontally scaled monoliths or services, the default logging behavior for most web frameworks has diminishing value with each new server. Out of the box, a framework typically has its logs written to STDOUT or a log file, which works great when first starting a project. However, once you start using your service in production debugging becomes a guessing game of hoping the server whose logs you are tailing happens to be the one to handle the request, and that traffic is low where you are able to see the log before more requests replace it on your screen.

Centralized logging solves this problem by having all your servers relay their logs to a centralized database. With all your logs going to one database, you gain all the database features for filtering and finding the exact information you need. Finding a single service’s issue or comparing and contrasting a service to other services even in different environments becomes trivial.

One of the software stacks for accomplishing this is called “ELK”. ELK is an abbreviation for Elasticsearch, Logstash, & Kibana is one of the more popular solutions for centralized logging over the past few years. At Bleacher Report we outsourced the “ELK” software stack by using Logz (https://logz.io), which is an ELK/logging software as a service. “ELK” is a very JVM centric stack so it made sense for us to have someone else do all the JVM tuning and setup, as we don’t use the JVM for our services.

In addition to Logz which gave us the functionality to format, view, and visualize our logs, we needed the functionality to send our logs to Logz. For that functionality we used Filebeat (https://www.elastic.co/products/beats/filebeat) which is a logstash forwarder written in Golang by the “ELK” creators, Elastic. Our OPS team setup an AWS Elastic Beanstalk extension to have our log files rotated (so the log files don’t use all our disk space) and forwarded to Logz though Filebeat.

Plug_logger_json

After discovering Logz, I started to integrate it with one of our Elixir Phoenix applications but soon hit road blocks around formatting our logs into JSON (Logz’s recommended format) and logging more fields than what Phoenix’s Plug.Logger logs by default. I searched existing packages, but didn’t find the right fit for our use case so I then began creating plug_logger_json (https://github.com/bleacherreport/plug_logger_json).

The end result was that plug_logger_json formats an http log as JSON and logs the following fields about a request:

In addition to the above fields, filebeat adds the field beat.hostname. This is a key part of our setup as our hostnames include the service name and environment which allows us to use that single field to filter by service or environment. A query of “beat.hostname: *prod*” will return metrics for all production instances. Likewise, a query of “beat.hostname: *stag-articles” will return metrics for all our staging article instances.

With these metrics being logged to logz, we’ve been able to do dashboard visualizations to:

View 95 percentile response times for any given environment, service, host or controller action

View average response times for any given environment, service, host or controller action

View the percentage of our requests with any given HTTP status code (500 internal server error for example) and view the exact request (with sensitive information filtering) that caused the 500.

View the req/s for any given environment, service, host or controller action

View our most/least popular endpoints

View usage percentages for android, iOS, and web browsers

View CDN response times

Trace requests across microservices to find where an error occurred

and more

Best of all, all these metrics can be viewed historically or in real time. Without first class New Relic Elixir support, we’ve still effectively built out our own application monitoring tools through Logz. This has been incredibly valuable as we’ve been able to standardize our logging across all our Elixir and Phoenix apps.

We’re already using this system in production for many of our Elixir services for monitoring and have even used it for performance testing how our services handle 10x production traffic.

To try plug_logger_json for yourself, check out the README at https://github.com/bleacherreport/plug_logger_json to view the latest setup information. Setting up plug_logger_json takes less than 5 minutes!