Almost every single customer I've worked with has had a different way of capturing and analysing activity in their production systems. Researching and comparing offerings feels like an endless task and a source of endless debate. Regardless of the large number of choices, being able to centralise and analyse application server exceptions is an important step towards improving the reliability of code in production.

I wanted to see how Logstash using Elasticsearch as a data store and Kibana as an analysis tool could be used to monitor a Django application.

To start, I created a small Django application that served random data, raised exceptions occasionally via a catch-all view and had a celery task which would randomly pick a log level to record to every time a new task was scheduled. With all these combinations it should make for a good set of example data to dig through and analyse.

One Django project, many response types I installed Django , django-celery , SQLAlchemy (needed to record celery's results to an SQLite file), gunicorn and python-logstash . I then created a project called faulty and an app within it called example . $ pip install Django == 1 .7.1 \ django-celery == 3 .1.16 \ gunicorn == 19 .1.1 \ SQLAlchemy == 0 .9.8 \ python-logstash == 0 .4.2 $ django-admin startproject faulty $ cd faulty $ django-admin startapp example I made the following changes to faulty/settings.py : import djcelery INSTALLED_APPS = ( 'django.contrib.admin' , 'django.contrib.auth' , 'django.contrib.contenttypes' , 'django.contrib.sessions' , 'django.contrib.messages' , 'django.contrib.staticfiles' , 'kombu.transport.django' , 'djcelery' , 'example' , ) BROKER_URL = 'django://' CELERY_RESULT_BACKEND = 'db+sqlite:///results.db' CELERY_SEND_EVENTS = True CELERY_TASK_RESULT_EXPIRES = 10 CELERY_TIMEZONE = 'UTC' CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler" djcelery . setup_loader () LOGGING = { 'version' : 1 , 'disable_existing_loggers' : False , 'handlers' : { 'logstash' : { 'level' : 'WARNING' , 'class' : 'logstash.LogstashHandler' , 'host' : 'localhost' , 'port' : 5000 , 'version' : 1 , }, }, 'loggers' : { 'django.request' : { 'handlers' : [ 'logstash' ], 'level' : 'WARNING' , 'propagate' : True , }, 'example.tasks' : { 'handlers' : [ 'logstash' ], 'level' : 'WARNING' , }, }, } I then created a task that would accept a randomly-generated content string, its reported length and content type. It would create a sha1 hash of the content and output that hash, the reported content length and the content type to a random log level. The content length would be the length before the content was wrapped. I could have derived the length in the task itself or hashed the content in the view but the purpose of this exercise is to create noise. example/tasks.py: import hashlib import logging from random import choice from celery import task from celery.utils.log import get_task_logger logger = get_task_logger ( __name__ ) @task () def log_response ( content , content_length , content_type ): log_method = choice (( logger . info , logger . warning , logger . error , logger . critical )) hasher = hashlib . sha1 () hasher . update ( content ) log_method ( ' {0} {1} {2} ' . format ( hasher . hexdigest (), content_length , content_type )) return True I configured the project to direct all traffic to a single endpoint. faulty/urls.py: from django.conf.urls import patterns , url urlpatterns = patterns ( '' , url ( r '^$' , 'example.views.endpoint' ), ) I then created a view that has a 25% chance of raising an exception and if it doesn't, generates random content, sends that content to the celery task and serves the content. example/views.py: import mimetypes from random import choice , randint from textwrap import wrap from django.http import HttpResponse from .tasks import log_response def endpoint ( request ): if randint ( 0 , 3 ) == 3 : # 25% chance of raising an error raise choice (( AssertionError , IndexError , KeyError )) # Pick a random content type: content_type = choice ( mimetypes . types_map . values ()) content_length = randint ( 512 , 2048 ) # Create a random string between 512 and 2048 characters long # and line wrap it every 50 characters choices = 'abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*(-_=+)' content = '' . join ([ choice ( choices ) for _ in range ( content_length )]) content = '

' . join ( wrap ( content , width = 50 )) # Send the content, its length and type to this task log_response . delay ( content , content_length , content_type ) return HttpResponse ( content , content_type = content_type )

Running the app server and task handler I then setup the database (as celery needed it to keep track of its tasks) and launched celeryd and gunicorn. $ python manage.py syncdb $ python manage.py migrate $ python manage.py celeryd & $ gunicorn faulty.wsgi:application -b 127 .0.0.1:8050 & I tested the endpoint a few times and never did the responses look alike. Here is one response I got back: $ curl -v localhost:8050 ... < HTTP/1.1 200 OK < Server: gunicorn/19.1.1 < Date: Mon, 03 Nov 2014 09:53:34 GMT < Connection: close < Transfer-Encoding: chunked < Vary: Cookie < X-Frame-Options: SAMEORIGIN < Content-Type: image/jpeg < 9h*mybxd!*+bf4%#hir+3w)kmp)%_vc%idx6qvz@7678=pkf-p 4h%(d9=ip7snmb&$s0@wrpj2*s_uif&saam=_-%u50n0_4*vh# i9+n_c@!c1&zi__9yxx$4#tfy0a1-mv+6oyl@-1(h4_fk^jm-o !7^ak(ih)q=q@@n0_0!$zy^a==-k99%%q8^20q$k4(_3qvt30e k=l(i4_2sxq^zt=5mncp!saue0b@&$do-04%wrx4(#kn2pwzl+ %@d50ib#rlz2av*wxjg47i&m7da2ff^9g74-w$fd3_^34bzz%* 945$woykwd4wot4*1l+lcca2+5%riudbxyy0d=jm-pkp123v@v xibyy0rc00!cwggfoa2k$q-qd)e%-f&gbn2042@1(0bomv7%j4 jk7llo7x6!oj($l-+mcncp$4k^uw-ldi=-$zyxq*0+9@y2)-62 3^=nywszj3tlb@!1nk659r1jn_s6zk(sllfd9+3iqfqr(^-8da *ysw9s6huzt^n1r_0eg=ap&mzgd#r(=0*5a^39lh!(%v0f-fv+ @pxhpokbt(zum3s2i4*(@n_qszcfl+8kmi$d#ixi(&!1l#f*@3 c8o8lc)f@ts+mde4m))h9%lqaqqafh7(3@#=wli_veot7*1lwr kr#u!#pu7t_d34l^flqt$brxkh+4hd9e58na0)rz!8+0=+w9pr #ca0=dkk9na_9_hbammn==7hi3aflyr^$_1e$s+%!hrb7^6j*& d5$wish65s1b3!q*^*f7bhuxfi!=tv^=$-#p5(m^-fzc_zpq@1 8anr)2av*+pb$14lfz1_f-^n34n%3%9*y8^++ijun_e@zm=&_0 @5^rpf$c&3&zsofrqjeuz2i(b9k=6j_c%q##jc#j+7=&iwa5t= aise&^owl$iho0k(*33r-l-=s+3kdd2x)ee^$+8&qk$nsk&h&h i2zxl7&s^9zpbd+r3qp)d%ste9=)(hn5**&kvi^(w(d^0bklrk 4#_&* And another response, this time returning HTTP 404: $ curl -v localhost:8050 ... < HTTP/1.1 404 NOT FOUND < Server: gunicorn/19.1.1 < Date: Mon, 03 Nov 2014 09 :59:28 GMT < Connection: close < Transfer-Encoding: chunked < Vary: Cookie < X-Frame-Options: SAMEORIGIN < Content-Type: text/html < <!DOCTYPE html> <html lang = "en" > <head> <meta http-equiv = "content-type" content = "text/html; charset=utf-8" > <title>Page not found at /</title> ...

Exception archaeology In this setup, Logstash will transport exceptions originating from Django, store them in Elasticsearch and Kibana will be used to view and analyse the exceptions. These are some condensed installation commands I ran on my Ubuntu 14 system: Install the Java runtime: $ sudo apt install \ default-jre Install and start Elasticsearch: $ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.4.deb $ sudo dpkg -i elasticsearch-1.3.4.deb $ sudo /etc/init.d/elasticsearch start Install Logstash: $ wget https://download.elasticsearch.org/logstash/logstash/packages/debian/logstash_1.4.2-1-2c0f5a1_all.deb $ sudo dpkg -i logstash_1.4.2-1-2c0f5a1_all.deb Install Kibana and launch a web server serving its static content on port 8000: $ wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.1.tar.gz $ tar xzf kibana-3.1.1.tar.gz $ cd kibana-3.1.1/ $ python -m SimpleHTTPServer &

Capturing exceptions With all our required packages installed and the web application and task handler already running, I created the following configuration for Logstash. It collects messages from syslog and tcp:5000 (used by Django for reporting exceptions) and sends them both to Elasticsearch and prints them to stdout. $ cat logstash.conf input { tcp { port = > 5000 codec = > json } udp { port = > 5000 type = > syslog } } output { elasticsearch_http { host = > "127.0.0.1" port = > 9200 } stdout { codec = > rubydebug } } I used this command to launch Logstash: $ /opt/logstash/bin/logstash agent -f logstash.conf

Generating traffic I used ab to generate traffic for Django: $ ab -n 200 -c 10 http://127.0.0.1:8050/ With a decent amount of exceptions captured and sent to Elasticsearch, I looked to see what index name Logstash had picked, it turned out to be logstash-2014.11.04 : $ curl --silent localhost:9200/_cat/indices?v health index pri rep docs.count docs.deleted store.size pri.store.size yellow logstash-2014.11.04 5 1 848 0 453 .8kb 453 .8kb I then open Kibana and loaded the default dashboard/file/logstash.json file with the URL below and saw an interface that looked like this. http://127.0.0.1:8000/index.html#/dashboard/file/logstash.json