Web application for calculating and reporting the status of your IT services.

Introduction

For anyone who has experienced even briefly the life in a modern IT environment of a medium to large company with some history, the following picture is far from surprising:

There are lots of applications, built with different technologies, by different teams, in different times with dependencies on each other. Users depend on a subset of them to carry out their daily work. Applications are up ’n running, people using internal services are productive and customers outside using web services are bringing profits for the company.

… most of the time

What happens when an error occurs in one or more applications?

Well, in this case the monitoring system may create an alert, users start calling / mailing screenshots and customers waiting.

Administrators immediately have to:

Identify the problem

Identify the causes

Try to give a quick solution while collecting information about the problem for future reference

All this evolves around a ringing phone (if not people’s physical presence) and the users are under the impression that no-one is engaged until someone answer their calls and assure them that someone is giving his best to solve the problem as soon as possible.

In a situation like this, may not be very clear how many and which exactly are the applications having problems. The dependencies may not be straightforward for the person sitting on the admin's chair the time of failure and the user feedback sounds either misleading or describing the end of the world.

A step in the right direction

The reader will find in this article a first approach to the problem. We are building a pretty simple dashboard-style web application in Django framework with the trivial, unimaginative name "appstatus".

We will not mention here the early stages of a basic Django project (python virtual environment, package installation, create project, create app). You can find them in multiple articles before this. We assume that you already have a working project. Lets start writing just a new app.

The model

We have a single model called "Application" with reference to itself (circular reference). We also have 2 methods:

"appok_rel" for the relative application status

"failedbecauseof" for the reason for failure

~/operations_env/operations/appstatus/models.py

from django.db import models class Application (models . Model): name = models . CharField(max_length =50 , unique = True ) date_created = models . DateTimeField( auto_now_add = True ) date_modified = models . DateTimeField(auto_now = True ) url = models . CharField(max_length =100 , unique = True ) description = models . TextField(blank = True , max_length =1024 ) dependencies = models . ManyToManyField( 'self' , symmetrical = False , blank = True ) appok = models . BooleanField(default = True ) failreason = models . TextField(blank = True , max_length =1024 ) def __str__ ( self ): return '%s' % self . name def appok_rel ( self ): return not any ( self . dependencies . all() . filter(appok = False )) def failedbecauseof ( self ): return str ( self . dependencies . filter(appok = False ) . first())

We plan to declare on every "Application" object all the dependencies to other "Application" objects, even if they are transitional dependecies. For example if we have App1 > App2 > App3 we need to declare on App1 two dependencies, one to App2 and one to App3 etc. This simplifies everything.

View & Template

The following code is self-described:

~/operations_env/operations/appstatus/views.py

from django.views.generic import ListView from appstatus.models import Application class ApplicationList (ListView): model = Application template_name = 'application_list.html' queryset = Application . objects . all() . order_by( 'appok' )

~/operations_env/operations/appstatus/templates/application_list.html

{ % extends "base.html" % } { % load humanize % } { % block app_list % } { % for application in object_list % } <div class= "col-xs-12 col-sm-4 col-md-3" > { % if not application.appok % } <div class= "panel panel-danger" > <div class= "panel-heading" > <h3 class= "panel-title" ><b> { { application.name } } </b></h3> </div> <div class= "panel-body" > <div> Reason: { { application.failreason } } </div> <div><i> Updated { { application.date_modified | naturaltime } } </i></div> </div> </div> { % elif not application.appok_rel % } <div class= "panel panel-warning" > <div class= "panel-heading" > <h3 class= "panel-title" ><b> { { application.name } } </b></h3> </div> <div class= "panel-body" > <div> Failed due to dependency on: <b> { { application.failedbecauseof } } </b></div> <div><i> Updated { { application.date_modified | naturaltime } } </i></div> </div> </div> { % else % } <div class= "panel panel-success" > <div class= "panel-heading" > <h3 class= "panel-title" ><b> { { application.name } } </b></h3> </div> <div class= "panel-body" > <div> Application ok. </div> <div><i> Updated { { application.date_modified | naturaltime } } </i></div> </div> </div> { % endif % } </div> { % endfor % } { % endblock % }

As you can see we used the bootstrap panels with the specific bootstrap class based on the relative status of each "Application" object.

Now if we want to use the Django "admin" application to insert applications, declare dependencies and play with statuses, we have to register our model to the admin app:

~/operations_env/operations/appstatus/admin.py

from django.contrib import admin from appstatus.models import Application @admin . register(Application) class ApplicationAdmin (admin . ModelAdmin): list_display = ( 'name' , 'appok' ) pass

Demo

For demonstration needs, lets implement the initial graph of the article in the following situation:

App3: Failure

Reason: db problem

The result is the screenshot bellow:

Roadmap

Of course this example represents a quite premature approach on a dependency management system.

In reality every "Application" object is a set of services with different dependencies for each one of them. It is extremely drastic to report a whole application as "failed" because of a dependency on one service.

Furthermore there are some features we would like to add to our project in a future release:

Create service dependencies and group services (end points) as Applications. Report Applications as OK, Failed or Partialy Failed depending on the status of their services. Build a REST api for status updates/reports for interactions with other systems (like monitoring). Build an administrative, simple UI for manual status updates for whole applications (group of services). Give users the ability to filter Applications appearing on screen, by their status. Declare only direct dependencies of services and let the app calculate the status of the entire ecosystem. Give developers the ability to discover dependencies of their apps without changing the status of any service.

Stay tuned.