Why are custom metrics important?

While there are volumes of discourse on the topic, it can't be overstated how important custom application metrics are. Unlike the core service metrics you'll want to collect for your Django application (application and web server stats, key DB and cache operational metrics), custom metrics are data points unique to your domain with bounds and thresholds known only by you. In other words, it's the fun stuff.

How might these metrics be useful? Consider:

You run an ecomm website and track average order size. Suddenly that order size isn't so average. With solid application metrics and monitoring you can catch the bug before it breaks the bank.

You're writing a scraper that pulls the most recent articles from a news website every hour. Suddenly the most recent articles aren't so recent. Solid metrics and monitoring will reveal the breakage earlier.

I 👏 Think 👏 You 👏 Get 👏 The 👏 Point 👏

Setting up the Django Application

Besides the obvious dependencies (looking at you pip install Django ), we'll need some additional packages for our pet project. Go ahead and pip install django-prometheus-client . This will give us a Python Prometheus client to play with, as well as some helpful Django hooks including middleware and a nifty DB wrapper. Next we'll run the Django management commands to start a project and app, update our settings to utilize the Prometheus client, and add Prometheus URLs to our URL conf.

Start a new project and app

For the purposes of this post, and in fitting with our agency brand, we'll be building a dog walking service. Mind you, it won't actually do much, but should suffice to serve as a teaching tool. Go ahead and execute:

django-admin.py startproject demo python manage.py startapp walker

#settings.py INSTALLED_APPS = [ ... 'walker', ... ]

Now, we'll add some basic models and views. For the sake of brevity, I'll only include implementation for the portions we'll be instrumenting, but if you'd like to follow along in full just grab the demo app source.

# walker/models.py from django.db import models from django_prometheus.models import ExportModelOperationsMixin class Walker(ExportModelOperationsMixin('walker'), models.Model): name = models.CharField(max_length=127) email = models.CharField(max_length=127) def __str__(self): return f'{self.name} // {self.email} ({self.id})' class Dog(ExportModelOperationsMixin('dog'), models.Model): SIZE_XS = 'xs' SIZE_SM = 'sm' SIZE_MD = 'md' SIZE_LG = 'lg' SIZE_XL = 'xl' DOG_SIZES = ( (SIZE_XS, 'xsmall'), (SIZE_SM, 'small'), (SIZE_MD, 'medium'), (SIZE_LG, 'large'), (SIZE_XL, 'xlarge'), ) size = models.CharField(max_length=31, choices=DOG_SIZES, default=SIZE_MD) name = models.CharField(max_length=127) age = models.IntegerField() def __str__(self): return f'{self.name} // {self.age}y ({self.size})' class Walk(ExportModelOperationsMixin('walk'), models.Model): dog = models.ForeignKey(Dog, related_name='walks', on_delete=models.CASCADE) walker = models.ForeignKey(Walker, related_name='walks', on_delete=models.CASCADE) distance = models.IntegerField(default=0, help_text='walk distance (in meters)') start_time = models.DateTimeField(null=True, blank=True, default=None) end_time = models.DateTimeField(null=True, blank=True, default=None) @property def is_complete(self): return self.end_time is not None @classmethod def in_progress(cls): """ get the list of `Walk`s currently in progress """ return cls.objects.filter(start_time__isnull=False, end_time__isnull=True) def __str__(self): return f'{self.walker.name} // {self.dog.name} @ {self.start_time} ({self.id})'

# walker/views.py from django.shortcuts import render, redirect from django.views import View from django.core.exceptions import ObjectDoesNotExist from django.http import HttpResponseNotFound, JsonResponse, HttpResponseBadRequest, Http404 from django.urls import reverse from django.utils.timezone import now from walker import models, forms class WalkDetailsView(View): def get_walk(self, walk_id=None): try: return models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: raise Http404(f'no walk with ID {walk_id} in progress') class CheckWalkStatusView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return JsonResponse({'complete': walk.is_complete}) class CompleteWalkView(WalkDetailsView): def get(self, request, walk_id=None, **kwargs): walk = self.get_walk(walk_id=walk_id) return render(request, 'index.html', context={'form': forms.CompleteWalkForm(instance=walk)}) def post(self, request, walk_id=None, **kwargs): try: walk = models.Walk.objects.get(id=walk_id) except ObjectDoesNotExist: return HttpResponseNotFound(content=f'no walk with ID {walk_id} found') if walk.is_complete: return HttpResponseBadRequest(content=f'walk {walk.id} is already complete') form = forms.CompleteWalkForm(data=request.POST, instance=walk) if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') class StartWalkView(View): def get(self, request): return render(request, 'index.html', context={'form': forms.StartWalkForm()}) def post(self, request): form = forms.StartWalkForm(data=request.POST) if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')

Update app settings and add Prometheus urls

Now that we have a Django project and app setup, it's time to add the required settings for django-prometheus. In settings.py , apply the following:

INSTALLED_APPS = [ ... 'django_prometheus', ... ] MIDDLEWARE = [ 'django_prometheus.middleware.PrometheusBeforeMiddleware', .... 'django_prometheus.middleware.PrometheusAfterMiddleware', ] # we're assuming a Postgres DB here because, well, that's just the right choice :) DATABASES = { 'default': { 'ENGINE': 'django_prometheus.db.backends.postgresql', 'NAME': os.getenv('DB_NAME'), 'USER': os.getenv('DB_USER'), 'PASSWORD': os.getenv('DB_PASSWORD'), 'HOST': os.getenv('DB_HOST'), 'PORT': os.getenv('DB_PORT', '5432'), }, }

and add the following to your urls.py

urlpatterns = [ ... path('', include('django_prometheus.urls')), ]

At this point, we have a basic application configured and primed for instrumentation.

Instrument the code with Prometheus metrics

As a result of out of box functionality provided by django-prometheus , we immediately have basic model operations, like insertions and deletions, tracked. You can see this in action at the /metrics endpoint where you'll have something like:

default metrics provided by django-prometheus

Let's make this a bit more interesting.

Start by adding a walker/metrics.py where we'll define some basic metrics to track.

# walker/metrics.py from prometheus_client import Counter, Histogram walks_started = Counter('walks_started', 'number of walks started') walks_completed = Counter('walks_completed', 'number of walks completed') invalid_walks = Counter('invalid_walks', 'number of walks attempted to be started, but invalid') walk_distance = Histogram('walk_distance', 'distribution of distance walked', buckets=[0, 50, 200, 400, 800, 1600, 3200])

Painless, eh? The Prometheus documentation does a good job explaining what each of the metric types should be used for, but in short we are using counters to represent metrics that are strictly increasing over time and histograms to track metrics that contain a distribution of values we want tracked. Let's start instrumenting our application code.

# walker/views.py ... from walker import metrics ... class CompleteWalkView(WalkDetailsView): ... def post(self, request, walk_id=None, **kwargs): ... if form.is_valid(): updated_walk = form.save(commit=False) updated_walk.end_time = now() updated_walk.save() metrics.walks_completed.inc() metrics.walk_distance.observe(updated_walk.distance) return redirect(f'{reverse("walk_start")}?walk={walk.id}') return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}') ... class StartWalkView(View): ... def post(self, request): if form.is_valid(): walk = form.save(commit=False) walk.start_time = now() walk.save() metrics.walks_started.inc() return redirect(f'{reverse("walk_start")}?walk={walk.id}') metrics.invalid_walks.inc() return HttpResponseBadRequest(content=f'form validation failed with errors {form.errors}')

If we make a few sample requests, we'll be able to see the new metrics flowing through the endpoint.

peep the walk distance and created walks metrics

our metrics are now available for graphing in prometheus

By this point we've defined our custom metrics in code, instrumented the application to track these metrics, and verified that the metrics are updated and available at the /metrics endpoint. Let's move on to deploying our instrumented application to a Kubernetes cluster.

Deploying the application with Helm

I'll keep this part brief and limited only to configuration relevant to metric tracking and exporting, but the full Helm chart with complete deployment and service configuration may be found in the demo app. As a jumping off point, here's some snippets of the deployment and configmap highlighting portions with significance towards metric exporting.

# helm/demo/templates/nginx-conf-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: {{ include "demo.fullname" . }}-nginx-conf ... data: demo.conf: | upstream app_server { server 127.0.0.1:8000 fail_timeout=0; } server { listen 80; client_max_body_size 4G; # set the correct host(s) for your site server_name{{ range .Values.ingress.hosts }} {{ . }}{{- end }}; keepalive_timeout 5; root /code/static; location / { # checks for static file, if not found proxy to app try_files $uri @proxy_to_app; } location ^~ /metrics { auth_basic "Metrics"; auth_basic_user_file /etc/nginx/secrets/.htpasswd; proxy_pass http://app_server; } location @proxy_to_app { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $http_host; # we don't want nginx trying to do something clever with # redirects, we set the Host: header above already. proxy_redirect off; proxy_pass http://app_server; } }

# helm/demo/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment ... spec: metadata: labels: app.kubernetes.io/name: {{ include "demo.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app: {{ include "demo.name" . }} volumes: ... - name: nginx-conf configMap: name: {{ include "demo.fullname" . }}-nginx-conf - name: prometheus-auth secret: secretName: prometheus-basic-auth ... containers: - name: {{ .Chart.Name }}-nginx image: "{{ .Values.nginx.image.repository }}:{{ .Values.nginx.image.tag }}" imagePullPolicy: IfNotPresent volumeMounts: ... - name: nginx-conf mountPath: /etc/nginx/conf.d/ - name: prometheus-auth mountPath: /etc/nginx/secrets/.htpasswd ports: - name: http containerPort: 80 protocol: TCP - name: {{ .Chart.Name }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" imagePullPolicy: {{ .Values.image.pullPolicy }} command: ["gunicorn", "--worker-class", "gthread", "--threads", "3", "--bind", "0.0.0.0:8000", "demo.wsgi:application"] env: {{ include "demo.env" . | nindent 12 }} ports: - name: gunicorn containerPort: 8000 protocol: TCP ...

Nothing too magick-y here, just your good ol' YAML blob. There are only two important points I'd like to draw attention to:

We put the /metrics endpoint behind basic auth via an nginx reverse proxy with an auth_basic directive set for the location block. While you'll probably want to deploy gunicorn behind a reverse proxy anyway, we get the added benefit of protecting our application metrics in doing so. We use multi-threaded gunicorn as opposed to multiple workers. While you can enable multiprocess mode for the Prometheus client, it is a more complex setup in a Kubernetes environment. Why is this important? Well, the danger in running multiple workers in a single pod is that each worker will report its own set of metric values on scrape. However, since the service is grouped to the pod level in the Prometheus Kubernetes SD scrape config, these (potentially) jumping values will be incorrectly classified as counter resets leading to inconsistent measurements. You don't necessarily need to follow all the above, but the big Tl:Dr here is: If you don't know better, you should probably start in either a single thread + single worker gunicorn environment, or else a single worker + multi-threaded one.

Deploying Prometheus with Helm

With the help of Helm, deploying Prometheus to the cluster is a 🍰. Without further ado:

helm upgrade --install prometheus stable/prometheus

After a few minutes, you should be able to port-forward into the Prometheus pod (the default container port is 9090)

Configuring a Prometheus scrape target for the application

The Prometheus Helm chart has a ton of customization options, but for our purposes we just need to set the extraScrapeConfigs . To do so, start by creating a values.yaml . As in most of the post, you can skip this section and just use the demo app as a prescriptive guide if you'd like. In that file, you'll want:

extraScrapeConfigs: | - job_name: demo scrape_interval: 5s metrics_path: /metrics basic_auth: username: prometheus password: prometheus tls_config: insecure_skip_verify: true kubernetes_sd_configs: - role: endpoints namespaces: names: - default relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] regex: demo action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: http action: keep - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_pod_name] target_label: pod - source_labels: [__meta_kubernetes_service_name] target_label: service - source_labels: [__meta_kubernetes_service_name] target_label: job - target_label: endpoint replacement: http

After creating the file, you should be able to apply the update to your prometheus deployment from the previous step via

helm upgrade --install prometheus -f values.yaml

To verify everything worked properly, open up your browser to http://localhost:9090/targets (assuming you've already port-forward ed into the running prometheus server Pod). If you see the demo app there in the target list, then that's a big 👍.

Try it yourself

I'm going to make a bold statement here: Capturing custom application metrics and setting up the corresponding reporting and monitoring is one of the most immediately gratifying tasks in software engineering. Luckily for us, it's actually really simple to integrate Prometheus metrics into your Django application, as I hope this post has shown. If you'd like to start instrumenting your own app, feel free to rip configuration and ideas from the full sample application, or just fork the repo and hack away. Happy trails 🐶