psaRas @ Flickr.com CC 2.0

Google Stackdriver is not only about logging and monitoring resource usage of your application or cluster of applications, it consists of multiple tools allowing you to find out a lot of information from data, starting from a single pod/application behavior up to your infrastructure as a whole.

Today we are going to work with Stackdriver Tracing, another tool in the Google’s toolbox which gets pretty useful if you want to analyse latency within your microservice based architecture.

Tracing

So, what is tracing and why should you even care? Tracing is basically a process of gathering timing data needed to troubleshoot latency problems in service-oriented architectures. If you want to track what is happening within your Kubernetes cluster the task can become a little bit complicated quickly, but all is not lost, we can do some magic and install distributed tracing solution like Zipkin agent. You can use Zipkin with Stackdriver Tracing solution to analyse and report on our traces from GKE.

Developers with experience building microservices at scale understand the role and importance of distributed tracing: per-process logging and metric monitoring have their place, but neither can reconstruct the elaborate journeys that transactions take as they propagate across a distributed system. Distributed traces are these journeys. (fragment of: OpenTracing blog)

What Stackdriver Tracing can do for you:

it helps gather timing data needed to troubleshoot latency problems in service architectures

collects latency data from your applications and displays it in the Google Cloud Platform Console

allows to generate in-depth latency reports to surface performance degradations

track how requests propagate through your application

automatically identify recent changes to your application’s performance with Analysis Reports

Web Application

For the purpose of this tutorial I have created a simple Play application with tracing enabled. Enabling tracing in a Play-based application is not a single flag to set but the setup is not too difficult either. We are using a Play Zipkin Tracing example from Github and will modify it a bit to make it work on Google Kubernetes Cluster.

To get the whole sources and working example, download GCP Goodies repository and navigate to part-6/play-scala-stackdriver-tracing project.

First, we add a single dependency for zipkin-tracing to our build.sbt

libraryDependencies += "jp.co.bizreach" %% "play-zipkin-tracing-play26" % "2.0.1"

Change configuration for Play application in application.conf

Please note that base-url specified above will not work on your Kubernetes cluster. We would need to modify it later, once the Zipkin agent service has been installed.

The third part of our setup is adding one small and simple Filter to our Play application:

Part of the example application are the endpoints where you can execute some calls within the application itself, some simple ones and some nested, just to see the difference on the Zipkin UI and later on Stackdriver console.

GET /once controllers.HomeController.once

GET /nested controllers.HomeController.nested

Local Setup

Our Play-based application besides the configuration we have already discussed is trivial, run it the standard way with sbt :

sbt run

On a separate terminal window, download and run Zipkin. Detailed instructions and more information about the project can be found here.

curl -sSL https://zipkin.io/quickstart.sh | bash -s

java -jar zipkin.jar

Hit the endpoints and analyse the traces in Zipkin

analysing single call:

analysing nested call:

Zipkin UI is pretty cool and there are a few things to explore but we are not going to use that in our cluster (but it is of course possible if someone wishes to do that). We want to have Zipkin agent running within our cluster and gathering stats from our running services. To test that, we need to start with a cluster creation.

Preparing GKE Cluster

Similarly to the last part, we are going to use gcloud commands to set up the cluster. Log in and set your current project:

NAME=stackdriver-test3

ZONE=us-west2-a gcloud auth login gcloud config set compute/zone $ZONE gcloud config set project softwaremill-playground-2

Create cluster with kubernetes enabled

gcloud container clusters create $NAME --num-nodes=2 --enable-stackdriver-kubernetes

Once the cluster is up and running we can start installing stuff on it. For Zipkin agent we are going to use zipkin-collector provided by Google. Run it with kubectl using the latest available docker image:

kubectl run stackdriver-zipkin \

--image=gcr.io/stackdriver-trace-docker/zipkin-collector:v0.6.0 \

--expose --port=9411

More info about collecting Zipkin traces on GCP can be found here. The full list of docker images is available here.

Release the application

Now it’s time to release our application which will send tracing data to our collector. Before we do that though, we need to change our configuration a bit, to point Zipkin calls to the proper endpoint on our cluster. For that, we will create another config entirely but override the settings only for zipkin:

The config called application-prod.conf will be used by our docker image released to GCR.

Again, similarly to the previous posts in this series, we are going to use sbt release plugin and push the created docker image to Google Container Repository.

sbt clean release

Note the version number uploaded or check it on GCR console itself. When you have the image ready, you can start deploying an instance of it on our cluster:

kubectl create deployment stackdriver-test3-tracing --image=eu.gcr.io/softwaremill-playground-2/play-scala-stackdriver-tracing:1.2

For convenience, we are going to expose our newly created service to the world:

kubectl expose deployment stackdriver-test3-tracing --type=LoadBalancer --port 80 --target-port 9000 kubectl get service -w

With the last command we will wait for the external IP to come up.

Execute multiple times the GET type request on our application available endpoints:

Navigate to Stackdriver -> Monitoring and select Trace . The stats are pretty impressive, you can check the latency of your calls as well as investigate them further to see the other calls nested within your application

Traces for /once endpoint:

Traces for /nested endpoint:

Traces timeline:

With increased complexity for request tracking in microservice-based architecture, distributed tracing can be a very valuable tool in our toolbox. The idea is not new and of course all this could be done with even simple logs but extracting information out of such a solution would be a nightmare. Zipkin tracing coupled with Stackdriver, on the other hand, provides easy to use and very readable information source where you can find bottlenecks and analyse information flow within your infrastructure.

Happy Tracing!