SERIES Kubernetes healthcheck Smart healthchecks with Kubernetes and Spring Boot Actuator Health based traffic control with Kubernetes

I’ve seen quite some projects in the past using various orchestration tools for deploying applications. Probably the most popular one nowadays is Kubernetes (K8S). Even though these tools have such a vast amount of functionality to help applications to run in a scalable and resilient manner, I keep noticing engineers are not utilizing the features they have.

One example I often see is missing or misused healthchecks. The world would be a great place if the orchestration tool could just figure out whether the application is healthy and take the necessary actions if it’s not. Fortunately we are writing 2020 when such tools are already available (they were available long before :)).

Today I’m going to focus on Kubernetes and show you how to set up proper healthchecks to monitor a Spring Boot application that has Actuator set up.

Healthcheck in Kubernetes

Let’s begin with a little bit of introduction into the healthcheck mechanism of Kubernetes.

The probe actions

All the healthchecks are managed by so called “probes” in the K8S ecosystem. Imagine the probe as a process that periodically does something to determine the health of the application. There are 3 actions a probe can do.

Executing a command

Just very briefly covering this. You can execute a command or list of commands. If the return value of the expression is 0, the application is considered healthy. If it’s other than 0, it’s unhealthy and needs action.

Opening a TCP socket

With this type of probe, Kubernetes will attempt to open a TCP socket on a specified port. If the socket is created successfully, the container is considered healthy. In any other case if the socket creation failed, the state is unhealthy.

Executing an HTTP GET

This one is the most sophisticated one. The system will execute an HTTP GET request against a specific endpoint. If the API is returning a status code between 200 – 399, it is considered healthy. If it is any other status code or the request could not be executed, the container is unhealthy. You can also provide some custom headers that needs to be passed with the healtcheck request in case you have a special case.

Liveness probe

There are 3 different types of probes Kubernetes is providing. Each one is suitable for a different use-case.

Liveness probe

Readiness probe

Startup probe

In this article, I’m going to cover only the first one – liveness probe.

The purpose of this type is to detect when an application gets into a state it cannot recover from. Imagine a container running for days/weeks and suddenly it stops serving requests. The only way to resolve the problem is to restart it. Of course I know there must be an underlying issue in the application that needs resolution but for now let’s not go into that direction.

As soon as the liveness probe detects the application is not passing the healthcheck, it will initiate a container restart on the pod. Note, in this case the pod itself will not be restarted but the underlying container that is unhealthy.

There are 5 configuration parameters for a probe:

initialDelaySeconds The number of seconds to wait until the probe is initiated after the container start. Useful if you know your app is taking at least 10 seconds to start then simply set this to 10 so the liveness probe won’t count the startup as failure.

periodSeconds Defines how often the probe performs the healthcheck, in seconds.

timeoutSeconds Determines after how much time the probe times out, in seconds. If you think about executing an HTTP GET request, if the response is not received (the application is slow) in for example 1 second (if that’s the configured timeout). The probe is considering it as a failure.

successThreshold The minimum number of consecutive healthcheck successes before the container is considered healthy after being unhealthy.

failureThreshold The maximum number of consecutive healthcheck failures before the container is considered unhealthy and being restarted.



An example TCP socket based liveness probe configuration looks the following, just to give you a feel:

apiVersion: v1 kind: Pod metadata: name: goproxy labels: app: goproxy spec: containers: - name: goproxy image: k8s.gcr.io/goproxy:0.1 ports: - containerPort: 8080 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 apiVersion : v1 kind : Pod metadata : name : goproxy labels : app : goproxy spec : containers : - name : goproxy image : k8s . gcr . io / goproxy : 0.1 ports : - containerPort : 8080 livenessProbe : tcpSocket : port : 8080 initialDelaySeconds : 15 periodSeconds : 20

Spring Boot Actuator health API

Alright, we’ve covered the basic idea of the probes in Kubernetes, let’s look at Spring Boot Actuator.

Spring Boot Actuator is an extension module for Spring Boot to monitor and manage the application through JMX or HTTP. The marketing slogan is: enhancing the application with production-ready features. There are lots and lots of features available in the module, I’m not going to cover all of those in the article but if you are interested, you can find more info here.

There is one interesting feature though, the health API. There is a single HTTP endpoint you can call /actuator/health . The default behavior is simple, if the application is healthy, it responds with HTTP 200 and the following JSON:

{ status: "UP" } 1 2 3 { status : "UP" }

If the application is unhealthy, it will respond with HTTP 503 and the following JSON:

{ status: "DOWN" } 1 2 3 { status : "DOWN" }

Customizing the health indicator

I don’t want to go into deep details how Actuator works under the hood but there is an interface called HealthIndicator that contributes to the overall system health. There are more than a dozen of them already auto-configured for you. An example is DiskSpaceHealthIndicator that checks for low disk space.

Writing a custom HealthIndicator is quite easy. Simply create a new class that implements the HealthIndicator interface and mark it as a @Component . Spring will pick it up automatically.

For the sake of the testing, I’ll show you a very simple HealthIndicator that can be switched UP / DOWN with a simple HTTP call.

I’m starting off from a generated project on start.spring.io. Gradle one with Actuator and Web dependencies. So, as a first step, let’s create a Spring bean that will hold the health state (healthy/unhealthy):

@Component public class ManualHealthHolder { private AtomicBoolean healthy = new AtomicBoolean(true); public void switchHealth() { healthy.set(!healthy.get()); } public boolean isHealthy() { return healthy.get(); } } 1 2 3 4 5 6 7 8 9 10 11 12 @ Component public class ManualHealthHolder { private AtomicBoolean healthy = new AtomicBoolean ( true ) ; public void switchHealth ( ) { healthy . set ( ! healthy . get ( ) ) ; } public boolean isHealthy ( ) { return healthy . get ( ) ; } }

Nothing special, just a state holder class with a single boolean value that represents the health of the system.

The HealthIndicator is also very simple:

@Component public class ManualHealthIndicator implements HealthIndicator { @Autowired private ManualHealthHolder manualHealthHolder; @Override public Health health() { boolean healthy = manualHealthHolder.isHealthy(); if (healthy) { return Health.up().build(); } return Health.down().build(); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 @ Component public class ManualHealthIndicator implements HealthIndicator { @ Autowired private ManualHealthHolder manualHealthHolder ; @ Override public Health health ( ) { boolean healthy = manualHealthHolder . isHealthy ( ) ; if ( healthy ) { return Health . up ( ) . build ( ) ; } return Health . down ( ) . build ( ) ; } }

There is a single method on the HealthIndicator interface that needs to be implemented HealthIndicator#health . You can do very complicated things like introducing new health states to the system but we’ll go with the existing UP and DOWN states. In this particular example, deciding the health is based on the ManualHealthHolder bean. If it says healthy, the state will be UP . If it says unhealthy, the state will be DOWN .

The next and last step is to create an HTTP endpoint for changing the state.

@RestController public class ManualHealthRestController { @Autowired private ManualHealthHolder manualHealthHolder; @GetMapping("/switch") public ResponseEntity<?> switchHealth() { manualHealthHolder.switchHealth(); return new ResponseEntity<>(HttpStatus.OK); } } 1 2 3 4 5 6 7 8 9 10 11 @ RestController public class ManualHealthRestController { @ Autowired private ManualHealthHolder manualHealthHolder ; @ GetMapping ( "/switch" ) public ResponseEntity < ? > switchHealth ( ) { manualHealthHolder . switchHealth ( ) ; return new ResponseEntity <> ( HttpStatus . OK ) ; } }

Very minimal again. There is a single HTTP GET mapping for switching the statuses: /status .

Testing time, starting up the application with ./gradlew clean build bootRun .

If you cURL localhost:8080/actuator/health , you’ll get the UP response (I’ve used jq here to format the response nicely).

$ curl localhost:8080/actuator/health | jq { "status": "UP" } 1 2 3 4 $ curl localhost : 8080 / actuator / health | jq { "status" : "UP" }

To simulate the downtime of the application, we can call the localhost:8080/switch API. It will switch the healthy flag internally and now querying the /actuator/health endpoint, you’ll get the DOWN state.

$ curl localhost:8080/actuator/health | jq { "status": "DOWN" } 1 2 3 4 $ curl localhost : 8080 / actuator / health | jq { "status" : "DOWN" }

Liveness probe with Actuator

Now that we know the building blocks, let’s go on with integrating Actuator and Kubernetes together. Kubernetes is working with Docker containers so we need to create a container from the Spring application. A simple Dockerfile looks the following:

FROM openjdk:8-jdk-alpine RUN apk add --no-cache --upgrade bash RUN apk add --no-cache --upgrade curl COPY build/libs/actuator-healtcheck-example-0.0.1-SNAPSHOT.jar app.jar ENTRYPOINT ["java","-jar","/app.jar"] 1 2 3 4 5 FROM openjdk : 8 - jdk - alpine RUN apk add -- no - cache -- upgrade bash RUN apk add -- no - cache -- upgrade curl COPY build / libs / actuator - healtcheck - example - 0.0.1 - SNAPSHOT . jar app . jar ENTRYPOINT [ "java" , "-jar" , "/app.jar" ]

Alright, now after executing

$ ./gradlew clean build 1 $ . / gradlew clean build

we can also execute the

$ docker build . -t actuator-healthcheck-example 1 $ docker build . - t actuator - healthcheck - example

command to build the docker image.

I’m using minikube here for testing so let me add a few more steps to properly create the image so we are able to deploy it to the actual Kubernetes cluster.

Before creating the image, you should execute the following command to change your docker context to the Kubernetes cluster.

$ eval $(minikube docker-env) 1 $ eval $ ( minikube docker - env )

Now that you have the docker context set up, execute the command to build the docker image. You can verify with the docker images command whether the image was successfully created.

$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE actuator-healthcheck-example latest 6db0841a7102 5 seconds ago 124MB 1 2 3 $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE actuator - healthcheck - example latest 6db0841a7102 5 seconds ago 124MB

The next point we’re looking at next is the deployment to the cluster. The initial deployment file looks the following:

apiVersion: apps/v1 kind: Deployment metadata: name: actuator-healthcheck-example labels: app: actuator-healthcheck-example spec: replicas: 1 selector: matchLabels: app: actuator-healthcheck-example template: metadata: labels: app: actuator-healthcheck-example spec: containers: - name: actuator-healthcheck-example image: actuator-healthcheck-example:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion : apps / v1 kind : Deployment metadata : name : actuator - healthcheck - example labels : app : actuator - healthcheck - example spec : replicas : 1 selector : matchLabels : app : actuator - healthcheck - example template : metadata : labels : app : actuator - healthcheck - example spec : containers : - name : actuator - healthcheck - example image : actuator - healthcheck - example : latest imagePullPolicy : IfNotPresent ports : - containerPort : 8080

It’s a very basic deployment descriptor, one of the important points is to set the imagePullPolicy to IfNotPresent or Never so Kubernetes will not try to download the docker image.

Adding the liveness probe:

apiVersion: apps/v1 kind: Deployment metadata: name: actuator-healthcheck-example labels: app: actuator-healthcheck-example spec: replicas: 1 selector: matchLabels: app: actuator-healthcheck-example template: metadata: labels: app: actuator-healthcheck-example spec: containers: - name: actuator-healthcheck-example image: actuator-healthcheck-example:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 livenessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 apiVersion : apps / v1 kind : Deployment metadata : name : actuator - healthcheck - example labels : app : actuator - healthcheck - example spec : replicas : 1 selector : matchLabels : app : actuator - healthcheck - example template : metadata : labels : app : actuator - healthcheck - example spec : containers : - name : actuator - healthcheck - example image : actuator - healthcheck - example : latest imagePullPolicy : IfNotPresent ports : - containerPort : 8080 livenessProbe : httpGet : path : / actuator / health port : 8080 initialDelaySeconds : 5 periodSeconds : 10 failureThreshold : 2

So the trick here is to use the httpGet action on the probe and bind it to the /actuator/health endpoint. As I said earlier, in case of the httpGet action, the probe will consider the application healthy when the status code is between 200 and 399 . Guess what, the /actuator/health API is fulfilling that contract. In case the application is reporting a healthy state, it will respond with 200 and 503 when it’s down.

The rest of the configuration is just telling Kubernetes to wait 5 seconds before the probe is initiated. Also, each 10 seconds execute the GET request against the endpoint to check for the health. And consider the container unhealthy if 2 consecutive healthchecks have failed.

Putting it all together

That’s it. Testing time. Now if you’ve read it this far, I assume the docker image is already build so we’re going from there.

The only thing we need to do is to deploy the application. With a little kubectl command you can do it:

$ kubectl apply -f k8s-deployment.yaml 1 $ kubectl apply - f k8s - deployment . yaml

The output should be:

deployment.apps/actuator-healthcheck-example created 1 deployment . apps / actuator - healthcheck - example created

So now if we take a look on the pods we have:

$ kubectl get pods NAME READY STATUS RESTARTS AGE actuator-healthcheck-example-6bf74bd94c-4xmvb 1/1 Running 0 17s 1 2 3 $ kubectl get pods NAME READY STATUS RESTARTS AGE actuator - healthcheck - example - 6bf74bd94c - 4xmvb 1 / 1 Running 0 17s

Everything looks good. The next phase of the testing is to flip the healthy flag in the application so we can see that the liveness probe is controlling to have the container in a healthy state.

Accessing the switch API for testing

There are 2 options for this. One is to open a terminal inside the container so we can locally trigger the /switch endpoint. The other one is to proxy the pod traffic to the local machine.

To get a shell inside the container, execute the following command (of course change the pod name to yours):

$ kubectl exec -it actuator-healthcheck-example-6bf74bd94c-t5h2f -- bash 1 $ kubectl exec - it actuator - healthcheck - example - 6bf74bd94c - t5h2f -- bash

From then on, executing

bash-4.4# curl localhost:8080/switch 1 bash - 4.4 # curl localhost:8080/switch

will switch the flag. If you query the /actuator/health API the same way, it’s going to say DOWN .

The other option to trigger the /switch API is to forward requests from your local machine directly to the pod with the use of kubectl . However it needs some preparation so the pod is accessible. We need to expose the pod’s port as a Service . To do that, let’s extend the descriptor we created:

apiVersion: v1 kind: Service metadata: name: actuator-healthcheck-example-svc labels: app: actuator-healthcheck-example spec: ports: - port: 8080 targetPort: 8080 selector: app: actuator-healthcheck-example 1 2 3 4 5 6 7 8 9 10 11 12 apiVersion : v1 kind : Service metadata : name : actuator - healthcheck - example - svc labels : app : actuator - healthcheck - example spec : ports : - port : 8080 targetPort : 8080 selector : app : actuator - healthcheck - example

So the full k8s-deployment.yaml file looks the following:

apiVersion: apps/v1 kind: Deployment metadata: name: actuator-healthcheck-example labels: app: actuator-healthcheck-example spec: replicas: 1 selector: matchLabels: app: actuator-healthcheck-example template: metadata: labels: app: actuator-healthcheck-example spec: containers: - name: actuator-healthcheck-example image: actuator-healthcheck-example:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 livenessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 2 --- apiVersion: v1 kind: Service metadata: name: actuator-healthcheck-example-svc labels: app: actuator-healthcheck-example spec: ports: - port: 8080 targetPort: 8080 selector: app: actuator-healthcheck-example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 apiVersion : apps / v1 kind : Deployment metadata : name : actuator - healthcheck - example labels : app : actuator - healthcheck - example spec : replicas : 1 selector : matchLabels : app : actuator - healthcheck - example template : metadata : labels : app : actuator - healthcheck - example spec : containers : - name : actuator - healthcheck - example image : actuator - healthcheck - example : latest imagePullPolicy : IfNotPresent ports : - containerPort : 8080 livenessProbe : httpGet : path : / actuator / health port : 8080 initialDelaySeconds : 5 periodSeconds : 10 failureThreshold : 2 -- - apiVersion : v1 kind : Service metadata : name : actuator - healthcheck - example - svc labels : app : actuator - healthcheck - example spec : ports : - port : 8080 targetPort : 8080 selector : app : actuator - healthcheck - example

Now that everything is in place, redeploy the stack with

$ kubectl apply -f k8s-deployment.yaml 1 $ kubectl apply - f k8s - deployment . yaml

To access the API, only a port-forward is needed:

$ kubectl port-forward actuator-healthcheck-example-6bf74bd94c-t5h2f 9876:8080 1 $ kubectl port - forward actuator - healthcheck - example - 6bf74bd94c - t5h2f 9876 : 8080

The command binds the local 9876 port to the pod’s 8080 port. So from now on, you can access the API from your local machine through localhost:9876 .

$ curl localhost:9876/switch 1 $ curl localhost : 9876 / switch

Observing the liveness probe

The application is deployed. We can access the API. Everything is ready to see the liveness probe in action. First of all, let’s verify that the pod is alive and the /actuator/health API is returning the UP status.

$ kubectl get pods NAME READY STATUS RESTARTS AGE actuator-healthcheck-example-6bf74bd94c-l6b9j 1/1 Running 0 15m 1 2 3 $ kubectl get pods NAME READY STATUS RESTARTS AGE actuator - healthcheck - example - 6bf74bd94c - l6b9j 1 / 1 Running 0 15m

$ curl localhost:9876/actuator/health {"status":"UP"} 1 2 $ curl localhost : 9876 / actuator / health { "status" : "UP" }

Looks good so far. Switching the health with /switch .

$ curl localhost:9876/switch 1 $ curl localhost : 9876 / switch

$ kubectl get pods NAME READY STATUS RESTARTS AGE actuator-healthcheck-example-6bf74bd94c-l6b9j 1/1 Running 0 16m 1 2 3 $ kubectl get pods NAME READY STATUS RESTARTS AGE actuator - healthcheck - example - 6bf74bd94c - l6b9j 1 / 1 Running 0 16m

$ curl localhost:9876/actuator/health {"status":"DOWN"} 1 2 $ curl localhost : 9876 / actuator / health { "status" : "DOWN" }

From the pod perspective, everything looks good however Actuator is saying the service is DOWN . Observing the pod events will clearly indicate that there was in fact a container restart because of it.

$ kubectl describe pod actuator-healthcheck-example-7dcdd4dd48-97qxm 1 $ kubectl describe pod actuator - healthcheck - example - 7dcdd4dd48 - 97qxm

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 65s default-scheduler Successfully assigned default/actuator-healthcheck-example-7dcdd4dd48-97qxm to minikube Normal Pulled 24s (x2 over 64s) kubelet, minikube Container image "actuator-healthcheck-example:latest" already present on machine Warning Unhealthy 24s (x2 over 34s) kubelet, minikube Liveness probe failed: HTTP probe failed with statuscode: 503 Normal Killing 24s kubelet, minikube Container actuator-healthcheck-example failed liveness probe, will be restarted Normal Created 23s (x2 over 64s) kubelet, minikube Created container actuator-healthcheck-example Normal Started 23s (x2 over 64s) kubelet, minikube Started container actuator-healthcheck-example 1 2 3 4 5 6 7 8 9 Events : Type Reason Age From Message -- -- -- -- -- -- -- -- -- -- -- -- - Normal Scheduled 65s default - scheduler Successfully assigned default / actuator - healthcheck - example - 7dcdd4dd48 - 97qxm to minikube Normal Pulled 24s ( x2 over 64s ) kubelet , minikube Container image "actuator-healthcheck-example:latest" already present on machine Warning Unhealthy 24s ( x2 over 34s ) kubelet , minikube Liveness probe failed : HTTP probe failed with statuscode : 503 Normal Killing 24s kubelet , minikube Container actuator - healthcheck - example failed liveness probe , will be restarted Normal Created 23s ( x2 over 64s ) kubelet , minikube Created container actuator - healthcheck - example Normal Started 23s ( x2 over 64s ) kubelet , minikube Started container actuator - healthcheck - example

You can see the message Liveness probe failed: HTTP probe failed with statuscode: 503 . And it happened 2 times so the container was considered unhealthy and have been restarted.

When the container restart is done, you can query the Actuator health and it will respond with UP status as the container has been restarted.

$ curl localhost:9876/actuator/health {"status":"UP"} 1 2 $ curl localhost : 9876 / actuator / health { "status" : "UP" }

Conclusion

I hope you see how easy it is to set up a proper healthcheck – at least liveness – with Kubernetes and Spring Boot. It’s definitely something I recommend doing to create a more resilient system and react on problems automatically.

The code can be found on GitHub. If you liked the article, give it a thumbs up and share it. If you are interested in more, make sure you follow me on Twitter.