Navigator is a Kubernetes extension for managing distributed databases. In this post we’ll tell you about all the improvements we’ve made since we unveiled it last year, including: experimental support for Apache Cassandra clusters, improved support for Elasticsearch clusters, and a Helm chart for easy installation! We’ll also give you an overview of the Navigator roadmap for 2018.

1. Experimental Support for Apache Cassandra

The Apache Cassandra database is a leading NoSQL database designed for scalability and high availability without compromising performance. It’s an ideal candidate for running on a scalable, highly available, distributed platform such as Kubernetes. For that reason it was the database chosen to showcase the potential of Kubernetes StatefulSets (or PetSet as it was known initially). Since then a Cassandra example has been added to the official Kubernetes documentation but it is only an example; it is not sufficient if you want to run a Cassandra cluster on Kubernetes in production.

Enter Navigator! Navigator now has experimental support the Apache Cassandra database. Navigator codifies and automates many of the processes that would previously have been performed by a database administrator or SRE (Site Reliability Engineer) (the operator). For example it will bootstrap a Cassandra cluster and create a load balancer for distributing CQL connections to all the cluster nodes. It performs regular node health checks which means that if a node becomes unresponsive, that node will be automatically bypassed by the loadbalancer and eventually the node will be restarted. Navigator can also scale up your Cassandra cluster, and it’s as simple as using Helm to increment the replicas field on the CassandraCluster API resource. See the demo below.

This is what’s currently possible and our goal is to make it simple enough that any developer can request a secure, highly available, scalable Cassandra database and Navigator will take care of the rest, ensuring that:

there are adequate database nodes running to service the database clients,

that failed or unresponsive database nodes are restarted.

Database nodes are distributed across zones / data center.

Database seed nodes are established in each data center.

Database nodes are cleanly removed from the cluster before being removed or upgaded.

A quorum of database nodes is maintained.

Database state is backed.

Database state can be recovered in the event of a catastrophic failure.

Now for a demo.

Demo

In this short screen cast we demonstrate how to install Navigator and then install a Cassandra cluster, using Helm. We also show how to examine the status and logs of Navigator components and of the Cassandra database nodes. We demonstrate how to scale up the Cassandra database and then connect to the database using cqlsh to create a key space, a table and insert some data.

The commands used in the demo are available in the Navigator repository.

2. Improved Support for Elasticsearch

Elasticsearch was the first database supported by Navigator and we’ve made dozens of improvements in the last six months, working closely with customers and the community.

Here are some examples:

Pilot Resources

The Navigator Pilot is a process which is injected into the container of the target database and becomes the entry point for the container. So instead of starting the database process immediately, Kubernetes will actually start a /pilot process which first connects to the Navigator API to discover the desired database configuration. It then configures and starts up an Elasticsearch sub-process.

We’ve introduced a new Pilot API resource. This is a place where the controller can publish the desired configuration (spec) of each database node. And it’s where a Pilot process can publish the actual state of its database sub-process (status).

The Navigator controller creates a Pilot resource for every pod under its control.

Sub-process Management

We’ve made many improvements to the Pilot to ensure that:

It cleanly starts the database sub-process.

It catches TERM signals and allows its database sub-process to be cleanly stopped.

It can reliably detect when its database sub-process has stopped.

Health Checks

And the Pilot now has a new REST endpoint ( /healthz ) through which it responds to Kubernetes Readiness and Liveness probes. While the database is running, the Pilot queries the Elasticsearch API to gather the current state of the database and publishes it via the /healthz endpoint.

Leader Election

Pilots now have a leader election mechanism. This allows a single “leader” Pilot process to perform cluster-wide administrative functions.

Scale Down Safely

This is all groundwork which will allow us to safely scale down Elasticsearch clusters.

3. Full Support for RBAC

Navigator now runs in (and is tested in) Kubernetes environments where RBAC is enabled. The Navigator API server, the Navigator controller, and the Pilots all run with the least necessary privilege.

And if you prefer, you can run a separate Navigator controller for each database namespace. We’ve implemented a new Filtered Informer mechanism so that, in this mode, the Navigator controller will only be able to interact with API resources in a single namespace.

4. A Navigator API Server

Navigator now has its own API server which is aggregated into the Kubernetes API.

The reason for this change was to overcome the limitations of CRDs (Custom Resource Definitions). Most importantly it allows us to assign versions to our Navigator API resource types, as they evolve. And it allows seamless conversion between different versions of the resources.

And while the Navigator architecture has become somewhat more complex, the installation of Navigator has been vastly simplified by the introduction of a Navigator Helm chart (see below).

5. A Helm Chart for Easy Installation

Navigator now has a tried and tested Helm chart.

This allows you to install and configure the Navigator API server, and the Navigator Controller, and all the supporting Kubernetes resources, in a single fool-proof step. We use that same Helm chart to install Navigator before running our end-to-end tests, so we (and you) can be confident that this installation mechanism is reliable. The Helm chart is quite configurable and allows you to tweak the logging level of the Navigator components for example.

6. End-to-end Tests

We have also been working on a suite of end-to-end tests for Navigator.

These verify that the Navigator API server and the Navigator controller can be installed, using Helm, as documented. They verify that the Navigator API server is successfully aggregated in all the versions of Kubernetes we support. That the Navigator controller starts an Elasticsearch or Cassandra database cluster matching the desired configuration in the API server. And most importantly that the databases are actually usable after they have been launched.

7. Test Infrastructure

We now use Kubernetes test infrastructure to run unit tests and end-to-end tests.

We initially tried running tests against Minikube on Travis CI and it totally worked! But we soon encountered limitations. We needed to use the minikube start --bootstrapper=kubeadm option, in order to properly set up the Navigator API server aggregation; but that doesn’t work on the Ubuntu 14.04 operating system provided by Travis-CI.

Additionally some of our end-to-end tests were attempting to launch (and scale-up) multi-node Elasticsearch and Cassandra databases. A single Travis-CI virtual machine just doesn’t cut the mustard.

So we’ve installed our own test infrastructure on Google Cloud and installed and tweaked the Kubernetes Test-Infra for our own purposes and it works great!

We’ll write more about this in a future blog post but for now take a look at: ‘Prow: Testing the way to Kubernetes Next’ and ‘Making Use of Kubernetes Test Infra Tools’, which give a great introduction to the Kubernetes Test-Infra tools.

Stay tuned

The Navigator API is subject to change and the project is still in an alpha state so we ask that you do not use it in production, yet! But we’re working flat out to add more features and to make Navigator as robust as possible. Here are some of the features that we’re working on:

Scale Down: Safely scale down all supported databases

Database Upgrade: Rolling upgrades of all supported databases

Backup and Restore: Scheduled database backup and automated restore

So stay tuned! And join us if you can at QCon London, in March 2018, where we plan to announce and demonstrate a new Navigator release. Hope to see you there!