Cassandra is one of my favorite databases when it comes to needing something fast, scalable, and distributed. When it came time to deploy Cassandra for our micro-services, I almost went the old route of creating virtual machines or provisioning bare metal servers. Our micro-services were already deployed in IBM Cloud Kubernetes clusters, so ideally the Cassandra clusters would be too. It turns out deploying Cassandra was super easy, especially with dynamically provisioned block storage (more on that later).

This article assumes you already have an IBM Cloud Kubernetes cluster created in IBM Cloud. If you haven’t already installed and configured kubectl, go here to get that sorted out. You’ll want to be able to run kubectl commands against the cluster.

For a more advanced deployment, take a look at my 2 part series on Making Apache Cassandra on IBM Cloud Kubernetes Production Ready.

Create a Headless Service

A normal Service in Kubernetes allows you load balance between the pods behind a single service IP, and creates a DNS entry for you. A headless service in Kubernetes is a service definition that doesn’t have a service IP. This can be useful if you don’t need load balancing for a service but would rather have just a set of A records for the service, pointing to each individual pod. Since Cassandra doesn’t really need load balancing behind an IP (because your Cassandra client connects to the nodes directly), there’s no need to create a real service. With a headless service, you can point your Cassandra client to cassandra.data.svc.cluster.local.

apiVersion: v1

kind: Service

metadata:

labels:

app: cassandra

name: cassandra

namespace: data

spec:

clusterIP: None

ports:

- port: 9042

selector:

app: cassandra

This effectively creates a DNS entry for cassandra.data.svc.cluster.local where an A record will exist for each Kubernetes pod with app: cassandra in its label. This makes configuring the Cassandra client easy, just point it at cassandra.data.svc.cluster.local.

Create the StatefulSet

A StatefulSet is like a Deployment in Kubernetes, but rather than deleting and recreating Pods, it uses the same Pod. So rather than get a dynamically generated Pod name like you do in a Deployment, Kubernetes will assign an index (starting at zero) for each Pod and use that index in the Pod name. This is one of the reasons we’re using a StatefulSet instead of a Deployment. We want to tell our Cassandra clients that the nodes are located at cassandra-0, cassandra-1, and cassandra-2. Having predictable Pod names makes this much easier. Here’s the StatefulSet definition as a whole.

apiVersion: apps/v1

kind: StatefulSet

metadata:

name: cassandra

labels:

app: cassandra

spec:

serviceName: cassandra

replicas: 3

selector:

matchLabels:

app: cassandra

template:

metadata:

labels:

app: cassandra

spec:

terminationGracePeriodSeconds: 1800

containers:

- name: cassandra

image: cassandra:3.11

imagePullPolicy: IfNotPresent

ports:

- containerPort: 7000

name: intra-node

- containerPort: 7001

name: tls-intra-node

- containerPort: 7199

name: jmx

- containerPort: 9042

name: cql

lifecycle:

preStop:

exec:

command:

- /bin/sh

- -c

- nodetool drain

env:

- name: CASSANDRA_SEEDS

value: cassandra-0.cassandra.data.svc.cluster.local

- name: MAX_HEAP_SIZE

value: 1024M

- name: HEAP_NEWSIZE

value: 100M

- name: CASSANDRA_CLUSTER_NAME

value: "Cassandra"

- name: CASSANDRA_DC

value: "DAL10"

- name: CASSANDRA_RACK

value: "Rack-1"

- name: CASSANDRA_ENDPOINT_SNITCH

value: GossipingPropertyFileSnitch

volumeMounts:

- name: cassandra-data

mountPath: /var/lib/cassandra

volumeClaimTemplates:

- metadata:

name: cassandra-data

labels:

billingType: "hourly"

spec:

storageClassName: "ibmc-block-silver"

accessModes: ["ReadWriteOnce"]

resources:

requests:

storage: 50Gi

Most of this will look like a Deployment definition, and you can figure out what settings you need for your setup. The more interesting part is the volumeClaimTemplates section. This section specifies how volumes are created for each of the Pods that get created. Normally, you’d have to allocate storage from somewhere, make that storage available as Persistent Volumes (PV) in Kubernetes, then claim part of that PV by creating a Persistent Volume Claim (PVC), THEN you can reference that PVC as a volume in your Deployment. With a StatefulSet, you can specify what kind of storage you need and how much (for each Pod created in the StatefulSet) storage. It’ll then create a PVC for each created Pod, where it will be made available as a volume to be referenced in the container spec. So in the above definition, 3 PVCs will be created; PVC cassandra-data-cassandra-0 will be claimed by Pod cassandra-0; PVC cassandra-data-cassandra-1 will be claimed by Pod cassandra-1; and so on.

So where does the storage for the PVC come from? Normally, you would need to allocate some storage and register that storage as a Persistent Volume within Kubernetes. With IBM Cloud, you can enable a plugin that will be invoked to allocate the block storage for you. Notice in the definition above we specified storageClassName: “ibmc-block-silver”. This tells Kubernetes to invoke the plugin that will go provision block storage for you (of the specified IOPs) on the fly and add it to Kubernetes as a PV. So in order for the above to work, we need to enable the plugin.

Installing the IBM Cloud Block Storage Plug-in

If you haven’t used Helm yet on your IBM Cloud Kubernetes cluster, follow these instructions before proceeding.

Okay, so you have Helm installed? Good, now run the following commands:

$ helm install ibm/ibmcloud-block-storage-plugin

$ kubectl get storageclasses | grep block

This installs the plugin and shows you the available storage classes available. You can find more detailed information about each of the storage classes here.

Putting it All Together

Important Note

Be careful about what storage class you use. You’ll notice if you list the storage classes that there are two types:

$ kubectl get storageclasses | grep block

ibmc-block-bronze ibm.io/ibmc-block 26d

ibmc-block-custom ibm.io/ibmc-block 26d

ibmc-block-gold ibm.io/ibmc-block 26d

ibmc-block-retain-bronze ibm.io/ibmc-block 26d

ibmc-block-retain-custom ibm.io/ibmc-block 26d

ibmc-block-retain-gold ibm.io/ibmc-block 26d

ibmc-block-retain-silver ibm.io/ibmc-block 26d

ibmc-block-silver ibm.io/ibmc-block 26d

Each flavor (bronze, silver, gold, etc.) has a ‘retain’ version of it. The ‘retain’ ones mean that when the PVC is deleted, the PV and physical storage device provisioned in your IBM Cloud Infrastructure account is still around for you to mount and reuse. This is important because if you don’t use the ‘retain’ classes, and the PVC is deleted, the underlying PV and physical storage device will automatically be deleted, along with your data!

If you’re looking to take this to the next level, read about how to make Casssandra production ready on Kubernetes here.

Follow us on Twitter 🐦 and Facebook 👥 and join our Facebook Group 💬.

To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇