Kubernetes is an popular, open-source platform for managing containerized workloads and services. With the introduction of Custom Resource Definitions (CRDs) in version 1.7, the platform also became extensible: admins can define their own types of resources, and which provide a more domain-specific schema than deployments or services, for example. But rest assured, this does not necessarily mean more lines of YAML!

CRDs are only a mean to specify a configuration, though. The cluster still needs controllers to monitor its state and reconcile the resource to match with the configuration. This is where Operators come into play.

Operators are controllers working in association with custom resources to perform tasks that well… “human operators” have to take care of. Think about deploying a database cluster with a manageable number of nodes, managing upgrades and even performing backups. A custom resource would specify the version of the database, the number of nodes to deploy and the frequency of the backups as well as their target storage, and the controller would implement all the business logic needed to perform these operations. This is what the etcd-operator does, for example.

Yet, writing a CRD schema and its accompanying controller can be a daunting task. Thankfully, the Operator SDK is here to help. Let’s see how it can be used to build a simple operator that scales up and down pods running a particular Docker image. In other words, let’s code a basic Kubernetes ReplicaSet!

Please note that at the time of writing this article, the Operator SDK was in version v0.4.0, so things may have changed afterwards. Also, in this article, we’ll build a Go Operator, but keep in mind that operators can also be developed as Ansible playbooks or Helm charts.

Primary resources and secondary resources

Before diving into the Operator SDK and the source code, let’s take a step back and discuss about operators a little more. As we’ve seen before, operators are the CRDs associated with controllers which observe and act upon changes in the configuration or changes in the state of the cluster. But there are actually two kinds of resources that need to be monitored by the controller. The primary resources and the secondary resources.

In the case of a ReplicaSet, the primary resource is the ReplicaSet itself: it specifies the Docker image to run and the number of pods in the set. The secondary resources are the pods themselves. Then, when a change occurs in the ReplicaSet spec itself (e.g., the version of the image was changed or the number of pods was updated) or when a change occurs in the pods (e.g.: a pod was deleted), then the controller is notified and acts in consequence to reconcile the state of the cluster by rolling out a new version, or just scaling up or down the pods.

In the case of a DaemonSet, the primary resources is the DaemonSet itself, while the secondary resources are once again the pods, but also the nodes of the cluster. The difference here is that the controller also monitors the nodes of the cluster to add or remove pods as the cluster grows or shrinks.

The key point here is that it is critical to clearly identify the primary and secondary resources of the operator, as they will determine the behavior of its controllers.

Building a PodSet operator

Enough with the theory, it’s now time to dive into the code and build a basic ReplicaSet-ish operator called PodSet and which will take care of scaling up and down pods which run execute a sleep 3600 in a busybox container. Nothing fancy here, but the focus of this article is building an operator, not diving too deep in the specificities of setting up a cluster of this or that application.

Installing the Operator SDK

The first thing to do is download and install the SDK:



$ cd $GOPATH/src/github.com/operator-framework

$ git clone

$ cd operator-sdk

$ git checkout v0.4.0

$ make dep

$ make install $ mkdir -p $GOPATH/src/github.com/operator-framework$ cd $GOPATH/src/github.com/operator-framework$ git clone https://github.com/operator-framework/operator-sdk $ cd operator-sdk$ git checkout v0.4.0$ make dep$ make install

Bootstrapping the Go project

Next, let’s use the operator-sdk command to bootstrap the project:

# bootstrap the project and run "deps ensure" and "git init"

$ operator-sdk new podset-operator # check the project structure

$ tree -I vendor

.

├── Gopkg.lock

├── Gopkg.toml

├── build

│ └── Dockerfile

├── cmd

│ └── manager

│ └── main.go

├── deploy

│ ├── operator.yaml

│ ├── role.yaml

│ ├── role_binding.yaml

│ └── service_account.yaml

├── pkg

│ ├── apis

│ │ └── apis.go

│ └── controller

│ └── controller.go

└── version

└── version.go

Once this is done, we have the base layout for our project, which contains not only the go code to run the operator ( cmd/manager/main.go ), but also the Dockerfile to package the binary into a Docker image, and a set of YAML manifests to 1) deploy the Docker image and 2) create the Service Account to run the controller along with a role and role bindings to allow the operations (adding or removing pods, etc.).

Adding a CRD and a controller

Now, let’s create a CRD for the PodSet operator, as well as its associated controller. Since this will be the first version of the operator, it is a good practice to set the version to v1alpha1 :

# Add a new API for the custom resource PodSet

$ operator-sdk add api --api-version=app.example.com/v1alpha1 --kind=PodSet



# Add a new controller that watches for PodSet

$ operator-sdk add controller --api-version=app.example.com/v1alpha1 --kind=PodSet

Another set of YAML files and Go code were generated by these 2 commands, and the most noticeable changes are the following ones:

the new deploy/crds folder contains the Custom Resource Definition of the PodSet, along with an example of Custom Resource:

$ cat deploy/crds/app_v1alpha1_podset_crd.yaml

apiVersion: apiextensions.k8s.io/v1beta1

kind: CustomResourceDefinition

metadata:

name: podsets.app.example.com

spec:

group: app.example.com

names:

kind: PodSet

listKind: PodSetList

plural: podsets

singular: podset

scope: Namespaced

version: v1alpha1 $ cat deploy/crds/app_v1alpha1_podset_cr.yaml

apiVersion: app.example.com/v1alpha1

kind: PodSet

metadata:

name: example-podset

spec:

# Add fields here

size: 3

The full name of the CRD is podsets.app.example.com but there are also various names associated with it ( PodSet , PodSetList , podsets and podset ). They will be part of the extended cluster API and available in the kubectl command line, as we’ll see later when deploying and running the operator.

2. The pkg/apis/app/v1alpha1/podset_types.go defines the structure of the PodSetSpec which is the expected state of the PodSet and which is specified by the user in the aforementioned deploy/crds/app_v1alpha1_podset_cr.yaml file. It also defines the structure of the PodSetStatus which will be used to provide the observed state of the PodSet when the kubectl describe command is executed. More on this later.

3. The scaffold file pkg/controller/podset/podset_controller.go is where we will put the business logic of the controller.

The rest of the changes are the necessary plumbing to register the CRD and the controller.

Implementing the business logic of the controller

Out of the box, the controller code generated by the SDK creates a single pod if none with a given app label already exists. Although this is not exactly what we want here, it is a nonetheless a good starting point since it shows how to use the k8s API, whether to list the pods or to create new ones.

The first thing we want to do is changing the PodSetSpec and PodSetStatus Go structs by adding fields to store the number of replicas in the former and the name of the pods in the latter:

type PodSetSpec struct {

Replicas int32 `json:"replicas"`

} type PodSetStatus struct {

PodNames []string `json:"podNames"`

}

Each time we make a change in these structures, we need to run the operator-sdk generate k8s command to update the pkg/apis/app/v1alpha1/zz_generated.deepcopy.go file accordingly.

Then, we need to configure the primary and secondary resources that the controller will monitor in the namespace. For our PodSet operator, the primary resource is the PodSet resource and the secondary resources are the pods in the namespace. By chance, we don’t have to do anything as this was already implemented in the generated code. Remember that by default, the controller operates on a PodSet resource and creates a pod.

Lastly, we need to implement the logic of scaling up and down the pods and updating the custom resource status with the names of the pods. All of this happens in the Reconcile function of the controller.

During the reconcile, the controller fetches the PodSet resource in the current namespace and compares the value of its Replica field with the actual number of Pods that match a specific set of labels (here, app and version ) to decide whether pods need to be created or deleted.

Instead of going into the details of the implementation of the controller’s Reconcile function (the code is available on GitHub), let’s focus on the key points to remember here:

The reconcile function is invoked each time the PodSet resource is changed or a change happens in the pods belonging to the PodSet. If pods need to be added or removed, the Reconcile function should only add or remove one pod at a time, return, and wait for the next invocation (since it will be called after a pod was created or deleted). Make sure that the pods are “owned” by the PodSet primary resource using the controllerutil.SetControllerReference() function. Having this ownership in place means that when the PodSet resource is deleted, all its “child” pods are deleted as well.

Building and publishing the operator

Let’s use the Operator SDK to build the Docker image containing the controller, and let’s push it to an online registry. We’ll use Quay.io in this case, but other registries would work as well:

# build the Docker image using the Operator SDK

$ operator-sdk build quay.io/xcoulon/podset-operator # login to Quay.io

$ docker login -u xcoulon quay.io # push the image to Quay.io

$ docker push quay.io/xcoulon/podset-operator

Also, we need to update the operator.yaml manifest to use the new Docker image available on Quay.io:

# On Linux:

$ sed -i 's|REPLACE_IMAGE|quay.io/xcoulon/podset-operator|g' deploy/operator.yaml # On OSX:

$ sed -i "" 's|REPLACE_IMAGE|quay.io/xcoulon/podset-operator|g' deploy/operator.yaml

Setting up Minishift

Note: the first batch of instructions to setup the VM and login is specific to Minishift, the tool to run OpenShift locally. You can skip to the next section if you want to deploy the operator on a regular Kubernetes cluster, instead. It’s fine. In that case, just replace oc with kubectl in the commands to execute.

Install Minishift and start a VM with the admin-user addon enabled:

# create a new profile to test the operator

$ minishift profile set operator # enable the admin-user add-on

$ minishift addon enable admin-user # optionally, configure the VM

$ minishift config set cpus 4

$ minishift config set memory 8GB

$ minishift config set vm-driver virtualbox # start the instance

$ minishift start # login with the admin account

$ oc login -u system:admin

Deploying on Minishift

Prior to deploying the operator controller, we need to create the service account and assign it a role with the proper permissions to manage resources:

# Setup Service Account

$ oc create -f deploy/service_account.yaml # Setup RBAC

$ oc create -f deploy/role.yaml

$ oc create -f deploy/role_binding.yaml

Once this is done, we can create the CRD and deploy the operator controller:

# Setup the CRD

$ oc create -f deploy/crds/app_v1alpha1_podset_crd.yaml # Deploy the podset-operator

$ oc create -f deploy/operator.yaml

Once the CRD has been created, it becomes not only accessible via the cluster API endpoint, but also from the command line tools. As mentioned earlier, this is how Kubernetes shines as an extensible platform:

# check the CRD

$ oc get crd podsets.app.example.com

NAME

podsets.app.example.com # check the operator controller

$ oc get pods

NAME READY STATUS

podset-operator-685bbbc858-d4gf7 1/1 Running # check if there's a CR using the CRD fullname...

$ oc get podsets.app.example.com

No resources found. # ... or one of its aliases

$ oc get podsets

No resources found.

Ok, so we have both our CRD and the controller operator in place. It’s finally time to create our PodSet resource configured with 3 replicas:

$ echo "apiVersion: app.example.com/v1alpha1

kind: PodSet

metadata:

name: example-podset

spec:

replicas: 3" | oc create -f -

And now, we can check the pods in the namespace:

$ oc get pods -l app=example-podset

NAME READY STATUS

example-podset-podc2ckn 1/1 Running

example-podset-podjnqqr 1/1 Running

example-podset-podlx55r 1/1 Running

As we could expect from the operator controller, when we delete a pod, a new one is created, and when we scale up or down, the number of pods increases or decreases accordingly:

# let's delete a pod

$ oc delete pod example-podset-podlx55r

pod "example-podset-podlx55r" deleted # let's check the pods again - a new pod was created to replace

# the one we just deleted

$ oc get pods -l app=example-podset

NAME READY STATUS RESTARTS AGE

example-podset-pod85zf5 1/1 Running 0 46s

example-podset-podc2ckn 1/1 Running 0 8m

example-podset-podjnqqr 1/1 Running 0 8m # let's scale down the pod set

$ echo "apiVersion: app.example.com/v1alpha1

kind: PodSet

metadata:

name: example-podset

spec:

replicas: 2" | oc apply -f - # let's check the pods

$ oc get pods -l app=example-podset

NAME READY STATUS RESTARTS AGE

example-podset-pod85zf5 1/1 Terminating 0 39m

example-podset-podc2ckn 1/1 Running 0 46m

example-podset-podjnqqr 1/1 Running 0 46m # let's scale up the pod set

$ echo "apiVersion: app.example.com/v1alpha1

kind: PodSet

metadata:

name: example-podset

spec:

replicas: 4" | oc apply -f - # let's check the pods again

$ oc get pods -l app=example-podset

NAME READY STATUS RESTARTS AGE

example-podset-pod5hj4r 1/1 Running 0 8s

example-podset-podc2ckn 1/1 Running 0 49m

example-podset-podjnqqr 1/1 Running 0 49m

example-podset-podlf7xm 1/1 Running 0 8s

Also, the custom resource status shows the name of the pods:

$ oc describe podset/example-podset

Name: example-podset

Namespace: myproject

Labels: <none>

Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"app.example.com/v1alpha1","kind":"PodSet","metadata":{"annotations":{},"name":"example-podset","namespace":"myproject"},"spec":{"replica...

API Version: app.example.com/v1alpha1

Kind: PodSet

Metadata:

...

Spec:

Replicas: 4

Status:

Pod Names:

example-podset-pod5hj4r

example-podset-podc2ckn

example-podset-podjnqqr

example-podset-podlf7xm

Events: <none>

And finally, when we delete the PodSet resource, all associated (i.e., “owned”) pods are deleted as well:

# let's delete the PodSet resource

$ oc delete podset example-podset

podset.app.example.com "example-podset" deleted # let's check the pods are deleted as well

$ oc get pods -l app=example-podset

NAME READY STATUS

example-podset-pod5hj4r 1/1 Terminating

example-podset-podc2ckn 1/1 Terminating

example-podset-podjnqqr 1/1 Terminating

example-podset-podzdzrn 1/1 Terminating

Et voilà! We build and deployed our first Kubernetes Operator 🎉

For further information about the Operator SDK, check the GitHub repository and the overview on CoreOS website.

The code for the operator developped in this article is on GitHub as well.