+

tl;dr - outline of some approaches I’ve taken to storage on my small k8s cluster, why I can’t just use Rook (which is/was primarily Ceph underneath), and setup & evaluation of OpenEBS. OpenEBS is working great and is wonderfuly simple – I’m probably going to be using it for everything from now on.

Discovering Rook (and resultingly Ceph, which was Rook’s first underlying system) was a huge moment for me in figuring out how to do interesting things with Kubernetes. My “cluster” is super small (only the one node!), but I always wanted to get away from the hackiness of hostsPath volumes, and use something that was a little more dynamic.

Using Rook with Ceph underneath meant that you needed to hand over an entire disk to Ceph to manage. I re-read both Ceph and Rook documentation countless times because both of them seem to suggest you can just run “in a folder”, but I’m convinced that what they mean is in a folder on a dedicated disk. There’s a fundamental problem with being able to constraint folder sizes in linux to consider as well. Either way, giving an entire disk to Ceph to manage is doable – you can just let ansible do the heavy lifting of wiping the second drive and reformatting it. In my case things are a little more difficult because Hetzner’s dedicated servers come with software RAID1’d disks (RAID1 = multiple copies of the same data). This meant I had to spend some time learning about software RAID on linux in general and learning to dissassemble it.

Relatively recently this set up bit me (I’ll get into the how/why later) and I reverted to the indignity that is managed hostPath volumes again (it’s not that bad, I just make sure to keep all data in a central location like /var/data and folder by project). Recovering the data is also really easy if you follow this path because you can just SSH on to the machine (and if it doesn’t boot go to recovery mode), and rsync the whole folder out (don’t forget rsync’s compression options!).

You might be wondering why I’d go for a solution like Rook/Ceph when I could just use any of the other awesome volume types that Kubernetes supports. Well there are a few reasons:

I don’t use a cloud provider (“baremetal” cluster)

I prefer (basically only use) F/OSS solutions

local volumes are awesome but don’t support dynamic provisioning as of now

And of course, tools like Rook/Ceph are what I want to get used to going forward because they do things like handle replication of data under the covers for you (so essentially RAIDx), and I’m preparing for the day where I enter the normal case of 99% of k8s operators who run more than one node and data starts flying everywhere. I have a fetish for good yet general solutions, so I’d rather run Rook/Ceph on one node then figure out how to run it on more than commit to a one-node solution that isn’t really managable in a multi-node environment (given that I’ll be multi-node sooner rather than later).

I only recently of OpenEBS through a random comment by u/dirtypete1981 on reddit, and I had no idea OpenEBS existed, and the idea of Container Attached Storage is interesting, although “CAS” is a terribly overloaded term already in computer science. After reading up on the concepts (or watching the FOSDEM 2018 talk by Jeffrey Molanus), I was interested in trying it out – while I don’t know that a case can be made for the CAS approach being faster than traditional approaches, the flexbility is self-apparent.

The CAS approach is kind of like Ceph turned inside out – the OSD/Mons and other internal stuff are exposed as part of your infrastructure instead of behind the Ceph curtain. Could OpenEBS be my solution to small-scale but general storage-for-my-workloads problems? (Spoilers: the answer is yes, which is why this blog post exists).

How I got here: borking my Rook setup

tl;dr - After prepping for a kubernetes 1.12 upgrade, I updated the OS as well and grub2 carelessly.

I don’t know why I keep doing this to myself (it’s not the first time), but in the middle of getting ready to migrate to Kubernetes 1.12, I did an apt-get update && apt-get upgrade . Nothing better than adding one big upgrade while you perform another. While apt was doing it’s thing I noticed that grub2-install was trying to configure itself and asking me for input. I picked some settings that I thought were correct (and that the config tool LGTM’d), but it turns out that grub/grub2 basically doesn’t support proper installation for LVM/Software RAID. That, or I’m just not smart enough to get it to work and I need more gray hairs on my beard, either way my setup was borked – say goodbye to that sweet Kubernetes 1.12 upgrade.

Cue hours of downtime, I spend lots of time running around the internet frantically searching terms like “grub”, “raid1”, “mdadm”, “grub-install”, and trying to figure out how I could get grub to realize where it should be booting from.

I will spoil it for you now though: in the end I had to get my data and rebuild the server completely. The silver lining is that my ansible infrastructure code (you can find an unmaintained snapshot of it on gitlab) was able to get from fresh 18.04 install to k8s 1.12 (might as well do the upgrade if I’m remaking the cluster) very quickly with no manual intervention. I did choose to remove the code that disassembled software RAID (you can approximate it by scaling your raid cluster down to 1 disk, then doing stuff with the second one without going into hetzner rescue mode) – I’m going to leave the drives RAID1’d on hetzner boxes from now on.

Well let’s pretend that it was going to work – here are some helpful resources I found along the way:

The first link helped me mount disks properly in Hetzner’s rescue mode after observing the machine wouldn’t boot. After requesting a live connection to my machine I saw that it was stuck at the Hetzner PXE boot screen (I really wish Hetzner let you bring your own PXE boot setup), tries the local disk but never succeeds. My first instinct though was of course to try and get my data off this possibly borked server.

After searching and trying many things, checking the drives for errors, wiping and re-building the raid partitions and messing with grub configuration, I gave up. As I said before, in the end I didn’t win this particular battle, but the silver lining is that I got to test my infrastructure code and it didn’t rot very much at all.

Now that I’m not trying to undo the RAID so I can give a full disk to Ceph any more I started to wonder what my other options were. While I was knee deep in incomplete help threads and beginner-level instructions, I realized that another way I could have solved this problem was to create a loop -based virtual disk and give that to Ceph. So now here are the options I know of:

hostPath / local volumes

/ volumes loop -based virtual disk

-based virtual disk OpenEBS(?)

By the title/flow of this article you problably know which one I’m going to investigate, but I do want to note that the virtual disk solution actually seems really promising for dynamic provisioning on a per-node level, because it seems nestable. Instead of trying to deal with size-constraining folders on disk, why not make one “big” virtual disk (let’s say 500GB), mount that, then partition it into smaller virtual disks? I get the feeling I could hack together an operator to do this and provide the ever-elusive “dynamic hostpath/local volumes” very very easily. Eventually I’ll find time to explore that idea, but that time isn’t this time.

Before choosing to go with OpenEBS I took a step back to evaluate why I want to solve this problem at all. At the end of the day I want:

Dynamic PV provisioning (a way to use the second disk on my dedicated server without sitting there and slicing up partitions/etc)

Consistency/Ease of application deployment (I can just use PerstistentVolumes and PersistentVolumeClaims, no managing hostPath s)

s) Replication & Durability (less useful in my current case of one drive on one machine)

Evaluating OpenEBS

Step 0: RTFM

One of the first things I looked at was OpenEBS’s list of features, and they’re pretty great. Some highlights:

Synchronous replication - Similar to how Ceph does it to ensure durability of writes

Snapshots - one of the biggest questions I rarely see answered. Getting a PVC up and running is fine, but what happens when the application goes down or I need to migrate to another node?

Snapshots are a huge differentiator if true. Rook is still working on it according to their roadmap (scheduled for v0.9, which is). I was also really impressed by the architecture docs, they’re pretty concise and informative. Reading through the docs it’s looking like OpenEBS is going to offer me a way to have dynamically allocated drives & PV/PVCs without the static provisioning that you’d need for local volumes.

NOTE I just realised that Rook 0.9 is out now, so they should have snapshots.

Obviously, there’s a lot of tech to read up on here if you’re new to the space. At this point basically I know enough about Kubernetes + Rook + Ceph to be dangerous after reading documentation and setting up/fiddling. Here’s a loose list of things you may want to read up on/know about:

Skimming these resources is obviously enough – it would take months/years to be an actual expert nevermind the actual in-the-trenches experience. Importantly, we need to keep the user-level goal in mind, which I can try and encapsulate with this statement:

When I start a pod, if there is space either in a local disk or some network attached storage I’ve purchased, I want a PVC to be automatically created for it, and I want to automatically have data replicated as the pod makes use of the filesystem

The idea is simple of course, but the devil is in the details, and there are deals (tradeoffs) to be made all over. Storage systems can be good for some usecases but bad for others – so one system might be great for storing & replicating pictures, but bad for storing and replicating writes to a Write Ahead Log like Postgres (or your favorite database) might perform. I didn’t choose GlusterFS when I was first looking into distributed storage mostly because of reports (that never seemed to get rebutted) that it was less than ideal for running databases on. What I’m looking for is a solution with a decent general-case usage I’m not Google, or a tech giant, I don’t run applications that are causing writes thousands of times a second, but I do want to enable easy operations.

OK enough exposition let’s get to installing OpenEBS.

Step 1: Installing OpenEBS

The OpenEBS documentation has a section on installing OpenEBS as you’d expect which we’re going to follow. We’ll use the default Jiva store, which seems to work with a local folder on the machine by default. I’m basing this understanding off the following quote:

OpenEBS can be used to create Storage Pool on a host disk or an externally mounted disk. This Storage Pool can be used to create Jiva volume which can be utilized to run applications. By default, Jiva volume will be deployed on host path. If you are using an external disk, see storage pool for more details about creating a storage pool with an external disk.

Hopefully they don’t mean this in the same way Rook/Ceph did, and I can just give OpenEBS a folder on-disk (which again is actually 2 disks software-RAIDed together) and OpenEBS will manage sizing and dynamic provisioning of data and expose it via iscsi. Which brings me to one of the hard requirements of OpenEBS – you need open-iscsi installed, as noted in the prerequsities:

root@Ubuntu-1810-cosmic-64-minimal ~ # sudo apt-get install open-iscsi

As for the kubernetes-parts, they recommend that you install with Helm or by running kubectl on a monolithic YAML file like this:

kubectl apply -f https://openebs.github.io/charts/openebs-operator-0.8.0.yaml

As usual, I don’t ever do that, but instead pull down and split up the monolithic YAML file and get an idea of what’s running. Here’s what it looks like for me (I use the makeinfra pattern:

infra/kubernetes/cluster/storage/openebs/openebs.ns.yaml :

--- apiVersion: v1 kind: Namespace metadata: name: openebs

infra/kubernetes/cluster/storage/openebs/openebs.serviceaccount.yaml :

--- apiVersion: v1 kind: ServiceAccount metadata: name: openebs-maya-operator namespace: openebs

infra/kubernetes/cluster/storage/openebs/openebs.configmap.yaml :

--- apiVersion: v1 kind: ConfigMap metadata: name: openebs-ndm-config namespace: openebs data: # udev-probe is default or primary probe which should be enabled to run ndm # filterconfigs contails configs of filters - in their form fo include # and exclude comma separated strings node-disk-manager.config: | probeconfigs: - key: udev-probe name: udev probe state: true - key: smart-probe name: smart probe state: true filterconfigs: - key: os-disk-exclude-filter name: os disk exclude filter state: true exclude: "/,/etc/hosts,/boot" - key: vendor-filter name: vendor filter state: true include: "" exclude: "CLOUDBYT,OpenEBS" - key: path-filter name: path filter state: true include: "" exclude: "loop,/dev/fd0,/dev/sr0,/dev/ram,/dev/dm-,/dev/md"

infra/kubernetes/cluster/storage/openebs/openebs.rbac.yaml :

--- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: openebs-maya-operator rules: - apiGroups: ["*"] resources: ["nodes", "nodes/proxy"] verbs: ["*"] - apiGroups: ["*"] resources: ["namespaces", "services", "pods", "deployments", "events", "endpoints", "configmaps", "jobs"] verbs: ["*"] - apiGroups: ["*"] resources: ["storageclasses", "persistentvolumeclaims", "persistentvolumes"] verbs: ["*"] - apiGroups: ["volumesnapshot.external-storage.k8s.io"] resources: ["volumesnapshots", "volumesnapshotdatas"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: ["apiextensions.k8s.io"] resources: ["customresourcedefinitions"] verbs: [ "get", "list", "create", "update", "delete"] - apiGroups: ["*"] resources: [ "disks"] verbs: ["*" ] - apiGroups: ["*"] resources: [ "storagepoolclaims", "storagepools"] verbs: ["*" ] - apiGroups: ["*"] resources: [ "castemplates", "runtasks"] verbs: ["*" ] - apiGroups: ["*"] resources: [ "cstorpools", "cstorvolumereplicas", "cstorvolumes"] verbs: ["*" ] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: openebs-maya-operator namespace: openebs subjects: - kind: ServiceAccount name: openebs-maya-operator namespace: openebs - kind: User name: system:serviceaccount:default:default apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: openebs-maya-operator apiGroup: rbac.authorization.k8s.io

infra/kubernetes/cluster/storage/openebs/openebs-api-server.deployment.yaml :

--- apiVersion: apps/v1beta1 kind: Deployment metadata: name: maya-apiserver namespace: openebs spec: replicas: 1 template: metadata: labels: name: maya-apiserver spec: serviceAccountName: openebs-maya-operator containers: - name: maya-apiserver imagePullPolicy: IfNotPresent image: quay.io/openebs/m-apiserver:0.8.0 ports: - containerPort: 5656 env: # OPENEBS_IO_KUBE_CONFIG enables maya api service to connect to K8s # based on this config. This is ignored if empty. # This is supported for maya api server version 0.5.2 onwards #- name: OPENEBS_IO_KUBE_CONFIG # value: "/home/ubuntu/.kube/config" # OPENEBS_IO_K8S_MASTER enables maya api service to connect to K8s # based on this address. This is ignored if empty. # This is supported for maya api server version 0.5.2 onwards #- name: OPENEBS_IO_K8S_MASTER # value: "http://172.28.128.3:8080" # OPENEBS_IO_INSTALL_DEFAULT_CSTOR_SPARSE_POOL decides whether default cstor sparse pool should be # configured as a part of openebs installation. # If "true" a default cstor sparse pool will be configured, if "false" it will not be configured. - name: OPENEBS_IO_INSTALL_DEFAULT_CSTOR_SPARSE_POOL value: "true" # OPENEBS_NAMESPACE provides the namespace of this deployment as an # environment variable - name: OPENEBS_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace # OPENEBS_SERVICE_ACCOUNT provides the service account of this pod as # environment variable - name: OPENEBS_SERVICE_ACCOUNT valueFrom: fieldRef: fieldPath: spec.serviceAccountName # OPENEBS_MAYA_POD_NAME provides the name of this pod as # environment variable - name: OPENEBS_MAYA_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: OPENEBS_IO_JIVA_CONTROLLER_IMAGE value: "quay.io/openebs/jiva:0.8.0" - name: OPENEBS_IO_JIVA_REPLICA_IMAGE value: "quay.io/openebs/jiva:0.8.0" - name: OPENEBS_IO_JIVA_REPLICA_COUNT value: "3" - name: OPENEBS_IO_CSTOR_TARGET_IMAGE value: "quay.io/openebs/cstor-istgt:0.8.0" - name: OPENEBS_IO_CSTOR_POOL_IMAGE value: "quay.io/openebs/cstor-pool:0.8.0" - name: OPENEBS_IO_CSTOR_POOL_MGMT_IMAGE value: "quay.io/openebs/cstor-pool-mgmt:0.8.0" - name: OPENEBS_IO_CSTOR_VOLUME_MGMT_IMAGE value: "quay.io/openebs/cstor-volume-mgmt:0.8.0" - name: OPENEBS_IO_VOLUME_MONITOR_IMAGE value: "quay.io/openebs/m-exporter:0.8.0" # OPENEBS_IO_ENABLE_ANALYTICS if set to true sends anonymous usage # events to Google Analytics - name: OPENEBS_IO_ENABLE_ANALYTICS value: "false" # OPENEBS_IO_ANALYTICS_PING_INTERVAL can be used to specify the duration (in hours) # for periodic ping events sent to Google Analytics. Default is 24 hours. #- name: OPENEBS_IO_ANALYTICS_PING_INTERVAL # value: "24h" livenessProbe: exec: command: - /usr/local/bin/mayactl - version initialDelaySeconds: 30 periodSeconds: 60 readinessProbe: exec: command: - /usr/local/bin/mayactl - version initialDelaySeconds: 30 periodSeconds: 60

infra/kubernetes/cluster/storage/openebs/openebs-provisioner.deployment.yaml :

--- apiVersion: apps/v1beta1 kind: Deployment metadata: name: openebs-provisioner namespace: openebs spec: replicas: 1 template: metadata: labels: name: openebs-provisioner spec: serviceAccountName: openebs-maya-operator containers: - name: openebs-provisioner imagePullPolicy: IfNotPresent image: quay.io/openebs/openebs-k8s-provisioner:0.8.0 env: # OPENEBS_IO_K8S_MASTER enables openebs provisioner to connect to K8s # based on this address. This is ignored if empty. # This is supported for openebs provisioner version 0.5.2 onwards #- name: OPENEBS_IO_K8S_MASTER # value: "http://10.128.0.12:8080" # OPENEBS_IO_KUBE_CONFIG enables openebs provisioner to connect to K8s # based on this config. This is ignored if empty. # This is supported for openebs provisioner version 0.5.2 onwards #- name: OPENEBS_IO_KUBE_CONFIG # value: "/home/ubuntu/.kube/config" - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: OPENEBS_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace # OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name, # that provisioner should forward the volume create/delete requests. # If not present, "maya-apiserver-service" will be used for lookup. # This is supported for openebs provisioner version 0.5.3-RC1 onwards #- name: OPENEBS_MAYA_SERVICE_NAME # value: "maya-apiserver-apiservice" livenessProbe: exec: command: - pgrep - ".*openebs" initialDelaySeconds: 30 periodSeconds: 60

infra/kubernetes/cluster/storage/openebs/openebs-snapshot-operator.deployment.yaml :

--- apiVersion: apps/v1beta1 kind: Deployment metadata: name: openebs-snapshot-operator namespace: openebs spec: replicas: 1 strategy: type: Recreate template: metadata: labels: name: openebs-snapshot-operator spec: serviceAccountName: openebs-maya-operator containers: - name: snapshot-controller image: quay.io/openebs/snapshot-controller:0.8.0 imagePullPolicy: IfNotPresent env: - name: OPENEBS_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace livenessProbe: exec: command: - pgrep - ".*controller" initialDelaySeconds: 30 periodSeconds: 60 # OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name, # that snapshot controller should forward the snapshot create/delete requests. # If not present, "maya-apiserver-service" will be used for lookup. # This is supported for openebs provisioner version 0.5.3-RC1 onwards #- name: OPENEBS_MAYA_SERVICE_NAME # value: "maya-apiserver-apiservice" - name: snapshot-provisioner image: quay.io/openebs/snapshot-provisioner:0.8.0 imagePullPolicy: IfNotPresent env: - name: OPENEBS_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace # OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name, # that snapshot provisioner should forward the clone create/delete requests. # If not present, "maya-apiserver-service" will be used for lookup. # This is supported for openebs provisioner version 0.5.3-RC1 onwards #- name: OPENEBS_MAYA_SERVICE_NAME # value: "maya-apiserver-apiservice" livenessProbe: exec: command: - pgrep - ".*provisioner" initialDelaySeconds: 30 periodSeconds: 60

infra/kubernetes/cluster/storage/openebs/openebs-disk-manager.ds.yaml :

--- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: openebs-ndm namespace: openebs spec: template: metadata: labels: name: openebs-ndm spec: # By default the node-disk-manager will be run on all kubernetes nodes # If you would like to limit this to only some nodes, say the nodes # that have storage attached, you could label those node and use # nodeSelector. # # e.g. label the storage nodes with - "openebs.io/nodegroup"="storage-node" # kubectl label node <node-name> "openebs.io/nodegroup"="storage-node" #nodeSelector: # "openebs.io/nodegroup": "storage-node" serviceAccountName: openebs-maya-operator hostNetwork: true containers: - name: node-disk-manager command: - /usr/sbin/ndm - start image: quay.io/openebs/node-disk-manager-amd64:v0.2.0 imagePullPolicy: IfNotPresent securityContext: privileged: true volumeMounts: - name: config mountPath: /host/node-disk-manager.config subPath: node-disk-manager.config readOnly: true - name: udev mountPath: /run/udev - name: procmount mountPath: /host/mounts - name: sparsepath mountPath: /var/openebs/sparse env: # pass hostname as env variable using downward API to the NDM container - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName # specify the directory where the sparse files need to be created. # if not specified, then sparse files will not be created. - name: SPARSE_FILE_DIR value: "/var/openebs/sparse" # Size(bytes) of the sparse file to be created. - name: SPARSE_FILE_SIZE value: "10737418240" # Specify the number of sparse files to be created - name: SPARSE_FILE_COUNT value: "1" livenessProbe: exec: command: - pgrep - ".*ndm" initialDelaySeconds: 30 periodSeconds: 60 volumes: - name: config configMap: name: openebs-ndm-config - name: udev hostPath: path: /run/udev type: Directory # mount /proc/1/mounts (mount file of process 1 of host) inside container # to read which partition is mounted on / path - name: procmount hostPath: path: /proc/1/mounts - name: sparsepath hostPath: path: /var/openebs/sparse

infra/kubernetes/cluster/storage/openebs/openebs.svc.yaml :

--- apiVersion: v1 kind: Service metadata: name: maya-apiserver-service namespace: openebs spec: ports: - name: api port: 5656 protocol: TCP targetPort: 5656 selector: name: maya-apiserver sessionAffinity: None

And a very basic Makefile to tie it all together:

infra/kubernetes/cluster/storage/openebs/Makefile :

.PHONY: install uninstall KUBECTL := kubectl install: namespace serviceaccount rbac configmap api-server provisioner snapshot-operator node-disk-manager svc namespace: $(KUBECTL) apply -f openebs.ns.yaml serviceaccount: $(KUBECTL) apply -f openebs.serviceaccount.yaml configmap: $(KUBECTL) apply -f openebs.configmap.yaml rbac: $(KUBECTL) apply -f openebs.rbac.yaml svc: $(KUBECTL) apply -f openebs.svc.yaml api-server: $(KUBECTL) apply -f openebs-api-server.deployment.yaml provisioner: $(KUBECTL) apply -f openebs-provisioner.deployment.yaml snapshot-operator: $(KUBECTL) apply -f openebs-snapshot-operator.deployment.yaml node-disk-manager: $(KUBECTL) apply -f openebs-disk-manager.ds.yaml uninstall: $(KUBECTL) delete -f openebs.svc.yaml $(KUBECTL) delete -f openebs-disk-manager.ds.yaml $(KUBECTL) delete -f openebs-snapshot-operator.deployment.yaml $(KUBECTL) delete -f openebs-provisioner.deployment.yaml $(KUBECTL) delete -f openebs-api-server.deployment.yaml $(KUBECTL) delete -f openebs.configmap.yaml $(KUBECTL) delete -f openebs.rbac.yaml $(KUBECTL) delete -f openebs.serviceaccount.yaml $(KUBECTL) delete -f openebs.namespace.yaml

OK, now that it’s installed let’s check if everything looks good:

$ make kubectl apply -f openebs.ns.yaml namespace/openebs created kubectl apply -f openebs.serviceaccount.yaml serviceaccount/openebs-maya-operator created kubectl apply -f openebs.rbac.yaml clusterrole.rbac.authorization.k8s.io/openebs-maya-operator created clusterrolebinding.rbac.authorization.k8s.io/openebs-maya-operator created kubectl apply -f openebs.configmap.yaml configmap/openebs-ndm-config created kubectl apply -f openebs-api-server.deployment.yaml deployment.apps/maya-apiserver created kubectl apply -f openebs-provisioner.deployment.yaml deployment.apps/openebs-provisioner created kubectl apply -f openebs-snapshot-operator.deployment.yaml deployment.apps/openebs-snapshot-operator created kubectl apply -f openebs-disk-manager.ds.yaml daemonset.extensions/openebs-ndm created kubectl apply -f openebs.svc.yaml service/maya-apiserver-service created $ # ... wait some time ... $ kubectl get all -n openebs NAME READY STATUS RESTARTS AGE pod/cstor-sparse-pool-o9mk-7b585d7b8d-bgc4q 2/2 Running 0 2m33s pod/maya-apiserver-78c59c89c-5h674 1/1 Running 0 3m16s pod/openebs-ndm-29d9g 1/1 Running 0 2m50s pod/openebs-provisioner-77dd68645b-tv98t 1/1 Running 5 3m14s pod/openebs-snapshot-operator-85dd4d7c94-hbbd8 2/2 Running 0 3m12s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/maya-apiserver-service ClusterIP 10.110.168.61 <none> 5656/TCP 34s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/openebs-ndm 1 1 1 1 1 <none> 2m51s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cstor-sparse-pool-o9mk 1/1 1 1 2m34s deployment.apps/maya-apiserver 1/1 1 1 3m18s deployment.apps/openebs-provisioner 1/1 1 1 3m15s deployment.apps/openebs-snapshot-operator 1/1 1 1 3m13s NAME DESIRED CURRENT READY AGE replicaset.apps/cstor-sparse-pool-o9mk-7b585d7b8d 1 1 1 2m34s replicaset.apps/maya-apiserver-78c59c89c 1 1 1 3m17s replicaset.apps/openebs-provisioner-77dd68645b 1 1 1 3m15s replicaset.apps/openebs-snapshot-operator-85dd4d7c94 1 1 1 3m13s

Well that certainly looks good to me – no errors, and the node management daemon set is running without issue. Let’s try and test it out.

Step 2: Testing it out with a simple Pod + PVC

Now that we have the system theoretically in a working state, let’s make a Pod with a PersistentVolumeClaim to validate. I want to note here that StatefulSet s and PersistentVolumeClaim s are separate concepts. I often see people mention them as if the only way to use a PersistentVolumeClaim is to have a StatefulSet – but this has more to do with the how the other options work (i.e. a Deployment ) – it’s perfectly possible to have a single Deployment use a PVC, but you can’t have more than one, because the second instance/replica would try to mount the same PV. StatefulSet s offer more things like consistent/different startup semantics and namings, and that’s what makes them well suited for less flexible stateful workloads.

The Makefile is a little disingenuous becuase of how the operator works, a bunch of Custom Resource Definitions (CRDS) also got installed as well as things like StorageClass ess. Since we’ll need to know the pool to be able to make our PersistentVolumeClaim , let’s list them:

$ kubectl get sc NAME PROVISIONER AGE openebs-cstor-sparse openebs.io/provisioner-iscsi 124m openebs-jiva-default openebs.io/provisioner-iscsi 125m openebs-snapshot-promoter volumesnapshot.external-storage.k8s.io/snapshot-promoter 124m

Let’s use the openebs-jiva-default – now we can write our resource definitions for our PersistentVolumeClaim and Pod :

openebs-test.allinone.yaml :

--- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-test-data namespace: default labels: app: pvc-test spec: storageClassName: openebs-jiva-default accessModes: - ReadWriteOnce resources: requests: storage: 5Gi --- apiVersion: v1 kind: Pod metadata: name: pvc-test namespace: default labels: app: pvc-test spec: containers: - name: pvc-test image: alpine command: ["ash", "-c", "while true; do sleep 60s; done"] imagePullPolicy: IfNotPresent resources: requests: cpu: 0.25 memory: "256Mi" limits: cpu: 0.50 memory: "512Mi" volumeMounts: - mountPath: /var/data name: data volumes: - name: data persistentVolumeClaim: claimName: pvc-test-data

Shortly after kubectl apply -f ing that file:

$ kubectl get pods NAME READY STATUS RESTARTS AGE pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 2/2 Running 0 25s pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-2fr6d 0/1 Pending 0 25s pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-j7swl 1/1 Running 0 25s pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-sndh2 0/1 Pending 0 25s pvc-test 0/1 ContainerCreating 0 12s

OK so here we see the CAS concept taking off – there are a bunch of Pod s being started that manage the data being shuffled around – if you look closely you can see the -ctrl- and -rep- in the pod names. I assume there are 3 data taking nodes + 1 manager here for the one PVC. I did nothing to tell OpenEBS I only have one node, so it’s running in the usual HA pattern.

After waiting a bit for some of the Pending containers to come out of pending and the pvc-test pod to get created I realized there was something wrong. A quick kubectl describe pod pvc-test reveals the problem:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m3s default-scheduler Successfully assigned default/pvc-test to ubuntu-1810-cosmic-64-minimal Normal SuccessfulAttachVolume 3m3s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e" Warning FailedMount 63s (x8 over 2m42s) kubelet, ubuntu-1810-cosmic-64-minimal MountVolume.WaitForAttach failed for volume "pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e" : failed to get any path for iscsi disk, last err seen: iscsi: failed to sendtargets to portal 10.108.192.121:3260 output: iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: connection login retries (reopen_max) 5 exceeded iscsiadm: No portals found , err exit status 21 Warning FailedMount 60s kubelet, ubuntu-1810-cosmic-64-minimal Unable to mount volumes for pod "pvc-test_default(6ead8128-10b4-11e9-9cf0-8c89a517d15e)": timeout expired waiting for volumes to attach or mount for pod "default"/ "pvc-test". list of unmounted volumes= [data]. list of unattached volumes= [data default-token-lsfvf]

Well, this is par for the course, since things very rarely work the first time, let’s get in and solve the issues. Before we go on though, let’s check what those other pods are doing:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m16s (x37 over 8m19s) default-scheduler 0/1 nodes are available: 1 node (s) didn't match pod affinity/anti-affinity, 1 node (s) didn't satisfy existing pods anti-affinity rules.

OK, so the node couldn’t start because it’s anti-affinity requirements couldn’t be met. This actually isn’t a problem but is actually expected behavior – OpenEBS runs 3 replicas for fault tolerance, and I’m going in the face of this since I’m only using one node. I love tools that I can predict/reason/guess about armed with only documentation knowledge – in this case it was just a guess but this is a great sign. Rather than re-configure OpenEBS to make less replicas right now I’m going to just ignore the 2 pending containers, and focus on the connection issues.

DEBUG: Connection refused to 10.108.192.121

Since we’re having connectivity issues, let’s make sure I don’t have any NetworkPolicy (I use and love kube-router in my cluster)set that’s preventing the communication:

$ kubectl get networkpolicy No resources found.

OK, all’s clear on that front, let’s figure out what is behind 10.108.192.121 that my pod is trying to talk to:

$ kubectl get pods -o=wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 2/2 Running 0 13m 10.244.0.137 ubuntu-1810-cosmic-64-minimal <none> <none> pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-2fr6d 0/1 Pending 0 13m <none> <none> <none> <none> pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-j7swl 1/1 Running 0 13m 10.244.0.136 ubuntu-1810-cosmic-64-minimal <none> <none> pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-sndh2 0/1 Pending 0 13m <none> <none> <none> <none> pvc-test 0/1 ContainerCreating 0 12m <none> ubuntu-1810-cosmic-64-minimal <none> <none>

The wider output ( -o=wide ) lets us know that the running pods (again, the Pending pods are OK, since we’re in a very not HA situation) – and that the IP we’re trying to connect to isn’t any one of these pods. But if you stop and think about it, of course it isn’t one of these pods – Pod IPs can shift, and if you want a reliable pointer to another pod what you need is a Service , let’s check the IPs of our services:

$ kubectl get svc -o=wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26d <none> pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-svc ClusterIP 10.108.192.121 <none> 3260/TCP,9501/TCP,9500/TCP 15m openebs.io/controller=jiva-controller,openebs.io/persistent-volume=pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e

BINGO! As you might expect with the CAS model, we have a Service exposing the harddrive interface that our Pod will use, to make it accessible. Now we need to figure out why our Pod can’t seem to talk to this service. Let’s dig deeper into the service and make sure it has Endpoint s attached:

$ kubectl describe svc pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-svc Name: pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-svc Namespace: default Labels: openebs.io/cas-template-name=jiva-volume-create-default-0.8.0 openebs.io/cas-type=jiva openebs.io/controller-service=jiva-controller-svc openebs.io/persistent-volume=pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e openebs.io/persistent-volume-claim=pvc-test-data openebs.io/storage-engine-type=jiva openebs.io/version=0.8.0 pvc=pvc-test-data Annotations: <none> Selector: openebs.io/controller=jiva-controller,openebs.io/persistent-volume=pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e Type: ClusterIP IP: 10.108.192.121 Port: iscsi 3260/TCP TargetPort: 3260/TCP Endpoints: 10.244.0.137:3260 Port: api 9501/TCP TargetPort: 9501/TCP Endpoints: 10.244.0.137:9501 Port: exporter 9500/TCP TargetPort: 9500/TCP Endpoints: 10.244.0.137:9500 Session Affinity: None Events: <none>

All of this looks fine and dandy to me – in particular, there are endpoints for the pods that did start up. Everything looks fine as far as Kubernetes concepts go, so let’s look back at the error message for some hints:

Warning FailedMount 63s (x8 over 2m42s) kubelet, ubuntu-1810-cosmic-64-minimal MountVolume.WaitForAttach failed for volume "pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e" : failed to get any path for iscsi disk, last err seen: iscsi: failed to sendtargets to portal 10.108.192.121:3260 output: iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: cannot make connection to 10.108.192.121: Connection refused iscsiadm: connection login retries (reopen_max) 5 exceeded iscsiadm: No portals found

So it looks like the iscsi subystem tried to connect to the kubernetes service @ 10.108.192.121:3260 (which goes to Endpoint for the -ctrl- pod 10.244.0.137 ). Let’s see what’s happening in the Pod with that IP address, we see its Running but how are things going?

$ kubectl logs pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 Error from server (BadRequest): a container name must be specified for pod pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84, choose one of: [pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-con maya-volume-exporter]

OK, so I need to pick one of the internal containers, how about pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-con :

$ kubectl logs pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 -c pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-con time="2019-01-05T06:37:57Z" level=info msg="REPLICATION_FACTOR: 3" time="2019-01-05T06:37:57Z" level=info msg="Starting controller with frontendIP: , and clusterIP: 10.108.192.121" time="2019-01-05T06:37:57Z" level=info msg="resetting controller" time="2019-01-05T06:37:57Z" level=info msg="Listening on :9501" time="2019-01-05T06:38:11Z" level=info msg="List Replicas" time="2019-01-05T06:38:11Z" level=info msg="List Replicas" time="2019-01-05T06:38:11Z" level=info msg="Register Replica for address 10.244.0.136" time="2019-01-05T06:38:11Z" level=info msg="Register Replica, Address: 10.244.0.136 Uptime: 15.399307176s State: closed Type: Backend RevisionCount: 0" time="2019-01-05T06:38:11Z" level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1" 10.244.0.136 - - [05/Jan/2019:06:38:11 +0000] "POST /v1/register HTTP/1.1" 200 0 time="2019-01-05T06:38:16Z" level=info msg="Register Replica for address 10.244.0.136" time="2019-01-05T06:38:16Z" level=info msg="Register Replica, Address: 10.244.0.136 Uptime: 20.396328606s State: closed Type: Backend RevisionCount: 0" 10.244.0.136 - - [05/Jan/2019:06:38:16 +0000] "POST /v1/register HTTP/1.1" 200 0 time="2019-01-05T06:38:16Z" level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1" time="2019-01-05T06:38:21Z" level=info msg="Register Replica for address 10.244.0.136" <the last ~3 lines loop forever>

OK, this was actually the hypothesis I was starting to form in my head – in particular the fact that I haven’t told OpenEBS about how many replicas it’d be able to create (which is my fault since I only have one node, and not 3) might be causing some issues. It’s only a warning, but the repeating nature might suggest that registering is not completing because of this mismatch. Since this isn’t quite a smoking gun let’s check the other container’s logs:

$ kubectl logs pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 -c maya-volume-exporter I0105 06:38:01.357964 1 command.go:97] Starting maya-exporter ... I0105 06:38:01.358045 1 logs.go:43] Initialising maya-exporter for the jiva I0105 06:38:01.358175 1 exporter.go:39] Starting http server....

Well, absolutely no visible prolbems there… so let’s go ahead and reduce the replication factor that OpenEBS is using and see if that fixes things. It took a little digging after re-reading the docs on deploying Jiva, but the StorageClass we’re using for the PersistentVolumeClaim is where we can make this change. Let’s make a new one based on the existing default:

$ kubectl get sc openebs-jiva-default -o=yaml > openebs-jiva-non-ha.storageclass.yaml $ emacs -nw openebs-jiva-non-ha.storageclass.yaml .... make edits ...

And here’s what I ended up with:

--- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-jiva-non-ha annotations: cas.openebs.io/config: | - name: ReplicaCount value: "1" - name: StoragePool value: default #- name: TargetResourceLimits # value: |- # memory: 1Gi # cpu: 100m #- name: AuxResourceLimits # value: |- # memory: 0.5Gi # cpu: 50m #- name: ReplicaResourceLimits # value: |- # memory: 2Gi openebs.io/cas-type: jiva provisioner: openebs.io/provisioner-iscsi reclaimPolicy: Delete volumeBindingMode: Immediate

Those limits definitely seem like a good idea but I’m ignoring them for now (the default is the same way). After kubectl apply ing this StorageClass , and updating our PVC to use the changed storageClassName , we can delete everything ( kubectl delete -f openebs-test.allinone.yaml ), update our Makefile and and re-make everything. After we do:

$ kubectl apply -f openebs-test.allinone.yaml persistentvolumeclaim/pvc-test-data created pod/pvc-test created ... after waiting a few seconds ... $ kubectl get pods NAME READY STATUS RESTARTS AGE pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-ctrl-5b5d84cd8f-v5zcn 2/2 Running 0 37s pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-rep-84897dfc97-t59bb 1/1 Running 0 37s pvc-test 0/1 ContainerCreating 0 37s sjr-pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-tcj7-6d484 0/1 Completed 0 5m18s

Great, so we’ve no pending -rep- pods, and one sjr-pvc pod that I’ve never seen before, but it seems that pod gets left after cleanup happens. More important is making sure pvc-test makes it out of the ContainerCreating state, let’s inspect it:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 2m14s (x3 over 2m14s) default-scheduler pod has unbound immediate PersistentVolumeClaims Normal Scheduled 2m14s default-scheduler Successfully assigned default/pvc-test to ubuntu-1810-cosmic-64-minimal Normal SuccessfulAttachVolume 2m14s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e" Warning FailedCreatePodSandBox 8s (x9 over 116s) kubelet, ubuntu-1810-cosmic-64-minimal Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:265: starting container process caused "process_linux.go:348: container init caused \"read init-p: connection reset by peer\"": unknown

Well good news and bad news the Pod was able to attach but it looks like containerd is having some issues… Which may have nothing to do with OpenEBS. Let’s take a detour

Bonus Round: Impromptu debugging of PodSandBox creation issues

Checking containerd ’s systemd status says it’s running fine, so let’s try and start a pod without a PVC:

Normal Scheduled 16s default-scheduler Successfully assigned default/no-pvc-test to ubuntu-1810-cosmic-64-minimal Warning FailedCreatePodSandBox 3s (x2 over 15s) kubelet, ubuntu-1810-cosmic-64-minimal Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:265: starting container process caused "process_linux.go:348: container init caused \"read init-p: connection reset by peer\"": unknown

Alright, it looks like something is just wrong with containerd , which is good news because it means OpenEBS is ostensibly working, but bad news because it’s a bit of a chink in the armor. Not being able to create new Pods is definitely not ideal if I were running in a more serious production environment. To avoid a full machine reboot, Let’s take a look at the kubelet logs:

Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.535560 1586 kuberuntime_sandbox.go:65] CreatePodSandbox for pod "no-pvc-test_default(3f8d2a77-10bb-11e9-9cf0-8c89a517d15e)" failed: rpc error: code = Unknown desc = failed to start sandbox contai Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.535587 1586 kuberuntime_manager.go:662] createPodSandbox for pod "no-pvc-test_default(3f8d2a77-10bb-11e9-9cf0-8c89a517d15e)" failed: rpc error: code = Unknown desc = failed to start sandbox conta Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.535659 1586 pod_workers.go:190] Error syncing pod 3f8d2a77-10bb-11e9-9cf0-8c89a517d15e ("no-pvc-test_default(3f8d2a77-10bb-11e9-9cf0-8c89a517d15e)"), skipping: failed to "CreatePodSandbox" for "n Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: W0105 08:28:09.708588 1586 manager.go:1195] Failed to process watch event {EventType:0 Name:/kubepods/burstable/pod13bea637-10ba-11e9-9cf0-8c89a517d15e/83cce6ed723480f83227706c155fc6f6ead206c4587b64e4c5084416bb Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: W0105 08:28:09.709232 1586 container.go:409] Failed to create summary reader for "/kubepods/burstable/pod3f8d2a77-10bb-11e9-9cf0-8c89a517d15e/55d52719eb084edcbb77f64167d9de7cce6e25f54990364d5fe7e1c8819d437d": n Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.753328 1586 dns.go:132] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 213.133.98.98 213.133.99.99 213.133.100.100

This is only partial but you can see things aren’t going well. Unfortunately both kubelet and containerd are not getting back into good states after restarting, so I’m just going to restart the box :(. I believe this has happened before and the quickest fix was to just restart everything, and this time I will absolutely not try and upgrade the entire system.

Well, I did all that only to realize that the issue is more nuanced – the resources I set in the pod specification were bad. With some binary-search-comment-and-uncomment, I realized my memory specification was wrong:, here’s the no-pvc-test Pod after the fix:

--- apiVersion: v1 kind: Pod metadata: name: no-pvc-test namespace: default labels: app: no-pvc-test spec: containers: - name: no-pvc-test image: alpine command: ["ash", "-c", "while true; do sleep 60s; done"] imagePullPolicy: IfNotPresent resources: requests: cpu: 0.25 memory: "512Mi" limits: cpu: 0.50 memory: "512Mi"

Woops! Looks like a classic case of user error

Finally putting it all together

I went back and fixed the other resources and everything worked out just fine, all the pods are running (with the PVC):

$ kubectl get pods NAME READY STATUS RESTARTS AGE pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-ctrl-5b5d84cd8f-v5zcn 2/2 Running 2 38m pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-rep-84897dfc97-t59bb 1/1 Running 1 38m pvc-test 1/1 Running 0 114s sjr-pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-tcj7-6d484 0/1 Completed 0 43m sjr-pvc-af0c599e-10b9-11e9-9cf0-8c89a517d15e-hdu8-vjf6v 0/1 Completed 0 39m

Let’s try and kubectl exec our way in to try and write some data:

$ kubectl exec -it pvc-test ash / # ls /var/data lost+found / # echo "HELLO WORLD" > /var/data/hello-world.txt / # ls /var/data hello-world.txt lost+found

Now, let’s delete only the pod (careful, don’t delete the PVC, we have the reclaimPolicy set to Delete , though it probably wouldn’t delete fast enough). After we delete the pod, we should be able to restart it and it will pick up the same volume:

$ kubectl delete pod pvc-test pod "pvc-test" deleted $ kubectl apply -f openebs-test.allinone.yaml persistentvolumeclaim/pvc-test-data unchanged pod/pvc-test created $ kubectl exec -it pvc-test ash / # ls /var/data hello-world.txt lost+found / # cat /var/data/hello-world.txt HELLO WORLD

We did it! We’ve got awesome persistent volumes working with OpenEBS and have a great non HA (but could easily go HA) setup. We’re standing on the shoulders of many giants, and things definitely look pretty good from up here!

If we look at the on-disk representation, we can check out the files in /var/openebs :

root@Ubuntu-1810-cosmic-64-minimal ~ # tree /var/openebs/ /var/openebs/ ├── pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e │ ├── revision.counter │ ├── volume-head-000.img │ ├── volume-head-000.img.meta │ └── volume.meta ├── pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e │ └── scrubbed.txt ├── pvc-af0c599e-10b9-11e9-9cf0-8c89a517d15e │ └── scrubbed.txt ├── shared-cstor-sparse-pool │ ├── cstor-sparse-pool.cache │ ├── uzfs.sock │ └── zrepl.lock └── sparse └── 0-ndm-sparse.img 5 directories, 10 files

Looks like OpenEBS is basically doing that “loop-based disk image maintenance” idea I had or something similar (and I’m sure way more robustly) – this might just be the best solution I’ve come across so far for storage with Kubernetes. let’s check out what some of these files are:

/var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e: directory /var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/volume-head-000.img: Linux rev 1.0 ext4 filesystem data, UUID=4820dfb0-7574-47b4-91b0-39a31580fbf2 (needs journal recovery) (extents) (64bit) (large files) (huge files) /var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/volume-head-000.img.meta: ASCII text /var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/volume.meta: ASCII text /var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/revision.counter: ASCII text, with no line terminators

Pretty awesome, straight forward and predictable stuff!

Wrapup

Thus concludes our whirlwind tour through setting up OpenEBS. As you can see the work was pretty light on our side, things just worked, and that’s thanks to a lot of hard work from the team behind OpenEBS and committers to the project (and all the other giants we’re standing on).

Going forward it looks like I’m going to be using OpenEBS over Rook for my bare metal clusters (on Hetzner at least) – it was/is a blast to try and keep up with this area and see how it evolves over time.