Like many techies out there, I’ve accumulated various Raspberry Pi like development boards over the years. And also like many techies, most of them have been sitting in a “tech all use someday” drawer.

Well that someday finally came for me :)

I had a few weeks off from work over the winter holidays, which gave me plenty of time to take stock of all the hardware I had, and what I could do with it. This included

A five disk RAID enclosure exposed over USB3

Raspberry Pi model B (OG model)

CubbieBoard 1

Banana Pi M1

HP Netbook (2012??)

Out of those 5 pieces of hardware, I was currently only using the RAID and the Netbook as a subpar NAS. Since the Netbook didn’t support USB3, I wasn’t getting the full speed potential out of my RAID.

Life Goals!

That RAID was being done a disservice by the netbook, so I set some goals for a better setup:

A NAS with USB3 and Gigabit ethernet A better way to manage the software on the device (bonus) Ability to stream some media off the RAID to my Fire TV

Since none of the devices I had supported both USB3 and Gigabit ethernet, I sadly had to do some shopping.

The board I landed on was the ROC-RK3328-CC. It had all the specs I wanted, and had decent enough OS support.

With the hardware specs addressed (and waiting for it to arrive), I turned my attention to the 2nd goal.

Managing the software on the device

Part of what I felt made my dev-board projects eventually fail in the past, was my lack of reproducibility and documentation. Whenever I got everything setup the way I needed it, I never wrote down the steps I took, or blog posts I followed. And when something eventually went wrong months or years later, I would have no idea what I had originally done when attempting to fix an issue.

So I said to myself “this time it will be different!”

I turned to a beast I know quite well, Kubernetes!

While K8s is a pretty heavy handed solution to a pretty simple problem, after almost three years managing clusters at $dayjob using various solutions (home grown, kops, etc), it’s also something I’m deeply familiar with.

Plus, deploying it outside of a cloud environment, and on ARM devices for that matter, seemed like an interesting challenge.

I also figured that since my existing hardware didn’t have the specs I needed for the NAS, I could at least cluster them, and maybe some of the software that didn’t need the higher specs could run on my older devices.

Kubernetes on ARM

Since I hadn’t had a chance at work to try using the kubeadm tool for provisioning clusters, I figured now was a perfect time to take it for a test drive.

For my OS I decided on Armbian, as it had the most support across all the boards I had.

I found a good blog post for setting up Kubernets on a Raspberry Pi using HypriotOS. Since I wasn’t to confident in HypriotOS being available for all my boards, I adapted the instructions to Debian/Armbian.

Prerequisites

Before starting I needed to install

Docker

kubelet

kubeadm

kubectl

Docker needed to be installed using their convenience script (as noted if running Raspbian).

curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh

And then I installed the Kubernetes components based on the instructions from the Hypriot blog, but adapted to lock all my dependencies to specific versions.

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list apt-get update apt-get install -y kubelet = 1.13.1-00 kubectl = 1.13.1-00 kubeadm = 1.13.1-00

Raspberry Pi B

I hit my first hiccup when trying to bootstrap a cluster on my original Raspberry Pi B.

$ kubeadm init Illegal instruction

Turns out that Kubernetes dropped support for ARMv6. Oh well, that left the CubbieBoard and the Banana Pi.

Banana Pi

The same process on the Banana PI initially seemed to have much more success, but the kubeadm init command eventually timed out waiting for the control plane to be healthy.

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

Checking what the containers were doing with docker ps , I saw that the kube-controller-manager and kube-scheduler were both up for at least 4-5 minutes, but the kube-api-server was only about a minute or two old.

$ docker ps CONTAINER ID COMMAND CREATED STATUS de22427ad594 "kube-apiserver --au…" About a minute ago Up About a minute dc2b70dd803e "kube-scheduler --ad…" 5 minutes ago Up 5 minutes 60b6cc418a66 "kube-controller-man…" 5 minutes ago Up 5 minutes 1e1362a9787c "etcd --advertise-cl…" 5 minutes ago Up 5 minutes

Obviously the api-server was dying, or an external process was killing it and restarting it.

Checking the logs I saw some pretty standard looking startup procedures, a log that it had started listening on the secure port, and then a long pause before lots of TLS handshake errors.

20:06:48.604881 naming_controller.go:284] Starting NamingConditionController 20:06:48.605031 establishing_controller.go:73] Starting EstablishingController 20:06:50.791098 log.go:172] http: TLS handshake error from 192.168.1.155:50280: EOF 20:06:51.797710 log.go:172] http: TLS handshake error from 192.168.1.155:50286: EOF 20:06:51.971690 log.go:172] http: TLS handshake error from 192.168.1.155:50288: EOF 20:06:51.990556 log.go:172] http: TLS handshake error from 192.168.1.155:50284: EOF 20:06:52.374947 log.go:172] http: TLS handshake error from 192.168.1.155:50486: EOF 20:06:52.612617 log.go:172] http: TLS handshake error from 192.168.1.155:50298: EOF 20:06:52.748668 log.go:172] http: TLS handshake error from 192.168.1.155:50290: EOF

And then the server would shutdown shortly after. Some more Googling brought me to this issue which seemed to indicate this was possibly caused by slow crypto on some ARM devices.

I took a leap and figured that maybe the api-server was being overwhelmed by the repeated retries of the scheduler and controller-manager .

Moving those files out of the manifests directory would tell the kubelet to terminate those pods.

mkdir /etc/kubernetes/manifests.bak mv /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/manifests.bak/ mv /etc/kubernetes/manifests/kube-controller-mananger.yaml /etc/kubernetes/manifests.bak/

Tailing the logs of the api-server, I saw it get further than before, but it was still dying around the 2 minute mark. Then I remembered, the manifest probably contained a liveness probe with timeouts that were set much to low for my slow-as-crap device (thats the technical term).

So I checked /etc/kubernetes/manifests/kube-api-server.yaml , and sure enough…

livenessProbe : failureThreshold : 8 httpGet : host : 192.168.1.155 path : /healthz port : 6443 scheme : HTTPS initialDelaySeconds : 15 timeoutSeconds : 15

My pod was getting killed after 135 seconds ( initialDelaySeconds + timeoutSeconds * failureThreshold ). I bumped the initialDelaySeconds up to 120…

Success! Well, I still got the handshake errors (presumably from the kubelet), but it made it through the startup.

20:06:54.957236 log.go:172] http: TLS handshake error from 192.168.1.155:50538: EOF 20:06:55.004865 log.go:172] http: TLS handshake error from 192.168.1.155:50384: EOF 20:06:55.118343 log.go:172] http: TLS handshake error from 192.168.1.155:50292: EOF 20:06:55.252586 cache.go:39] Caches are synced for autoregister controller 20:06:55.253907 cache.go:39] Caches are synced for APIServiceRegistrationController controller 20:06:55.545881 controller_utils.go:1034] Caches are synced for crd-autoregister controller ... 20:06:58.921689 storage_rbac.go:187] created clusterrole.rbac.authorization.k8s.io/cluster-admin 20:06:59.049373 storage_rbac.go:187] created clusterrole.rbac.authorization.k8s.io/system:discovery 20:06:59.214321 storage_rbac.go:187] created clusterrole.rbac.authorization.k8s.io/system:basic-user

Once the api-server was up, I moved the controller and scheduler yamls back into the manifests directory, and they started up normally as well.

Now to double check, could I get everything to boot up normally if I left all the files in the manifests directory, and just increased the livenessProbe initial delay?

20:29:33.306983 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://192.168.1.155:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.155:6443: i/o timeout 20:29:33.434541 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.ReplicationController: Get https://192.168.1.155:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 192.168.1.155:6443: i/o timeout 20:29:33.435799 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.PersistentVolume: Get https://192.168.1.155:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0: dial tcp 192.168.1.155:6443: i/o timeout 20:29:33.477405 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1beta1.PodDisruptionBudget: Get https://192.168.1.155:6443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0: dial tcp 192.168.1.155:6443: i/o timeout 20:29:33.493660 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.PersistentVolumeClaim: Get https://192.168.1.155:6443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: dial tcp 192.168.1.155:6443: i/o timeout 20:29:37.974938 controller_utils.go:1027] Waiting for caches to sync for scheduler controller 20:29:38.078558 controller_utils.go:1034] Caches are synced for scheduler controller 20:29:38.078867 leaderelection.go:205] attempting to acquire leader lease kube-system/kube-scheduler 20:29:38.291875 leaderelection.go:214] successfully acquired lease kube-system/kube-scheduler

Yes, it eventually works, but these older devices were probably not going to be suitable for running the control plane, given that repeated TLS connections caused such a drastic slowdown. But, I now had a working K8s install on ARM!

Moving on…

Mounting the RAID

Since SD cards are not suitable for long-term sustained writes, I wanted to have the more volatile parts of the filesystem persisted on a more durable medium, in this case, the RAID. I gave it 4 partitions

50GB

2x 20GB

3.9TB

I didn’t have a precise use for the 20GB ones, but I wanted to leave some options open in the future.

In the /etc/fstab file I mounted the 50GB partition at /mnt/root , and the 3.9TB partition at /mnt/raid , and then mounted the etcd and docker directories into the 50gb partition.

UUID=655a39e8-9a5d-45f3-ae14-73b4c5ed50c3 /mnt/root ext4 defaults,rw,user,auto,exec 0 0 UUID=0633df91-017c-4b98-9b2e-4a0d27989a5c /mnt/raid ext4 defaults,rw,user,auto 0 0 /mnt/root/var/lib/etcd /var/lib/etcd none defaults,bind 0 0 /mnt/root/var/lib/docker /var/lib/docker none defaults,bind 0 0

The ROC-RK3328-CC Arrives

With the new board in hand, I fire it up, install the K8s prerequisites, and run kubeadm init . After a few minutes it succeeds and prints the join command to run on other nodes.

Success! No need to fiddle with timeouts.

Since this board is also the one that needs to host the RAID, I need to setup the mounts again as well. Putting it all together:

1. Disk mounts in /etc/fstab

UUID=655a39e8-9a5d-45f3-ae14-73b4c5ed50c3 /mnt/root ext4 defaults,rw,user,auto,exec 0 0 UUID=0633df91-017c-4b98-9b2e-4a0d27989a5c /mnt/raid ext4 defaults,rw,user,auto 0 0 /mnt/root/var/lib/etcd /var/lib/etcd none defaults,bind 0 0 /mnt/root/var/lib/docker /var/lib/docker none defaults,bind 0 0

2. Install Docker and K8s binaries

curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list apt-get update apt-get install -y kubelet = 1.13.1-00 kubectl = 1.13.1-00 kubeadm = 1.13.1-00

3. Set a unique hostname (Important once I add multiple nodes)

hostnamectl set-hostname k8s-master-1

4. Initialize Kubernetes

I skip the control plane phase because I want to be able to schedule normal pods on this node as well.

kubeadm init --skip-phases mark-control-plane

5. Install a network plugin

The Hypriot blogpost was a little out of date, as Weave is also a supported network plugin on ARM.

export KUBECONFIG = /etc/kubernetes/admin.conf kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version= $( kubectl version | base64 | tr -d '

' ) "

6. Add node labels

Since I’ll need the NAS server to run on this node, I need to mark it with some labels I can use when scheduling.

kubectl label nodes k8s-master-1 marshallbrekka.raid = true kubectl label nodes k8s-master-1 marshallbrekka.network = gigabit

Joining other nodes to the cluster

Setting up my other devices (Banana Pi, CubbieBoard) was just as easy. I would follow the first 3 steps (customizing the mounts for drives or flash storage I had available), and then running the kubeadm join command instead of kubeadm init .

Finding ARM docker containers.

While I could normally build most docker containers I wanted from my Mac, doing so for ARM was not as easy. I did find many blog posts showing how to use QEMU to accomplish the task, but I ended up finding most of the apps I needed already built, many of which from linuxserver.

Next Steps

I still don’t have my initial device setup quite as automated/scripted as I would like, but the few commands I do have to run (mounts, docker, kubeadm) I now have well documented in a Git repo. The rest of my apps are also defined as K8s yamls in that repo as well, which makes it trivial to re-create the setup if i need to rebuild from scratch for any reason.

Looking forward, a few things I would like to do