The facts

Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that performs safe automatic node reboots. It gets triggered by the package management system of the underlying OS.





In essence Kured:

Watches for the presence of a reboot sentinel e.g. /var/run/reboot-required

Utilises a lock in the API server to ensure only one node reboots at a time

Optionally defers reboots in the presence of active Prometheus alerts

Cordons & drains worker nodes before reboot, uncordoning them after

The Reboot Problem

At Weaveworks the development and production clusters underpinning Weave Cloud are orchestrated with Kubernetes running on EC2, maintained with Terraform and Ansible.



The EC2 instances run Ubuntu 16.04 with unattended-upgrades enabled, so the machines need to be rebooted periodically (mainly in response to kernel upgrades). If they aren’t, the clusters are at risk from security vulnerabilities, and eventually run out of disk space as the OS is unable to remove older kernels and modules.

The first attempt

Our initial approach to this problem was to trigger a Prometheus alert whenever the /var/run/reboot-required file appeared on any of the nodes. We tried coupling it with a manual process that entailed waiting for a safe moment - defined as no active alerts - before draining the application pods and then rebooting each node in turn.



Automation makes everything better

Whilst this worked in practice, the frequency of OS updates coupled with the quantity of nodes drove us eventually to an automated solution. And so for the past six months all reboots have been conducted safely and automatically by kured, our Kubernetes reboot daemon.

During this time kured has effected hundreds of node reboots in our dev and prod clusters without human intervention - in fact, until the relatively recent addition of Slack notifications, we were mostly unaware that it was happening at all.

Kured is here!

Now that we have gained confidence in the implementation through an extended period of operational use, we are pleased to share the 1.0.0 release of kured with the community under an Apache 2 license. Kured should work with most Kubernetes installations and distros - read more about how it works on Github:

Join our community slack if you have any questions or suggestions!

