Motivation

In our company, we use Kubernetes as our main deployment technology. Recently we tackled the task of upgrading our Kubernetes cluster. We thought we were prepared decently. We gathered and read upgrade documentation upfront and we already had some prior working experience with Kubernetes, CoreOS, and Docker. Everything seemed straightforward - just follow the documentation and we are home free...

But, as always, reality verified our naive optimism and we ended up struggling with the upgrade. The task took us more time than we expected with some pitfalls along the way.

Since the official documentation is rather sparse in this matter, and web resources were insufficient, we had to come up with our own procedure.

So, if you are planning a Kubernetes upgrade from version 1.5 to 1.6, this might be a good read for you! It might not be the best, but it works and we hope some of you will find it helpful. Any comments or enhancements are more than welcome.

Cluster Details

The patient is an internal cluster that we use for hosting continuous integration, development, and test automation tools.

The specification looks as follows:

5 nodes, each running on a physical machine with CoreOS installed via Ignition.

1 Kubernetes master node and 4 workers.

etcd running in a cluster configuration with one server and 4 proxies (a bit of an unfortunate setup).

flannel for networking.

Since no production services run on the cluster, downtime during the upgrade was acceptable. We also have full backups of all disks.

Exact Software Versions

Initial

CoreOS 1353.8.0 (the latest one at that time)

Kubernetes 1.5.2

etcd 2.3.7

Target

CoreOS - no change here

Kubernetes 1.6.1

etcd 3.0.10

The Plan

The Ignition Way

The cluster being upgraded was brought up using Matchbox and Ignition. In such cases, one way to go is to update the Ignition configuration files with the new versions, then tear down and reinstall the whole cluster from scratch. According to the documentation, following this path would result in a brand new Kubernetes 1.6 cluster.

Our Way

Since we already had a working installation, we preferred to avoid complete reinstallation with Ignition due to the risk of ending up with the whole cluster down in case of failure.

Instead, we decided to do manual changes in the existing installation, which seemed easier to roll back.

This was our step-by-step plan:

Stop Kubernetes. Back up etcd data. Upgrade etcd to version 3. Migrate etcd data to version 3. Upgrade Kubernetes to version 1.6. Start Kubernetes. Post upgrade actions.

First Things First

Many of the commands below need to be executed as root so root access to all nodes is assumed.

Shutting Down Kubernetes

In order to shut down all nodes (masters and workers), for each one, perform the steps below.

Mark the node unschedulable to prevent creating new pods by replication controllers (RC). This command will also delete all pods from the node.



$ kubectl drain —-force my_node_name



The option --force is necessary so pods like kube-proxy (not managed by any RCs) won’t prevent pods from being deleted. After the command returns, check the status:



$ kubectl get nodes



It should display "Ready,SchedulingDisabled" for the drained node.

Next, SSH into the my_node_name node and stop kubelet:



$ systemctl stop kubelet



and hyperkube ( hyperkube_container_id can be obtained with docker ps command):



$ docker stop hyperkube_container_id

Additionally, for the master node only, stop the following containers:

$ docker stop controller-manager_container_id

$ docker stop scheduler_container_id

$ docker stop apiserver_container_id

$ docker stop proxy_container_id

Again, *_container_id s can be obtained with docker ps command.

At this point, with controller-manager, scheduler, apiserver, and proxy stopped, Kubernetes is shut down.

In order to verify the Kubernetes cluster state, check running processes.

$ ps aux | grep kube

There should be no kube-related ones. Apart from that, take a look at the nodes:

$ kubectl get nodes



The status of all nodes should be "NotReady,SchedulingDisabled."

Upgrade etcd

Before upgrading to Kubernetes 1.6, it is necessary to upgrade etcd from version 2 to 3.

We found this step to be the most difficult one in the whole procedure, mainly because there was no consistent documentation describing the complete upgrade.

In this section, we present our complete and tested procedure for the etcd upgrade. The documentation we used to prepare it can be found in the references section. Kingsley Jarrett's blog post was a big help.

etcd Backup

It is strongly advised to create a backup of the etcd data.

Make sure the etcd cluster is healthy before taking any actions.

$ etcdctl cluster-health

In case it's not, fix the problem before going further.

Now, shut down etcd. This action consists of stopping the systemd unit and disabling it from starting on boot:

$ systemctl stop etcd2

$ systemctl disable etcd2

Repeat etcd shut down on all nodes.



Take a backup of etcd2. It should be performed on all etcd servers; proxies are not applicable for bbackup In our infrastructure, it was only one backup:



$ etcdctl backup --data-dir /var/lib/etcd2 --backup-dir /home/core/etcd2-backup

Copy the backup, and additionally, just in case, the raw /var/lib/etcd2 directory to a safe place.

Please note that apart from data-dir there might also be wal-dir to back up. If etcd default wal-dir is contained inside data-dir , as was in our case, it doesn't have to be backed up separately. If you needed to do a wal-dir backup, refer to the documentation.

Move Existing etcd Data

Copy existing etcd2 data, /var/lib/etcd2 , to the directory used by etcd3, /var/lib/etcd (yep, it's etcd without any number).

Clean up the target directory before copying:



$ rm -rf /var/lib/etcd/*



Copy the existing data:



$ cp -a /var/lib/etcd2/* /var/lib/etcd



Note that the data has still to be converted to the v3 format. We are going to return to this soon.

Do the Upgrade

The actual upgrade consists of switching the systemd units, from etcd2.service to etcd-member.service dedicated for etcd3. It is already shipped with CoreOS, any missing binaries are going to be downloaded at the first startup of the new systemd unit.

Since the main unit definition, /usr/lib/systemd/system/etcd-member.service , belongs to a read-only filesystem we need to put our configuration as a drop-in. We do it simply by copying the one from etcd2.

The following steps are to be performed on all nodes.

Create drop-in directory:

$ mkdir /etc/systemd/system/etcd-member.service.d



And copy the drop-in:

$ cp /etc/systemd/system/etcd2.service.d/40-etcd-cluster.conf /etc/systemd/system/etcd-member.service.d

No changes in the copied file were necessary.

Now we can enable and start the new etcd3 unit:

$ systemctl daemon-reload

$ systemctl enable etcd-member

$ systemctl start etcd-member

Check the status:

$ systemctl status etcd-member

You might encounter errors like the following:

Jun 20 15:49:36 localhost rkt[8025]: rm: unable to resolve UUID from file: open /var/lib/coreos/etcd-member-wrapper.uuid: no such file or directory Jun 20 15:49:36 localhost rkt[8025]: rm: failed to remove one or more pods

In that case, restart the flanneld systemd service and try again - it worked for us.

Finally, after repeating the procedure on all nodes, verify the installation.

Check the status on each node:

$ etcdctl cluster-health



Check the unit logs on the master:

$ journalctl -e -u etcd-member

For messages:

Jun 20 15:06:37 localhost etcd-wrapper[31792]: 2017-06-20 15:06:37.779303 I | etcdserver: starting server... [version: 3.0.10, cluster version: 2.3] Jun 20 15:06:37 localhost etcd-wrapper[31792]: 2017-06-20 15:06:37.785814 N | membership: updated the cluster version from 2.3 to 3.0 Jun 20 15:06:37 localhost etcd-wrapper[31792]: 2017-06-20 15:06:37.785869 I | api: enabled capabilities for version 3.0

Perform some manual testing of writing and reading data using etcdctl .

On one node:

$ etcdctl mkdir /somename

$ etcdctl mk /somename/key value



Then on another node:

$ etcdctl get /somename/key

value



$ etcdctl update /somename/key value2



On yet another one:

$ etcdctl get /somename/key

value2





Migrate Existing Data to v3

We are almost there - the one thing left is to migrate the existing data to the v3 format. Here again, the etcdctl tool can help, however, for some unknown reason, the version installed on CoreOS is still 2.3.7 while we need v3 to do the migration. This issue caused us a lot of headaches and eventually, our colleague came up with the following solution.

Download the desired etcdctl 3 version:

$ wget https://github.com/coreos/etcd/releases/download/v3.2.0/etcd-v3.2.0-linux-amd64.tar.gz

and use the downloaded binary directly:

$ ETCDCTL_API=3 ./etcdctl migrate --data-dir=/var/lib/etcd

Please note the environment variable pointing version 3 explicitly.

After the command returns, we have the data migrated to the etcd3 format.

That's it - we are now running etcd in version 3!

etcd - Post Upgrade

At the end, it's good to establish a proper systemd default presets policy so etcd-member is enabled by default.

Replace ‘etcd2.service’ with ‘etcd-member.service’ in

/etc/systemd/system-preset/20-ignition.preset



Refresh:

$ systemctl daemon-reload



Verify:

$ systemctl status etcd-member ● etcd-member.service - etcd (System Application Container) Loaded: loaded (/usr/lib/systemd/system/etcd-member.service; enabled; vendor preset: enabled)

The same check for etcd2 should state that vendor presets are disabled.

What were we here for... ah, Kubernetes. Let's go on with our main task!

Upgrade Kubernetes

This actually appears quite simple - the thing to do is updating the kubelet version in an environment variable defined in the kubelet.service systemd unit.



Additionally, we change the name of the variable from KUBELET_VERSION to the currently preferred KUBELET_IMAGE_TAG . It is not mandatory since the old one is still supported, but it's better to follow the latest conventions.

Bump Version

Update kubelet.service starting from the master node and later continuing with worker nodes.



Replace

KUBELET_VERSION=v1.5.2_coreos.0



with

KUBELET_IMAGE_TAG=v1.6.1_coreos.0



Refresh systemd units:



$ systemctl daemon-reload



Restart kubelet:



$ systemctl restart kubelet



Now Kubernetes should start, downloading the new binaries behind the scenes. Error messages about “Orphaned pods” can be ignored.

Finally, update the manifests of the static pods. Check which manifests need editing:

$ grep -R v1.5.2_coreos.0 /etc/kubernetes/manifests/* /etc/kubernetes/manifests/kube-apiserver.yaml: image: quay.io/coreos/hyperkube:v1.5.2_coreos.0 /etc/kubernetes/manifests/kube-controller-manager.yaml: image: quay.io/coreos/hyperkube:v1.5.2_coreos.0 /etc/kubernetes/manifests/kube-proxy.yaml: image: quay.io/coreos/hyperkube:v1.5.2_coreos.0 /etc/kubernetes/manifests/kube-scheduler.yaml: image: quay.io/coreos/hyperkube:v1.5.2_coreos.0

and adjust versions in all manifest YAML files listed by the above command.

Finally, restart kubelet.service one more time:

$ systemctl restart kubelet

After finishing the steps above on master and workers list all nodes:

$ kubectl get nodes

You should see them all with the new version: v1.6.1+coreos.0

Bring the Kubernetes Cluster Back to the Operating State

Although we have upgraded and started all nodes, they are still in a "SchedulingDisabled" state due to draining them at the very beginning:



$ kubectl get nodes



shows "Ready,SchedulingDisabled".



In order to enable scheduling to repeat the following for each node:



$ kubectl uncordon my_node_name

node my_node_name uncordoned



and check again:



$ kubectl get nodes



It should display a "Ready" status for all nodes.



Replication controllers should now start scheduling pods on the nodes.



In case you use Kubernetes Dashboard, the new version has to be installed: https://github.com/kubernetes/dashboard/releases.



Congratulations, you are now a happy Kubernetes 1.6 user!

Post Upgrade

The only thing left, not mandatory yet still important, is compliance with the Ignition configuration with the current state of the cluster. It is necessary in the case of automated reinstallation of an existing node or bringing up a new one.

But this is a topic for another post.