I recently wanted to deploy a newer versions of Kubernetes to see it working with our Cloud Native Storage (CNS) feature. Having assisted with the original landing pages for CPI and CSI, I’d done this a few times in the past. However, the deployment tutorial that we used back then was based on Kubernetes version 1.14.2. I wanted to go with a more recent build of K8s, e.g. 1.16.3. By the way, if you are unclear about the purposes of the CPI and CSI, you can learn more about them on the landing page, here for CPI and here for CSI.

OK, before we begin I do want to make it clear that the instructions are still completely valid. The only issue is that with the later releases of K8s, some of the “Kinds” (K8s components) have changed. This will become clear as we go through the process. The other thing that has caught me out personally is the requirement to use hard-coded names for the configurations files, both for the CPI and the CSI. I’ll show you how issues manifest themselves when these hard-codes names are not used.

I am going to continue to use the Flannel for my CNI. However, there are a number of modifications required to the flannel YAML. I’ll highlight these to you as we go along.

1. Changes to K8s Master/Control Plane Deployment

The obvious change here is that we need to install K8s tool that is v1.16.3 rather than v1.14.2. This is quite straight forward to do. For the tools, change the apt install to the correct versions:

# apt install -qy kubeadm=1.16.3-00 kubelet=1.16.3-00 kubectl=1.16.3-00

For the correct K8s distribution, modify the master’s kubeadminit.yaml,as shown here:



apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration useHyperKubeImage: false kubernetesVersion: v1.16.3

After deploying the K8s control plane/master as per the tutorial, you will see the following message displayed:

You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/

Until a network has been deployed, the control plane/master node stays in a NotReady state.

root@k8s-master-01:~# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-01 NotReady master 66s v1.16.3

You can see the reason for the NotReady via a kubectl describe of the node:

root@k8s-master-01:~# kubectl describe node k8s-master-01 Name: k8s-master-01 Roles: master ... Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 06 Mar 2020 09:04:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 06 Mar 2020 09:04:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 06 Mar 2020 09:04:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Fri, 06 Mar 2020 09:04:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletNotReady runtime network not ready: NetworkReady=false \ reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

We need to provide a CNI (Container Network Interface), and for the purposes of the tutorial, we have been using flannel. However, with us using a newer version of Kubernetes, this is where things start to get interesting.

2. Changes to Flannel deployment in K8s 1.16.3

Let’s do the very first step from the tutorial and see what happens when we apply the kube-flannel yaml:

root@k8s-master-01:~# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \ no matches for kind "PodSecurityPolicy" in version "extensions/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \ no matches for kind "DaemonSet" in version "extensions/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \ no matches for kind "DaemonSet" in version "extensions/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \ no matches for kind "DaemonSet" in version "extensions/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \ no matches for kind "DaemonSet" in version "extensions/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml": \ no matches for kind "DaemonSet" in version "extensions/v1beta1" root@k8s-master-01:~#

The errors above are as a result of API deprecation in Kubernetes 1.16. PodSecurityPolicy is now is the policy/v1beta API and DaemonSet is now in apps/v1 API. After downloading and making the appropriate changes to the kube-flannel.yaml, I ran it again.

root@k8s-master-01:~# kubectl apply -f kube-flannel-test.yaml podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel unchanged clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged serviceaccount/flannel unchanged configmap/kube-flannel-cfg unchanged error: error validating "kube-flannel-test.yaml": error validating data: \ ValidationError(DaemonSet.spec): missing required field "selector" \ in io.k8s.api.apps.v1.DaemonSetSpec; if you choose to ignore these errors, turn validation off with --validate=false root@k8s-master-01:~#

This error is as a result of changes made to DaemonSet. Because DaemonSet has been updated to use apps/v1 instead of extensions/v1beta1, the apps/v1 API version requires a selector to be provided in the DaemonSet spec. So once again, I modified the flannel YAML file to include a selector in the DaemonSet spec as follows:

Before:

apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds-amd64 namespace: kube-system labels: tier: node app: flannel spec: template: metadata: labels: tier: node app: flannel

After:

apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds-arm namespace: kube-system labels: tier: node app: flannel spec: selector: matchLabels: name: kube-flannel template: metadata: labels: name: kube-flannel tier: node app: flannel

After making that changed, I deployed my flannel yaml once more. Success!

root@k8s-master-01:~# kubectl apply -f kube-flannel-test.yaml podsecuritypolicy.policy/psp.flannel.unprivileged configured clusterrole.rbac.authorization.k8s.io/flannel unchanged clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged serviceaccount/flannel unchanged configmap/kube-flannel-cfg unchanged daemonset.apps/kube-flannel-ds-amd64 created daemonset.apps/kube-flannel-ds-arm64 created daemonset.apps/kube-flannel-ds-arm created daemonset.apps/kube-flannel-ds-ppc64le created daemonset.apps/kube-flannel-ds-s390x created root@k8s-master-01:~#

At least, I thought it was success. However, when I checked on my master node, it still wasn’t ready.

root@k8s-master-01:~# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-01 NotReady master 23m v1.16.3 root@k8s-master-01:~# kubectl describe nodes k8s-master-01 Name: k8s-master-01 Roles: master ... Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 06 Mar 2020 09:24:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 06 Mar 2020 09:24:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 06 Mar 2020 09:24:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Fri, 06 Mar 2020 09:24:37 +0000 Fri, 06 Mar 2020 09:01:33 +0000 KubeletNotReady runtime network not ready: NetworkReady=false \ reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Hmm. Still not ready. It was time to look at the node’s kubelets logs and see if there is something else wrong. This is what I found:

root@k8s-master-01:~# journalctl -xe | grep kubelet ... Mar 06 09:25:54 k8s-master-01 kubelet[19505]: W0306 09:25:54.953423 19505 cni.go:202] \ Error validating CNI config &{cbr0 false [0xc0003a53e0 0xc0003a5780]...

After a quick search, I found an issue reported on github. It was entitled network not ready after `kubectl apply -f kube-flannel.yaml` in v1.16 cluster #1178 The solution is to add a cniVersion number to the flannel yaml file. So the following entry was added to my flannel yaml:

Before:

data: cni-conf.json: | { "name": "cbr0", "plugins": [ { "type": "flannel",

After:

data: cni-conf.json: | { "name": "cbr0", "cniVersion”: "0.3.1”, "plugins": [ { "type": "flannel",

Reapply the flannel yaml once more, and check the status of my master node. Finally – its Ready!

root@k8s-master-01:~# kubectl apply -f kube-flannel-test.yaml podsecuritypolicy.policy/psp.flannel.unprivileged configured clusterrole.rbac.authorization.k8s.io/flannel unchanged clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged serviceaccount/flannel unchanged configmap/kube-flannel-cfg configured daemonset.apps/kube-flannel-ds-amd64 unchanged daemonset.apps/kube-flannel-ds-arm64 unchanged daemonset.apps/kube-flannel-ds-arm unchanged daemonset.apps/kube-flannel-ds-ppc64le unchanged daemonset.apps/kube-flannel-ds-s390x unchanged root@k8s-master-01:~# root@k8s-master-01:~# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-01 Ready master 35m v1.16.3 root@k8s-master-01:~#

OK – we can now get on with deploying the worker nodes. Of course, if you didn’t want to mess about with all this flannel related stuff, you could of course choose another pod network addon, such as Calico. If you want the new flannel yaml manifest, you can download it with all the changes from here.

3. Changes to Worker Node Deployments

There is very little change required here. You must simply make sure that you deploy the newer kubectl, kubeadm and kubelet versions – 1.1.6.3 rather than 1.14.2. We already seen how to do that for the master node in step 1 – repeat this for the workers. I added to workers to my cluster:

root@k8s-master-01:~# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-01 Ready master 39m v1.16.3 k8s-worker-01 Ready <none> 41s v1.16.3 k8s-worker-02 Ready <none> 20s v1.16.3 root@k8s-master-01:~#

The remainder of the deployment is pretty much the same as before. As you go about deploying the CPI (Cloud Provider Interface) and the CSI (Container Storage Interface), some of the manifests reference sub-folder called 1.14. It is ok to continue using these manifests, even for later versions (1.16.3) of K8s.

4. A word about CPI and CSI configuration files

The remaining steps involve deploying the CPI and CSI drivers so that you can allow your Kubernetes cluster to consume vSphere storage, and have the usage bubbled up in the vSphere Client. However, something that has caught numerous people out, inclusing myself, is that the CPI and CSI configuration files are hard-coded; for CPI you must use a configuration file called vsphere.conf and for CSI you must use a configuration file called csi-vsphere.conf. I did a quick exercise to show the sorts of failures you would expect to see if different configuraton filenames are used.

4.1 CPI failure scenario

After deploying the CPI yaml manifests, the first thing you would check is to make sure the cloud-controller-manager pod deployed successfully. You wil see something like this if it did not:

root@k8s-master-01:~# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE .. kube-apiserver-k8s-master-01 1/1 Running 0 13m kube-controller-manager-k8s-master-01 1/1 Running 0 13m kube-proxy-58pkq 1/1 Running 0 9m10s kube-proxy-gqzpc 1/1 Running 0 13m kube-proxy-qx4rd 1/1 Running 0 7m7s kube-scheduler-k8s-master-01 1/1 Running 0 13m vsphere-cloud-controller-manager-4792t 0/1 CrashLoopBackOff 5 5m50s

Let’s look at the logs from the pod. I’ve just cut a few snippets out of the complete log output:

root@k8s-master-01:~# kubectl logs vsphere-cloud-controller-manager-4792t -n kube-system I0305 13:54:17.317921 1 flags.go:33] FLAG: --address="0.0.0.0" I0305 13:54:17.318366 1 flags.go:33] FLAG: --allocate-node-cidrs="false" I0305 13:54:17.318383 1 flags.go:33] FLAG: --allow-untagged-cloud="false" ... I0305 13:54:17.318492 1 flags.go:33] FLAG: --cloud-config="/etc/cloud/vsphere.conf" I0305 13:54:17.318496 1 flags.go:33] FLAG: --cloud-provider="vsphere" ... F0305 13:54:18.517986 1 plugins.go:128] Couldn't open cloud provider configuration \ /etc/cloud/vsphere.conf: &os.PathError{Op:"open", Path:"/etc/cloud/vsphere.conf", Err:0x2} goroutine 1 [running]: k8s.io/klog.stacks(0x37e9801, 0x3, 0xc0007fe000, 0xb4) /go/pkg/mod/k8s.io/klog@v0.3.2/klog.go:900 +0xb1 k8s.io/klog.(*loggingT).output(0x37e98c0, 0xc000000003, 0xc000423490, 0x3751216, 0xa, 0x80, 0x0) /go/pkg/mod/k8s.io/klog@v0.3.2/klog.go:815 +0xe6 k8s.io/klog.(*loggingT).printf(0x37e98c0, 0x3, 0x2008bd6, 0x32, 0xc0005818a0, 0x2, 0x2) /go/pkg/mod/k8s.io/klog@v0.3.2/klog.go:727 +0x14e k8s.io/klog.Fatalf(...)

So ensure that you use the vsphere.conf filename for the CPI.

4.1 CSI failure scenario

CSI is similar to CPI in that it requires a hard-coded configuration filename. Here is what you might observe if the csi-vsphere.conf name is not used. Here is the log snippet taken from the CSI controller pods.

root@k8s-master-01:~# kubectl logs vsphere-csi-controller-0 -n kube-system vsphere-csi-controller I0305 16:39:14.210288 1 config.go:261] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf I0305 16:39:14.210376 1 config.go:265] Could not stat /etc/cloud/csi-vsphere.conf, reading config params from env E0305 16:39:14.210401 1 config.go:202] No Virtual Center hosts defined E0305 16:39:14.210415 1 config.go:269] Failed to get config params from env. Err: No Virtual Center hosts defined E0305 16:39:14.210422 1 service.go:103] Failed to read cnsconfig. Error: stat /etc/cloud/csi-vsphere.conf: no such file or directory I0305 16:39:14.210430 1 service.go:88] configured: csi.vsphere.vmware.com with map[mode:controller] time="2020-03-05T16:39:14Z" level=info msg="removed sock file" path=/var/lib/csi/sockets/pluginproxy/csi.sock time="2020-03-05T16:39:14Z" level=fatal msg="grpc failed" error="stat /etc/cloud/csi-vsphere.conf: no such file or directory" root@k8s-master-01:~#

The take-away is to not deviate from the hard-coded configuration filenames for both the CPI and CSI.