The Good Part

Disclaimer: I do NOT consider myself a Kubernetes Network expert. Not being an expert and with the lack of documentation on this specific subject, what is presented in this article was found after a lot of trial and error, and a lot of GitHub issues comments digging . The steps might not be optimal or the right thing to do, but they worked on our multiple setups and we have a real product running on the Kubernetes cluster, so follow the instructions at your own risk.

Changing from Weave to Calico is a 3 step process:

Ensuring that your network setup is right Removing Weave Install Calico

Pre-requisites

You have ssh access to all the nodes that are part of the Kubernetes cluster and can escalate privileges to execute some of the actions.

access to all the nodes that are part of the Kubernetes cluster and can escalate privileges to execute some of the actions. All these steps were tested on a Kubernetes 1.13 cluster, and they should work for Kubernetes 1.11+, but there are no guarantees.

Also, this was tested with Calico 3.9.1, but they should work with Calico 3.10+

kubectl is configured with cluster admin access.

is configured with cluster admin access. Weave is installed as a CNI plugin/add-on.

This is specific for an IPv4 cluster network setup, for IPv6 some of the options to install Calico might be different

Small warning that these steps will cause some downtime in your Kubernetes cluster.

Checking your network setup

This step might be optional if you are super sure that your network is set up correctly, but if that is the case you probably wouldn’t be reading this article :)

These are some of the things that we found while investigating our network issues

kube-apiserver

Most of its settings can be found at /etc/kubernetes/manifests/kube-apiserver.yaml in your master nodes:

The --service-cluster-ip-range CIDR doesn’t overlap with the kube-proxy --cluster-cidr parameter, and the range is not in use by anything else. Ensure that the Kubernetes API is correctly exposed. You can do this by running kubectl get ep kubernetes and check that all the IP addresses listed are under the same subnet through which nodes can talk to each other. If that is not the case, and this can happen when there are multiple NICs in the nodes, you will need to tweak the --bind-address and --advertise-address parameters of the kube-apiserver so it exposes the API in the address that you want.

kube-proxy

Which settings you can find at /etc/kubernetes/manifests/kube-proxy.yaml in all your nodes, there are 2 things we need to look for:

The --cluster-cidr parameter doesn’t overlap with --service-cluster-ip-range of the kube-apiserver , and it is not in use by anything else. Ensure that the --masquerade-all is either not set or set to false , as this will SNAT all your pods communication making it impossible for NetworkPolicies to work, as they use the source IP to match iptables rules. This is recommended by both Weave and Calico, and also in the kube-proxy documentation itself.

Also, take note of the --cluster-cidr and one of the IP address of your kubernetes endpoint kubernetes get ep kubernetes , we will use this later on to setup Calico

To understand more on what these parameters do, feel free to read more on them and all the other options at:

Remove Weave

Removing Weave itself consist of a few substeps:

Remove the Kubernetes objects Clean the CNI configuration and binaries Remove the weave interface from all the Kubernetes nodes

Step 1: Remove Weave objects

Warning: As soon as you perform this step pods might not be able to communicate with each other, and new pods might fail to get scheduled, and Nodes can be marked as NonReady

a) If you still have the Weave YAML file from which you installed Weave, this step could be as simply as:

kubectl delete -n kube-system -f weave-net.yaml

This should remove all the objects installed by Weave

b) If you have an internet connection and you trust it and don’t care about backing up your current setup, you can download the Weave YAML and this will delete all the objects that a weave installation would apply:

kubectl delete -f " https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '

')"

This URL is taken from the official Weave install documentation.

c) For manual deletion these are all the components (at the moment of writing this article) that Weave installs and that should be removed:

# Optional: To backup your current setup

kubectl get -o yaml --export -n kube-system daemonset weave-net > weave-net-ds.yaml

kubectl get -o yaml --export -n kube-system rolebinding weave-net > weave-net-rb.yaml

kubectl get -o yaml --export -n kube-system role weave-net > weave-net-role.yaml

kubectl get -o yaml --export -n kube-system clusterrolebinding weave-net > weave-net-crb.yaml

kubectl get -o yaml --export -n kube-system clusterrole weave-net > weave-net-crole.yaml

kubectl get -o yaml --export -n kube-system serviceaccount weave-net > weave-net-sa.yaml # Delete weave objects

kubectl delete -n kube-system daemonset weave-net

kubectl delete -n kube-system rolebinding weave-net

kubectl delete -n kube-system role weave-net

kubectl delete -n kube-system clusterrolebinding weave-net

kubectl delete -n kube-system clusterrole weave-net

kubectl delete -n kube-system serviceaccount weave-net

Additionally, if you have network encryption with Weave you should also remove the secret created for it:

kubectl get secret -n kube-system

And look for the relevant Weave secret and delete it.

The really important component here, because is the one that actually does all the work is the weave-net DaemonSet, all the others are optional (including the secret), but is better to ensure that you have a clean slate before proceeding.

Step 2: Clean CNI files

The commands in the section are meant to be run in ALL the nodes that are part of the Kubernetes Cluster.

Now that we have removed Weave from Kubernetes, there are a few files that we need to clean up from the nodes themselves. In all your nodes you should have a directory /opt/cni/bin where Weave (and any CNI plugin) will place the binaries that needs to run. Go into the directory and clean any Weave binary within:

rm -f /opt/cni/bin/weave*

Do not remove the entire directory! There are some extra binaries in that directory that other Kubernetes component use to run, including Calico, I learned that the hard way…

The initialization configuration, that Nodes use to start their Kubernetes network, needs to be removed as it will conflict with the one from Calico when it gets installed at /etc/cni/net.d . It should be safe to remove this directory completely as it will be recreated when Calico starts.

rm -rf /etc/cni/net.d

If you don’t feel safe doing that, then you can go in there and delete the files with weave in their name:

rm -f /etc/cni/net.d/*weave*

These directories and files will be recreated by Weave in case that you decide to rollback, so don’t worry too much about loosing these files.

Step 3: Remove the weave interface

On each of the nodes again we will remove the weave interface as the final step. We only need to run a couple of commands for this:

iface weave down

ip link delete weave

And that is it, Weave is gone now. And don’t worry, in case that you need to rollback, this interface should be recreated by Weave.

Install Calico

Installing Calico is (surprise, surprise) a 3 step process:

Get the images. Get the YAML and configure it. Deploy the YAML.

Step 1: Get the images

If your nodes and pods have access to the public docker registry you can skip this step, if they don’t, you will need pull the Calico images in all your nodes (highly recommend doing this), so the Calico pods can run without issues:

docker pull calico/node:v3.9.1

docker pull calico/pod2daemon-flexvol:v3.9.1

docker pull calico/cni:v3.9.1

docker pull calico/kube-controllers:v3.9.1

You might want to use a different version of Calico, for that just replace the v3.9.1 with the version that you want.

In case that your cluster only have access to a private registry you will need to tag and push them to that registry in case that the images get deleted from the node. Even in this case I would recommend having the images pre-pulled in all the nodes.

Step 2: Configure Calico

From the machine with kubectl , get the Calico YAML file with:

This particular file calico.yaml is for Calico v3.9 for other version visits the Project Calico website.

Using your favourite editor, there are a few tweaks that we need to make to this file before deploying it.

If in Step 1 you are using a private registry for the Calico images, scan for all the images: keys and prefix your docker registry of choice.

Look for the calico-node DaemonSet, specifically the calico-node container spec.template.spec.containers[name=calico-node] , there should be only one container, the rest are just initContainers . Look for the env section, there are a couple of values that we are going to update here:

CALICO_IPV4POOL_CIDR value should match or fall within the kube-proxy --cluster-cidr range. In case that you have multiple NICs in your nodes add the environment variable IP_AUTODETECTION_METHOD , and using the value of the Kubernetes API endpoint IP that you collected in the first section of this article, your entry should look something like this:

- name: IP_AUTODETECTION_METHOD

value: "can-reach=<kube_api_ep_ip>"

By default Calico will pick up the first interface that it encounters, which might not be the one used for talking between nodes, there are other detection methods provided by Calico, such as specifying a pattern or precise name of which interface to use, but note that this setting will apply for all the nodes. Just be sure to choose one that fits your needs.

More configuration options can be found here:

Step 3: Deploy Calico

You made it! There is only one thing left to do:

kubectl apply -f calico.yaml

And that is it, you now should have Calico up and running!

Well actually…

I lied a little bit, because you just changed your network controller doing a full restart of your cluster might be a good idea. If you don’t want to restart your VMs, then here is how I did it:

### In all your cluster nodes

service kubelet stop #Just to ensure that old Weave rules are not present

iptables -t nat -F && iptables -t mangle -F && iptables -F && iptables -X service kubelet start

### From a machine with kubectl acccess # Run this a few times and ensure that all your nodes are marked as Ready

kubectl get nodes # And recreate all your pods (look at this as re-registering your pods with the network)

for ns in $(kubectl get ns -o name | cut -f2 -d'/'); do

kubectl delete pods -n $ns --all

done

If your Kubernetes applications are setup correctly they should recover after this restart.

And now we are truly done!

Any comments or feedback are truly appreciated. @AndresPerezl