As Kubernetes is taking over the world and spreads everywhere, the number of people who are confused about cluster networking is increasing. Once these people finally grasp Kubernetes networking model, they will be faced with another problem – choosing a CNI plugin – which is a hard choice to make.

Even though you have a managed cluster, it’s still a good idea to know a bit more about different CNI plugins. Choosing a big cloud provider, like GCP or AWS, won’t save you from making this decision. If your cluster happens to grow into thousands of services, the CNI plugin is one of the first things to look at in order to scale smoothly. Therefore, you might need to change it now. If you desire to have custom network policies, again, you may have to switch to another CNI plugin.



On the other hand, if your cluster works on bare metal, you have no way to escape. You should decide immediately which CNI plugin will be used for your cluster. You better make a right decision, as it might be too challenging to change it later without breaking the cluster.



I will not go through the list of CNI plugins and evaluate them here, as there are many different options and all have their use case. The purpose of this blog post is to explain why we’re quite happy with Cilium and what it brings to the table.

Our Story

We’re building an internal cloud for our client that is one of the biggest automotive manufacturers in the world. Naturally, the cluster we’ll provide should be able to scale to thousands of services. When we started working on it, we had no idea that the CNI plugin would play a really important role for this goal.



After checking the networking provider list in the Kubernetes documentation, we decided to go with Flannel. It worked well for us until we decided to use Border Gateway Protocol. As a result of this, we switched to Calico (more on that here). We were very pleased with Calico until we noticed a huge amount of iptables rules in our nodes. Wait, why would this be a problem?



I think it’s time to pause the story for a moment and explain why iptables is relevant here.



In the default iptables proxy mode, kube-proxy component of Kubernetes, which works on every node in the cluster, watches Kubernetes master node(s) for the addition and removal of Service and Endpoint objects. For each object, it installs necessary iptables rules. Thus, the traffic that is being received and sent from the pods and back is properly routed to the node and the port used for that services.

So, every packet received or transmitted is matched against a list of rules, one by one. Since we aim to scale thousands of services, this would possibly create a serious bottleneck in our system.



You can find a good discussion about this: How many and how big services should kubernetes scale to?

At that time, we came across this blog post about Cilium, another CNI plugin that is conversely not relying on iptables. Instead, it was using a Linux kernel technology called BPF (actually, eBPF). I don’t want to dive into BPF here, as it’s already written in the blog post I just mentioned but this part is important:



“The aforementioned KubeCon Talk performed specific measurements on iptables as a bottleneck for Kubernetes service forwarding and noted that throughput degraded by ~30% with 5,000 services deployed, and by 80% with 10,000 services (a 6X performance difference). Likewise, rule updates at 5,000 services took 11 minutes, ages in a world of continuous delivery.



Thanks to the flexibility of BPF, Cilium performs this same operation with O(1) average runtime behavior using a simple BPF map based hash table, meaning the lookup latency at 10,000 or even 20,000 services is constant. Likewise, updates to these BPF maps from userspace are highly-efficient, meaning that even with 20,000+ services, the time to update a forwarding rule is microseconds, not hours.



For these reasons, Facebook has recently presented their use of BPF and XDP for load-balancing in a public talk to replace IPVS after measuring an almost 10x performance increase.”

This was convincing enough for us to give Cilium a shot.

Since our cluster works on bare metal, we are using MetalLB as a Load Balancer. Our biggest doubt was that Cilium wouldn’t work well with it. Later we found out that our doubt was unnecessary. They worked seamlessly together and the amount of iptables rules reduced dramatically.



The second most important feature of Cilium is its custom network policies which operate on Layer 7, giving us the ability to have enforcements on both ingress and egress. Without having to write code, you can allow/block requests based on path, header and request method. This is a great flexibility as you don’t have to write code on application level for it, especially if you combine Cilium network policies with one of the service mesh technologies such as Istio.



Cilium also plays well with Istio and the community even has plans to make Istio work with less latency using in-kernel proxy instead of Istio’s Envoy. You can read more about it here.



Speaking about community, I have to say that one of the upsides of switching to Cilium is its community. They are so helpful to detect Cilium-related issues in your cluster. Don’t be surprised if you find yourself debugging a networking issue in your cluster at 10 pm with its core contributors. I strongly suggest you to join Slack channel of Cilium.



Considering its capabilities, Cilium is an underrated technology and I personally believe we will hear more about it and BPF in the upcoming years.



Thanks to Thomas Graf and all other contributors of Cilium.