by Alexander Kirillov

Networking has always been seen as the black sheep of the IT world, or at least one of the blackest sheep. This becomes especially true when transitioning from a monolithic architecture to microservices because the number of components that need to communicate with each other explodes.

In this context, it is absolutely fundamental to have an infrastructure ready to handle the networking complexity in a microservices environment, making the network management as easy and smooth as possible. Kubernetes seems to be one of the platforms that exemplify this kind of infrastructure.

Let’s see how this container orchestrator handles the topic and manages to route the traffic between its containers. There are three different cases which we will describe in this article:

Traffic between containers in the same pod

Traffic between containers in different pods in the same node

Traffic between containers in different pods in different nodes

Traffic between containers in the same pod

It should be reminded that Kubernetes implements a networking model where all containers in a Pod share the same IP address. As a result, if a container wants to talk to another one it just needs to send traffic to the right port on localhost. The Pod needs to be configured to avoid port allocation conflicts between the containers, so two containers in the same Pod must not be configured to bind to the same port.

From a technical point of view, Kubernetes implements this by creating a virtual network interface that is shared between the containers of the pod.

Traffic between containers in different pods in the same node

For the traffic between containers in different pods, Kubernetes relies on a network bridge, which has the role of an L2 switch. All the virtual network interfaces of the different pods inside a node are connected to this bridge. This means that the Pods are in the same L2 subnet, the one defined by the bridge.

Let’s assume that the container a wants to send a packet to container d. Here is what happens behind the scenes:

The container uses its network only interface (eth0) to send the packet. This interface in the container network namespace is the counterpart of vethxxx (which is the virtual interface on the host network namespace). The packet arrives to the bridge docker0. The bridge routes the packet to vethyyy after resolving the MAC address with an ARP request (don’t forget that vethxxx and vethyyy are in the same L2 network) The packet arrives to vethyyy and it is routed to the right container depending on the port the packet is addressed to.

Traffic between containers in different pods in different nodes

In this case an overlay network is necessary to connect the different nodes with each other. Several implementations exist, but the simplest one and probably the most used in Kubernetes is Flannel, a networking plugin developed by CoreOS.

The overlay network makes sure that the traffic from one container is routed to the node that hosts the destination container. It keeps an up-to-date routing table in each node with the mapping between the Pod IP ranges (Pod CIDR) and the node the traffic should be routed to in order to reach these IP ranges.

Flannel daemon creates a TUN interface on each node. The Kubernetes traffic will go through this interface, which encapsulates it in UDP packets and sends them to the right host.

The overlay network also makes sure that the Pod CIDR in each node doesn’t conflict with the Pod CIDR of another node in the Kubernetes cluster.

Let’s assume again that the container a wants to send a packet to container d. Here is what happens:

The container uses its network only interface (eth0) to send the packet. This interface in the container’s network namespace is the counterpart of vethxxx (which is the virtual interface on the host network namespace). The packet arrives to docker0. However, since vethxxx and vethyyy are not in the same subnet, the host needs to use the routing table. This is where it knows that the packet is addressed to a container, and then it sends it to the flannel0 TUN device. Behind the flannel0 device there is the flannel daemon, which has the routing table of the Pods (since it communicates with the Kubernetes API server), and knows that to reach the IP address of the destination Pod the packet should be sent to node 2. The daemon encapsulates the packet, and writes it to the kernel with the destination IP address of the node 2. The default interface of node 1 receives the packet and routes it to node 2. The default interface of node 2 receives the packet and sends it to the flannel daemon. The flannel daemon extracts the packet addressed to the container from the encapsulated packet and writes it to the kernel. The kernel checks the routing table and routes the packet to docker0. Once arrived, an ARP request is sent to all the Pods in the node to know what virtual interface should receive the packet. The packet arrives to vethyyy and it is routed to the right container depending on the port the packet is addressed to.

Does it seem complex? Here is the good news…

All this seems very complex and contradictory with the requirement of a simple networking system for the containers to work smoothly. So what is so fascinating about it? Actually, in the day-to-day operations, you just don’t have to manage this complexity. The initial configuration of Kubernetes networking is not so difficult, and it’s even inexistent if you use a managed Kubernetes service. So here is the good news: actually you don’t need to care about the networking, Kubernetes manages it for you. Just plug your cluster and play!

Thanks for reading! Feel free to leave your feedback. Don’t forget to follow us on Twitter and join our Telegram chat to stay tuned!

You might also want to check our Containerum project on GitHub. We need your feedback to make it stronger — you can submit an issue, or just support the project by giving it a ⭐. Your support really matters to us!