The main purpose of this blog is to help Kubernetes users to get comfortable with K8S major network components, common usage patterns, and corresponding troubleshooting tools. This will provide a good foundation for you to design your next cluster or to analyze your existing cluster network issues and make suggestions for improvements.

First question, KubeProxy is a critical and required component in all K8S clusters, which mode is the right one for you? iptable or IPVS?

Next, how to choose the best L2/L3 network solution? KubeRouter, Calico, Flannel or others?

After deploying the cluster and have the network up and running. What tools can I use to verify expected behavior for routing and load balancing?

To answer all the above questions, we provided 3 network combinations as below to try to cover the most common scenarios.

Network Plugins and KubeProxy modes Combinations

Cluster A: Calico(ipip cross-subnet) + KubeProxy(IPVS mode)

Cluster B: Calico(ipip always) + KubeProxy(iptables mode)

Cluster C: Kube-router + KubeProxy(iptables mode)

Cluster A: Calico(ipip cross-subnet) + KubeProxy(IPVS mode)

In this cluster, Calico is chosen as the network plugin, IPVS mode is enabled for KubeProxy.

IP-in-IP encapsulation is using a mode of cross-subnet, meaning it is only for traffic crossing subnet boundaries. This provides better performance in AWS multi-AZ deployments.

Let’s take a look at the worker node Routing table

Calico cross-subnet mode routing table

tunel0 is used for cross-subnet communication between nodes

VM eth0 is used for intra-subnet node communications.

Notice there is a 10.0.7.0/24 blackhole, this subnet is used by the local pods on the worker node, communicating using the cali- interfaces

For all the 3 interface types in Routing table, we can also find the corresponding matches in the ip addr result below

Worker node Network interfaces

Command: ip addr

Calico cross-subnet mode network interface

From the result above, we can see IPVS proxier creates a dummy interface kube-ipvs0, and bind service IP addresses to this interface.

Notice service IPs under kube-ipvs0 interface will have a corresponding matching record in the ipvs load balancing results, as shown in the ipvsadm output below.

IPVS-Based Load Balancing

IPVS (IP Virtual Server) is built on top of the Netfilter and implements transport-layer load balancing as part of the Linux kernel.

Kube-proxy can configure IPVS to handle the translation of virtual Service IPs to pod IPs.

From the snippet below, we can find matching service cluster IPs load balancing on top of pods IPs.

ipvsadm -ln

10.1.0.1:443 refers to the Kube-Controller-Manager service, the pod ips are master node ips

10.1.0.10:53 refers to the CoreDNS service, the pods ips are pointing to the 2 coredns pods

IPVS supports a lot more load balancing algorithms than iptables mode(round-robin only), these scheduling algorithms are implemented as kernel modules. Ten are shipped with the Linux Virtual Server.

Cluster B: Calico(ipip always) + KubeProxy(iptables mode)

In this cluster, IP-in-IP mode set to Always, Calico will route using IP-in-IP for all traffic originating from a Calico enabled node to all Calico networked containers and nodes.

Notice in the routing table below

No VM eth0 is used for calico network.

Only tunl0 is used to inter-node traffic.

For the pods on the VM, cali- interfaces are being used.

Calico ipip always mode routing table

Network interfaces

Interfaces setttings are matching the routing table, only eth0, tunl0 and cali- interfaces are used. No kube-ipvs0 is involved since Kube-proxy is using iptables mode

How about the K8S service and pod load balancing?

Let’s take a look at the iptables output for an K8S ingress controller.

KUBE-SVC-3C2I2DNJ4VWZY5PF refers to an ingress controller service with cluster IP 10.1.60.159 (Scroll to the right in the gist to see more)

SVC IP is loading balancing on top of 3 pods in round-robin fashion

The first record in the iptables with tcp dpt:30998 represents the node port being used by the AWS ELB which points to ingress controller service.

KUBE-SEP-XXXXXXXXXX refers to ingress controller pods, which belongs to a replica set of size 3

KUBE-SEP-P6JNEFEXMECE2WS6 with a pod IP 10.0.11.27

KUBE-SEP-DX25GZBAXCASAQMI with a pod IP 10.0.35.21

KUBE-SEP-HI43CU4ZL6YUQHDB with a pod IP 10.0.11.26

iptables for k8s svc and pod

Cluster C: Kube-router + KubeProxy(iptables mode)

Similar to Calico cross-subnet mode, kube-router uses eth0 for intra-subnet traffic and tunneling for inter-subnet traffic between nodes.

For the pods on the node, kube-bridge is being used for container traffic before they hit eth0 or tun- interfaces.

Network interfaces

As seen from the network interface output, there are two interesting types

Kube-bridge between VM eth0 and pod veth0

veth pairs are created between each pod and VM

More useful tools

crictl: CLI for kubelet CRI.

For k8s network troubleshooting on the worker node, crictl is more k8s-friendly than docker.

crictl ps does not show the un-related pause containers or the extremely long container names, comparing to docker ps output. Besides, it will output the pod ID where the container belongs to.

crictl ps

When debugging in Cluster B, to confirm the cali- interfaces and 10.0.104.0/24 blackhole are used by local pods, it is very convenient to use the crictl commands to get the local pod ips.

netshoot: Kubernetes network trouble-shooting swiss-army container

In several situations, installing missing tools is not an option when you try to understand what is happening in your network infrastructure.

When following Immutable Infrastructure, no tools can be installed on the VM.

When you do not have enough permissions to install tools

The VM disk is not writable

The netshoot container has a set of powerful networking debugging tools that can be used to troubleshoot Docker and Kubernetes networking issues

apache2-utils

bash

bird

bridge-utils

busybox-extras

calicoctl

conntrack-tools

curl

dhcping

drill

ethtool

file

fping

iftop

iperf

iproute2

iptables

iptraf-ng

iputils

ipvsadm

libc6-compat

liboping

mtr

net-snmp-tools

netcat-openbsd

ngrep

nmap

nmap-nping

nmap-nping

py-crypto

py2-virtualenv

python2

scapy

socat

strace

tcpdump

tcptraceroute

util-linux

vim

In the example below, it is using Privileged mode and Host’s network namespace, this will give you almost all the access needed from inside the container.

docker run -it --privileged --net host nicolaka/netshoot

That’s it, This should provide a good foundation to explore other network solutions and troubleshooting tools. Hope it is helpful!