As I begin to dig deeper into placing docker into production I find myself needing an elaborate lab. I personally use KVM on a Ubuntu system but this should apply to any hypverisor which uses Linux bridges to accomplish networking.

I wanted end-to-end connectivity from my host system to containers that were being hosted in a virtual machine. For example, I should be able to create a VM (from my KVM Host machine) named dockerhost01 and a containers in this machine. I should be able to ping the container from my KVM HOST machine. With the default routing table this will not work.

In order to aid in representing our configuration I have a crude hand-written diagram. I’m far to lazy to start up my windows VM just to get visio going so I hope this will do justice.

In the image you can see we have a workstation with a wlan adapter at 192.168.0.104. This workstation also has the default virtual bridge that KVM creates. We can think of a virtual bridge as a “switch” which lives on our host. We are, in essence, turning our workstation into a multi-port switch. Each virtual machine we power on literally “plugs” there interface into this virtual switch named virtbr0. I will return back to linux virtual bridges later in the post.

Next you can see that we have a KVM VM named DockerHost. This VM itself has another linux virtual bridge named Docker0. This is how containers communicate with their host machine. In the same fashion, our containers “plug” there interfaces directly into Docker0.

Now if this is your first introduction to linux bridges you may be slightly confused. I would suggest you play with them a little to see how they work. It’s a simple construct but conceptually can be slightly confusing. Let’s download the bridge-utils tools and inspect our switches

#Workstation machine root@ldubuntu:/home/ldelossa# brctl show bridge name bridge id STP enabled interfaces virbr0 8000.5254007de659 yes virbr0-nic vnet0 vnet1 vnet2 vnet3 vnet4 root@ldubuntu:/home/ldelossa#

What we see here is the default linux bride that KVM creates. You can see that each VM I have created on my workstation is a “plugged-in” interface on the bridge. The bridge has an IP address also

root@ldubuntu:/home/ldelossa# ip addr list 5: virbr0: mtu 1500 qdisc noqueue state UP group default link/ether 52:54:00:7d:e6:59 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 6: virbr0-nic: mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 500 link/ether 52:54:00:7d:e6:59 brd ff:ff:ff:ff:ff:ff 7: vnet0: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 500 link/ether fe:54:00:1a:57:13 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe1a:5713/64 scope link valid_lft forever preferred_lft forever 8: vnet1: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 500 link/ether fe:54:00:61:5d:99 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe61:5d99/64 scope link valid_lft forever preferred_lft forever 10: vnet2: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 500 link/ether fe:54:00:ba:3c:e3 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:feba:3ce3/64 scope link valid_lft forever preferred_lft forever 11: vnet3: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 500 link/ether fe:54:00:88:b8:44 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe88:b844/64 scope link valid_lft forever preferred_lft forever 12: vnet4: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 500 link/ether fe:54:00:63:42:aa brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe63:42aa/64 scope link valid_lft forever preferred_lft forever

This IP address is used in NAT operations. Any node trying to access the 192.168.122.0/24 network OUTSIDE of our workstation, would need to first send packets to our workstation, then our workstation sends packets to our Linux bridge.

And here is where things get confusing if you are not used to Linux bridging. One would think that on our workstation, if we tried to trace the path to our DockerHostVM, we would see the Linux bridge’s interface IP as a hop. However we must keep in mind that our workstation IS the Linux bridge. There’s no hop necessary. We are directly connected to the 192.168.122.0/24 network simply by BEING the bridge.

Okay, if that doesn’t make too much sense don’t worry. The above can be demonstrated the same way for DockerHostVM. The same concepts are at play, however docker calls its Linux bridge docker0.

So if you look at our diagram, what is the main goal we need to accomplish? We need to get packets originating from HostWorkStation, destined for 172.17.0.1 to DockerHostVM, and then to the container. The reply packets must follow the same path up, through DockerHostVM, and back to HostWorkstation.

So let’s take a look at what we known. We know that we are going to need both Linux machines to act as a packet forwarding router. So let’s do this first, enable packet forwarding on both machines:

Run the following commands on both machines:

sysctl -w net.ipv4.ip_forward=1

Okay cool, so now we have the mechanisms which will allow us to forward packets based on our routing table entries. That last statement was a hint on where to go next. Let’s get an idea of what the routing tables look like on each machine in play.

#HostWorkStation ldelossa@ldubuntu:~$ ip route default via 192.168.0.1 dev wlan0 proto static metric 600 192.168.0.0/24 dev wlan0 proto kernel scope link src 192.168.0.104 metric 600 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1

#DockerHostVM [root@dockerhost01 ~]# ip route default via 192.168.122.1 dev ens3 proto static metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.11 metric 100

root@22e95e83a211:/# ip route default via 172.17.0.1 dev eth0 172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2

Okay so let’s look at this information from bottom up. First the container’s routing table. Not much going on here, which is fine. All traffic leaving the container is going to head the only way it can – out it’s one interface which we know is virtually “plugged” into docker0.

Next we take a look at our DockerHostVM routing table. Here we have a little more complexity but still not bad. We know to direct packets heading to the container’s ip ranges (172.17.0.0/16) toward the docker0 bridge. We also know that we are directly connected to 192.168.122.0/24 via our ens3 interface. This interface is “plugged” into our virtb0 bridge on our HOST machine. So ANY packets that we aren’t sure where the destination is, we send up to our linux bridge on HostWorkStation.

Now we have our HostWorkStation. This is our point of interest. My workstation’s default route is going to send all unknown packets out my wireless lan interface; Which is appropriate, that’s where the internet is and consequently where we should send any unknown packets. Next we have a route for our VM network (192.168.122.0/24) directing any packets that need to go to our VMs to head to virbr0 interface. This is all great and dandy, but what are we missing?

We need to tell our HostWorkStation to send packets destined for our docker container somewhere other than out my wireless lan interface. Right now there’s no routes for 172.17.0.0/16, hence those packets will head out wlan0, and die. So where do we need to send those packets? My first inclination was to send those packets to our bridge interface, 192.168.122.1 – however this is incorrect. We need to remember that our HostWorkStation IS! the bridge. The bridge interface is not a separate device with it’s own routing table, is literally is the linux machine we are running. Therefore we want to route packets to the next device who knows how to get to our docker containers, we want to route packets to DockerHostVM (192.168.122.11)

Let’s do just that

#HostWorkStation ip route add 172.17.0.0/16 via 192.168.122.11

Let’s test connectivity

ldelossa@ldubuntu:~$ ping 172.17.0.2 PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data. 64 bytes from 172.17.0.2: icmp_seq=1 ttl=63 time=0.815 ms 64 bytes from 172.17.0.2: icmp_seq=2 ttl=63 time=0.466 ms 64 bytes from 172.17.0.2: icmp_seq=3 ttl=63 time=0.550 ms 64 bytes from 172.17.0.2: icmp_seq=4 ttl=63 time=0.539 ms 64 bytes from 172.17.0.2: icmp_seq=5 ttl=63 time=0.396 ms 64 bytes from 172.17.0.2: icmp_seq=6 ttl=63 time=0.452 ms 64 bytes from 172.17.0.2: icmp_seq=7 ttl=63 time=0.554 ms ^V64 bytes from 172.17.0.2: icmp_seq=8 ttl=63 time=0.568 ms

Very nice!

So the full picture, how does this work?

#from workstation to container

1) We generate a ping from our workstation toward 172.17.0.2

2) Workstation looks at it’s routing table and says “Okay I want to send to 172.17.0.2, no problem I’ll send these packets over to 192.168.122.11”

3) The networking stack then determines where to source this ping from. It determines that it has an interface on the 192.168.122.0/24 network, the bridge, and sends the ping out this interface, over to 192.168.122.11 sourced from 192.168.122.1. (this step is exactly why we do not see 192.168.122.1 as a hop, the networking stack owns the bridge, and can source packets from this bridge directly)

4) Our DockerHostVM (192.168.122.11) gets this packet with the destination of 172.17.0.2

5) Our DockerHostVM does a routing table lookup and finds it’s route for 172.17.0.0/16 going toward docker0 and sends the packet that way.

6) The packet is delivered via the bridge to the correct container

Let’s see this with tcpdump. We will initiate a ping from our HostWorkStation toward 172.17.0.2. We will run tcpdump on both how DockerHostVM and our container

#HostWorkStationVM ldelossa@ldubuntu:~$ ping 172.17.0.2 PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data. 64 bytes from 172.17.0.2: icmp_seq=1 ttl=63 time=0.613 ms 64 bytes from 172.17.0.2: icmp_seq=2 ttl=63 time=0.665 ms 64 bytes from 172.17.0.2: icmp_seq=3 ttl=63 time=0.477 ms 64 bytes from 172.17.0.2: icmp_seq=4 ttl=63 time=0.650 ms 64 bytes from 172.17.0.2: icmp_seq=5 ttl=63 time=0.535 ms 64 bytes from 172.17.0.2: icmp_seq=6 ttl=63 time=0.583 ms 64 bytes from 172.17.0.2: icmp_seq=7 ttl=63 time=0.632 ms 64 bytes from 172.17.0.2: icmp_seq=8 ttl=63 time=0.448 ms 64 bytes from 172.17.0.2: icmp_seq=9 ttl=63 time=0.611 ms

#DockerHostVM [root@dockerhost01 ~]# tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on docker0, link-type EN10MB (Ethernet), capture size 65535 bytes 23:43:47.053790 IP 192.168.122.1 > 172.17.0.2: ICMP echo request, id 24412, seq 1, length 64 23:43:47.053956 IP 172.17.0.2 > 192.168.122.1: ICMP echo reply, id 24412, seq 1, length 64 23:43:47.055546 IP 172.17.0.2.55398 > dns02.kvm.lan.domain: 8571+ PTR? 1.122.168.192.in-addr.arpa. (44) 23:43:47.056461 IP dns02.kvm.lan.domain > 172.17.0.2.55398: 8571 NXDomain* 0/1/0 (99) 23:43:47.056987 IP 172.17.0.2.36237 > dns02.kvm.lan.domain: 25760+ PTR? 3.122.168.192.in-addr.arpa. (44) 23:43:47.058016 IP dns02.kvm.lan.domain > 172.17.0.2.36237: 25760* 1/2/2 PTR dns02.kvm.lan. (137) 23:43:48.052682 IP 192.168.122.1 > 172.17.0.2: ICMP echo request, id 24412, seq 2, length 64 23:43:48.052936 IP 172.17.0.2 > 192.168.122.1: ICMP echo reply, id 24412, seq 2, length 64 23:43:49.051689 IP 192.168.122.1 > 172.17.0.2: ICMP echo request, id 24412, seq 3, length 64 23:43:49.051818 IP 172.17.0.2 > 192.168.122.1: ICMP echo reply, id 24412, seq 3, length 64 23:43:50.050764 IP 192.168.122.1 > 172.17.0.2: ICMP echo request, id 24412, seq 4, length 64 23:43:50.050964 IP 172.17.0.2 > 192.168.122.1: ICMP echo reply, id 24412, seq 4, length 64 23:43:51.050628 IP 192.168.122.1 > 172.17.0.2: ICMP echo request, id 24412, seq 5, length 64 23:43:51.050728 IP 172.17.0.2 > 192.168.122.1: ICMP echo reply, id 24412, seq 5, length 64 23:43:52.050675 IP 192.168.122.1 > 172.17.0.2: ICMP echo request, id 24412, seq 6, length 64 23:43:52.050817 IP 172.17.0.2 > 192.168.122.1: ICMP echo reply, id 24412, seq 6, length 64

#Container [root@dockerhost01 ~]# docker exec -it nipap-psql01 /bin/bash root@22e95e83a211:/# tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 04:43:47.053812 IP 192.168.122.1 > 22e95e83a211: ICMP echo request, id 24412, seq 1, length 64 04:43:47.053953 IP 22e95e83a211 > 192.168.122.1: ICMP echo reply, id 24412, seq 1, length 64 04:43:47.055538 IP 22e95e83a211.55398 > dns02.kvm.lan.domain: 8571+ PTR? 1.122.168.192.in-addr.arpa. (44) 04:43:47.056467 IP dns02.kvm.lan.domain > 22e95e83a211.55398: 8571 NXDomain* 0/1/0 (99) 04:43:47.056981 IP 22e95e83a211.36237 > dns02.kvm.lan.domain: 25760+ PTR? 3.122.168.192.in-addr.arpa. (44) 04:43:47.058036 IP dns02.kvm.lan.domain > 22e95e83a211.36237: 25760* 1/2/2 PTR dns02.kvm.lan. (137) 04:43:48.052782 IP 192.168.122.1 > 22e95e83a211: ICMP echo request, id 24412, seq 2, length 64 04:43:48.052933 IP 22e95e83a211 > 192.168.122.1: ICMP echo reply, id 24412, seq 2, length 64 04:43:49.051751 IP 192.168.122.1 > 22e95e83a211: ICMP echo request, id 24412, seq 3, length 64 04:43:49.051816 IP 22e95e83a211 > 192.168.122.1: ICMP echo reply, id 24412, seq 3, length 64 04:43:50.050798 IP 192.168.122.1 > 22e95e83a211: ICMP echo request, id 24412, seq 4, length 64 04:43:50.050960 IP 22e95e83a211 > 192.168.122.1: ICMP echo reply, id 24412, seq 4, length 64 04:43:51.050651 IP 192.168.122.1 > 22e95e83a211: ICMP echo request, id 24412, seq 5, length 64 04:43:51.050726 IP 22e95e83a211 > 192.168.122.1: ICMP echo reply, id 24412, seq 5, length 64 04:43:52.050706 IP 192.168.122.1 > 22e95e83a211: ICMP echo request, id 24412, seq 6, length 64 04:43:52.050814 IP 22e95e83a211 > 192.168.122.1: ICMP echo reply, id 24412, seq 6, length 64

We have successfully achieved end-to-end connectivity from our KVM host to our docker containers being hosted on a machine. This now allows us to lab with the more exciting aspects of docker such as, multi-docker host overlay networking, CI deployments, orchestration, and logging just to name a few.

Hope this helped anyone who was looking to perform a similar set-up.