Open vSwitch (OVS) is a virtual switch implementation that can be used as a tool for Software Defined Network (SDN). The concepts managed by OVS are the same as the concepts managed by hardware switches (ports, routes, etc.). The most important features (for me) of OVS are

It enables to have a virtual switch inside a host in which to connect virtual machines, containers, etc.

a host in which to connect virtual machines, containers, etc. It enables connect switches in different hosts using network tunnels (gre or vxlan) .

. It is possible to program the switch using OpenFlow.

In order to start working with OVS, this time…

I learned how to create a overlay network using Open vSwitch in order to connect LXC containers.

Well, first of all, I have to say that I have used LXC instead of VM because they are lightweight and very straightforward to use in a Ubuntu distribution. But if you understand this, you should be able to follow this how-to and use VMs instead of containers.

What we want (Our set-up)

What we are creating is shown in the next figure:

We have two nodes (ovsnodeXX) and several containers deployed on them (nodeXXcYY). The result is that all the containers can connect between them, but the traffic is not seen the LAN 10.10.2.x appart from the connection bewteen the OVS switches (because it is tunnelled).

Starting point

I have 2 ubuntu-based nodes (ovsnode01 and ovsnode02), with 1 network interface each, and they are able to ping one to each other:

root@ovsnode01:~# ping ovsnode01 -c 3 PING ovsnode01 (10.10.2.21) 56(84) bytes of data. 64 bytes from ovsnode01 (10.10.2.21): icmp_seq=1 ttl=64 time=0.022 ms 64 bytes from ovsnode01 (10.10.2.21): icmp_seq=2 ttl=64 time=0.028 ms 64 bytes from ovsnode01 (10.10.2.21): icmp_seq=3 ttl=64 time=0.033 ms --- ovsnode01 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = 0.022/0.027/0.033/0.007 ms root@ovsnode01:~# ping ovsnode02 -c 3 PING ovsnode02 (10.10.2.22) 56(84) bytes of data. 64 bytes from ovsnode02 (10.10.2.22): icmp_seq=1 ttl=64 time=1.45 ms 64 bytes from ovsnode02 (10.10.2.22): icmp_seq=2 ttl=64 time=0.683 ms 64 bytes from ovsnode02 (10.10.2.22): icmp_seq=3 ttl=64 time=0.756 ms --- ovsnode02 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 0.683/0.963/1.451/0.347 ms

I am able to create lxc containers, by issuing commands such as:

lxc-create -n node1 -t ubuntu

I am able to create linux bridges (i.e. I have installed the bridge-utils package).

Spoiler (or quick setup)

If you just want the solution, here you have (later I will explain all the steps). On each node you can follow the next steps in ovsnode01:

ovsnode01:~# brctl addbr br-cont0 ovsnode01:~# ip link set dev br-cont0 up ovsnode01:~# cat > ./container-template.lxc << EOF lxc.network.type = veth lxc.network.link = br-cont0 lxc.network.flags = up lxc.network.hwaddr = 00:16:3e:xx:xx:xx EOF ovsnode01:~# lxc-create -f ./container-template.lxc -n node01c01 -t ubuntu ovsnode01:~# lxc-start -dn node01c01 ovsnode01:~# lxc-create -f ./container-template.lxc -n node01c02 -t ubuntu ovsnode01:~# lxc-start -dn node01c02 ovsnode01:~# lxc-attach -n node01c01 -- ip addr add 192.168.1.11/24 dev eth0 ovsnode01:~# lxc-attach -n node01c01 -- ifconfig eth0 mtu 1400 ovsnode01:~# lxc-attach -n node01c02 -- ip addr add 192.168.1.12/24 dev eth0 ovsnode01:~# lxc-attach -n node01c02 -- ifconfig eth0 mtu 1400 ovsnode01:~# apt-get install openvswitch-switch ovsnode01:~# ovs-vsctl add-port ovsbr0 br-cont0 ovsnode01:~# ovs-vsctl add-port ovsbr0 vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=10.10.2.22

You will need to follow the same steps for ovsnode02, taking into account the names of the containers and the IP addresses.

Preparing the newtork

First we are going to create a bridge (br-cont0) to which the containers are being bridged

We are not using lxcbr0 because it may have other services such as dnsmasq that we should disable before. Moreover, creating our bridge will be more interesting to understand what we are doing

We will issue this command on both ovsnode01 and ovsnode02 nodes.

brctl addbr br-cont0

As this bridge has no IP, it is down (you can see it using ip command). Now we are going to set it up (also in both nodes):

ip link set dev br-cont0 up

Creating the containers

Now we need a template to associate the network to the containers. So we have to create the file container-template.lxc:

cat > ./container-template.lxc << EOF lxc.network.type = veth lxc.network.link = br-cont0 lxc.network.flags = up lxc.network.hwaddr = 00:16:3e:xx:xx:xx EOF

In this file we are saying that containers should be automatically bridged to bridge br-cont0 (remember that we created it before) and they will hace an interface with a hardware address that will follow the template. We could also modify the file /etc/lxc/default.conf file instead of creating a new one.

Now, we can create the containers on node ovsnode01:

ovsnode01# lxc-create -f ./container-template.lxc -n node01c01 -t ubuntu ovsnode01# lxc-start -dn node01c01 ovsnode01# lxc-create -f ./container-template.lxc -n node01c02 -t ubuntu ovsnode01# lxc-start -dn node01c02

And also in ovsnode02:

ovsnode02# lxc-create-f ./container-template.lxc -n node02c01 -t ubuntu ovsnode02# lxc-start -dn node02c01 ovsnode02# lxc-create-f ./container-template.lxc -n node02c02 -t ubuntu ovsnode02# lxc-start -dn node02c02

If you followed my steps, the containers will not have any IP address. This is because you do not have any dhcp server and the containers do not have static IP addresses. And that is what we are going to do now.

We are setting the IP addresses 192.168.1.11 and 192.168.1.21 to node01c01 and node0201 respetively. In order to do so, we have to issue these two commands (each of them in the corresponding node):

ovsnode01# lxc-attach -n node01c01 -- ip addr add 192.168.1.11/24 dev eth0 ovsnode01# lxc-attach -n node01c02 -- ip addr add 192.168.1.12/24 dev eth0

You need to configure the containers in ovsnode02 too:

ovsnode02# lxc-attach -n node02c01 -- ip addr add 192.168.1.21/24 dev eth0 ovsnode02# lxc-attach -n node02c02 -- ip addr add 192.168.1.22/24 dev eth0

Making connections

At this point, you should be able to connect between the containers in the same node: between node01c01 and node01c02, and between node02c01 and node02c02. But not between containers in different nodes.

ovsnode01# lxc-attach -n node01c01 -- ping 192.168.1.11 -c3 PING 192.168.1.11 (192.168.1.11) 56(84) bytes of data. 64 bytes from 192.168.1.11: icmp_seq=1 ttl=64 time=0.136 ms 64 bytes from 192.168.1.11: icmp_seq=2 ttl=64 time=0.051 ms 64 bytes from 192.168.1.11: icmp_seq=3 ttl=64 time=0.055 ms ovsnode01# lxc-attach -n node01c01 -- ping 192.168.1.21 -c3 PING 192.168.1.21 (192.168.1.21) 56(84) bytes of data. From 192.168.1.11 icmp_seq=1 Destination Host Unreachable From 192.168.1.11 icmp_seq=2 Destination Host Unreachable From 192.168.1.11 icmp_seq=3 Destination Host Unreachable

This is because br-cont0 is somehow a classic “network hub”, in which all the traffic can be listened by all the devices in it. So the containers in the same bridge are in the same LAN (in the same cable, indeed). But there is no connection between the hubs, and we will make it using Open vSwitch.

In the case of Ubuntu, installing OVS is as easy as issuing the next command:

apt-get install openvswitch-switch

Simple, isn’t it? But now we have to prepare the network and we are going to create a virtual switch (on both ovsnode01 and ovsnode02):

ovs-vsctl add-br ovsbr0

Open vSwitch works like a physical switch, with ports that can be connected and so on… And we are going to connect our hub to our switch (i.e. our bridge to the virtual switch):

ovs-vsctl add-port ovsbr0 br-cont0

We’ll make it in both ovsnode01 and ovsnode02

Finally, we’ll connect the ovs switches between them, using a vxlan tunnel:

ovsnode01# ovs-vsctl add-port ovsbr0 vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=10.10.2.22 ovsnode02# ovs-vsctl add-port ovsbr0 vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=10.10.2.21

We’ll make each of the two commands above on the corresponding node. Take care that the remote IP addresses are set to the other node 😉

We can check the final configuration of the nodes (let’s show only ovsnode01, but the other is very similar):

ovsnode01# brctl show bridge name bridge id STP enabled interfaces br-cont0 8000.fe502f26ea2d no veth3BUL7S vethYLRPM2 ovsnode01# ovs-vsctl show 2096d83a-c7b9-47a8-8fff-d38c6d5ab04d Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "vxlan0" Interface "vxlan0" type: vxlan options: {remote_ip="10.10.2.22"} ovs_version: "2.0.2"

Warning

Using this set up as is, you will get ping working, but probably no other traffic. This is because the traffic is encapsulated in a transport network. Did you know about MTU?

If we check the eth0 interface from one container we’ll get this:

# lxc-attach -n node01c01 -- ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:16:3e:52:42:2f inet addr:192.168.1.11 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 ...

Pay attention to the MTU value (it is 1500). And if we check the MTU of eth0 from the node, we’ll get this:

# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 60:00:00:00:20:15 inet addr:10.10.2.21 Bcast:10.10.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 ...

Summarizing, MTU is the size of the message for ethernet, which usually is 1500. But we are sending messages into messages and if we try to use it as is, we are trying to send things that have a size of “1500 + some overhead” in a room of “1500” (we are conciously omiting the units). And “1500 + some overhead” is bigger than “1500” and that is why it will not work.

We have to change the MTU of the containers to a lower size. It is as simple as:

ovsnode01:~# lxc-attach -n node01c01 -- ifconfig eth0 mtu 1400 ovsnode01:~# lxc-attach -n node01c02 -- ifconfig eth0 mtu 1400 ovsnode02:~# lxc-attach -n node02c01 -- ifconfig eth0 mtu 1400 ovsnode02:~# lxc-attach -n node02c02 -- ifconfig eth0 mtu 1400

This method is not persistent, and will be lost in case of rebooting the container. In order to persist it, you can set it in the DHCP server (in case that you are using it), or in the network device set up. In the case of ubuntu it is as simple as adding a line with ‘mtu 1400’ to the proper device in /etc/network/interfaces. As an example for container node01c01:

auto eth0 iface eth0 inet static address 192.168.1.11 netmask 255.255.255.0 mtu 1400

Some words on Virtual Machines

If you have some expertise on Virtual Machines and Linux (I suppose that if you are following this how-to, this is your case), you should be able to make all the set-up for your VMs. When you create your VM, you simply need to bridge the interfaces of the VM to the bridge that we have created (br-cont0) and that’s all.

And now, what?

Well, now we have what we wanted to have. And now you can play with it. I suggest to create a DHCP server (to not to have to set the MTU and the IP addresses), a router, a DNS server, etc.

And as an advanced option, you can play with traffic tagging, in order to create multiple overlay networks and isolating them from each other.