From lxadm | Linux administration tips, tutorials, HOWTOs and articles

Introduction

Virtualization is a great technology, as it lets you run multiple virtual systems on a single server.

It's easy to create a "LAN" for containers on a single server - just attach them to the same bridge, use the same subnet (i.e. 10.10.10.0/24) - done. Containers can communicate with each other using their private IP address.

However, with more then one server *not* in the same LAN (i.e. two LXD servers in different datacentres, or even in the same datacentre, but with hosting which doesn't allow LANs), the things get tricky. There are numerous examples of such hostings - i.e. Amazon AWS or Hetzner.

Virtual Extensible LAN (VXLAN) can help us solve this issue. No more hassle with port redirections, adjusting IPs after containers were migrated etc.

While this HOWTO is primarily with LXD in mind, it can be used with any networking technologies. Apart from LXD - LXC, KVM, Xen, docker should be fine to use it. Make sure to read MTU notes.





Network diagram

We will build a Virtual Extensible LAN as on example diagrams below. Each server is using unicast VXLAN connected to every other server. In the examples, the containers are using 10.10.10.0/24 subnet, but of course you can use any other subnet.

Example 1:

LXD1: IP 1.1.1.1, Europe LXD2: IP 2.2.2.2, Asia container01, 10.10.10.10 container04, 10.10.10.20 container02, 10.10.10.11 container05, 10.10.10.21 container03, 10.10.10.12 container06, 10.10.10.22 LXD3: IP 3.3.3.3, US LXD4: IP 4.4.4.4, Africa container07, 10.10.10.30 container10, 10.10.10.40 container08, 10.10.10.31 container11, 10.10.10.41 container09, 10.10.10.32 container12, 10.10.10.42



Example 2:

LXD1: IP 1.1.1.1, Hetzner DC18 LXD2: IP 2.2.2.2, Hetzner DC19 container01, 10.10.10.10 container04, 10.10.10.20 container02, 10.10.10.11 container05, 10.10.10.21 container03, 10.10.10.12 container06, 10.10.10.22 LXD3: IP 3.3.3.3, Hetzner DC20 LXD4: IP 4.4.4.4, Hetzner DC20 container07, 10.10.10.30 container10, 10.10.10.40 container08, 10.10.10.31 container11, 10.10.10.41 container09, 10.10.10.32 container12, 10.10.10.42





Prerequisites

You will need a fairly modern kernel and userspace. Ubuntu 14.04 LTS is too old; Ubuntu 16.04 LTS is fine. CentOS 7 should also be fine.

Each container needs to be attached to the bridge created by the script below. We're using vxbr0 as the bridge for VXLAN devices.





Performance

VXLAN offers near wire speed. I.e. if iperf between your containers with public IPs will show 90 MB/s traffic, VXLAN traffic should be showing around 85 MB/s.





Drawbacks

VXLAN does not encrypt the traffic

VXLAN does not compress the traffic

While it's typically fine to run the traffic unencrypted between the servers in the same VPC / security group in AWS or in general, a single datacentre, you may want to think about extra traffic encryption with server in different geographical locations.





MTU issue

VXLAN interface will lower your MTU to 1450. It means, any container attached to such a bridge using VXLAN needs its networking to have MTU of 1450 as well. Otherwise, any traffic with larger packets will hang.

If you're using LXD, you can run "lxc config edit your-container" and set it as below:

(...) devices: eth0: mtu: "1450" # <----- lowers container's NIC MTU to 1450 nictype: bridged parent: vxbr0 # <----- bridge where the container has to be attached type: nic root: path: / type: disk (...)





preventing bridge looping

If you use more than two servers, you will have more than one VXLAN device attached to the bridge. This normally creates packet loops. To prevent it, we use ebtables to block the traffic between different VXLAN devices on the same server.

The script sets this up automatically.

Please note that the script assumes it's the only thing on the server which manipulates ebtables.





vxlansetup.sh script

The script creates VXLAN devices between every server.

usage - available actions are start, stop, restart and status

# ./vxlansetup.sh * Usage: vxlansetup.sh {start|stop|restart|status} #





copy the script to every server running the containers (do not copy or use the script in the containers)

modify LOCALIP, REMOTEIPS, LOCAL_DEV, BRIDGE_DEV variables on every server

DRYRUN - set it to 1 to see which commands would be run

VXLAN_DEV, PORT - should be OK for most systems, but adjust if needed

you have to run the script on every server (remember to adjust the IPs accordingly)





#!/bin/bash LOCALIP = 1.1.1.1 REMOTEIPS =( 2.2.2.2 3.3.3.3 4.4.4.4 ) # existing local device where $LOCALIP is attached LOCAL_DEV = vmbr0 # New bridge device where $VXLAN_DEV devices will be attached BRIDGE_DEV = vxbr0 # Will set up one vxlan* device per remote IP. # For example, with two remote IPs (three servers in total), we will set up vxlan0 and vxlan1 devices, # attached to BRIDGE_DEV (vxbr0) VXLAN_DEV = vxlan # Port used for vxlan PORT = 4789 # If set to 1 - only print commands we would run DRYRUN = 1 # No need to change anything below function vxrun () { COMMAND = $@ if [ " $D RYRUN" -eq 1 ] ; then echo $C OMMAND else $C OMMAND fi } function vxlan_check () { brctl show $B RIDGE_DEV 2> & 1 | grep -q 'No such device' if [ $? -ne 0 ] ; then echo "vxlan already set up?" exit 1 fi ip addr | grep -q " $ VXLAN_DEV" if [ $? -eq 0 ] ; then echo "vxlan already set up?" exit 1 fi } function vxlan_start () { # first, check if we have any vxlan devices or interfaces vxlan_check VXLAN_DEVICES = $(( ${# REMOTEIPS [@] } - 1 )) which brctl >/dev/null if [ $? -ne 0 ] ; then vxrun echo brctl command not found exit 1 fi # If there is more than one remote IP, we have to add ebtables rules to prevent looping if [ " ${# REMOTEIPS [@] } " -gt 1 ] ; then # Check if ebtables is installed which ebtables >/dev/null if [ $? -ne 0 ] ; then vxrun echo ebtables command not found exit 1 fi # Our vxlan* devices must not pass traffic to each other for i in $( seq 0 $VXLAN_DEVICES ) ; do for j in $( seq 0 $VXLAN_DEVICES ) ; do if [ $i -ne $j ] ; then vxrun ebtables -A FORWARD -i ${ VXLAN_DEV }${ i } -o ${ VXLAN_DEV }${ j } -j DROP fi done done fi # Add bridge vxrun brctl addbr $B RIDGE_DEV #vxrun brctl stp $BRIDGE_DEV on vxrun ip link set up $B RIDGE_DEV # Add vxlan* devices for VXLAN_DEVICE in $( seq 0 $VXLAN_DEVICES ) ; do vxrun ip link add ${ VXLAN_DEV }${ VXLAN_DEVICE } type vxlan id ${ VXLAN_DEVICE } remote ${ REMOTEIPS [ $ VXLAN_DEVICE ] } local $LOCALIP dev $LOCAL_DEV dstport $PORT vxrun ip link set up dev ${ VXLAN_DEV }${ VXLAN_DEVICE } vxrun brctl addif $B RIDGE_DEV ${ VXLAN_DEV }${ VXLAN_DEVICE } done } function vxlan_stop () { vxrun ip link set down $B RIDGE_DEV vxrun brctl delbr $B RIDGE_DEV # List all vxlan devices VXLAN_SETS = $( ip addr | awk -F: "/ $ VXLAN_DEV/ {print \$2}" ) for VXLAN_SET in $VXLAN_SETS ; do vxrun ip link set down $VXLAN_SET vxrun ip link del $VXLAN_SET done # We assume ebtables are only used for our vxlan setup script ebtables -F } function vxlan_status () { echo vxlan bridge: brctl show $B RIDGE_DEV echo echo vxlan interfaces: ip link show | grep -A 1 $VXLAN_DEV echo ebtables -L } MODE = $1 set -u # start, stop, restart if [ " $ MODE" == "start" ] ; then vxlan_start elif [ " $ MODE" == "stop" ] ; then vxlan_stop elif [ " $ MODE" == "restart" ] ; then vxlan_stop vxlan_start elif [ " $ MODE" == "status" ] ; then vxlan_status else echo " * Usage: $0 {start|stop|restart|status}" exit 1 fi





Example output

Here is an example output in "DRYRUN" mode:

# vxlansetup.sh start ebtables -A FORWARD -i vxlan0 -o vxlan1 -j DROP ebtables -A FORWARD -i vxlan0 -o vxlan2 -j DROP ebtables -A FORWARD -i vxlan1 -o vxlan0 -j DROP ebtables -A FORWARD -i vxlan1 -o vxlan2 -j DROP ebtables -A FORWARD -i vxlan2 -o vxlan0 -j DROP ebtables -A FORWARD -i vxlan2 -o vxlan1 -j DROP brctl addbr vxbr0 ip link set up vxbr0 ip link add vxlan0 type vxlan id 0 remote 2.2.2.2 local 1.1.1.1 dev vmbr0 dstport 4789 ip link set up dev vxlan0 brctl addif vxbr0 vxlan0 ip link add vxlan1 type vxlan id 1 remote 3.3.3.3 local 1.1.1.1 dev vmbr0 dstport 4789 ip link set up dev vxlan1 brctl addif vxbr0 vxlan1 ip link add vxlan2 type vxlan id 2 remote 4.4.4.4 local 1.1.1.1 dev vmbr0 dstport 4789 ip link set up dev vxlan2 brctl addif vxbr0 vxlan2