Physical interface test

So, I finally made up my mind and prepared a more professional and precise test.

Xeon processors from Broadwell on, have a feature named Posted Interrupts.

This feature allows a guest OS to receive interrupts from a device, without passing through the hypervisor. KVM supports this since 2013, and by doing this, performance of a device passed via PCI passtrough are identical to bare metal:

“Although close to bare metal, the average interrupt invocation latency of KVM+DID is 0.9µs higher.

[…]

As a result, there is no noticeable difference between the packet latency of the SRIOV-BM configuration and the SRIOV-DID configuration, because the latter does not incur a VM exit on interrupt delivery.”

So I got some Intel 82599ES 10 Gigabit cards as suggested by the FreeBSD Network Performance Tuning, another identical server and I connected the servers back to back with this setup.

+----------------+ +----------------+

| | | |

| NIC2-----+--<--<--<--+-----NIC1 |

| | | |

| Server | | TRex |

| | | |

| NIC3-----+-->-->-->--+-----NIC4 |

| | | |

+----------------+ +----------------+

The server was running a single Linux or FreeBSD machine at once with PCIE passthrough, while another server is running TRex bound to two 10 Gbit cards.

TRex is a powerful DPDK based traffic generator which is capable to generate dozens of millions of packet per second. It can easily achieve line rate with 10 Gbit cards by sending 64 byte frames from 4 CPU.

TRex sends 10 Gbit (14.8 millions of 64 byte packets) per seconds from NIC1, packets got received from NIC2, gets handled by the OS under test, which sends the packets to NIC3, and it finally the packets go back to TRex, which checks them and makes statistics.

With this setup, and PCI passthrough tests results became more stable and reproducible between different runs, so I assume that this is the correct way to do it.

Each test was repeated ten times, the graph line plots the average, while min and max values are reported with candlesticks.

First test is a software bridge. The two interfaces are bridged and packets are just forwarded.

L2 forwarding

The first thing that struck me, is that FreeBSD packet rate was substantially the same with one or 8 CPU. I investigated a bit, and I’ve found it to be a known issue: bridging under FreeBSD is known to be slow because the if_bridge driver is pratically monothread due to excessive locking, as written in the FreeBSD network optimization guide.

The second thing that I noted is that when running a test on a single core FreeBSD guest, the system freezes until traffic is stopped. It only happens to FreeBSD when the guest has only one core. Initially I tought that it could be a glitch of the serial or tty driver, but then I ran a while sleep 1; do date; done loop, and if it was just an output issue, the time wouldn’t freeze. I looked in all the sysctl to find if the FreeBSD kernel was preemptible, and it is, so I can’t explain what is going on. I made an asciinema which better illustrates this weird behavior.

Second test is routing. Two IP adresses belonging to different networks are assigned to the interfaces, and the TRex NIC4 address is set as default route. TRex is sending packets to the first interface and packets are forwarded.

When talking about L3 forwarding both OS scale quite well. While achieving more or less the same performances with a single core, Linux does a better job with multiple processors.

Third test is about firewall. The setup is the same as the routing test, except that some firewall rules are loaded in the firewall.

The rules are generated in a way that they can’t match any packet sending from TRex (different port range than the generated traffic), they are here only to weigh.

We know that both OS have two firewall systems, Linux has iptables and nftables, while FreeBSD has PF and IPFW. I tested all of them and in the graph below I report performances for iptables and IPFW because they resulted faster than the other two solutions.