In which detailed data confirms the obvious

If it becomes clear that the Zyxel is a part of a larger pattern of substandard performance rather than the exception I’ll start digging into kernel performance profiling and really get to the bottom of the issue.

Good news is that GL-B1300 has a good enough price/performance to become our new go-to midrange pick. Bad news is that’s 100mbps throughput, at least 3x less than what I had estimated the device to be capable of.

True to my word, I dug into the performance at the kernel level on both the WD MyNet n600 and the GL-B1300, trying to get to the bottom of our performance woes.

First off thanks to Brendan Gregg for the wonderful pearl script that converts data from the Linux perf tool into easily readable Flame Graphs. For those of you not familiar this graph represents a call stack, going up the deeper ‘down’ the callstack you go. Length is a relative measure of how much cpu time was spent in a given function.

The below graphs where captured over 100 second intervals during an iperf UDP test from kworker threads using a kernel compiled with as identical as possible flags.

The n600 is a pretty old device, it runs a 560mhz AR9344 MIPS core right out of 2011. Variants of this core are popular into even modern routers. Tested speed with Althea is ~25mbps.

The GL-B1300 is a brand new quad core 717mhz ARM chip Atheros IPQ4028. Tested speed with Althea is ~100mbps.

Considering the process and architecture improvements between these two processors we should be seeing a lot more than a 4x improvement. The latter processor is easily a dozen times more powerful.

The reason we’re focusing on processing power is that in our design with Althea we actually nest two WireGuard tunnels. One to provide security for your traffic as it traverses out to the internet, the other exists between every hop. A very important part of being able to bill for traffic is being able to identify who is responsible for paying for that traffic.

Since each hop pays the hop adjacent to it in our system WireGuard is actually the most expedient way to authenticate traffic is actually from the peer who will be billed. In theory we could improve efficiency by removing the chacha20 encryption and only using the poly1305 authentication.

This is even something we seriously discussed as it became clear even modern devices where not performing to expectations. But I decided to do some more investigating before going down that route.

In the graph for the n600 CPU time dominated by WireGuard tasks. So much so that the iptables and routing rules remain thin little licks of ‘flame’.