I spent this morning gathering performance data for all of our major supported devices. The results are very interesting.

First a note on methodology, these tests where intentionally performed against a distant (70ms) exit server and against real internet targets. In terms of pure iperf performance these devices can usually do about ~30% more, but I wanted to get realistic usage numbers. Including things like TCP window scaling thanks to the high latency and distance. What’s not realistic about this test is that mesh links are perfect (Ethernet cables). I wanted to inspect the software stack that’s under direct our control for problems before diving into antenna tuning.

The results are really interesting. Latency wise performance is flawless across the board, less than half a ms additional latency per hop, meaning video calls or live streaming will continue to be viable even deep into the network.

In terms of throughput loss to meshing things are also pretty positive. When meshing with a higher performance gateway devices nodes actually saw throughput increase which says good things about our packet scheduling and network design. On the other hand deeper connections of identical devices must suffer the rules of queueing theory. Throughput has to go down as the chain increases in length. At a rate of about 5-10% per hop currently from current tests, but these aren’t very reliable to extrapolate form.

The absolute value of mesh throughput is a little more complicated and related to Wireguard encryption performance and other device properties. Lets compare our two highest performance devices. The EdgerouterX using a MT7621 and the Turris Omnia using a Armada 38x. You may notice that the Omnia has about double the cpu clock speed and about double the performance.

While this is intuitively ok, the reality is that it shouldn’t be the case. The Armada has a L2 cache (L1 cache specs are the same on both devices) and ARM is typically considered to have a higher IPC (instructions per clock) execution rate. It’s hard to narrow down what’s causing what here, my hunch is memory speed and lack of DMA to make copying traffic around efficient, I should probably do some cache miss stats on the Omnia before reaching further conclusions.

A supporting note is that on the n750 35% of the total cpu load is in sirq AKA copying data off of the nic and into memory. This could indicate a lack of DMA (direct memory access) where higher frequencies would directly assist in memory bound workloads.