print

We decided for OpenVPN to build secure connections to our Private Spaces . We braced for difficulties, but that was only the beginning. The point of this post is that integration testing does make a difference. And that OpenVPN is a very nice tool!

Busy? Scroll to the bottom for a summary of main lessons.

When we started playing with OpenVPN, we knew that we want to use PKI and certificates. The simple reason being, we already had an easy to deploy PKI system for issuing certificates, revocation lists (CRL) and everything else you need for doing it professionally – including the use of real secure hardware (read cloud HSM if you know it doesn’t mean “High School Musical”).

We were after a simple user interface and it meant quite a bit of tailoring. A completely new VPN user management added to the PKI system, which we took as the basis for the Private Space security. That was something we planned for.

Six weeks into the project, we had a decent skeleton of a product, which included:

PKI for key management – each user gets a certificate, when his/her access is revoked, the certificate is added to a revocation list (CRL) and the change is applied almost instantly. Dynamic DNS – each instance of the VPN gets a domain in .umph.io, which we generate by randomly adding a name of a British town to it: e.g., cambridge2.umph.io. If you restart the virtual machine with a new IP address, the DNS records are updated and everything works – business as usual. Letsencrypt certificate – connections to the server are browser-trusted and you can see a satisfying green padlock (or less satisfying gray padlock in Safari browser). Cloud HSM , which is protecting the VPN’s certification authority – no key leakage here, all signings done in secure hardware with FIPS140-2 Level 3 security. Frankly – neither AWS or Azure will give you that!

At this point we thought – great, job done, let’s start a few instances on Amazon AWS and tell people. Alright, we were reasonable enough to start using it ourselves first. We connected some office servers to it, our mobile phones and laptops and started testing.

UDP Shaping

The first obvious question was – how “fast” it was. That turned out to be a difficult question to answer. The first tests we’ve done were on Virgin (UK) business fiber – 200Mbps. A big disappointment – the peak on the left is 10Mbps. The next day, we ran tests on ADSL2 and Cambridge University networks. These went reasonably well so we thought the problem was on Amazon side. Trying to figure out what was going on, we asked Amazon support and they said their side was fine.

At this point, we were testing both UDP and TCP options on the Virgin fiber while keeping an eye on OpenVPN server logs. We got two different types of errors:

“Replay-window backtrack occurred” – with UDP – network congestion, packets arriving in incorrect order. Most likely linked to ISP shaping. You can start figuring out the problem by changing the size of “replay-window”. “MULTI: packet dropped due to output saturation” – with TCP – either you live with it, or you can start messing with configuration options tun-mtu, fragment, and mssfix. (we used TUN)

To cut a long story short, Amazon were correct, of course. The whole problem was in Virgin’s extraordinary shaping of UDP traffic.

DNS and basic networking

Our goal was to completely hide all the traffic inside the encrypted VPN channel, including DNS. We have therefore already installed dnsmasq. Then we have encountered yet another quirkiness – occasional routing problems, as if DNS stopped working. We think it is linked to “PEERDNS=yes” set as default in AWS EC2 instances. Once we replaced that with public DNS servers 8.8.8.8, the problem disappeared.

OpenVPN server config files and sysctl

When you do a lot of configuration changes, you expect your VPN client to reconnect quickly and in line with the configuration you define. We ran into a problem where we made a change in the server configuration, but the client behaved as if nothing happened. It got so bad, that we started optimizing network-related sysctl settings. We started thinking there is something about “magic” but … no magic, just “advanced” technology. It turned out that the openvpn server reads all files with the “.conf” extension and when we renamed old configuration files, we kept the extension intact. Well – that was a day spent with sysctl and network analysis.

Once we figured this one out, we were able to get reliable reconnections without any sysctl configuration.

Hint: reconnections still don’t work? Think about what you need to persist.

Speed, LZO, EC2 types

We had access to a fast Cambridge University network thanks to our ideaSpace membership. The maximum speed there is very good and we could test what difference is there between t2.micro, t2.small, and t2.medium EC2 types, which Amazon offers.

Again, we tested both Open VPN protocols, UDP as well as TCP.

The bottom line was that we couldn’t really show much statistically significant difference between UDP and TCP in the bandwidth data. There was no difference between t2.micro and t2.small – an expected result as the only difference between these two EC2 types is available RAM. We could, however, see a difference between t2.small and t2.medium, in the region of 20Mbps – see the table on the left above for a few test results. More interesting was the difference on the EC2 / AWS side, where OpenVPN clearly used a significant number of processor cycles.

Server processor use

EC2 instances collect CPU credits, which can be used for short-term spikes in utilization. The t2.small instance managed to stay just below the level for “spikes” when LZO compression was switched off. (If you wonder, we used two different tests: 1. a web service for testing broadband speed, and 2. download of a large file from the VPN server. The latter shows “network out” only. The CPU utilization difference was negligible.)

Note: you may also want to have a look at the TCP window sizing and its impact on network speed. However, it makes sense only if your network speed is getting close/above 150Mbps with 50ms link latency. Current operating systems use decent window scaling factors – but still worth keeping in mind if you encounter “magic”.

Never give up

It is possible to set up OpenVPN relatively quickly, using one of many sample configuration files you can quickly google. If you want to create a reliable VPN, brace yourself for a long journey when you keep improving your network one step at a time.

We have eventually created a configuration, which was stable, reasonably fast (measured with download times of large files), hid all the traffic, including DNS queries, and worked on Mac, Windows, smart phones. But it’s certainly not the end of it!

Lessons

UDP v TCP – many ISPs don’t like UDP and will enforce harsh traffic shaping. Switching to TCP comes with a penalty in speed and ping latency, but it’s not as bad as you may fear. Compression – it’s up to you, but most data is already compressed at the source and HTTPS is incompressible by the definition of encryption. Setting rcvbuf / sendbuf does make a difference. Windows can be nasty and you have to tell it explicitly to ignore “outside” DNS. You have to get networking basics right – like DNS setup. Before you start optimizing, ask yourself whether it is to get tests right, or for the real world use.