This article will show you how to use the native Azure and AWS VPN solutions create a full VPN mesh between Azure and AWS with dual gateways on both sides and active-active connectivity across the board.

As far as I know this is the first time this kind of setup is publicly documented, so enjoy the first row seat!

A bit of context…

Establishing a site to site (S2S) VPN between AWS and Azure was a less than ideal process as it required at least one of the sides to use a 3rd party network virtual appliance (NVA) such as Windows RRAS, a virtual firewall from one of the available vendors or a Linux box running VPN software such as StrongSwan.

Not only the setup process was cumbersome, but it also implied you had to build your own HA capabilities at whichever side you decided to deploy your 3rd party NVA and had to build complex setups for automated failover with minimal downtime.

Why was it like that?

The reason why you had to use a 3rd party NVA on either side is simple: AWS and Azure’s native VPN solutions were not compatible.

AWS chose to only support IKEv1 for their native S2S VPN solution, whilst Azure chose to move to IKEv2 only for theirs. This basically created a situation were their VPN solutions couldn’t obviously agree on which IKE version to use when establishing a S2S tunnel.

As a side note, Azure has a VPN gateway SKU that supports IKEv1. That’s the Basic SKU and unsupported for production environments, it does however have other interoperability issues with AWS VPN making the connection also impossible.

What has changed?

AWS has announced they now support IKEv2, which is great news for everyone, but mainly for their customers and those who have to interact with AWS VPCs.

Does that mean that a VPN between Azure and AWS is now possible? Yes, but with some caveats.

What can be accomplished now?

Right now you can configure tunnels between AWS and Azure that take advantage of the HA possibilities on both sides, with just a few caveats:

You can’t use BGP You need to create two VPN connections on the AWS side to achieve active-active across on both sides, otherwise only the AWS side is active-active (using both VGWs) while sending all traffic to one of the VPN GWs on Azure.

Scenario

We’re going to configure a site to site connection between Azure and AWS. Unfortunately we can’t use BGP at the moment because AWS forces you to use APIPA addresses for the tunnel’s inside IP, which is also the IP where BGP listens on; and Azure forces you to use the last available IP address on the gateway subnet as the BGP listener. These two settings are not compatible.

We will still use dual tunnels on the AWS side and active-active on the Azure side. I’ll initially build active-standby from Azure (i.e. only one VPN GW Public IP) and test connectivity. This is the scenario (please ignore the IP addresses):

Only then I’ll add the second VPN GW on the Azure side to accomplish this scenario:

AWS Configuration

VPC IP addressing 172.31.0.0/16 Region us-east-2

VGW IP 1

18.220.213.254 VGW IP 2 52.15.136.135 CGW 52.174.95.131

Azure configuration

VNet IP addressing 10.0.0.0/16 Region West Europe VPN GW Public IP 52.174.95.131 Local Network Gateway 1 18.220.213.254 Local Network Gateway 2 52.15.136.135

Step by step setup

Bring up the first tunnel

Create the Azure VNet and the AWS VPC as per the above settings Create the Azure VPN GW (do this first as it takes a while to get created) Create the AWS VGW and attach it to your VPC Create the tunnel in the AWS side as per the above settings, no BGP and leave inside tunnel and PSK to be automatically created Create the new connection in Azure pointing to the first VGW, which you’ll need to create as Local Network Gateway. Use the right PSK from AWS’s configuration. At this point your first tunnel will come up:

Bring up the second tunnel

Create a second connection on the Azure side, for which you will have to create a new Local Network Gateway with the 2nd VGW’s IP address and the same IP address space.

And that’s it, you should now see both tunnels up. Yeah, it’s that easy!

Routing

Now that the tunnel is UP, the Azure VNet will effectively learn the routes to the networks at the other side of the tunnels. You can check those routes in the effective routes for the NIC of any VM deployed in the VNet.





The AWS side requires you to enable route propagation on the VPC’s route table first:

Once enabled you should immediately see the route in the route table:

Testing

I have deployed a VM on each side of the tunnel and I’ll do some testing. I’ll start just with ICMP, but remember ICMP is stateless you it plays quite well with asymmetric routing.

Next up we’ll run some iPerf tests to show TCP connectivity. As there are not stateful devices receiving traffic on different interfaces, the potential asymmetry on this setup should not be a problem.

AWS VM IP Address 172.31.33.246 AWS VM size t2.micro Azure VM IP address 10.0.1.4 Azure VM size A0

ICMP test

We’ll ping the AWS VM from the Azure VM, capture the output of the ping command on the Azure side and the output of a tcpdump (only the on-screen output) on the AWS side.

Azure

pjperez@vpntest-Azure:~$ ping 172.31.33.246

PING 172.31.33.246 (172.31.33.246) 56(84) bytes of data.

64 bytes from 172.31.33.246: icmp_seq=1 ttl=254 time=109 ms

64 bytes from 172.31.33.246: icmp_seq=2 ttl=254 time=109 ms

64 bytes from 172.31.33.246: icmp_seq=3 ttl=254 time=110 ms

64 bytes from 172.31.33.246: icmp_seq=4 ttl=254 time=108 ms

64 bytes from 172.31.33.246: icmp_seq=5 ttl=254 time=110 ms

^C

--- 172.31.33.246 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4002ms

rtt min/avg/max/mdev = 108.781/109.743/110.653/0.611 ms

AWS

[root@ip-172-31-33-246 ~]# tcpdump host 10.0.1.4

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

12:10:12.944619 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 1, length 64

12:10:12.944647 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 1, length 64

12:10:13.945001 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 2, length 64

12:10:13.945029 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 2, length 64

12:10:14.946326 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 3, length 64

12:10:14.946350 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 3, length 64

12:10:15.945779 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 4, length 64

12:10:15.945807 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 4, length 64

12:10:16.946711 IP ip-10-0-1-4.us-east-2.compute.internal > ip-172-31-33-246.us-east-2.compute.internal: ICMP echo request, id 25440, seq 5, length 64

12:10:16.946738 IP ip-172-31-33-246.us-east-2.compute.internal > ip-10-0-1-4.us-east-2.compute.internal: ICMP echo reply, id 25440, seq 5, length 64

^C

10 packets captured

10 packets received by filter

0 packets dropped by kernel

As you can see there’s no packet loss and the latency is more or less expected for a cross-oceanic connection.

TCP (iPerf3) test

In this case we’ll start an iPerf3 server on the AWS side and push traffic with 8 parallel threads from the Azure side. I will only share the final result from each side for brevity.

Azure (Client)



[ ID] Interval Transfer Bandwidth Retr

[ 4] 0.00-10.00 sec 14.3 MBytes 12.0 Mbits/sec 3 sender

[ 4] 0.00-10.00 sec 12.6 MBytes 10.6 Mbits/sec receiver

[ 6] 0.00-10.00 sec 21.9 MBytes 18.3 Mbits/sec 6 sender

[ 6] 0.00-10.00 sec 19.0 MBytes 15.9 Mbits/sec receiver

[ 8] 0.00-10.00 sec 12.9 MBytes 10.8 Mbits/sec 3 sender

[ 8] 0.00-10.00 sec 11.2 MBytes 9.41 Mbits/sec receiver

[ 10] 0.00-10.00 sec 14.9 MBytes 12.5 Mbits/sec 8 sender

[ 10] 0.00-10.00 sec 13.1 MBytes 11.0 Mbits/sec receiver

[ 12] 0.00-10.00 sec 11.0 MBytes 9.21 Mbits/sec 6 sender

[ 12] 0.00-10.00 sec 9.38 MBytes 7.87 Mbits/sec receiver

[ 14] 0.00-10.00 sec 15.2 MBytes 12.7 Mbits/sec 5 sender

[ 14] 0.00-10.00 sec 13.2 MBytes 11.1 Mbits/sec receiver

[ 16] 0.00-10.00 sec 16.0 MBytes 13.4 Mbits/sec 8 sender

[ 16] 0.00-10.00 sec 13.7 MBytes 11.5 Mbits/sec receiver

[ 18] 0.00-10.00 sec 24.3 MBytes 20.4 Mbits/sec 2 sender

[ 18] 0.00-10.00 sec 21.4 MBytes 17.9 Mbits/sec receiver

[SUM] 0.00-10.00 sec 130 MBytes 109 Mbits/sec 41 sender

[SUM] 0.00-10.00 sec 114 MBytes 95.2 Mbits/sec receiver

iperf Done.

You’ll see there is a number of retransmissions and an overall performance of roughly 100Mbps. This is restricted by VM sizes on both sides as I’m using A0 on Azure (the smallest, slowest, most restricted SKU) and t2.micro on the AWS side.

AWS (Server)

[ ID] Interval Transfer Bandwidth

[ 5] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 5] 0.00-10.13 sec 12.6 MBytes 10.4 Mbits/sec receiver

[ 7] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 7] 0.00-10.13 sec 19.0 MBytes 15.7 Mbits/sec receiver

[ 9] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 9] 0.00-10.13 sec 11.2 MBytes 9.29 Mbits/sec receiver

[ 11] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 11] 0.00-10.13 sec 13.1 MBytes 10.9 Mbits/sec receiver

[ 13] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 13] 0.00-10.13 sec 9.38 MBytes 7.77 Mbits/sec receiver

[ 15] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 15] 0.00-10.13 sec 13.2 MBytes 10.9 Mbits/sec receiver

[ 17] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 17] 0.00-10.13 sec 13.7 MBytes 11.3 Mbits/sec receiver

[ 19] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[ 19] 0.00-10.13 sec 21.4 MBytes 17.7 Mbits/sec receiver

[SUM] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender

[SUM] 0.00-10.13 sec 114 MBytes 94.0 Mbits/sec receiver

Server listening on 5201

As you can see, this has been quite easy and works like a charm.

You don’t need anything else to have a reliable and performant VPN connection between Azure and AWS, but if you’d like to use active-active on the Azure side too please keep reading…

Steps to add active-active on the Azure side

Make sure you have created your Azure VPN Gateway as active-active, otherwise make the change as per the instructions. Grab the secondary public IP from the Azure VPN Gateway and create a new AWS VPN connection with that. You can, and unfortunately should, keep using static routing. Create two new S2S connections on the Azure side just as we have done for the initial tunnels, but pointing to the two new AWS public IP address, this will bring a total of 4 connections for the gateway pair. Now you have a full VPN mesh between two firewall pairs!

Please note you don’t need to make further changes on AWS for routing as the route table is already accepting propagated routes.

Featured image: photo by israel palacio on Unsplash