Leave room for your network to grow while safely segmented

Life is full of competing priorities, and technology is certainly no exception. For some time we have known that the safest system is one that allows the minimal number of people or other systems access. For computer networks, this leads to a fairly obvious ideal configuration where each interface only allows pre-approved connections both inbound and outbound. Further, we would want to be able to inspect the traffic being sent to look for signs of intrusion or attack. Historically, the cost to achieve this desired state has been at odds with the perceived benefits, but with the advent of the cloud, we can get closer than ever before!

Internet Protocol Addressing and Routing

Where the rubber hits the road and the internet we all know and love becomes real is in the world of IP (“internet protocol”) and the two most common session protocols TCP (“transmission control protocol”) and UDP (“user datagram protocol”).

Let’s break down how data moves between network interfaces in a typical network, leaving out how the data gets generated by the application and picking up at the IP layer. IP provides the methodology for ensuring that chunks of data known as “packets” can be moved from one network interface to another across separately managed networks. The smallest logical “network” is the subnet. Within a subnet, network interfaces can generally communicate directly with each other. We generally use what are known as “switches” to hook up the various network interfaces that are part of the same subnet.

Depiction of a subnet containing multiple network interfaces connected together via a network switch

In IP, a subnet is generally described by it’s starting IP and the size of the subnet in the form of a bitmask length. For example, the network 10.0.0.0/8 starts at IP 10.0.0.0, and ends at 10.255.255.255. The “/8” portion indicates that you can identify this subnet by the first 8 bits of the address, and the remaining 24 bits are used to identify the interface within the network.

For a packet to move between subnets, we rely on devices that provide “routing” functionality. For an IP based network, these routers examine the destination IP address of a given packet, consult the “routing table” associated with the network interface the packet arrived from, and use the resulting destination interface to push the packet back out.

Multiple subnets interconnected by an network router

What might the ideal rule controlling traffic between interfaces look like then? Perhaps we require the packet be proven to come from a known user, being sent to a known application, and whose contents can be guaranteed to not have been modified in transit? (spoiler alert — research “service mesh” to see how to get close to this!)

You may have noticed that the devices we’re discussing are getting more and more complex in their decision making — switches simply look at the physical address (the “MAC” address) of the packet’s destination and push it out the appropriate port, while routers must analyze the IP address and lookup routing rules. This increasing complexity is no accident, and is designed to balance the trade-off between the granularity of protection afforded in the network and the expense of the devices connecting those devices. Not to mention the complexity of managing all those rules!

What Virtualization has Bought Us

When devices were predominantly physical in nature we generally purchased switches, routers, firewalls, and other network devices that were also primarily physical. This lead to increased pressure to create “flat” networks in order to keep costs down, which resulted in large swaths of systems were on the same subnet able to communicate directly. Is it any surprise then that many application owners don’t fully understand and appreciate the various network connection dependencies their applications have?

With the advent of virtualization, and especially with concepts introduced by various cloud vendors such as the “Virtual Private Cloud (VPC)” and “Virtual Network (VNET)” we now have the tools necessary to right size our subnets and enforce more intelligent connection rules than ever.

Layered Security

Within AWS we have the concept of a “security group”. These are applied to the various network interfaces (called ENI or elastic network interface) and provide a mechanism to supply IP-level inbound and outbound rules. This means we can enforce rules such as “this network interface is allowed to accept connections from the 10.0.1.0/24 network” or “this interface is allowed to initiate connections to the 0.0.0.0/0 network” and meet one part of our ideal state network control goals. Good stuff!

Unfortunately, while this provides a degree of security and control, it does not inspect the packet contents to ensure they aren’t malicious. Further, in regulated environments, we need to ensure that changes to connectivity between applications are fully auditable to remain compliant. We can provide auditability via secure CI/CD pipelines and infrastructure as code, but in many environments, we have legacy processes or immature teams where a more traditional firewall-based control plane is required.

How in AWS might we achieve our goals of micro-segmented networks with such a firewall control plane injected?

Key AWS Building Blocks

Before we can design a solution, we should spend a few moments analyzing the components that are used to create networks within AWS.

Virtual Private Cloud (VPC)

Virtual Private Clouds are the AWS mechanism of defining a private network within the AWS cloud. VPCs are a concept built on top of the AWS hyperplane, and well worth a deep dive! For our conversation here, it is most relevant to keep the following in mind:

VPCs are “flat” with respect to ENI to ENI communication. You cannot create a route out of the VPC for an IP address that lives within the IP address. You are expected to use security groups at this level. Creating a VPC is free VPCs are a region level construct within AWS

VPC Subnet

Within the VPC you define a series of subnets. These subnets must be within the set of IP CIDRs that have been associated with a VPC and are bound to a particular availability zone (AZ). Each subnet has a routing table that is used to determine the “next hop” based on the destination address.

VPC Route Table

The mechanism for determining these next hops is the VPC route table. Typical next hops used within a routing table within AWS are the following:

Internet Gateway (IGW)

NAT Gateway (NGW)

Elastic Network Interface (ENI)

Virtual Gateway (VGW)

Transit Gateway (TGW)

VPC Peer

Gateway Interface Endpoint (for example to provide private access to S3 or DynamoDB)

Keep in mind that if the destination address is contained within the IP CIDRs associated with the VPC it is considered “local” and you will not be allowed to create a route for such an address.

VPN Connection

AWS VPN connections are established between a “Customer Gateway (GGW)” which specifies the IP addresses where the remote side (non-AWS) of the VPN tunnel should be connected and whether automated route table exchange should happen via BGP. This dynamic route exchange option requires you to select an autonomous system number (ASN). Keep this ASN in mind — we’ll need it shortly as we discuss the transit gateway.

Once the CGW has been created, you create the VPN by specifying the CGW and either a virtual private gateway or as we will use in this example, a transit gateway.

Internet Gateway

The internet gateway is required for your VPC to have a direct connection to the internet. You attach it to your VPC, then in a route table create an appropriate route. For example, you can create a “public” subnet by routing 0.0.0.0/0 to the IGW, and by either attaching an elastic IP to an ENI or allowing the ENI to acquire an automatic public address.

NAT Gateway

The most common way to provide outbound internet access for network interfaces within a VPC that you do not wish to be publicly addressable is by creating a NAT Gateway. You typically place the NAT gateway in a “public” subnet as described above, then set the 0.0.0.0/0 route of “private” subnets to be the NAT Gateway. While NAT Gateways are highly available, and autoscale to meet capacity demands, they do not provide any traffic inspection or filtering capabilities.

Transit Gateway (TGW)

The transit gateway exists to provide a solution for interconnecting many VPCs. While it is possible to create a “mesh” of peered VPCs to allow them to communicate, maintaining such an arrangement creates management overhead and as the number of VPCs in an enterprise increases you quickly reach the limits of peering connections.

Instead of requiring you to create such a mesh, the transit gateway functions more as a hub-and-spoke model. You attach a VPC to the transit gateway by specifying the subnets in which to place the TGW’s network interfaces, then update the route tables of the subnets that should be able to access the broader network to reference the transit gateway. If you specify this route as 0.0.0.0/0, you can force all non-local traffic to go through the transit gateway.

Each attachment of a VPC to the transit gateway is associated with a TGW route table. These route tables can be used to force all traffic between VPCs to run through a central “inspection” VPC which we will refer to as the “core.”

Equal Cost Multi-path Routing (ECMP)

The concept of equal-cost multi-path routing is that when multiple routes are advertised between two destinations and they have the same “cost” a network can choose to balance each stream across the available connections. As a result, we gain two key benefits — scale and availability.

For some time AWS has supported ECMP across Direct Connect, and with the introduction of the Transit Gateway, they added ECMP for VPN connections. We will take full advantage of this!

Custom Network Appliances in AWS

As discussed earlier when reviewing route tables, a potential target of a route within AWS is an Elastic Network Interface (ENI). This enables us to add custom devices (for example next-generation firewalls, or transparent proxies) and set routing up such that traffic must pass through these devices. Pretty cool, but now we must solve for both scaling and availability subject to all the typical AWS limitations.

What can we do to create a highly available solution? We can certainly minimize our mean time to recovery (MTTR) by setting up a warm spare, and upon a health check failing automatically update the route table to fail over to the spare device, and this is indeed a reasonable solution for availability — but does nothing to address how we can scale.

To scale in this arrangement, we would need to limit the devices pointing to a particular appliance by creating multiple subnets, each with a dedicated appliance, and individualized health monitoring and spare capacity.

Is there a better solution?

Marrying the Building Blocks and Creating a Highly Available, Scalable Network Core

Lets begin by describing a few key characteristics we’d like our approach to have:

Meet service level agreements made to each network segment (“workload VPC”) in the form of guaranteed-available minimum bandwidth transiting the core. Ensure failure of an individual device affects at most a single network stream, where a reconnect of that stream will immediately be re-routed. Be easy to reason about in terms of the connectivity path, failure modes, approach to scaling, and how we would recover. Failure of each component results in a secure, and where ever possible, continuously available environment.

To get over the limitations of routing to an ENI, we can use the ECMP capabilities of the TGW to create a fleet of devices to perform the type of inspection and advanced rule evaluation we’re interested in doing. On the other side of that device, the native attachment mechanisms of the TGW are already highly available and scalable.

Ensuring we can meet our obligations to our workload VPCs that will be routed through the fleet is now a simple matter of adding up the required capacity, and dividing by the maximum capacity of an individual device. To reach an acceptable level of risk for our availability targets we can take this capacity based device count, and add a number of additional devices. Unlike our previous scenario where the extra devices were warm, but not adding capacity, in this scenario, they are actually hot — able to directly increase the overall capacity of the system while they enable us to meet our availability targets!

Both failure and updates to the fleet can be handled by either a rolling deployment, or by creating an entirely new core, and updating the TGW’s route table to point to the new ECMP balanced set of VPN connections!