Evolution of the AppDirect Kubernetes Network Infrastructure

By Alexandre Gervais / Jun 07, 2018

At AppDirect, we embraced Kubernetes—an open-source system for automating deployment, scaling, and management of containerized applications—since the early days of the project. We ran sandboxes with beta releases and successfully pushed applications and served traffic with Kubernetes 1.1 in production.

In October of 2016, members of the AppDirect team attended the second Kubernetes Montreal Meetup, and one question was on everyone’s lips during the event: "How the heck do you handle and dispatch incoming requests to services?"

Back then it seemed there were no easy answers or magic solutions, but in the two years since the meetup we’ve managed Kubernetes and network infrastructure in many different environments, such as static production setups, dynamic development environments, cloud based and on premise. From that experience, our answer to the question has evolved.

Our Approach Before Kubernetes

Before experimenting with Kubernetes, AppDirect relied heavily on Terraform “infrastructure as code" to document, review, change, and reproduce our infrastructure. In addition to preferring our Terraform toolbox over manual configuration and untraceable infrastructure changes, we had a major constraint—our inability to deploy our infrastructure in a "cloud agnostic" manner, or on premise. This meant we did not allow Kubernetes to interact with Amazon Web Services (AWS) to, say, change route tables or dynamically create load balancers for LoadBalancer-typed services.

With these considerations in mind, here are the different approaches we used to allow external inbound requests to reach the desired service.

Different Approaches with Kubernetes

Ingress Controller

Load Balancers

Needing to walk before we ran, our first approach was the most trivial: You need to expose a service? Throw in a load-balancer sending traffic to a static NodePort. What a great idea! This pattern allowed us perform SSL termination on each load-balancer and apply different network ACLs. However, that also meant each service had to be exposed with a different domain name, but no worries. With our expertise using Terraform, we built a reusable module to prevent duplication and iterate faster.

This solution scaled for the first 10 services but was becoming a drag scaling up to 20 services. We also knew onboarding new services was not going to slow down; teams we're picking up the pace in working and shipping in a micro-service architecture. There were other clear drawbacks:

We had to maintain a global list of available and reserved NodePorts.

Work had to be repeated in the dev to test to prod promotion cycles.

Development teams were not autonomous. Every new service onboarding required infrastructure changes and infrastructure access.

A clutter of load-balancers and DNS entries.

To try and solve for this, we threw in a little more automation.

HAProxy with Consul Template

We experimented with Consul and Consul-template to dynamically create and update an HAProxy configuration. The plan was simple: Replace all statically created load-balancers with one dynamic proxy configuration.

Notice that Kubernetes itself was not aware of Consul. This idea required us to use an external process to register the Kubernetes services (and their now randomly allocated NodePort) to the pool of registered Consul services. This external tool was incorporated into our delivery pipeline, so service registration was performed on user-triggered events and was not actively listening for state changes.

Although it allowed us to scale our development environments faster, this solution never made it to production. In the end we were DDoS-ing ourselves with too many Consul-template watches and ghost configurations living in our development environments caused by unpredictable service deregistration (due to the nature of the service registration mechanism in Consul).

We were seeing the benefits of this approach, but also the drawbacks of all the added complexity and moving pieces. Surely it could be simplified. We just had to generate an active proxy configuration using the Kubernetes API.

Domain Router

Our current ingress controller solution is custom built to interact with the Kubernetes API and generate an auto-reload HAProxy configuration. It is only dependent on a set of proprietary annotations we apply to Kubernetes service objects. Annotations are necessary in order to configure the network ACLs and customize the domain names exposed by the domain router.

In Envoy-popularized terms: Go control plane with an HAProxy data plane.

The infrastructure components required to publicly expose the domain router are very similar to our initial approach, simply placing a load balancer in front of the service. It is simplified, however, by using a wildcard DNS entry and SSL certificate. It is all down to a single NodePort and load balancer.

Accessing services based on the domain name introduced some complexity in our service architecture, trickling down to our partners and client integrations. This was because:

Every new API endpoint led to more configuration keys.

We had no unified auth and rate-limiting strategies.

We were exposing the internal logic of the business domain modeling.

All old domains and APIs were still pointing to a single application, the AppDirect platform

API Gateway

Building on the lessons learned from building a domain-based ingress controller, the growing appetite for new services and demands on the AppDirect platform naturally led to us incorporate a more intelligent piece of software to our infrastructure: an API gateway.

The goal of our API gateway was to leave the exposed public APIs untouched and accessible even to legacy URLs and partner-customized domains, yet allow us to grow by "injecting" and replacing old components one by one. After all, introducing a new domain for each service meant adding logic and configuration complexity to client applications. Other wins would include full multi-tenant support for all services (relying on the request host) and enforcing a unified security strategy (authentication, authorization, rate limiting, CORS, etc.)

Kong

Our first prototype of an API gateway implementation was carried out with Kong. We had some big questions from the beginning:

Kong uses an external datastore for which we had no operational knowledge. Everything we run in Kubernetes is stateless, so introducing a StatefulSet or managing yet another set of virtual machines specifically for this purpose was a rude awakening.

We had to build a new CI/CD workflow for Kong configuration management, aligning our application deployments with active Kong configuration.

In order for us to implement AuthN/AuthZ strategies, we had to write custom plugins and package our own distribution of Kong.

The team wasn’t happy with this prototype and wanted to build something closer to our domain router, so we shifted our focus in late 2017 to one of the new kids on the block, Ambassador.

Ambassador

Relying entirely on the Kubernetes native API—which we know and love—Ambassador is lightweight, stateless, and uses no external datastore. Ambassador exclusively makes use of Kubernetes annotations to drive the active route configuration (i.e., it is the control plane of Envoy's data plane). With the underlying Envoy component built-in metrics, we have full observability over the API gateway traffic and behavior through our existing Grafana dashboard.

The open source project's design was so similar to our custom-built domain router we immediately felt confident this was the right approach. Not only is it similar, we didn’t have to change any of our CD pipeline. Since gateway configurations are defined in Kubernetes service annotations, they are also reviewable and auditable as it should be with any infrastructure change.

Our deployment strategy for Ambassador is also the same as our domain router: bind it to a single NodePort and terminate the SSL at the load balancer. Ambassador also allowed us to extend the AuthN/AuthZ mechanisms using their External Auth service. In our case, we decided to deploy our implementation of this service as a sidecar running in Ambassador's pod.

The end result: Using existing partner-specific domains, services are routed based on paths with the Ambassador API gateway.

The Answer We’ve Been Looking For

Ever since the inception of Kubernetes, we’ve witnessed a community of ideas coming together and evolving best practices. Ambassador is one of the building blocks we adopted and are proud to be contributing to.

Alexandre Gervais is a Staff Software Engineer at AppDirect.