Almost any kind of software can be continuously delivered, and an internet-based software application (exposed via an API or web page) is particularly well suited to this practice because you typically have complete control of the rollout of new functionality. All web-based applications expose their functionality via some form of “gateway” at the point of ingress, and the technology used here can include simple language container runtimes (Passenger, Tomcat etc), smart proxies (NGINX, Ambassador), and enterprise-grade API management solutions (CA, TIBCO etc). In this article you will learn about the impact of your choice of gateway on your ability to continuously delivery web applications.

Continuous Delivery 101

Continuous Delivery is fundamentally a set of practices and disciplines in which software delivery teams produce valuable and robust software in short cycles. As noted by Steve Smith, an active thought-leader within this domain, “continuous delivery is achieved when stability and speed can satisfy business demand [in a sustainable manner]”. Your goal is to make deployments as predictable and routine as possible, and to ensure that you and the organisation can obtain effective feedback from each release of new business functionality.

As an edge gateway or API gateway is typically acting as the “front door” to your application, it can interact with the continuous delivery (CD) process in a multitude of ways. I’ve been fortunate enough to watch Jez Humble, Dave Farley and Dan North speak at several conferences, and nearly every time I hear them talk about continuous delivery they mention the concept of a “walking skeleton” (or “dancing skeleton” in Dan’s case). A walking skeleton often takes the form of a real application that acts as proof-of-concept of your high-level designs, which is delivered through to production via an end-to-end deployment process. In my experience working as a consultant I have found this an invaluable technique.

Design and Development: Walking Skeletons

Implementing a walking skeleton and creating an associated build pipeline that deploys your code through to QA, staging and production environments identifies many issues early within the design and prototyping stages of a project — both from a technical perspective and an organisational/social perspective. I’m going to save talking about the organisational issues for another article (suffice to say that a walking skeleton often gets InfoSec involved early in the project, and also identifies political blockers), but the technical issues I have often bumped into are wide-ranging and diverse — including having to copy deployment artifacts manually (via USB stick) to the production environment, discovering that production is running a ten year old version of Linux, and realising that no-one has the password to login to the firewall appliance!

The primary benefit of using an API gateway within the development stage of a project is the ability to deploy your application or service to production and “hide” it — i.e. not expose the endpoints to end-users. A gateway can block traffic to a new endpoint, or simply not expose the endpoints publicly. Some gateways can also be configured to route only permitted traffic to a new endpoint, either via security policies or request header metadata. This allows you to test your walking skeleton application deployed into the real environment. This is more likely to give you results that are highly correlated within an actual live release — you can’t get a more production-like environment than production itself!

Test and QA: Shadowing and Shifting

A modern API gateway can help with testing on many levels. As mentioned previously, we can deploy a service — or a new version of a service — into production, hide this deployment via the gateway, and run acceptance and nonfunctional tests here (e.g. load tests and security analysis). This is invaluable in and of itself, but we can also use a gateway to “shadow” (duplicate) real production traffic to the new version of the service and hide the responses from the user. This allows you to learn how this service will perform under realistic use cases and load.

Unicorn organisations use the technique of shadowing traffic (or “dark launching”) features all of the time. Facebook famously tested the release of the username registration service in 2009 by directing real user traffic at the service and hiding the data that was returned. Twitter have also talked about their creation and use of the internal “Diffy” tool that acts as a proxy, multicasts requests and then compares, or “diffs” the response.

Adrian Colyer wrote a great summary of a 2016 paper written by the Facebook team that talked about their “Kraken” load testing tool. In a nutshell, the Kraken tool integrates tightly with the Facebook gateways and can “shift” (or route) part of its global traffic to systems (or data centers) under test and monitor the results — reverting the traffic shifting if monitoring systems show error. For example, if Facebook want to stress test a new data center that has just opened in Germany, then they can shift all of the European traffic to this center in a controlled and gradual fashion, and watch what happens. I appreciate that not all of us are Facebook, but I think this is a very interesting technique nonetheless, and it helps me think differently about the way in which I can utilise an application gateway.

The final topic of testing that I can’t resist talking about is referred to by many names: chaos engineering, chaos testing, chaos experimentation, or even “resilience testing”. This type of testing increased in popularity as teams are building distributed systems and bumping into the realities and complex failure scenarios when working within this domain. Chaos testing allows a team to form a hypothesis about how a system will react to failure, design and run the experiment, and monitor what happens. The Netflix team have historically been the pioneers within this space, and I’m sure many of you will have heard (or even used) the Chaos Monkey and Simian Army. Their second evolution of these tools inspired “Failure Injection Testing”, where failure could be injected into specific requests (perhaps for a test user, or cohort of tolerant end-users) and the results monitored. Target requests were identified and modified via an application gateway.

Deploy and Release: Decouple for Speed and Safety

I’m sure many of you have thought that some of the API gateway techniques already mentioned could be used to deploy and release functionality. I agree. It’s worth mentioning that it is considered best practice within the continuous delivery community to decouple deployment from release. The term “deployment” refers to the act of deploying a change to application components or infrastructure, and the term “release” refers to the act of enabling or exposing a feature to end-users (with a corresponding business impact). An API gateway can help with this in primarily two ways: smart routing (a.k.a. “dynamic routing”), and feature flagging (a.k.a “feature toggling”).

Smart routing can enable blue/green releases, canary releases (where a portion of traffic is routed to the newly “released” service), and incremental rollout. Incremental of “phased” rollout is a logical extension to canary releasing, where all traffic is gradually routed to the new service over time. The benefits of all these technique are fully realised with an effective monitoring solution — particularly if this is integrated within the gateway — as any deviation in operational or business metrics can trigger the halt of a rollout, and potentially even trigger a rollback.

Feature flagging — the ability to “toggle” parts of your application on and off — can be implemented by adding filters or “plugins” to a gateway that can modify request header metadata. This metadata can in turn be used to determine with a gateway which services to call and what data to transform, and can also be inspected within an application or service further down in the call stack in order to provide fine-grained control of the functionality being exposed. Flickr and Etsy have talked extensively about their use of these techniques, and the ever-generous Netflix team have created some fantastic content about how they run experiments and A/B test using this technique.

One word of caution I am keen to offer here relates to coupling. When working with flexible API Gateways like Zuul, which allow the dynamic injection of scripts at runtime in order to alter routing and request/response data transformation, it can be tempting to introduce business logic into the gateway — accidentally, or otherwise. With great power comes great responsibility, and although this can be beneficial for (niche) use cases, the coupling and lack of cohesion introduced by spreading business logic between a service and gateway makes the continuous delivery of features more challenging, simply because of the additional moving parts and coordination and orchestration required between them.

Getting Started with an API Gateway

Hopefully this article has convinced you of the benefits a well-implemented and well-managed API gateway can provide. Your choice of gateway will largely be determined by your requirements for development workflow, testing processes, and target deployment platform:

As part of my work with Datawire I’ll be creating a series of articles in the future that will provide a guide to implementing many of the techniques mentioned above using the Ambassador Kubernetes-native API Gateway. The introductory article that demonstrates how to deploy and release a Java microservices-based application with Kubernetes and Ambassador can be found on the Ambassador blog.