Continuous Delivery: How Can an API Gateway Help (or Hinder)?

Published in

Ambassador Labs

7 min readMar 6, 2018

Almost any kind of software can be continuously delivered. An internet-based software application (exposed via an API or web page) is particularly well suited to continuous delivery because you typically have complete control of the rollout of new functionality. All web-based applications expose their functionality via some form of “gateway” at the point of ingress, and the technology used here can include simple language container runtimes (Passenger, Tomcat, etc.), smart proxies (NGINX, Ambassador), and enterprise-grade API management solutions (CA, TIBCO, etc.). In this article you will learn about the impact of your choice of gateway on your ability to continuously deliver web applications.

What is Continuous Delivery?

Continuous Delivery is a set of practices and disciplines in which software delivery teams produce valuable and robust software in short cycles. As noted by Steve Smith, an active thought-leader within this domain, “continuous delivery is achieved when stability and speed can satisfy business demand [in a sustainable manner]”. Your goal is to make deployments as predictable and routine as possible and ensure that you and the organisation can obtain effective feedback from each release of new business functionality.

As an edge gateway or API gateway is typically acting as the “front door” to your application, it can interact with the continuous delivery (CD) process in many ways. For example, I’ve watched Jez Humble, Dave Farley, and Dan North speak at several conferences. Nearly every time I hear them talk about continuous delivery, they mention the concept of a “walking skeleton” (or “dancing skeleton” in Dan’s case).

A walking skeleton often takes the form of a real application that acts as a proof-of-concept of your high-level designs, which is delivered through to production via an end-to-end deployment process. In my experience working as a consultant, I have found this an invaluable technique.

Design and Development: Walking Skeletons

Implementing a walking skeleton and creating an associated build pipeline that deploys your code through to QA, staging, and production environments identifies many issues early within the design and prototyping stages of a project — both from a technical perspective and an organisational/social perspective.

I’m going to save talking about the organisational issues for another article (suffice to say that a walking skeleton often gets InfoSec involved early in the project and also identifies political blockers), but the technical issues I have often bumped into are wide-ranging and diverse — including having to copy deployment artifacts manually (via USB stick) to the production environment, discovering that production is running a ten-year-old version of Linux, and realising that no-one has the password to login to the firewall appliance!

Using an API gateway within the development stage

The primary benefit of using an API gateway within the development stage of a project is the ability to deploy your application or service to production and “hide” it — i.e., not expose the endpoints to end-users. A gateway can block traffic to a new endpoint or simply not expose the endpoints publicly. Some gateways can also be configured to route only permitted traffic to a new endpoint, via security policies or by requesting header metadata. This allows you to test your walking skeleton application deployed into the real environment. This is more likely to give you highly correlated results within an actual live release — you can’t get a more production-like environment than the production itself!

Test and QA: Shadowing and Shifting

A modern API gateway like Ambassador Edge Stack can help with testing on many levels. We can deploy a service — or a new version of a service — into production, hide this deployment via the gateway, and run acceptance and nonfunctional tests here (e.g., load tests and security analysis). This is invaluable in and of itself, but we can also use a gateway to “shadow” (duplicate) real production traffic to the new version of the service and hide the responses from the user. This allows you to learn how this service will perform under realistic use cases and load.

Unicorn organisations use shadowing traffic (or “dark launching”) features all of the time. For example, Facebook famously tested the release of the username registration service by directing real user traffic at the service and hiding the data that was returned. Twitter has also talked about their creation and use of the internal “Diffy” tool that acts as a proxy, multicasts requests, and then compares or “diffs” the response.

Adrian Colyer wrote a great summary of a 2016 paper written by the Facebook team that talked about their “Kraken” load testing tool. In a nutshell, the Kraken tool integrates tightly with the Facebook gateways and can “shift” (or route) part of its global traffic to systems (or data centers) under test and monitor the results — reverting the traffic shifting if monitoring systems show an error. So, for example, if Facebook wants to stress test a new data center that has just opened in Germany, they can shift all of the European traffic to this center in a controlled and gradual fashion, and watch what happens.

I appreciate that not all of us are on Facebook, but I think this is a very interesting technique nonetheless. It helps me think differently about howI can utilise an application gateway.

Chaos Engineering — Chaos Testing

The final topic of testing that I can’t resist talking about is referred to by many names: chaos engineering, chaos testing, or “resilience testing”. This type of testing increased in popularity as teams build distributed systems and bump into the realities and complex failure scenarios when working within this domain.

Chaos testing allows a team to hypothesize how a system will react to failure, design and run the experiment, and monitor what happens. The Netflix team have historically been the pioneers within this space, and I’m sure many of you will have heard (or even used) the Chaos Monkey and Simian Army. Their second evolution of these tools inspired “Failure Injection Testing”, where failure could be injected into specific requests (perhaps for a test user or cohort of tolerant end-users) and the results monitored. Target requests were identified and modified via an application gateway.

Deploy and Release: Decouple for Speed and Safety

I’m sure many of you have thought that some of the API gateway techniques already mentioned could be used to deploy and release functionality. I agree. It’s worth mentioning that it is considered best practice within the continuous delivery community to decouple deployment from release.

The term “deployment” refers to the act of deploying a change to application components or infrastructure, and the term “release” refers to the act of enabling or exposing a feature to end-users (with a corresponding business impact). An API gateway can help with this in primarily two ways: smart routing (a.k.a. “dynamic routing”) and feature flagging (a.k.a “feature toggling”).

Smart routing can enable blue/green releases, canary releases (where a portion of traffic is routed to the newly “released” service), and incremental rollout. Incremental “phased” rollout is a logical extension to canary releasing, where all traffic is gradually routed to the new service over time. The benefits of all these techniques are fully realised with an effective monitoring solution — particularly if this is integrated within the gateway — as any deviation in operational or business metrics can halt a rollout and potentially even trigger a rollback.

How to Implement a Feature Flagging?

Feature flagging is the ability to “toggle” parts of your application on and off and can be implemented by adding filters or “plugins” to a gateway that can modify request header metadata. This metadata can be used to determine which services to call and what data to transform with a gateway. And can also be inspected within an application or service further down in the call stack to provide fine-grained control of the exposed functionality. Flickr and Etsy have talked extensively about their use of these techniques. In addition, the ever-generous Netflix team have created some fantastic content about how they run experiments and A/B tests using this technique.

One word of caution I am keen to offer here relates to coupling. When working with flexible API Gateways like Zuul, which allow the dynamic injection of scripts at runtime to alter routing and request/response data transformation, it can be tempting to introduce business logic into the gateway — accidentally or otherwise.

With great power comes great responsibility. Although this can be beneficial for (niche) use cases, the coupling and lack of cohesion introduced by spreading business logic between a service and gateway make the continuous delivery of features more challenging. Simply because of the additional moving parts and coordination and orchestration required between them.

Getting Started with an API Gateway

Hopefully, this article has convinced you of the benefits a well-implemented, and well-managed API gateway can provide. Your choice of gateway will largely be determined by your requirements for development workflow, testing processes, and target deployment platform:

As part of my work with Ambassador Labs I’ll be creating a series of articles that provide a guide to implementing many of the techniques mentioned above using the Ambassador Kubernetes-native API Gateway. For example, how to deploy and release a Java microservices-based application with Kubernetes and Ambassador. Learn more on the Ambassador Labs blog.