Engineering

Using Mutating Admission Controllers to Ease Kubernetes Migrations

Migrating applications to a new Kubernetes environment can be challenging, but Admission Controllers can help reduce risk and lower the impact to your users.

Published in

Ambassador Labs

6 min readNov 21, 2022

The old staging environment of Ambassador Labs was just a set of namespaces in the same cluster as our production environment. Using namespaces to separate environments is not ideal since changes in staging can affect production.

For example, updating cluster resources like Ambassador Edge Stack CRDs could impact the whole cluster. It also made us deploy some staging changes manually instead of using the GitOps-style continuous delivery approach due to the risk of breaking production and, as such, missing the key benefits GitOps provides.

To address these issues, we separated both environments, running each in a Kubernetes cluster. Migrating applications to a new environment can be challenging due to differences between external dependencies, hardware constraints, etc. In addition, applications will constantly change, which makes it difficult to keep the environments up to date.

In this article, I’ll explain how mutating admission controllers enabled us to quickly update the manifests for the new environment while keeping the old environment running.

Our approach to GitOps

Before I explain how we tackled the migration, let’s talk about how we do GitOps.

At Ambassador Labs, all the manifests deployed to a Kubernetes cluster are stored in a Git repository. This repository is monitored by ArgoCD, and when there is a new version of a manifest, ArgoCD will deploy it.

There are typically two types of applications in our cluster:

Applications that we use but don’t own, like Grafana and Prometheus. The manifests for these applications are kept in the manifest repository.
Applications owned by Ambassador Labs, like Edge Stack and Telepresence. The source code and manifests for these applications are kept in their own repository. When a change is made, a job will push the latest version of their manifests to ArgoCD’s repository.

Kubernetes Admission Controllers to the Rescue!

One of the options we had to migrate to the new staging environment was to create a copy of all the staging manifests. This could be implemented in a couple of different ways:

Create a new Git repository to be used as the source for staging’s manifests. All Ambassador-owned applications would push the staging manifests here. This approach would require changing several repositories and keeping the old and new staging manifests up to date until the old environment is torn down.
Create branches on each Ambassador-owned application repo, which would be used as the source for staging. Downsides of this approach include: the new staging environment will slowly diverge from the old one unless the branches are kept up to date, and ArgoCD will have to be given permissions to multiple source repositories, which is undesirable from a security perspective.

From a technical point of view, both options would work, but with their respective limitations. However, there were a couple of requirements that had to be met:

The old staging environment should be kept up to date (as a backup) in case the new one has issues.
The staging environment is used by different development teams, each one with its own priorities. This meant that the migration process should be transparent to them, and we should be able to onboard teams gradually as their applications were migrated to the new cluster.

Empathize with Users

The last requirement is not a technical one, but a ‘people’ one. One of Ambassador Labs’ core values is to empathize with users, so we wanted a solution that would not just meet the technical goals but also solve the problem in a way that was sympathetic to our developers.

Keeping in mind the previous constraints, we decided to use the same Git repository in the old and new environments and use a mutating admission controller to patch resources on the fly in the new cluster.

This gave us the following advantages:

All changes necessary to bootstrap the new cluster were defined in one place. Later, these changes could be ported to each of the application repositories.
The old and new staging environments could now work side by side, and their differences were minimal.
If there were issues with the new environment, it was possible to revert to the old one with minimal effort.

How does it work?

Kubernetes has a MutatingAdmissionWebhook admission controller that provides a mechanism for configuring webhooks that can modify (mutating webhooks) or reject (validating webhooks) requests to the Kubernetes API server. Typically, these webhooks are used for enforcing security practices, ensuring resources follow specific policies, or configuration management (e.g., configuring resource limits).

We created a mutating webhook called patcher that intercepts every request to create or modify certain resources (e.g., ConfigMaps and Deployments), and transforms the manifest if necessary. Here are some of the updates that patcher handles:

Initializing Docker registry credentials: Patcher will set the imagePullSecret field on the ServiceAccount used by a pod, which will enable the pod to get images from a private Docker registry.

Replacing values referencing the old staging environment or its dependencies: To be able to run both environments at the same time, we need to replace any settings that cause conflicts or are incorrect, like hostnames or single-instance integrations.
To do this, patcher checks the Webhook request and if it contains an object that should be modified, it returns a response with a JSON patch representing the changes to apply.

Blocking the creation of objects that are deprecated or unwanted in the new environment: This is done by setting the allowed key of the webhook response to false.

Now that I’ve explained how patcher works, let’s look at the process to set up a new cluster:

Install patcher and configure it to block production-only applications.
Configure ArgoCD for the new cluster, and deploy all applications so that they start in an ‘inactive’ state. e.g., Secrets for connecting to a DB will be missing.
For each application:

Find all the settings that have to be updated and create a patch.
Update the application, so it becomes active.

Once the new cluster is stable, and the old one has been retired, we will port each patch back to the source repository and remove it from patcher.

We made other considerations!

While the use of admission controllers made the staging migration easier, there were a few areas that required special consideration to avoid issues. I explain them below:

Cluster availability

Admission controllers can be considered part of the control plane, and as such, they should be carefully designed and implemented. For instance:

Admission webhooks should be highly-available and quick to return. Any issues with the webhook being unavailable can make the cluster inaccessible or break it in unexpected ways. If the webhook is deployed in the Kubernetes cluster, multiple instances can be run behind a service to improve availability.
Limit the scope of the objects modified by the webhook. While you can configure a webhook to accept requests for all objects, this is not recommended since any issues with the webhook can severely impact the cluster. A better approach would be to limit the objects that the webhook modifies.

Persisted vs. live state

The use of mutating admission controllers can introduce differences between the source manifests and the live manifests, which can make it difficult to troubleshoot issues or update applications. At Ambassador Labs, we used mutating webhooks as a tool to achieve a goal quickly, but once we finished the migration, additional work was done to port the changes introduced by patcher back to the source repositories.

Conclusion

Admission controllers are a very powerful tool, and migrating applications is just one use case for them. However, care must be taken to ensure that webhooks are reliable and that they don’t introduce unexpected behaviors or change the cluster state in unanticipated ways. If you’re interested in learning more, the following references are a good start.