In this article, I will share my experiences with 3 major types of Kubernetes ingress solutions. Let’s go through their pros and cons and find out which one suits your needs.

Nginx Ingress Controller

How does it work behind the scene?

First, Let’s deploy a hello-world service with 2 Pods running in demo namespace. Next, We apply the hello-world ingress resource file as below.

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

name: hello-world

spec:

rules:

- http:

paths:

- path: /api/hello-world

backend:

serviceName: hello-world

servicePort: 80

Let’s take a look when an Ingress resource is deployed, how does the ingress controller translate it into Nginx configuration?

For API path /api/hello-world, through an upstream directive as below, it will route incoming traffic to Service hello-world with 2 destination Pod IPs on container port 8080 in the namespace demo . Pretty straightforward, right? It is very similar to our iptables or ipvs routing table.

Nginx Ingress relies on a Classic Load Balancer(ELB)

Nginx ingress controller can be deployed anywhere, and when initialized in AWS, it will create a classic ELB to expose the Nginx Ingress controller behind a Service of Type=LoadBalancer . This may be an issue for some people since ELB is considered a legacy technology and AWS is recommending to migrate existing ELB to Network Load Balancer(NLB). However, under regular traffic volume, it never becomes a problem for us.

If NLB is preferred in your cluster, the good news is: it is supported since v1.10.0 as an ALPHA feature as below.

annotations:

# by default the type is elb

service.beta.kubernetes.io/aws-load-balancer-type: nlb

Supports distributed ingress changes

In a microservice environment, applications are created or decommissioned all the time. A centralized routing file which includes all the ingress rules, hosts and paths becomes harder to share and maintain by all the microservice teams. We heard that Uber has 4000 microservices from one of the sessions in KubeCon 2018. Imagine for each environment, you have to use a giant configuration file to manage thousands of ingress rules. Sounds like a nightmare to teams who have to collaborate on it.

Not only dedicated pipelines are needed to keep them deployed and synced for all the environments. The race condition can be another potential problem. Besides, we(DevOps) do not want to fight with these monolithic routing configurations and become a bottleneck for frequent changes from Development teams. It is preferred for each microservice development team to own their ingress and be able to change it however they like it. Therefore, for each application helm chart, we add an ingress.yaml under templates folder and these routing rules can be deployed or promoted together with app code to different environments, without any intervention.

One potential issue we thought might happen is, different applications ingress rules might conflict with each other on the API namespaces. However, in real development cycles, people always check swagger for all the existing APIs, do code reviews and write test automation. This potential API conflicts never becomes a real problem.

Do not use Nginx Ingress without a defined scope

When getting started, people tend to create an ingress controller with default values and start to try things out, such as deploying the dashboard or migrating a few applications. This is very common and we did the same thing. In the beginning, everything was going smoothly, and we kept adding new environments into our Kubernetes cluster. Until suddenly, we hit our first routing issue. During this onboarding environments process, Nginx configuration quickly grows up to 200 thousand lines and starts to have config reload issues.

We took a close look at Nginx Ingress controller helm chart and it has the following settings:

controller.scope.enabled : default to false, watch all namespaces

: default to false, watch all namespaces controller.scope.namespace namespace to watch for ingress, default to empty

This means, by default, each Ingress controller will listen to all the ingress events from all the namespaces and add corresponding directives and rules into Nginx configuration file.

Let’s take another look at the ingress controller deployment as below. Notice when the chart is deployed, these settings are translated into a container argument called --watch-namespace. This might come in handy and save you some time during the debug process.(Consistent naming convention is hard.)

Do not share Nginx Ingress for multiple environments

After abusing a shared ingress controller by 30+ environments, the Nginx config file got humongous and very slow to reload. POD IPs got stale and we started to see 5xx errors. Keep scaling up the same ingress controller deployment did not seem to solve the problem.

Since then, we started to use dedicated ingress controllers for each environment. Besides, there are bonus advantages to this solution:

Granular access control to each Nginx Ingress service, customized AWS security groups can be applied to each Ingress ELB.

Tuned configurations for special environments, such as Chaos Test and Perf Test. These settings are more flexible to manage, also more forgiving for us to try things out.

Tuning Worker Process and Memory Settings

Nginx has a default setting worker_processes auto , which means

the worker process number is the same as the amount of CPU cores on the host VM.

Notice I mentioned host VM instead of Container Resource? This is because Nginx is not CPU-cgroup-aware and ingress-controller will ignore the following 2 constraints:

spec.containers[].resources.limits.cpu

spec.containers[].resources.requests.cpu

When we upgrade our cluster VM nodes from m4.xlarge(4 vCPU cores) to C5.4xlarge(16 vCPU cores), suddenly, our ingress controller pod starts to fail during continuing ingress changes. After logging into the pod and checking /etc/nginx/nginx.conf , we found that each Ingress Controller pod has 16 worker processes instead of 4.

When frequent ingress changes happen, nginx will keep reloading the configuration for all the 16 worker processes and quickly it consumes all the memory we allocated for the pods and got them OOM-killed.

A lesson learned, however, there are several solutions to this problem:

Allocate more memory to the Nginx container in the pod.

Deploy ingress controller to VMs with less CPU cores using Pod Affinity.

Tune worker_process number through helm chart config

For the first solution, under our load test, when deployed on C5.4xlarge with 16 CPU cores, Nginx container launched 16 worker processes. With 1G memory(increased from 500M), it can handle the load without any failure.

AWS ALB Ingress Controller

AWS Application Load Balancer(ALB) is a popular and mature service to load balance traffic on the application layer(L7). Both Path-based and Host-based routing rules are supported. We have been leveraging this AWS service since it was launched. During the early phase of evaluation of Kubernetes Ingress controllers, AWS ALB Ingress controller was my first choice. However, at that time, this open source project has not been donated to the Kubernetes SIG-AWS or officially blessed by AWS yet. There are a few critical features missing for our needs. Recently, we decided to give it another try and found some features very promising.

Using AWS ALB which supports L7 routing natively

AWS ALB supports application layer routing natively, each target group represents one Kubernetes service and routes incoming requests to worker nodes where the pod for this service resides on in IP mode(I will explain this more later).

This offload all the operational maintenance, together with the scalability and availability concerns to AWS since it is a fully managed service. Besides, there is no need to dig into Nginx and become an expert to troubleshoot all the gotchas.

In addition to ALB’s native L7 routing, AWS keeps adding new functionalities to this service, there are several features that are quite beneficial to us:

AWS Web Application Firewall(WAF) integration support(WAF is awesome, one less major infrastructure to maintain, yay!)

Natively Redirect insecure HTTP requests to HTTPS requests(I know, right?)

Supports fixed response without forwarding to the application(Not a fancy 404 page, but close!)

Authentication on ALB: OIDC, Facebook, Google Auth, AWS Cognito

These features can be quite handy when you start to secure and productionize Kubernetes Ingress controllers in your environment.

Require all the ingress resources to be defined in one place

In the current version, when an ALB ingress controller receives new ingress resources from the yaml file, it will not update the existing ALB by adding new ingress rules. Instead, it will do a complete overwrite and only apply the rules in the newest file.

If all the ingress rules are static and predefined in one central file, this may not be an issue at all. However, in certain situations, as mentioned in the section Nginx ingress supports distributed ingress changes, We have ingress changes from various teams at the various time. We prefer a self-service and automated solution.

Luckily, the community hears our request and is actively working on this Create option to reuse an existing ALB feature right now. I am very excited and can’t wait to try it out.

ALB Ingress on IP Mode with AWS CNI Plugin

AWS ALB Ingress controller supports two traffic modes: instance mode and IP mode.

instance mode: Ingress traffic starts from the ALB and reaches the NodePort opened for your service. Traffic is then routed to the container Pods within cluster. The number of hops for the packet to reach its destination in this mode is always two. ip mode: Ingress traffic starts from the ALB and reaches the container Pods within cluster directly. In order to use this mode, the networking plugin for the Kubernetes cluster must use a secondary IP address on ENI as pod IP, aka AWS CNI plugin for Kubernetes. The number of hops for the packet to reach its destination in this mode is always one.

Whenever an Ingress resource is created, The ingress controller will:

Create an ALB and Listener(80/443) if they do not exist yet

Create a target group on the ALB for each K8S service.

Update Path and Host ingress configs on each target group

Add only the VMs where the backend pods are running (instead of all VMs) to the target group.

Yes, instead of having a load balancer sitting on top of all the worker nodes in the cluster, each target group is only load-balancing on very few nodes where your pods run on. This is very unique comparing to the ALB Ingress controller(Instance Mode) or Nginx Ingress Controller.

Notice this also greatly reduced the chance for a load balancer to route traffic to an irrelevant VM and then rely on the local Kube-proxy and Network agent(i.e. calico-node) to find the target VM where your pod is really running on.

In addition, we have one more network complexity to consider. When using popular K8S network plugins like Calico or Flannel, the overlay network is optional inside the same subnet but required for across-subnet traffic. This is expected for static on-premise datacenters. However, when deployed in the cloud such as AWS, an extra layer of EC2-VPC fabric is added. Our network stack will become more complex.

Therefore, to have a simplistic and efficient network stack, this AWS ALB ingress controller(IP mode) solution looks very promising for the following reasons:

For the first time, a load balancer can be pod location-aware .

. The number of hops for the packet to reach its destination is always one .

. No extra overlay network comparing to using Network plugins(Calico, Flannel) directly in the cloud(AWS, GCP, Azure)

Currently, the latest release for AWS CNI plugin is 1.3.0, You can get it from here and try it out.

This is Part 1 of my “Kubernetes Ingress Controllers: How to choose the right one”. In the next session, I will share my experience with the 3rd category, Envoy Based Ingress Controller. We will take a deep dive with this popular option and discuss its pros and cons.

References