A story about getting our Kubernetes infra into a better state and slashing 2/3rd of our AWS bill in the process

It all started when…

Nell Shamrell noted in the #infrastructure-team channel that our monthly AWS bill was really high, and honestly it was way too high for an organization like OperationCode.

I’ve been helping out with ops stuff at OperationCode for about a year, and our Kubernetes (aka k8s) cluster definitely needed some work:

It was still on Kubernetes 1.11 (21 months old! a hundred in k8s years) the steps to upgrade in-place were fraught with peril. It had a cumbersome user onboarding process involving Google authentication and a secrets.json file that everyone (including the admins!) kept losing It was costing us way too much, consisting of 6x t3.medium instances (3 masters, 3 worker nodes), 3 NAT gateways (these are surprisingly expensive!) and overly large EBS volumes (200 GB per host)

As you can see, all of this really added up quickly for a small organization like ours.

Our AWS costs for Feb 2020

Part 0: Implement Monitoring and Observability

Before proceeding with any changes, I took a good hard look at our monitoring situation. We have Sentry.io to notify us of application errors as well as Prometheus with AlertManager to capture metrics and send alerts from inside the our Kubernetes cluster.

In addition I added 2 external (and free) points of visibility:

Observability instrumentation from Honeycomb.io (my day-time employer), utilizing the free Community plan to give us 500MB of storage and 15/GB mo of telemetry data ingest. It was relatively easy to implement the Python Beeline in our back-end application, in fact most of the work involved fixing the build and updating dependencies and documentation.

External site monitoring with StatusCake, also utilizing their free plan to give us 10 uptime test URLs and 1 page speed test.

This way I had the confidence to proceed with a total infrastructure makeover, knowing we had sufficient test coverage for our production environment.

As Charity Majors always says, you really do test in prod.

Source: the charity.wtf blog

Part 1: Clean up unused resources

Our initial examination found nearly $200/mo of resources that were totally unused, left over from a previous version of the app that ran on ElasticBeanstalk and a few experiments. A 1-hour pairing session with two other OCers and we got those cleaned up quickly.

Part 2: Reflecting on why Kubernetes matters to OperationCode

One thing I love about OperationCode is how we use our own web application and infrastructure as a teaching resource, all the apps and infra are open source. This allows the vets we help (as well as all volunteers) to gain experience in today’s popular development practices. I believe our choices to use and provide access to a “real, in production” Kubernetes cluster follow in the same spirit. It’s totally okay if it’s a little bit overkill for our needs, because it provides invaluable hands-on experience to anyone who wants it.

The feedback about our kops-based k8s cluster was that it was a little too hard, and that made it feel scary for people to experiment with. After all, this was our production infrastructure we’re talking about here and it needs to stay up and be easy to understand. Having had positive experiences with Amazon’s “managed” EKS offering in the past, this felt like the natural choice moving forward. It balances accessibility with reliability, while providing us ops volunteers a reasonable reduction in the complexity and effort of day-to-day maintenance.

Furthermore the idea of “serverless” Kubernetes with the new Fargate-based operating model seemed an attractive way to even further reduce operations concerns. It didn’t quite work out, but more on that later.

Part 3: Building out the new cluster

An AWS Container, I don’t make this stuff up

Thanks to eksctl, building out a new EKS cluster was incredibly easy. I appreciated that I could practice some Infrastructure as Code practices by utilizing a config file. eksctl also provides a really handy eksctl utils write-kubeconfig which made onboarding new users incredibly simple.

After the cluster was built (this took virtually no time at all!), the AWS docs recommended installing the following components:

These were relatively easy to set up following the instructions, except for external-dns. The example configuration provided pointed to an outdated image and unfortunately the external-dns docs didn’t provide any guidance on how to make it work within Fargate. After some experimentation and help from the friendly folks in the Kubernetes slack (#eks channel) I figured out that I needed to use the eksctl create iamserviceaccount command to create a role for external DNS, and then use the eks.amazonaws.com/role-arn annotation to map the kubernetes service account to the IAM role (code here).

Finally, we use Argo CD for continuous delivery of our back-end services that run on k8s, so that was important to get deployed. One of the known limitations of EKS+Fargate is that it doesn’t support persistent volume claims, it can only support ephemeral pods. I ended up adding a single t3a.medium node to the cluster so that Argo wouldn’t lose all of its data every time a pod cycled :)

Rather than use Argo’s HA install template (that was overkill for us), I modified the quickstart template’s Redis server section to be a StatefulSet with a persistent volume claim (code here) which provides us a sufficient level of availability for Argo.

Part 4: Updating our services k8s manifests

In testing, we found a few changes that were going to be necessary to our application’s manifests in order to run on the new cluster.

Most importantly was modifying all the services from type: ClusterIP to type: NodePort as required by the ALB ingress controller and External DNS controller. After that it was simply implementing the ingress annotations and rules in the right fashion. The controllers watch for these annotations and then automatically create AWS ALB and Route53 DNS records, pretty cool!

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

name: back-end

annotations:

kubernetes.io/ingress.class: alb

alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:633607774026:certificate/d59d030e-0239-4bfa-8553-e4bafb6481b4

alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'

alb.ingress.kubernetes.io/scheme: internet-facing

alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01

external-dns.alpha.kubernetes.io/hostname: backend.k8s.operationcode.org

labels:

app: back-end

spec:

rules:

- host: backend.k8s.operationcode.org

http:

paths:

- path: /*

backend:

serviceName: back-end-service

servicePort: 80

And here’s the complete pull request with all the changes:

Finally cut-over was as simple as a DNS record change, which was performed (mostly) successfully on a Sunday evening. It helped that our PostgreSQL database is on RDS and didn’t need to be touched aside from a security group change.

Part 5: Realizing our costs were still too high (ugh!)

DevOps action set

After a week of operating the new setup, I cleaned up the kops cluster and then we began closely monitoring daily AWS spend. I implemented MiserBot to give us daily AWS spend updates. Unfortunately once it stabilized we were still running a daily cost of $14.51 day ($435/mo) and that still was way too high.

My analysis and recommendations:

Back in the #infrastructure-team chat in the OC slack

Part 6: Transitioning to Spot Instance worker nodes

Okay so Fargate was a cool idea, but it definitely cost more than I thought it would. The Fargate hours, NAT gateway and t3a.small were a combined spend of $6.16/day that I knew we could roll into a triplet of AWS t3.small instances that would cost a total of $0.45/day if spot priced. That was a worthwhile change.

I wanted to be careful about spot instances cycling (being terminated and re-created) impacting our operations, so I implemented additional monitoring with Cloudwatch alarms and a Cloudwatch alarm to Slack sender.

I was also a little bit nervous about transitioning the kube-system pods from Fargate to worker nodes, but that ended up going very smoothly as well as cutting over rest of the namespaces. I did discover after transitioning the staging environment that our RDS security groups were set to allow traffic from the NAT gateway (this also adds data transfer costs!), so I had to quickly implement VPC peering which was a better solution from both a security and cost perspective.

You can ninja-edit Infrastructure if nobody notices, right?

One additional thing I realized was how BAD our backend request latency was on Fargate, with many requests nearing 3 seconds! Post cutover the graph looks so much better.

Honeycomb’s Slack unfurls are so good, right?

The difference in response time was shocking!

Conclusion

Running Kubernetes takes work, testing and vigilant monitoring. The project moves so fast that you can’t take your eye off it for a year and then expect a smooth upgrade. Let it go for 4–5 minor versions and you’ll find it much easier to lift-and-shift everything to a new cluster, as we did.

We’re now in a much better place in terms of cost and manageability for our Kubernetes cluster. The cluster itself now costs less than $5/day to operate (including the ingress ALBs) and our total AWS spend is $7.80/day which is a far more sustainable rate for an organization such as ours.

Fargate was the bit of a disappointment for this project: I wish AWS had been clearer upfront with EKS Fargate costs. Yes you can find it if you really dig around, but it is non-obvious. But the price/performance ratio was the biggest bummer.

Finally, organizations like OperationCode do good work and depend on the generosity of our service providers as well as your donations. Please donate!