Introduction

With a background as a system administrator and software developer, from like 10 years ago, working as a freelancer, also means that you will most of the time be required to know more than what you actually need for a specific role.

Just 5 years ago I started called myself a DevOps engineer which is like a good next step a sysadmin with developer experience could make, so I read some books regarding the topic, devops culture, principles, success cases and I was prepared to get into that world.

Three years after that move everything was going great, a lot of different responsibilities came with that new role, I had the opportunity to work on some cool startups and projects, helping mostly developments teams, to deliver software quickly and in a consistent, stable way.

I continued learning and trying new tools and methodologies, I got introduced into AWS, Serverless approach, microservices architectures, distributed systems, containerization, several CI/CD servers, also tried difficult things like introducing new processes in teams where doesn't like changes, and a lot more things I will try later to write.

Why start working in something you don't know

At that moment (even today) everything you read about DevOps was focused on tools, and one in particular is the most relevant, Kubernetes, it was not exactly new but every person or company using it, lets you knew it, and that felt like if you don't know what it is you were left behind your field, and that is not a good feeling, neither is mandatory for you to know everything.

But luckily I got a contract to help a team with their platform because it was having intermittent outage and some clients were insurance and airlines companies so they really needed to ensure an SLA.

The application and infrastructure was composed of multiples microservices on top of three docker swarm nodes, a rabbitmq deployment for message queue, a redis server for cache, a mongodb database, and a single big server with haproxy for routing traffic (not a good choice if you want to maintain high availability with traffic being approximately 2.5 billions req/month or ~1000 req/sec)

It was a really simple architecture but for some reason the docker swarm nodes were breaking and all the stack needed to be redeployed in order to containers to become active again processing requests, which made that tracking sessions metrics lost and the user needed to start again the workflow they were doing.

Fortunately at the beginning I was told by the CTO that they don't wanna to continue using Docker Swarm but instead they wanna try Kubernetes (k8s), but they needed to be on Oracle Cloud Infrastructure (because they were part of the Oracle startup accelerator program) and they had credits to create any resources they want on that cloud.

Well I assumed the challenge! and I started right away with my path trying to learn anything about k8s, I have to be honest, it was confusing at the beginning trying to map what I knew (regarding virtualization and containers) to k8s resources, and design the architecture but not only that I needed too create a k8s cluster, and all the hurdle regarding certificates, network configuration, etcd database, etc, everything was so overwhelming!.

Getting into Kubernetes expose you to a vast list of resources, and reading just for the sake of it worth nothing, so my investigation leads me to kubespray, but Oracle Cloud Infrastructure (OCI) cloud was not supported, so researching a bit more I found that the OCI team has a terraform repository which helped me with creating a self managed k8s cluster and configured with specific containers deployments that helps kubernetes to manage cloud resources directly from it, like volumes and load balancers through the cloud API.

Getting hands dirty

Not knowing Kubernetes and its scope, doesn't helped with planning neither with several others issues I just realized some days after I started, because managing your own k8s cluster like I was trying to do it, was not the best approach, but the company doesn't had another choice.

At the end we move forward and solved issues as their appeared, this gave me a better understanding about the amount of configurations and components a k8s cluster needed, official documentation was good, but thanks to kubernetes the hard way (from Kelsey Hightower) existed, and a couple more web resources (some of them here), I was able to start moving with confidence, on both administrator and developer scope.

Soon after I solved cluster setup a couple of weeks later, I started migrating application services to helm packages, which is a tool for packaging common k8s resources as yaml files using a template engine so you can customize specific values when you deploy them all in once, giving you the benefit of tuning values you declare in your templates through the cli, thanks to this I was reading yaml files and k8s APIs (you can see a reference by k8s version here) all days.

I must say that kubernetes from a developer perspective is great, because it change the way you thing about how you can deploy your applications, but only happens when you know the core concepts like services, deployments, persistent volumes, persistent volumes claims, ingresses, ingress controllers, secrets, configs, and how you need to define them in yaml files.

After you have the basic of your app working you could now proceed simulating how can it scale and integrate it with others services, which was the next step I did.

The good part of Helm as a package manager, is the registry of others applications from the community so you have the ability to use them in your cluster with one command, and that gives you a running rabbitmq and redis server each configured as clusters and with best practices, this saves you a lot of time.

Yaml files are good to keep track of you infrastructure changes, but the most easy way to do it is using kubectl, which is the official client cli that makes calls to the kubernetes api server, and I used it for testing everything before I wrote it as a yaml file only when I have the resources working as expected, which then by using the describe subcommand I was able to pull the resource description in yaml format directly from k8s and adapted it to the helm package I was writing (I found it really easy to work this way).

After the application was running on kubernetes and the mongodb database migrated too to Oracle Cloud Infrastructure platform (but not on k8s), we did our internal tests, results gives us confidence in platform and we started migrating traffic to the new cluster, we did it progressively and checking the usage and performance metrics (also set it up on k8s using the prometheus operator and grafana helm package), to be sure everything continued working and latency doesn't increased.

Deployments and clusters issues disappeared from day to day conversations, and when it happened were not stressful as before because they were not so frequent, infrastructure as code approach allowed us to replicate the same infrastructure for development/qa/stage environments and applications were isolated on different namespaces so testing new versions or features was quickly.