Inspired by GitOps...

This article intended to share ideas and solutions to address some challenges related to Configuration Management, especially in the cloud environment. Hope you find this read helpful.

The approach described in this article was conceptualized a few years back, then implemented and used across many, many projects to build configuration management components for production-grade systems and applications.

The Use Case

This problem is quite common and we have seen it over the years not only in cloud-based deployments and environments but also in the local type of deployments, similar to “3 blades in the rack next room”. This problem is applicable to any deployment with more than 1 environment in the picture, like DEV, QA, STG, PROD and so on.

And the problem is, as you probably have guessed, the configuration data and its management. Wiki talks about Configuration Management(CM) in great length - CM planning and management, controls, status, so on and so forth, but I’m as architect and DevOps engineer always about the details (well, that’s where you-know-who is..) and about how to get this done in fastest and most efficient way.

So, let’s get to it...

When I refer to configuration data, I’m talking about different types of it:

Fetched once, used many times

Fetched every now and then, used many times

Fetched each time it is used

I’m not going to dive into details of what each type of configuration data looks like and it’s use cases — if you’re experienced engineer, chances are, you well familiar with these and if you’re not, Google is here to help :)

At first, the problem arises when you start thinking how you’re going to manage the basic config parameters of your app or service, like a port number, a database connection string or your Consul DNS — I mean you can’t hardcode them! Okay, there are ways to solve that (config files, etc.). But, what happens if your app is containerized or runs in VMs? Meaning, in order to change one configuration param, you’d have to re-deploy the app. Or, if you change one thing in your app ecosystem it cascades into re-deploying other components as well — a bit costly, isn’t it? Now, as you go with the project, you add complexity to your solution you’d have to think about how to manage more and more configurations, something along the lines of having Hadoop cluster, plus Spark cluster with running jobs, plus the whole ecosystem of technologies to run Web services with load balancing and service discovery, plus, plus, plus… And, on top of that, you’d have CI/CD pipeline which has its own configurations. Now, multiply all this by the number of the environments you might have (i.e. development , prod , staging , sandbox #1 , sandbox #2 , engineering , etc.). And, if this is not enough, add futures like security and audit (who did what and when).

The most annoying part of all this, there is no single version of truth anymore — your configurations start drifting as you go through the lifecycle of the application or the system, and quite often you don’t have a clean record of what changed and by whom.

The complexity of CM takes a whole new level when we were delivering Analytical Projects — enabling Data Scientists to develop, train and deploy the models (sometimes quite complex) without having a bunch of engineers supporting them around the clock. In other words, the Analytical Platform has to be robust and stable, yet tailored for the needs of Data Scientists to avoid seeing 5–6 digit numbers on the cloud consumption bill.

At some point, I will have to find time to write the article to describe how we build open-source based, fully containerized and scalable Analytical Platform to manage the full lifecycle of Analytical Models (which saved hundreds of thousands of dollars per months on cloud costs for one of our clients… Really not kidding).

If all the above is not enough, we have come across the requirements from a few clients, to implement application or various components of it across multiple cloud platforms i.e. GCP + Azure + AWS, this implies CM at the cloud and application levels.

To sum it all up, we have identified the need for configuration data:

have a single version of the truth.

have a precise record of changes, along with a reference to what triggered the change (in other words, the audit trail).

implement separation of concerns i.e. applications should be busy implementing business logic, and not managing configs.

manage various complexities of configuration data, including nested configs.

as much cloud-agnostic as possible, hence support for application deployed across multiple cloud providers.

simplify and streamline deployment procedures.

all levels of config data — application, system, infrastructure, deployment, schedule, so on and so forth.

Phew...