Minimalist Software Architecture

Lessons learned from building large-scale multi-region distributed systems

No man is an island, and neither are software systems. When designing a system, the software architect usually needs to choose dependencies — infrastructure, authentication, storage, the list goes on and on. When I first started taking on software architecture responsibilities at IBM, I tended to pick dependencies that got the job done, but soon I learned this lesson: Be a minimalist. Only introduce a new dependency when you absolutely need it.

What is a dependency, really?

The answer seems straightforward. If your system relies on something to function, it’s a dependency. However, that is just the tip of the iceberg. A dependency needs:

Presence: wherever your system goes, it goes, too.

Compliances: it meets the same compliances your system requires.

“UX”: it makes your users happy.

Maintenance: you need to spend extra resources to maintain it.

Of course, there are many other factors when picking a dependency, but I feel these are worth talking about.

Presence: a war story

At IBM Watson, like at many other global businesses, our system is deployed all over the world. When designing an internal SaaS, our team wanted to use a database-as-a-service hosted by another IBM team. We did a prototype with it in our dev environment and even did an alpha rollout in our US production. Everything looked great.

However, when we were asked to deploy our system to non-US locations like Europe and Australia, on a deadline, we realized that the database team had its own schedule of global rollout. We could either wait for them or create a flag in our code to disable some functions depending on the DB’s presence.

We ended up going with the latter. But it introduced fragmentation to our deployments, which led to more trouble in our CI/CD pipeline. It also caused unnecessary complexities in our codebase. When we added new features, we needed to consider two cases (with that database or without it). If you have N dependencies, you potentially end up with 2^N combinations of dependency presence cases. Needless to say, it will be a nightmare.

That’s why when you choose a dependency, get a guarantee that it will be present wherever you are.

Compliances

Today there are more regulations and compliances on computing systems than ever. If your company serves EU citizens, you can be subject to GDPR. If you wish to serve healthcare businesses in the US, you have to meet HIPAA. All these regulations put requirements on your system and your dependencies. Just like you need to ensure presence, you need a guarantee that your dependencies meet the same compliances. Otherwise, the moment they break compliances, so does your system.

“UX”: Kafka Streams vs Flink

User experience matters more when you are designing a framework and giving other people the code to use rather than hosting a SaaS. Remember that when users invite your framework into their house, they have to welcome the dependencies, too.

Take Kafka Streams and Apache Flink, for example. I researched both of them when building a stream processing pipeline based on Kafka. Some quick context: Kafka is an open-source distributed queue system, and Kafka Streams is a framework based on Kafka that helps process data in Kafka. Flink can complete similar tasks.

Kafka Streams and Flink have lots of pros and cons, but the deal-breaker for me was their dependencies. Kafka Streams requires only Kafka, which we had already. Flink, on the other hand, requires a Zookeeper cluster to achieve HA (high availability). We didn’t have the resources to host such a cluster on our own; therefore, we chose Kafka Streams over Flink.

Maintenance

Adding a new dependency also has hidden maintenance costs, such as additional metrics, monitoring, and alerting, failure scenarios, and automation. Dependencies can get slow, or just fail, and your system needs to have logic to detect and handle them. When you deploy your system, your continuous delivery system should set up everything, including dependencies, so a new dependency usually means new logic in your automation. Be prepared for the extra work!