Four excellent articles on designing, developing, and maintaining complete software systems.

We suffer from an embarrassment of riches when it comes to programming idioms and tactics from the level of a single line of code up to an individual object in isolation. But when we start talking about the interactions between objects in our systems, or the interactions between systems, things get a lot more fuzzy.

To gain some clarity when thinking through high-level system design, development, and maintenance, read these four articles:

Each article covers a different theme in high-level software development, but the thread that unifies them is the idea of looking at software projects as being dynamic, interconnected systems-of-systems in which most of the interesting stuff happens at the integration layer.

I’ve shared a brief summary of each of these articles below, highlighting the parts I found most interesting. Hope you find them as useful as I did!

A Laboratory for Teaching Object-Oriented Thinking

This article introduces the concept of Class-responsibility-collaboration (CRC) cards, which are an extremely lightweight tool for reasoning about object-oriented software designs.

The basic idea is that you represent object roles in a system via simple index cards that include only the following information:

The name of the class at the top of the card The responsibilities of the class on the left side of the card A list of collaborating objects on the right side of the card

To drive out a new design, designers start with a specific use case and then build just a couple cards to cover the various responsibilities in solving the problem. As the information gets sketched out, the names of the objects and their stated responsibilities can be refined and reworked, and cards can be added and removed, until a coherent design emerges.

I had first seen this technique used in the example project found in Growing Object-Oriented Software, Guided by Tests, but the technique itself has been around a very long time. This article was published in 1989!

The emphasis on a very minimal structure that is completely independent of a programming language’s implementation details, combined with the focus on the interactions between objects rather than static analysis of objects in isolation make this design technique especially suitable for systems-level thinking.

Big Ball of Mud

Although we spend a lot of time talking about what beautiful software design looks like, most real projects don’t come close to meeting our ideals. This is an open secret among software developers, but one that is seldom given the attention it deserves.

In this article, Foote and Yoder analyze the common patterns we see in typical software development lifecycles, and examines their costs and benefits. In doing so, the authors reveal both what causes a software system to devolve into a “big ball of mud”, as well as some methods for keeping a system in a constant state of health.

The seven specific patterns covered in the article are listed below, along with the recommendations for how to deal with them.

Big ball of mud: You need to deliver quality software on time, and under budget. Therefore, focus first on features and functionality, then focus on architecture and performance.

Throwaway code: You need an immediate fix for a small problem, or a quick prototype. Therefore, produce, by any means available, simple, expedient, disposable code that adequately addresses just the problem at-hand.

Piecemeal growth: Master plans are often rigid, misguided and out of date. Users’ needs change with time. Therefore, incrementally address forces that encourage change and growth. Allow opportunities for growth to be exploited locally, as they occur. Refactor unrelentingly.

Keep it working: Maintenance needs have accumulated, but an overhaul is unwise, since you might break the system. Therefore, do what it takes to maintain the software and keep it going. Keep it working.

Shearing layers: Different artifacts change at different rates. Therefore, factor your system so that artifacts that change at similar rates are together.

Sweeping it under the rug: Overgrown, tangled, haphazard spaghetti code is hard to comprehend, repair, or extend, and tends to grow even worse if it is not somehow brought under control. Therefore, if you can’t easily make a mess go away, at least cordon it off. This restricts the disorder to a fixed area, keeps it out of sight, and can set the stage for additional refactoring.

Reconstruction: Your code has declined to the point where it is beyond repair, or even comprehension. Therefore, throw it away and start over.

Don’t just settle for this brief summary! Be sure to read the article to see how the authors shine a light on each of these patterns and expose the interconnections between them. That’s where the real insights are.

The Log: What every software engineer should know about real-time data’s unifying abstraction

On the surface, this article may sound as if it’s targeted at a relatively narrow audience that needs to do “web scale” data processing. But don’t let that fool you: it truly is relevant to “every software engineer”, even if it’s only for the high level ideas it contains.

The article starts by discussing the concept of a data log in the abstract. This is a very simple append-only storage mechanism which contains a set of arbitrary records ordered by time. The difference between a data log and the logs we typically see for debugging purposes in applications is that a data log is meant to be consumed by computers rather than humans.

The author goes on to talk about how data logs are a key part of implementing database transactions, and discusses the many different ways that transactions can be logged, synchronized, and replicated. This high-level explanation was especially interesting to me as someone who doesn’t know a whole lot of stuff about database theory.

From that point forward, things get a lot more interesting. The author covers three main themes: data integration, real-time data processing, and distributed system design.

Of those three topics, I found the author’s ideas on data integration to be especially interesting, mainly because that’s the kind of problem I deal with most in my regular work. The basic idea is that if you can build a uniform means of publishing and subscribing to data streams from all your systems within an organization, you can get rid of a lot of crufty connection layers between systems. This style of design can even vastly simplify the responsibilities for data warehousing systems, by splitting out the job of “getting all the data in one place” from the task of restructuring information for complicated batch analysis tasks.

The discussion on stream processing goes on to discuss how the inherent characteristics of a log-based storage mechanism eliminate various data flow performance and reliability problems. The basic idea is that if each system consuming the log is only responsible for tracking a single number (which log entry it received last), it is easy to restart individual processes without losing information or causing system-wide service interruptions. And because the connection between worker processes and the uniform logging service is indirect, workers can catch up with the flow of new events on their own time without worrying about blocking or dropping data. The log system itself serves as a massive, universal buffer.

The author goes on to discuss at the high level the full range of architectural benefits that this style of design can have for distributed systems. He wraps things up by discussing what it might take to make to get this kind of architecture to gain widespread adoption, hinting at the possibility of unifying lots of open source tools and providing good abstractions over them to make this kind of architecture more accessible to the average developer.

I’ll admit that a ton of the stuff in this article was over my head, but it’s written in such plain language that I gained a ton from it nonetheless. It seems like if I come back to it again in a year, I’ll be able to learn just as much from it as I did on my first read, just at a more refined level.

Coplien’s Segue

This essay is a sequel to James Coplien’s widely read article “Why most unit testing is waste”. Although you’ll probably enjoy it more if you read both articles, it’s capable of standing on its own.

Unlike the other articles in this reading list, this one is a bit more opinionated and scattershot in its scope. However, it offers many compelling thought experiments and anecdotes about the limitations of unit testing, the potential virtues of embracing runtime assertions at the individual function level (e.g. contract-style programming), and the power of system-level tests to provide useful insights to aid debugging and define intended program behavior.

In addition to the testing related discussions, this essay drives at some important questions about what it means to deliver reliable, high quality software. It discusses the limitations of all forms of testing as a means of driving software quality, and hints at some important ideas around fault tolerance, self-healing software systems, and a number of other useful topics.

Unless you’re very familiar with Coplien’s perspective on things, you’ll probably find yourself disagreeing on at least some of his points. But he offers so much to think on and explore that I’m sure you’ll still find something of interest in this article.

These articles are fascinating, because they’re relevant to all software developers regardless of their background or level of experience. If you make your way through this entire reading list, you’ll undoubtedly level up your systems thinking skills.

I’ve really enjoyed studying each of these articles, so don’t hesitate to email me if you would like someone to discuss them with.