The "monoliths versus microservices" debate often focuses on technological aspects, ignoring strategy and team dynamics. But instead of starting with technology, smart-thinking organizations are beginning with the team's cognitive load as the guiding principle for the effective delivery and operation of modern software systems.

Excessive cognitive load works against effective team ownership and supportability of software. Here's why, and how to approach the problem.

Overview: Monoliths and microservices

Many organizations are moving from traditional, monolithic software architectures to designs based on microservices and serverless, allowing them to take advantage of newer runtimes that help teams to take ownership of software services.

However, it can be difficult for software architects, team leads, and other technical leaders to assess the “right size” for these services. Should a microservice be limited to 100 lines of code? Should you start with a monolith and extract microservices, as Tammer Saleh recommends, or start with microservices from the beginning, as advised by Stefan Tilkov? How do you avoid what Simon Brown calls a "distributed microservices big ball of mud"?

During the research for our forthcoming book (Team Topologies: Organizing Business and Technology Teams for Fast Flow), and working with clients in different parts of the world, we realized that many organizations fail to consider an important dimension in the decisions around the size of software services: team cognitive load.

Most of the confusion around the sizing of services goes away when you reframe the problem in terms of the cognitive load that a single service-owning team can handle, as you'll see below.

[ Special Coverage: DevOps Enterprise Summit London 2019 ]

How to define cognitive load

But first, here's what we mean by cognitive load and how this applies to teams. Psychologist John Sweller defined cognitive load as "the total amount of mental effort being used in the working memory," and went on to describe three different kinds of cognitive load:

Intrinsic cognitive load, which relates to aspects of the task fundamental to the problem space. Example: How is a class defined in Java? Extraneous cognitive load, which relates to the environment in which the task is being done. Example: How do I deploy this component, again? Germane cognitive load, which relates to aspects of the task that need special attention for learning or high performance. Example: How should this service interact with the ABC service?

Broadly speaking, you should attempt to minimize the intrinsic cognitive load (through training, good choice of technologies, hiring, pair programming, etc.) and eliminate extraneous cognitive load (boring or superfluous tasks or commands that add little value to retain in working memory). This will leave more space for germane cognitive load (where "value-added" thinking lies).

For a great overview of how cognitive load applies to software development, see the article "Managing Cognitive Load for Team Learning," by Jo Pearce.

Cognitive load applied to teams

When you apply the concept of cognitive load to a whole team, you need to limit the size of the software system on which the team is expected to work. That is, don't allow a software subsystem to grow beyond the cognitive load of the team responsible for it. This has strong and quite radical implications for the shape and architecture of software systems: Software architecture becomes much more "team-shaped" as you explicitly consider cognitive load as an indicator of supportability and operability.

The drive to minimize extraneous cognitive load also leads to the need to focus on developer experience and operator experience. By using explicitly defined platforms and components, your teams will be able to reduce their extraneous cognitive load.

Some organizations have even begun to use cognitive load as an explicit input into software architecture and system boundary decisions.

Why you should use team cognitive load to right-size microservices

In a world of "You build it, you run it," where the whole team is responsible for the successful operation of software services, it is imperative to remove unnecessary barriers to team ownership of software. Obscure commands or arcane configuration options increase the (extraneous) cognitive load on team members, effectively reducing their capacity for acquiring or improving business-oriented aspects (germane cognitive load).

Another typical example is waiting for another team to provision tickets for infrastructure or to update configurations. This interrupts the flow of the dependent team, again resulting in a reduction in the effective use of cognitive capacity.

Reduced team cognitive capacity puts a strain on the team’s ability to fully own a software service. The team is spending so much time dealing with complicated configuration, error-prone procedures, and/or waiting for new environments or infrastructure changes that it cannot pay enough attention to important aspects of testability or runtime edge cases.

As software developer Julia Evans says, reducing cognitive load for your team means setting interface boundaries. Every techie at your organization doesn't need to be a Kubernetes expert.

Put another way, by ensuring that the cognitive load on a team is not too high, you have a better chance to enhance the supportability and operability of the software on which your the team is working. It can better own its services, because the team understands them better.

Three ways to reduce team cognitive load and improve flow

There is no magic formula for reducing cognitive load for teams, but having worked with many large organizations around the world (including in China, Europe, and the US), we recommend three helpful approaches: well-defined team interaction patterns, independent stream-aligned teams, and a thinnest viable platform.

1. Create well-defined team interaction patterns

Too often in organizations, the relationships between teams are not well defined or understood. As Russell Ackoff said, problems that arise in organizations "are almost always the product of interactions of parts, never the action of a single part."

You've likely heard complaints such as "Why should we have to collaborate with that other team?" or "Why doesn’t that team provide us what we need?" These are signs that the team interactions within the organization are ambiguous. In our Team Topologies book we identify three core team interaction modes to help clarify and define how teams should interact:

Collaboration: Working together with another team for a defined period of time to discover new ways of working, new tools, or new solutions. X-as-a-service: Consuming or providing something "as a service," with a clear API and clear expectations around service levels. Facilitating: Helping (or being helped by) a team to gain new skills or new domain awareness, or to adopt a new technology.

With these well-defined team interactions patterns in place, you can begin to listen for signals at the organization level for team interactions that are working well and those that are not, including problems with cognitive load.

For example, if a collaboration interaction goes on for too long, perhaps it's a signal that some aspect of the technology would be better provided as a service by a platform.

Similarly, if one team expects to consume a monitoring tool "as a service" but constantly needs to work with the providing team to diagnose problems, this could be a signal that there is too much cognitive load on the consuming team and you need to simplify the API.

2. Use independent, stream-aligned teams

It is increasingly common in large and small organizations to see small, cross-functional teams (with a mix of skills) owning an entire "slice" of the problem domain, from idea to live services. Such teams are often called product or feature teams.

But with the coming-of-age of IoT and ubiquitous connected services, we call them "stream-aligned" because "product" loses its meaning when you're talking about many-to-many interactions among physical devices, online services, and others. ("Product" is often a physical thing in these cases.)

Stream-aligned teams are aligned to the stream of change required by a segment of the organization, whether that's a line of business, a market segment, a specific geography, or a government service.

It is hugely important to ensure that stream-aligned teams can analyze, test, build, release, and monitor changes independently of other teams for the vast majority of their work. Dependencies introduce a substantial amount of cognitive load (e.g., waiting for other microservices or environments to be able to test, or not having microservices-focused monitoring).

Ensuring that stream-aligned teams are substantially independent in their day-to-day flow of work removes unhelpful extraneous cognitive load, allowing teams to focus on the intrinsic and germane (domain-relevant) aspects of the work. Part of this independence comes from being able to use an effective platform.

In larger organizations it's useful to align two or three teams in a close partnership when delivering large, complicated systems. That close relationship helps to avoid one team waiting on another.

Obviously, teams do depend on other services and associated teams for providing infrastructure, runtime APIs, tooling, and so on. But these dependencies don't block the flow of work of a stream-aligned team. Being able to self-service new test environments, deployment pipelines, or service monitoring are all examples of non-blocking dependencies. Stream-aligned teams can consume these independently as needed.

3. Build the thinnest viable platform

Stream-aligned teams should expect to consume services from a well-defined platform, but avoid the massive, unfriendly platforms of yesteryear. Instead, build the thinnest viable platform (TVP): the smallest set of APIs, documentation, and tools needed to accelerate the teams developing modern software services and systems.

Such a TVP could be as small as a single wiki page that defines which public cloud provider services other teams should use, and how. Larger organizations might decide to build additional services atop an underlying cloud or IoT platform, but those extra services should always be "just thick enough" to accelerate the flow of change in stream-aligned teams, and no thicker.

Avoid the frequent mistakes of the past, when internal platforms were bloated, slow, and buggy; had terrible user experience; and—to make matter worse—were mandatory to use.

A good platform acts as a force multiplier for stream-aligned teams, helping them to focus on core domain functionality through attention to the developer experience, ease of use, simplicity of tooling, and richness of documentation. In short, build and run the platform as a product or service itself, with stream-aligned teams as internal customers, using standard agile and DevOps practices within the platform itself.

The engineers at cloud communications company Twilio have taken this approach internally for their delivery squads. In a presentation at QCon in 2018, senior director of engineering Justin Kitagawa described how Twilio's internal platform has evolved to reduce the engineers' cognitive load by providing a unified self-service, declarative platform to build, deliver, and run thousands of global microservices.

Furthermore, the platform's developer experience is regularly assessed via feedback from internal customers using a Net Promoter Score.

The internal platform at Twilio explicitly follows these key principles:

API-first: Empower dev teams to innovate on platform features via automation.

Self-service over gatekeepers: Help dev teams determine their own workflow.

Declarative over imperative: Prefer "what" over "how."

Build with empathy: Understand the needs and frustrations of people using the platform.

This approach has enabled Twilio to scale to a customer base of over 40,000 organizations worldwide.

By reducing cognitive load, a good platform helps dev teams focus on the differentiating aspects of a problem, increasing personal and team-level flow and allowing the whole team to be more effective.

Lighten the load

Team cognitive load is an important dimension when considering the size and shape of your software system boundaries. By ensuring that team cognitive load isn't too high, you can increase the chances that team members will be able to build and operate services effectively because they will properly understand the systems they are building.



We recommend the use of three core team interaction modes to clarify the interactions between teams and ultimately help to reduce cognitive load. When used with independent stream-aligned teams and a thinnest viable platform, these team interaction modes will help your organization detect when cognitive load is too high in different parts of your systems.

Want to know more about cognitive load? Attend our talk, "Monoliths vs. Microservices is Missing the Point: Start with Team Cognitive Load," at DevOps Enterprise Summit: London, which runs June 25-27.

Keep learning