II. The psychology of overengineering

I see the main culprit related to “technical-decision”-induced damages to be overengineering. I.e. letting technology play a bigger or more critical part in the solution than the research merits.

We buy in to the illusion that selecting the correct technology or architecture up front, will magically solve a lot of problems down the line, so we do not need to consider the problems up-front. We overlook the risks introduced by the technology or architecture and end up ignoring fundamental problems or risks because we see it as already mitigated by this early technical choice.

Of the 37 prominent damage causes, 2 are related to technology. That means that 95% of the damage causes are NOT directly inflicted by technology choices.

(Although do note that the percentage does not represent the actual impact of technological choices, but instead how many in absolute terms are technology choices.)

Typically, when technological choices have a big influence on the success of a project, it is most often a negative influence.

So why do we again and again fail to:

Check technology, Get second opinion, ask technology users

and

Ask experts, do early PoC

?

Or why do we not do it thoroughly and critically enough?

Why do we do overengineering and introduce risk and damages as a result?

In my view we have three primary components of “The overengineering triad”, that are the main culprits for introducing technical induced risk and damages.

Learning fallacies,

A tendency to introduce value/solution proxies/substitutes and

an inherent overengineering bias produced by equal parts cognitive biases and game theoretic phenomena.

Learning fallacies

One of the best examples of how we often learning the wrong lessons and act wrong accordingly is the story about the RAF bombing planes from world war 2.

During World War II, the statistician Abraham Wald took survivorship bias into his calculations when considering how to minimize bomber losses to enemy fire. Researchers from the Center for Naval Analyses had conducted a study of the damage done to aircraft that had returned from missions, and had recommended that armor be added to the areas that showed the most damage.

Wald noted that the study only considered the aircraft that had survived their missions — the bombers that had been shot down were not present for the damage assessment.

The holes in the returning aircraft, then, represented areas where a bomber could take damage and still return home safely. Wald proposed that the Navy reinforce areas where the returning aircraft were unscathed, since those were the areas that, if hit, would cause the plane to be lost. As another example, when the brodie helmet was introduced during WWI, there was a dramatic rise in field hospital admissions of severe head injury victims. This led army command to consider redrawing the design, until a statistician remarked that soldiers who might previously have been killed by certain shrapnel hits to the head (and therefore never showed up in a field hospital), were now surviving the same hits, and thus made it to a field hospital. — https://en.wikipedia.org/wiki/Survivorship_bias

In the software development profession, we seem to have some deeply rooted learning fallacies.

These are partly induced by cognitive biases and partly the industry hype machine.

Lottery fallacy

The first learning fallacy I see, is the lottery fallacy. We see the successful companies as doing some unique that made them win in the marketplace.

It is basically saying: “It is very unlikely for a person to win the lottery, so the person must have done something when picking the numbers, to increase his chance of winning”.

And then we try to investigate their number-picking process. And as developers, architects, etc. we look at their technological choices and setup. And remember even if this was a viable strategy (figuring out how to pick the lottery numbers. Hint: It is not) we are looking into something that represents a small percentage of the damage causes. I.e. the technology-choice induced damage causes.

We should realize that it is highly likely that someone (or very few) WILL win in the marketplace. And that can easily be DESPITE their technology stack, technology choices or their architecture. To be clear about this, the many factors contributing to “winning in the marketplace”, means that you can take incredible bad technological choices and still win.

This is not to say that technology choices do not have an impact. It is just to highlight that it is not an all-defining impact. And probably often has a lot less importance than it is attributed.

Survivership bias

When we talk of examples of survivorship bias from world war 1 and 2, it is quite easy for us to see what the logical fallacies. I.e. the soldiers being injured when wearing helmets, means that a lot more of them arrives at the hospital because they previously just died in the battlefield. (and thus did not get to the hospital).

How are we, in the software development profession, victims of this bias?

I think we are hit in two ways. One is that we look at the extremely successful companies, e.g. the FANGS (Facebook, Amazon, Netflix, Google, Spotify), and not the other 1000s mildly successful (or not-unsuccessful) software development organizations out there. For most of us, it would probably be better to learn from those organizations, instead of the FANGS, as those organizations represent our most likely non-failing trajectory.

A second logical mistake we do, is to emulate how the FANGS handled the growth pains, not how their success was created.

This is a very important point, because you actually want to suffer from growth pains. There is a saying something like “I envy you your problems”, and that is very applicable here.

We want to suffer from growth pains and we want to be able to handle it. But it is not the handling of growth pains that make you grow. There is quite a lot of cargo culting involved here. We emulate how to unload the planes arriving. But we do not understand how to make them come.

Another way to say it, is that we are often spending time reenforcing the areas on the plane with bullet holes. Not the places that are actually critical for success.

We do this in a number of ways. We copy the technology stack and architectures that enable extreme scalability. We spend time planning and mitigating for the growth we expect to come and get ready for it. And that is turning the problem on its head.

The time we spend overengineering, learning the bleeding edge technology, gold plating the architecture, and doing all the best practices, could have been spent delivering fast and learning.

It is a case of opportunity cost. The time we spend planning and building for “the great pay day”, is time we could use trying to make it happen instead.

We don’t even emulate what we would have like to happen. We emulate handling growth pains, not inducing them.

Hype and effect

The hype curve is a well-known phenomenon in tech.

We are usually subjected to an artillery barrage of the wonders of a technology, architecture or paradigm (TAP) at the peak of inflated expectations, well before we understand what it is not useful for, and way before for we understand what it is actually useful for.

E.g. who would have guessed recommendation algorithms were great at creating political extremists and propagating anti-scientific beliefs?

So usually when we are primed to introduce a new *TAP*, we are usually very unqualified to do so. We are basically pushed to introduce the new *TAP* when we are at the peak of unknown unknowns. I.e. at the peak of riskiness of introducing the *TAP*.

We will see a graph a bit later, which is very similar to this.

So, one aspect is that we get the new *TAP* introduced when it is very risky doing so, but at the time when we achieve Plateau of Productivity, the new *TAP* is suddenly not that anymore. Its just-the-tap. And thus probably overshadowed by the new *ShinyTAP*.

So, to summarize. We are nudged most to introduce technology, architectures and paradigms in our projects when they are very risky because of unknown unknowns. At the same time, we are nudged opposite for stable “boring” tech and patterns, because we fear getting left behind or missing out on the “next big thing”. We do this as an industry wide lemming effect, where people are brainwashed to feel that they are becoming obsolete if they do not jump on the new bandwagon.

Fortunately it is possible to find counter points to this behavior. Many seem to be recovering overengineerers (http://boringtechnology.club/ or https://thenewstack.io/reddit-cto-sxsw-stick-boring-tech-building-start/ )

Value substitutes

Most developers (and humans in general) want to do something meaningful, and a way of doing that, is to create some sort of value.

Value is a very abstract term, but a basic definition is that the time we have spent doing something, was better spent doing that, instead of doing something else or nothing at all.

It is, however, often very difficult identifying exactly what value we are providing. This is the case when the developers, teams, etc. are too far removed from the end user or customer.

If this happen, the need these people have for creating value, does not disappear. It risks getting fulfilled by a value substitute. Some concrete thing, they can put in as a proxy for the value they actually are providing.

Most often these value substitutes have nothing to do with actual end-user or customer value. We intuitively know this, but counter this with concerns like: “it will pay off in the long term” or similar. But if you are already too far from the recipient of the actual value you create, how can you evaluate that it “will pay of in the long term”? And furthermore, evaluate that it is desirable to prioritize long term benefits over short term gains? Even more to the point, that the value of the long term gain is greater than the short term value creation? (and the interest with dividend created).

Silver bullets

Very early in software projects, we decide what technological silver bullets are going to help us bring value to the customer. Often we exaggerate the importance of bringing in a new TAP and follow current “best practices” (or loudest hype) for the system challenges we are facing.

At this point in the project, however, we are often not very knowledgeable about the actual end use of the system, and by introducing new “things too master” (not just get acquainted with or learn) in the beginning of the project, the primary side effect is a greater risk of unknown unknowns in a phase of the project where we already have much uncertainty and a ton of unknown unknowns.

But hey, at least the customer will be happy they got the newest NoSql storage technology, Eventsourcing architecture or nano-services? Right? Riiiight…?

All-of-the-above

We are constantly being bombarded with ways of “futureproofing” our applications and systems. We are subjected to the wonders of architectural principles, frameworks and technologies, and even before we know what problem the customer or user need to have solved, we begin introducing a lot of overhead targeted at growth pains that may never come.

The mindset is something like:

“I may not actually know what the customer want, but if I build it with these two architectural principles, these frameworks and with this technology, then I have myself set up to with the ability to pivot the system towards the customer’s needs, when I finally do understand those. In any case, they get an awesome system.” — Some overengineering developer (and at times me)

It may not solve the problem they need solved in a simple and cheap way, but it will be quite a sight…

Process perfections

If we are too far removed from the value we bring, we can also put “following the process” up as a proof that we deliver value. The mindset is something like: “We follow development process X to the letter, hence we must be providing value to the customer”.

This is not value. Actually, it can be quite the opposite where we cannot take in changing requirements or circumstances from the real world, because it is in conflict with our process or our plans. Then better to just continue following the process and plan, and disregard reality.

There are other ways in which we substitute actual value with some proxy for the value. We also sometimes proxy a solution by introducing something overly complicated and generic to compensate for not understanding what we are actually trying to solve.

Try to ensure the developers experience the actual value of what they spend their time on. It may not result in avoiding value substitutes entirely, but it will increase the likelihood that they are spotted and dealt with.

Overengineering bias

In software development, we have an inherent combination of biases that induce overengineering. There are also game theoretic phenomena and phenomena akin to evolutionary stable strategies present because of the biases and circumstances of team population compositions.

What he lacks in knowledge, he makes up for in confidence

Dunning-Kruger effect

First of all, we, as an industry, are very susceptible to the Dunning-Kruger effect.

“In the field of psychology, the Dunning–Kruger effect is a cognitive bias in which people assess their cognitive ability as greater than it is. It is related to the cognitive bias of illusory superiority and comes from the inability of people to recognize their lack of ability.” — https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

The peak in the beginning of the graph, is called mount stupid. When people are standing on Mount Stupid, they have a tendency to yell and evangelize a lot.

The dunning-Kruger effect hits us at multiple levels. It hits us on an industry level, with the hype curve. The similarity between the hype curve and the Dunning-Kruger effect is no coincidence. It is the same underlying bias. In the industry we have a tendency to be very vocal about TAPs when we are on the top of Mount Stupid.

Organizations adopt these TAPs with very high confidence and very little knowledge. Typically people not onboard are deemed old school or shamed in other ways. Here the scarcity bias plays in a bit, i.e. the fear of missing out on the future market and opportunities if we miss this silver TAP.

On the team level it is a bit of a different dynamic.

Imposter Syndrome

On the team level there are multiple reason for introducing new TAPs. It can be curiosity, wanting to improve professionally, the CV or just trying something new and shiny.

Regardless of the motivation, it is typically a single or small amount of people pushing the new silver TAP.

The software development profession is the profession with the highest degree of people suffering from imposter syndrome. Above 60%.

Impostor syndrome (also known as impostor phenomenon, impostorism, fraud syndrome or the impostor experience) is a psychological pattern in which one doubts one’s accomplishments and has a persistent internalized fear of being exposed as a “fraud”. — https://en.wikipedia.org/wiki/Impostor_syndrome

Imposter syndrome means that the majority of team members, will assume that the reason they think the TAP seems too complicated, risky or downright stupid, is just because they themselves are themselves and not as competent as the proponents of the TAP.

This is exacerbated by the fact that it is very difficult to infer, as an outsider, whether people are evangelizing from Mount Stupid or from a Plateau of actual Competence.

Technologic Stockholm syndrome

The Dunning-Kruger effect and Imposter syndrome works together, to introduce new shiny and risky TAPs. The question is then, why do we not discard the TAP when it turns out that we made a wrong choice?

There are multiple explanations for this. It is a combination of the boiling frog, sunk cost fallacy and some version of technological Stockholm syndrome.

The TAP does usually not explode in one major incident. It is typically minor incidents of increasing scope, seriousness and damage.

We typically see tensions beginning to rise, when we break new ground, either feature or adoption-wise. As the tensions begin to rise, the chance of an actual fully blown incident starts to increase. The incident can be an actual production incident or it can be “the f***ing thing does not handle X/support Y/enable Z!”.

Under all circumstances something happens that creates a lot of stress, additional work and general “unwellbeing”.

We fight through the challenges, the pains and the frustration.

At some point we end up on the other side of the incident and begin to rationalize it and make excuses for why the TAP treated us, behaved or reacted like it did. And very often we end up blaming ourselves for the shortcomings or damage induced by the TAP. “We should have known that it would act up like…”.

We get to the other side of this bad experience, dig in deeper, add more sunk cost, and on the next iteration of this vicious cycle, we are even more determined to blame ourselves for the damages induced by the TAP. And even more determined to stand our ground defending it.