I was asked by Twitter to switch to long form writing about testing, QA and risk. Here are the early raw results of my return to blogging about… all of those things Please note that part 7, the [further reading page.](http://infiniteundo.com/post/158412712773/further-reading-s-as-n-7n “The Further Reading page is constantly being updated with new resources!”) is not included here as that is a living document.

Table of Contents

Part One

> Any software system begins as a shared narrative about a problem and > the people who come together around solving that problem.

If you don’t accept the above proposition completely then nothing I have to say about software is going to work for you.

Chthulhucene Devops: staying with the trouble as a service

This is and always has been the core proposition of my “way of Devops.” Which I am now finally able to articulate and which I differentiate from other devops as Chtulhucene Devops, so as to acknowledge that it is not “mine” in any sense except as far as I know I am its sole practicing engineer. Designers and executives and other leaders may practice it — but developers mostly have a hard time with the tolerance for chaos required for what Donna Haraway has so insightfully now labeled: “staying with the trouble.”

Software is narrative

> The problem with intelligent communication is the illusion that it > has taken place. > > – GB Shaw

Suspend your disbelief and just run with this for one minute if you will: commercial software is a narrative about a problem and the community of people who come together around said problem. Note that I haven’t said anything about money or value streams yet. That’s the beauty of this approach: you start at the highest-possible view of the project: the gods-eye view. This is what the narrative approach can deliver to you, the confused but eager software hacker-er.

You see, a big problem in software – the main problem – is that you wake up one morning and find that you’ve spent 3 months building the wrong thing. It seemed like the right thing 3 months ago, communications got dropped, mistakes were made, it’s wrong now. This can happen so easily with software.

The problem of what was it even supposed to do in the first place

In order to not build the wrong thing we must know with clarity what we are meant to be building. It sounds like a tautology and if we were talking about any medium but the digital medium it would be a tautology. But as all software engineers immediately come to learn, there is a Lovecraftian, Non-Newtonian gulf between “what we build” and “what we were meant to be building.”

“Know what is meant to be happening not just what is happening” is anything but a tautology in software. It is a yawning conceptual gulf that can swallow projects whole.

How do you know what is meant to be happening?

As people we have something called an inner narrative that we compare to the external happenings in the world and that’s how we do sense-making. The thing is that the “external happenings” of the world aren’t external at all. Events in the world around us impinge on and irrevocably merge with our “inner” narratives.

The world is made of stories

Stories are how we do sense-making. Stories are literally the tool that allowed us to come down from the trees. We couldn’t master fire until we could fashion a story about how to master fire.

But with fire there was a physical thing to point to: the thing that is on fire. Get that thing. Such were our stories. For almost our entire time on earth as a species, stories were basically: there is thing, do something with thing.

But now we can’t use that narrative any more even. Because with the digital domain there is no “raw material” that we start with to create products. That this is the case causes a lot of mis-spent Web budgets, because it is counterintuitive so people tend to budget in spite of it not in alignment with the reality that Web products all begin as stories and some are less fictional than others.

There is no thing

> A monk asked Joshu, a Chinese Zen master: Has a dog Buddha-nature or > not?' > > Joshu answered: Mu.’

In software there is no “thing” that you can point at. In order to point at a “thing” in software you have to construct the thing, starting with the environment in which the thing is going to exist.

You always have to design both “the product” your customers want and “the environment” in which your product will run in production. Thus any software product begins with two obvious categories of “work to be done.” People ignore this because it seems counterintuitive.

Now you have two problems

> A programmer has a problem and says I know I’ll use Perl. > > Now they have 2 problems.

This is a class of Boundary Problem – you always have to design the environment your software product “lives” in, no matter how hard you try to isolate your project from the vagaries of its environment.

This observation generalizes and goes back at least to Wittgenstein who said of Boundary Conditions in general:

> Can’t we imagine a rule determining the application of a rule, and a > doubt which [it] removes — and so on?

Further reading

For further reading on this and related chtulhucene devops topics, please visit the further reading page.

Part Two

No you have n+1 problems

The old joke goes like this:

> A programmer has a problem and says I know I’ll use Perl. > > Now they have 2 problems.

It’s ha-ha only serious humor or as I prefer to call it: you-have-to-laugh-because-you-can’t-cry humor. I may have learned this phrase from my dad, who was a journalist.

Solving a problem with a Web service (or a device or an appliance or a mobile app that depends on Web services – to me it is the same thing!) is not just finding the solution but keeping the solution running in production forever after. That you have to be responsible for the product in the long term is something a lot of people overlook. In the early days of software “keeping it running” was minimized under the label “maintenance” or “system operations” making it sound like a negligibly important background activity.

Don’t bet against the CAP theorem

It turns out that keeping Web services running is really hard. That’s why so many historical Web sites even though they were super cool are no longer around: it’s really expensive to pay people to run sites. And it turns out people don’t like running sites no one uses. This is new information as of about 2008 or so – before that everyone assumed the opposite was true.

The crowd that realized they were wrong and copped to it coined the word “devops” to describe their insight. Devops just means that you try to establish common ground within the company, in response to any problem you establish common ground and work together toward a commonly-known compromise goal of solving the problem. It’s drawn from Theory Of Constraints and other stuff that if you are reading this series of posts still, I probably don’t have to explain to you!

You can’t sacrifice Consistency

It turns out people really care about their data. You can’t build a distributed system that is resistant against partitions. You can’t build a site that’s up all the time. But you can damn well ensure that if the user saved data with you and you told them you got it, that you definitely really still got it.

The value of data integrity to users is often overlooked. Doubters would do well to remember Ma.gn.ol.ia.

So given that these “ops” exist…

Ops came to prominence in the mid-to-late 2000s as Web 2.0 apps like Gmail and Google Maps exploded in popularity. Not to mention a stupid service called twittr that was started around then too that one caused all kinds of trouble.

The thing is that you can scale hardware and you can scale software but people can’t be cloned nor can more people be trained to do hard specialized nerd labor like Ops.

The moral of the story so far

So as a person who is interested in making a thing that will work on the internet, you need to learn about Ops. Now. And you need to learn as much as you can. Because real ops people cost a lot of money and for the most part you can bet you will never even work with one. They are that rare because the Internet of the 2000s got HUGE that fast. It really did. I was there. Ask me.

So dev and ops must find common ground. There are not enough ops any more to run the web and there never will be again because the web got so big so fast and keeps expanding at about the same rate as fuck I have no idea nothing I have ever seen. Like a rainbow kaboom this web we built kind of mostly on accident if history be known.

Oh.

There’s that troublesome idea again. History.

Narrrative. Let’s get back to talking about how to leverage the insight that software is at its heart a narrative.

Part Three

For the sake of argument

For the sake of argument let’s say that I am right and that all software is at its heart a narrative?

Wait I take that back. I can prove I’m right. I don’t need you to suspend your disbelief for this to make sense. Although in the future if you keep reading I will teach you how selectively suspending your disbelief (and that of others) can be advantageous.

For the sake of argument JUST LISTEN TO ME HERE

If software were in fact composed of stories that are shared between people, then we could easily decompose software into stories and indeed the software industry fairly blossoms with different options for doing so. Story-driven-development and ideation run amok in silicon alley, to mostly good effect!

If software were narrative then it would be possible to get just the idea of software funded and valued as if it were a real existing software (whatever that means do you start to grok me yet?)

We all know things get funded that don’t get built

Things get funded because someone likes the story. Things don’t get built for all kinds of Reasons. That a software thing was not built at all does not in any way imply that no work went into trying to build the thing. A common tragedy in the programming profession is that we build a word processor in place of what was actually needed I am not entirely making that up.

That software can be almost built for years but never actually work supports my point. If effort went into the software and the software never worked (to take an extreme case for the sake of argument.)

If that were the case, what do we have, the non-working stuff of the project? Well literally a developer will tell you “we have what’s in Git history.” Literally we have a story.

Git stores histories which are stories which is what narrative means. Game set match.

Part Four

Software is narrative: so what this is old news.

Great question!

Narratives have certain essential characteristics. All software methodologies to date have failed to reduce software to any certain set of essential characteristics.

To view software as narrative is an application of the Theory of Constraints: by applying the constraint that software must at some level be viewed / viewable as narrative, we reduce all software to simple essential characterstics.

Further we advantage ourselves of the multitudinous perspectives from 3 million years of proto/human evolution that led to having minds that can carry complex things like stories, even stories about how processors are supposed to interact with strange fictional beings we tell each other are called “the Integers.”

If you accept that the digital realm is narrative first

If you accept that the digital realm is narrative first then you realize quickly that you can still capitalize on the True Promise of the Web.

This is quite an exciting insight since it means there are numerous ways of getting mad paid on the Web, that have been overlooked because no one was looking to monetize narratives-as-such. But narratives certainly exist as such. And I have now rationalized their value.

In fact I have state and will continue to state that all my success, my career, my talks, Etsy, Barnes and Noble iOs automated builds, all my stuff, it worked (and some of it continues to work in production to this day, amazingly enough!) because I always subscribed to this vision of software as narrative.

This insight is the thing that makes my solutions stand up where others’ don’t.

It has taken me many years and much talking in private to more people than I can thank (although obviously MY WIFE deserves a big round of applause) to arrive at a place in my life where I can explain how I did what I did at each of those famously successful job sites.

Because I always felt I was carrying out a repeatable process and I was right. It just took time and self-reflection to arrive at a place where I could start to attach allegories and case studies to these: the central insights of my career.

Part Five

How do you use software to control the narrative?

You should be so lucky as to have a month of vacation saved up and a ticket you can buy to some isolated terrain where you can meditate upon this:

> Given that software products are composed of competing/interacting > narratives, how do I use software to control the narrative?

Because code in a very fundamental way determines the laws of what is and is not possible within a network of people thinking about the same problem.

The insight that code is narrative enables new ways of thinking. If you can go away and meditate on how to monetize and / or make benefit to your fellow beings with this insight you should do that. I sort of already I have I guess. It was cool. You’ll like it. Great excuse to take some “me time” and come back with a new startup idea.

Part Six

The Software Development Life Cycle reconsidered as a story cycle

A trilogy. Part the first:

First of all young'n, have ye never yeven heared of ye olde SDLC?

Why then read ye of the s, the d the l and the c fore I whack ye with my specially reserved get-off-my-lawnHammer-of-doom+4!

For I have been a hax0r yes! A hax0r of yore, in the days when bases were belong to us and yet before. When the bases belonged to no one yet.

I am long in tooth and hot of air and I speak for the cats of the internet. And the lolcats say: go forth and do not break the Web. And the cats say: we serve the will of Sir Tim Berners-Lee: break not the Web for it is already yet just a little bit broken. Let none seek to break it further and that’s QA or something. Thanks goodbye.

Part Eight

Fix All Errors And Warnings: A Narrative Perspective On The ROI Of De-Noising Logs

Before its use in computing the word log referred to a journal kept by a human: a log is a record of events in the real world. Logs recount histories and by doing so logs participate in the multiple narratives of success and failure in intractably complex sociotechnical organizations.

Logs Contain Historical Evidence

On the Web when something fails the server and application logs are the source of truth about the chain of events. Logs are the primary evidence that we use to reconstruct the chain of failure and then present a new narrative where the system works again.

Logs Contain A Lot Of Signal

The actionability of error messages is of direct business value. The faster a Web product can recover from an incident (MTTR) the less impact that incident is likely to cause. In the best case incidents are detected at the precursor stage and no production impact whatever takes place. Such is the power of high-signal logs.

Now add noise

Now to this high ROI first-responder log capability, add noise. Why?

Right.

There is no value in allowing noise in logs. Noise here can only impact the hard-won ROI of actionable service logs.

Don’t add noise to logs, it screws up the narrative

At best a collection of service logs during an incident (and here I include RRD services and StatsD, Splunk — they are all ways of logging) is a trigger for that eureka moment where an impending incident becomes a simple matter of fixing misconfiguration.

Ignoreable Errors Are A Cargo Cult