John Allspaw has been at the forefront of learning from outside ourselves since forever — his training isn’t in IT at all; he’s got a Masters in Mechanical Engineering and worked on vehicle crashworthiness studies.

You absolutely have to learn from other disciplines if you want to be good at what you do. A lot of disciplines are better at what they do than what we do, and in some cases, we do the same kinds of things. Everyone in the DevOps / IT Admin profession has to do Usability Design, except most of us barely think about it. We run what are sometimes critical infrastructures that people’s lives and livelihoods depend on. When things fall over, a post-mortem should be conducted to discover the actual cause of failure. And the people who aren’t aware of these kinds of processes need to learn from the past and figure out how to apply it to their world.

That is one of the reasons that I liked The Phoenix Project, and I’m sure many of you did, too. It’s not a guidebook so much as an inspirational story of how someone else did what you should be doing, too. Less “case study” than “parable”, but you still need parables.

Engineers have been safely designing and building structures and vessels for a long time, and they’ve been getting better at it in recent years, too. New ideas in design and engineering, like the ones talked about in Risk Society, are changing the way that everyone thinks (or at least, the way that they should be thinking), and to be honest, it has changed in IT, too. The concept of “there can never be failure” to “failure is inevitable; make sure that we can control the failure modes” is probably the biggest one that I know of.

As it stands right now, there is no IT equivalent of the FAA, who makes sure that infrastructures are “air worthy”, so to speak, or who sets requirements for infrastructure uptime and redundancy. Part of this is the degree of criticality that they have versus what we have. Our infrastructures fall over and usually, no one is in danger of dying. But that is rapidly changing, particularly in the web-scale community.

If AWS East falls over, does anyone die? No one is able to say “no” with any degree of certainty. You can say, “anyone who puts systems for hospitals or traffic lights in AWS is an idiot” (and you’d be right), but there is prima facie evidence that idiots exist. So you can’t say that no one will die because AWS falls over. Does that imply fault?

Almost certainly not of AWS engineers, but almost definitely of the “idiot” in question. You can read about the Hyatt Regency Walkway Collapse in 1981 that killed 114 people and injured 216. The engineers who had approved the final designs were convicted of gross negligence, misconduct, and unprofessional conduct in practicing engineering. The engineering firm employing them itself didn’t get convicted, they did lose their license to be an engineering firm.

However, the AWS engineers need to be (and probably are) aware that outages of their infrastructure can cause more than financial damage to their clients, and that goes for all of us. I ran the infrastructure for a financial risk analytics service that had tens of billions of dollars under management. If I mistakenly took down a core router, what would the ramifications be? Financial, certainly, but what of the people who made decisions based on the data provided by my service? Of the people who had invested in the firms that had hired us?

While we can’t let that pressure squash us, we have to remain cognizant of the fact that a lot is riding on our critical infrastructures. I’ve seen several college of engineering departments that have a sign that says the equivalent of, “If you cheat in engineering classes, you will kill people later”. We don’t have that mindset yet with IT, but I think we should because eventually, we’ll be responsible for infrastructure that will kill people if we get it wrong.

I hate to end like that, but I’m out of time, so I’ll just say this:

Everything isn’t bleak, we aren’t destined to fail, and things can be great in the future. We need more science in our profession, we need more diversity in our profession, and we need to figure out how to reach out to the people who don’t know that they’re part of our profession.

Please comment below!