Quick Takeaway - This and some other such outages are not completely a Human Error, it is a Design Failure as well. A better Design(UX) can help in avoiding such a massive outage/mistakes.

On 28th Feb 2017, Amazon’s S3 Service faced nearly 4 hours outage. Amazon’s web hosting services are among the most widely used out there, which means that when Amazon’s servers go down, a lot of things will go down with them. Big products like Slack, Quora faced issues.

The Root Cause posted by Amazon -

At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

What Mashable has to say - " The cause, according to the company, who posted a very wordy explanation on its website Thursday, was "human error." Which sounds bad enough until you find out exactly what the "human error" was: a typo. "

What Gizmodo has to say - " Apparently, some poor engineer at Amazon Web Services (AWS) did an oopsie and brought the internet to its knees. Oopsies are the worst! "

What Venturebeat has to say - " The event was triggered by human error "

But, I would not say that AWS outage was completely a human error. It was a Design(UX) failure as well.

How to avoid such mistakes by a Better Design? Well, Design for Confirmation.

Confirmation is a technique used for critical actions, inputs, or commands. It provides a means for verifying that an action or input is intentional and correct before it is performed. Confirmations are primarily used to prevent a class of errors called slips, which are unintended actions. Confirmations slow task performance, and should be reserved for use with critical or irreversible operations only. When the consequences of an action are not serious, or when actions are completely and easily reversible, confirmations are not needed.

Above paragraph says all we need to think about designing right confirmation page. Just to make it more clear.

Good Practice:-

Have a detailed confirmation if action is mission critical or outcome can not be reversed simply. Have the action as a question in the header of the Confirmation Dialog/Modal. Have clear explanation about the outcome of the action in the body of the Modal. Restate the action in the confirmation button. Make the button stand out in the Modal, and should have a color(like red) which indicates the action being taken is critical. Have an Acknowledgment Modal as well, where we can have the outcome of the action performed. If the action can be reversed have a link to reverse the same. The acknowledgment will remove any uncertainty about the action the System has just performed.

Examples -

(Invision’s confirmation modal requires checking boxes that indicate what will happen when a user deletes a prototype.)

(Github's confirmation modal asks the user to enter the repository name again so that user does not accidently delete the code repository)

(Example of Acknowledgement with Button to check the outcome of the Action)

Bad Practice:-

Simple confirmation dialog without much details of the action

Example :-

Thanks for taking your valuable time to read this post. If you wish to, You can also read my other posts Here at LinkedIn. I occasionally write about my own learning from work & experimentations. Also feel free share your own thoughts or learnings about the same.

Hemant Kumar Singh

References -