Employee Error to Blame For Huge Amazon S3 Outage This Week Amazon says an employee error was responsible for the Amazon Web Services outages earlier this week that took a large number of websites completely offline. On Tuesday, users predominantly on the east coast were unable to access a large number of sites including Netflix, Quora, Slack, Reddit, The Security and Exchange Commission, AirBNB, Medium and Expedia (even, ironically, downdetector.com). The issues plagued Amazon for much of Tuesday, though the company had yet to issue an explanation for why.

Now, in a website post , Amazon says the problems were caused when an employee screwed up a coding update to the company's billing systems. "At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process," Amazon said. "Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended." The servers that were inadvertently removed managed the metadata and location information of all S3 objects on the east coast, as well as the allocation of new storage. Removing a significant portion of the capacity for these systems required a full restart to resolve, but while these subsystems were being restarted, S3 was unable to service requests, the company notes. Ultimately, things were resolved by 5 PM EST on Tuesday. Amazon stated they're taking steps to prevent the problem from happening again. "We are making several changes as a result of this operational event," Amazon said Thursday. "Finally, we want to apologize for the impact this event caused for our customers. We will do everything we can to learn from this event and use it to improve out availability even further." "We are making several changes as a result of this operational event," Amazon said Thursday. "Finally, we want to apologize for the impact this event caused for our customers. We will do everything we can to learn from this event and use it to improve out availability even further."







News Jump Europe's Top Court: Net Neutrality Rules Bar Zero Rating; ViacomCBS To Rebrand CBS All Access As Paramount+; + more news Verizon To Buy Reseller TracFone For $7B; 5G Not The Competitive Threat To Cable Many Thought It Would Be; + more news MS.Wants Records From AT&T On $300M Project; Google Fiber Outages In Austin, Houston, Other Texan Cities; + more news States With The Biggest Decreases In Speed; AT&T Hopes You'll Forget Its Fight Against Accurate Maps; + more news AT&T's CEO Has A Familiar $olution To US Broadband Woes; EarthLink Files Suit Against Charter; + more news 5G Doesn't Live Up To Hype, AT&T's 5G Slower Than Its 4G; Cord-Cutting Now In 37% of Broadband Households; + more news FCC Cited False Broadband Data Despite Warnings; ZTE, Huawei Replacement Cost Is $1.87B, But Only $1B Allocated; + more Cogeco Rejects Altice USA's Atlantic Broadband Bid; AT&T Is Astroturfing The FCC In Support Of Trump Attack; + more news Big CBRS Auction Winners: Verizon, Windstream, Dish, Cablecos; Altice USA makes play for Atlantic Broadband; + more news Verizon, SpaceX, CenturyLink, Charter Among RDOF Bidders; Streaming 1st Choice For 50% Of Viewers: What Now? + more news ---------------------- this week last week most discussed

Most recommended from 53 comments

brianiscool

join:2000-08-16

Tampa, FL 17 recommendations brianiscool Member Fired Someone got fired.

Anonf9e4d

@comcastbusiness.net 15 recommendations Anonf9e4d Anon But I thought.... I have a customer who thinks hosting in "the cloud" makes everything work flawlessly. Nope! It just means when something gets screwed up, you have no control in getting it resolved!

tomatoe

Premium Member

join:2002-08-03

Kansas City, MO 15 recommendations tomatoe Premium Member It happens No amount of SDN, or API's, no amount of personal experience can keep networks safe, and running at five 9's. Accidents can happen, and you hope you have enough resilience built into your networks to handle it ...sounds like Amazon does not. Was that really this person's fault, or was that a failure of a bigger scale in general? They were debugging an issue during business hours. Was the person told to do this debugging by a manger? Probably so.



I hope the engineer/admin doesn't get canned. There is always a chance to teach and mentor.

FureverFurry

RIP Daphne: 3/12/05 - 6/19/12

Premium Member

join:2012-02-20

49xxx Zoom 5341J

ARRIS WBM760

Vonage VDV-21

11 recommendations FureverFurry Premium Member Resume'

ACCOMPLISHMENTS: Single handedly brought down the entire internet for major firms on the east coast".



I'll give Amazon credit for acknowledging what happened in a truthful manner other than mumbo-jumbo non-explanations like some companies give. No doubt that (ex) employee has a job offer with a Fortune 500 company after a resume' showing:ACCOMPLISHMENTS: Single handedly brought down the entire internet for major firms on the east coast".I'll give Amazon credit for acknowledging what happened in a truthful manner other than mumbo-jumbo non-explanations like some companies give. antidelldude

join:2003-12-22

Beverly Hills, CA 8 recommendations antidelldude Member How Stressful THIS is why I passed up working sysop in data centers. Too much damn stress. wkm001

join:2009-12-14 6 recommendations wkm001 Member Catastrophic Failure A catastrophic failure is normally just a series of small failures.

MalibuMaxx

Premium Member

join:2007-02-06

Chesterton, IN 4 recommendations MalibuMaxx Premium Member Employee of the Month They should give this man an award... having stuff in the cloud can be both a blessing and a curse... It gives the customer no control over anything... So it is a double edge sword...