Tuesday, February 28th, was a bad day for AWS and for AWS users who relied on the US-East-1 Region in Northern Virginia to run their business and/or to serve their customers. I won’t rehash what happened but readers can get details on the outage in this TechTarget article. And yes, I know Amazon did not officially classify it as an outage, but it was effectively so for many users. As expected, both apologists and detractors have taken to social media and the Internet to defend or to bury Amazon Web Services. I’ve read everything from “If a user’s application failed, it’s all their fault” to “AWS is so unreliable even Amazon doesn’t use it” and everything between. One of the most balanced reflections was actually written by my Rackspace colleague, Kevin Jackson.
I want to take a few moments to share some thoughts on the S3 outage/slow-down and what it means users. Then I’ll walk through some tips for architecting against Region-level failures.