
Everybody notices when something big fails—like AWS’s US-EAST-1 region. And fail it did. All sorts of services and sites became inaccessible, and we all knew it was Amazon’s fault. A week later, when I run into a site that’s down, I still say, “Must be some hangover from the AWS outage. Some cache that didn’t get refreshed.” Amazon gets blamed—maybe even rightly—even when it’s not their fault.
I’m not writing about fault, though, and I’m also not writing a technical analysis of what happened. There are good places for that online, including AWS’s own summary. What I am writing about is a reaction to the outage that I’ve seen all too often: “This proves we can’t trust AWS. We need to build our own infrastructure.”
Building your own infrastructure is fine. But I’m also reminded of the wisest comment I heard after the 2012 US-EAST outage. I asked JD Long about his reaction to the outage. He said, “I’m really glad it wasn’t my guys trying to fix the problem.”1 JD wasn’t disparaging his team; he was saying that Amazon has a lot of expertise in running, maintaining, and troubleshooting really big systems that can fail suddenly in unpredictable ways—when just the right conditions happen to tickle a bug that had been latent in the system for years. That expertise is hard to find and expensive when you find it. And no matter how expert “your guys” are, all complex systems fail. After last month’s AWS failure, Microsoft’s Azure obligingly failed about 10 days later.
I’m not really an Amazon fan or, more specifically, an AWS fan. But outages like this should force us to remember what they do right. AWS outages also warn us that we need to learn how to “craft ways of undoing this concentration and creating real choice,” as Signal CEO Meredith Whittaker points out. But Meredith understands how difficult it will be to build this infrastructure and that, for the present, there’s no viable alternative to AWS or one of the other hyperscalers.
Operating and troubleshooting large systems is difficult and requires very specialized skills. If you decide to build your own infrastructure, you will need those skills. And you may end up wishing that it isn’t your guys trying to fix the problem.
Footnote
- In 2012, I happened to be flying out of DC just as the storm that took US-EAST down was rolling in. My flight made it out, but it was dramatic.
 




