AWS has suffered an operational incident that has affected multiple services in its US East region, with knock-on disruption reported by global apps and UK websites.
The scale of Monday’s outage has proved once again how intertwined with our everyday services AWS has become – with the outage being caused by an ‘operational issue’ in its US East Coast region.
You’d think that would leave much of the world with access to popular services, such as Snapchat, Slack, or Peloton. Sadly, as I would discover this morning in my UK-based home as I prepped for my workout – the outage in the US East Coast region is impacting services a bit further afield, and not just locally.
That’s confirmed by the fact the outage is also impacting UK-focused services, such as the Government’s HM Revenue and Customs website, or the London Stock Exchange. It’s also impacting banking services, with Lloyds Bank and Halifax both impacted by the outage, while Apple’s services, which include TV, Music and the App Store, have also been down for several hours.
Given the scale of the outage, AWS said its teams moved quickly to mitigate the fault. “Engineers were immediately engaged and are actively working on both mitigating the issue, and fully understanding the root cause,” AWS added in its operational update.
It’s not the first time AWS has experienced a widespread outage. In fact, many of the same apps and websites were caught out during outages in 2023, 2021, and 2020. It does highlight how fragile our online ecosystem can be, however – driving home the importance of planning for resilience in the data centre sector.
Update 10:33am – AWS has reported signs of life, with many services coming back online, albeit not 100% reliably. The firm notes that it continues ‘to work through a backlog of queued requests’, and ‘will continue to provide additional information.’
Update 2:32pm – With AWS services largely now back to normal, Jamil Ahmed, Distinguished Engineer at Solace, has been in touch to remind us how this incident underscores a fundamental vulnerability in the cloud strategy many firms take: depending on a single cloud provider. He noted, “Even as cloud technology evolves, failures within the system will inevitably happen. ‘One-of-a-kind’, extremely rare outages or issues continue to plague every service provider from time to time, which is why the need to store valuable information on multiple provider services, known as an event mesh, have arisen.
“From a business perspective, there are no excuses to having a single cloud provider. It’s multi-cloud all the way, treating cloud as commoditised compute, not building apps and services that are tied to knowing what cloud they’re in. Unfortunately, when businesses first introduced the cloud into their strategy, about 10 years ago, they made multi-provider usage a problem to solve later on. It is now ‘later on,’ and the strategy of using one cloud service is demonstrably dangerous and negligent. Anyone adopting cloud without thought for multi-cloud on day 1, should opt into an event mesh system or be fearful for that next ‘extremely rare’ event.”