Content Hub sponsored by
Schneider logo new

AWS outage once again takes down major apps, websites

AWS has suffered an operational incident that has affected multiple services in its US East region, with knock-on disruption reported by global apps and UK websites.

The scale of Monday’s outage has proved once again how intertwined with our everyday services AWS has become – with the outage being caused by an ‘operational issue’ in its US East Coast region. 

You’d think that would leave much of the world with access to popular services, such as Snapchat, Slack, or Peloton. Sadly, as I would discover this morning in my UK-based home as I prepped for my workout – the outage in the US East Coast region is impacting services a bit further afield, and not just locally. 

That’s confirmed by the fact the outage is also impacting UK-focused services, such as the Government’s HM Revenue and Customs website, or the London Stock Exchange. It’s also impacting banking services, with Lloyds Bank and Halifax both impacted by the outage, while Apple’s services, which include TV, Music and the App Store, have also been down for several hours.

Given the scale of the outage, AWS said its teams moved quickly to mitigate the fault. “Engineers were immediately engaged and are actively working on both mitigating the issue, and fully understanding the root cause,” AWS added in its operational update.

It’s not the first time AWS has experienced a widespread outage. In fact, many of the same apps and websites were caught out during outages in 2023, 2021, and 2020. It does highlight how fragile our online ecosystem can be, however – driving home the importance of planning for resilience in the data centre sector.

Update 10:33am – AWS has reported signs of life, with many services coming back online, albeit not 100% reliably. The firm notes that it continues ‘to work through a backlog of queued requests’, and ‘will continue to provide additional information.’

Update 2:32pm – With AWS services largely now back to normal, Jamil Ahmed, Distinguished Engineer at Solace, has been in touch to remind us how this incident underscores a fundamental vulnerability in the cloud strategy many firms take: depending on a single cloud provider. He noted, “Even as cloud technology evolves, failures within the system will inevitably happen. ‘One-of-a-kind’, extremely rare outages or issues continue to plague every service provider from time to time, which is why the need to store valuable information on multiple provider services, known as an event mesh, have arisen.

“From a business perspective, there are no excuses to having a single cloud provider. It’s multi-cloud all the way, treating cloud as commoditised compute, not building apps and services that are tied to knowing what cloud they’re in. Unfortunately, when businesses first introduced the cloud into their strategy, about 10 years ago, they made multi-provider usage a problem to solve later on. It is now ‘later on,’ and the strategy of using one cloud service is demonstrably dangerous and negligent. Anyone adopting cloud without thought for multi-cloud on day 1, should opt into an event mesh system or be fearful for that next ‘extremely rare’ event.”

Update 4:02pm – Jake Madders, Co-founder and Director at Hyve Managed Hosting, largely agrees with Jamil Ahmed. In a statement to Data Centre Review, he noted, “Today’s AWS incident is a stark reminder that even the largest and most reliable cloud providers can experience significant outages – but these risks can be mitigated. The key lies in building resilience into your infrastructure from the outset. Diversifying across multiple cloud providers and geographic regions is essential to ensure redundancy and enable seamless failover when disruption occurs. Just as important is decoupling critical services – such as, for example, identity management, DNS, and core data layers – from any single provider, so that if one ecosystem is impacted, your operations can continue elsewhere.

“For organisations that prioritise data sovereignty, it should also be a key consideration, with local failover options and replication to trusted jurisdictions built into their continuity strategy. Effective mitigation also includes regular backup and recovery testing, automated failover processes, and a well-documented, frequently reviewed incident response plan.

“A final consideration is that while large enterprises may have the internal resources to implement and manage these safeguards, smaller businesses without in-house expertise may struggle – not just during an outage, but with the aftermath and recovery. By engaging with a trusted infrastructure partner, smaller organisations can gain the foresight, tools and support they need to maintain continuity, recover quickly, and minimise disruption when incidents occur.”

Update 5:02pm – Many apps that use AWS have reported a second outage. That include Duolingo, Peloton, and even Amazon Prime Video. While many of these apps recovered earlier today, they started to experience new problems at about 4:00pm UK time. While AWS has not yet commented on the new outage, Peloton put on its own status, “We are seeing another spike in errors across several Peloton services. The team is investigating with our partners.”

Update October 23 at 09:02am – According to consultant John Strand, the AWS episode “unveiled a difficult truth: the modern internet depends on a handful of cloud providers whose internal failures can cascade across banks, hospitals, government agencies, and news outlets.” This, he argues, is the very definition of a single point of failure – a risk that national telecom frameworks were designed to avoid through redundancy and oversight – to the point telcos would argue too much.

Strand points out that AWS operates “a parallel internet, with its own fibre-optic backbones, routers, switches, and interconnection points.” Legally, this infrastructure already resembles telecommunications. Yet unlike traditional carriers, AWS is not bound by continuity obligations, universal service contributions, or public-interest oversight. Telecom providers must publish terms, guarantee interconnection, and maintain backup power levels; cloud operators face none of these statutory duties, he reckons.

During the outage, over 2,000 enterprises were affected – from Uber and Starbucks to PlayStation, Coinbase and the UK’s tax authority. Few had viable contingencies outside the same ecosystem. In a regulated telecom context, that would constitute a failure to serve upon reasonable request. But in the cloud world, resilience is treated as a premium feature, not a baseline duty.

“AWS does not guarantee full resilience as a baseline public obligation,” Strand notes. “Instead, it increasingly frames resilience as an optional, premium feature – something customers can purchase if they can afford it.”

Stewart Laing, CEO, Asanti Data Centres, added, “Many organisations have embraced public cloud as a silver bullet, but the AWS outage shows what happens when you build everything on one foundation.

“This is not just about uptime. It’s about resilience by design, and asking the hard question: where was your business continuity plan? It also again calls into question, the UK governments ‘cloud first’ policy.

“Furthermore, this outage doesn’t just hit organisations directly hosted on AWS – it ripples through entire supply chains. Even businesses that believe they’re insulated are likely to be affected when their third-party suppliers go down. With most organisations relying on multiple vendors, many of whom depend on AWS behind the scenes, the result is a cascading, system-wide impact that’s far bigger than a single point of failure.”

Related Articles

Top Stories