Jürgen Hatheier, International CTO at Ciena, explains why the surge in AI training will eclipse cloud traffic and force operators to rethink capacity, automation and fibre strategy at 800 Gb/s and beyond.
Over the past 20 years, global broadband network traffic has increased at a steady, predictable rate. As cloud services, high-resolution video and other high-bandwidth applications have developed over that time, annual bandwidth increases of 20 to 30% have been sufficient to deal with the additional demands placed on the network.
But that is soon about to change thanks to the growth of AI workloads along with ambitious plans set forth by governments and businesses to compete in the AI race.
In the UK, for example, the AI Opportunities Action Plan aims to increase sovereign compute capacity by at least twenty times by 2030, stating “such expansion is critical if the UK is to keep pace”. Across the Atlantic, the US Department of Energy recently identified 16 federal sites where tech companies can rapidly build data centres to accelerate commercial development of AI.
Meanwhile, the major hyperscalers are both optimising their data centres for AI workloads in addition to building dedicated AI data centres. Meta, for example, is in discussions to build a new AI data centre campus with costs exceeding $200 billion.
The growth in compute capacity driven by AI data centres will of course have implications on data centre interconnect (DCI), the networks used to link multiple data centres. Ciena recently commissioned a global survey of over 1,300 data centre experts to understand their expectations about AI’s impact on DCI in the coming years. The survey findings validated the well-known fact that AI is sparking a significant transformation in data centre network infrastructure.
AI’s impact on DCI
According to Ciena’s survey, more than half (53%) of data centre experts believe that, over the next two to three years, demands from AI workloads on DCI infrastructure will surpass those of cloud computing and big data analytics.
To meet these demands, significant investments in (and expansion of) data centre estates and infrastructure are underway. In fact, according to the survey, 43% of new data centre facilities are expected to be dedicated to AI workloads.
As the requirements for AI compute increase, Large Language Model (LLM) training will likely take place over geographically distributed facilities. Splitting the training in different locations will allow the required tens of thousands of power-hungry GPUs to tap into different parts of the power grid. This approach requires data centres to synchronise results at each step of the training process, by exchanging massive amounts of data. These transmissions must be as fast as possible to make the most of the costly compute infrastructure, creating the demand for unprecedentedly high DCI bandwidths.
The global data centre experts interviewed in our survey expect AI to drive at least a six-fold increase in DCI network bandwidth over the next five years. This translates to between 40 to 60% compound annual growth, more than double the growth pattern we’ve known since the early 2000s.
CSPs and hyperscalers alike are taking steps to add more capacity and prepare their networks to handle the demands driven by AI. According to the survey findings, there are a mix of solutions, both hardware and software, data centre experts believe will be needed to improve DCI performance, efficiency and scalability.
Bigger, smarter networks for AI workloads
As data traffic, driven by AI workloads alongside pre-existing cloud, video and analytics services continues to surge, data centre operators are increasingly investing in scalable, high-capacity infrastructure to maintain pace with demands. For instance, nearly nine in ten (87%) data centre experts believe they will need a minimum of 800 Gb/s per wavelengths across both new and existing network routes.
Capacity, however, isn’t the only infrastructure issue that data centre operators will need to address. To manage the diverse traffic types and dynamic traffic patterns of AI workloads, networks are transforming to be more intelligent and adaptive.
Smart networks and intelligent automation platforms will be key to ensuring AI traffic is prioritised and routed efficiently. Real-time software automation capabilities can enable networks to dynamically adjust bandwidth, optimise power consumption, and prevent congestion.
Managed Optical Fibre Networks: A new approach for collaboration
While cloud providers and hyperscalers are expanding very rapidly to support AI initiatives, our survey confirms that they will be leveraging Managed Optical Fibre Networks (MOFN) as an approach to help scale out and support the increased demands for data centres.
In fact, two thirds (67%) of the global survey respondents plan to use MOFN while a third would acquire dark fibre themselves.
There is no one-size-fits-all approach
While the approach laid out above could help cloud and communication service providers optimise their networks for AI workloads, the reality is that there is currently no one-size-fits-all network architecture to meet all use cases. We’ll see over the coming months and years different network architectures and expansion strategies that fit specific business models, and which best cater to their customers. Regardless of the approach taken, the key will be high performance DCI connectivity.