Niklas Lindqvist, Nordic General Manager at Onnec, explains how AI has prompted many operators to rethink their approach to data centre design.
With global investment in AI technologies predicted to double between 2023 to 2026, it seems obvious that data centres would be rapidly expanding to meet demand. However, recent data reveals that vacant capacity ended up dropping by 6.3% in London between 2022-23.
This surprising drop-off in new data centre capacity is because the rise of AI is forcing operators to reconsider much of their approach to data centre design. To know why, we need to understand some underlying issues around AI and the infrastructure that powers it.
Shaking up the foundations
Data centres have historically relied on CPU-powered racks for conventional computing tasks, whereas AI relies on GPU-powered racks. Because of these different hardware fundamentals, AI needs more power, cooling, and space compared to traditional compute.
To satisfy the particular needs of AI compute, operators need to invest in bespoke infrastructure for it – such as more power connections or alternative cooling systems. These requirements present complications for operators: since the supporting infrastructure for AI has to be embedded in the walls and foundations of a site, it’s extortionately expensive to replace.
Because of the expense of replacing the infrastructure to support AI compute, many operators will find themselves unable to retreat from their commitment to AI. So, if an operator over commits to AI, then their data centres could end up saddled with underutilised and unprofitable capacity for years to come. The risk of overcommitment is high for operators amid uncertainty around the long-term demand for AI, with Gartner saying that the technology is at the peak of inflated expectations in its hype cycle.
Due to this mix of market uncertainty and irrevocability in infrastructure, many operators have been holding back on new data centre projects at the design stage.
Holistic planning
Operators can’t delay new data centres indefinitely, and are aware that they risk losing market share and their competitive edge if they take too long to respond to the demand for AI.
Given this need to be first movers while offsetting the risk of overcommitment to AI, operators need to do what they can to ensure their data centres are as efficient and resilient as possible. However, with the rules of data centre infrastructure being rewritten in real-time to accommodate AI, operators will need to adopt a new approach to designing and planning their sites – one that’s far more holistic than before.
Bring in more stakeholders
AI compute, regardless of how extensively it’s deployed in a data centre, is set to make sites significantly more complex than facilities running ‘just’ traditional compute. The heightened demands for AI compute opens up many more potential bottlenecks and points of failure in a site.
To guarantee uptime and minimise the potential for expensive issues over the lifespan of a data centre, teams should be more meticulous than ever before during a site’s planning phase. At the outset of projects, operators should seek out a greater range of teams and expertise to inform initial designs. Along with expertise on power and cooling, operators should seek input from other teams such as cabling, security, and ops from early on to remove potential bottlenecks or fault sources.
Leverage AI in the data centre
With AI compute now existing in their data centres, operators should use the technology to discover new efficiencies in their sites. AI offers tremendous promise for the data centre, with it being able to offer precise and high-quality insights for a variety of workflows. These include:
- Temperature and humidity monitoring
- Security system operations
- Power usage monitoring and allocation
- Hardware fault detection
- Predictive maintenance
By using AI on-site, operators could dramatically improve the efficiency of their data centres. With the next generation of sites set to have particularly novel and complex layouts, AI is perfectly suited to help teams address any new and unexpected challenges.
Invest for the long-term
During peak times such as training runs, AI places a significant load on data centres. These peak periods will often see AI compute exceed the limits of traditional sites for power draw, cooling demand, and data throughput.
This peak load will place a greater strain on the basic components that underpin a data centre, while also driving an increase in the number of parts and connections within a site. If those basic materials and components aren’t of sufficient quality, then the demand of AI compute will make them more prone to failure. As a result, many cheaper and lower quality materials that are fit for use in traditional sites can bring data centres running AI compute to a halt.
Because of this, operators should be very wary of trying to save money on sites featuring AI compute by choosing lower quality materials. Typically, any savings from cheaper components will be swallowed up by the opportunity and replacement costs that arise when they fail.
Tackling the infrastructure problem
The infrastructure demands of AI compute are the main reason that operators are holding back on investment, but this isn’t going to be the case indefinitely. As the market becomes less uncertain over the long-term demand for AI, firms will eventually discover the ideal split in their sites between traditional and AI compute.
When this happens, operators will need to ensure they have every possible advantage at hand in their site’s operations. To help ensure their sites are able to support them as they mature in a market dominated by AI, operators should design sites holistically from the outset, leverage AI to discover new efficiencies in their sites, and invest in high-quality materials that can handle the heightened demands of AI compute.