How do data centres scale for AI when power is the constraint?

Will Stewart
Will Stewart
Head of Global Industry Segment Management, Smart Infrastructure at Harting

As AI pushes rack densities higher and exposes the limits of legacy infrastructure, Will Stewart, Head of Global Industry Segment Management, Smart Infrastructure at Harting, explains why power, cooling and maintainability now need to be treated as one connected challenge.

AI workloads are reshaping data centres at pace. Operators now grapple with rising power densities, tighter cooling limits, and the need to accommodate large AI hardware deployments – all within physical footprints that once sufficed for more traditional computing environments. Racks that drew 7–10 kW just a few years ago now demand 30-100 kW, with mid-term roadmaps reaching up to 1 MW, to power training and inference clusters. These shifts are placing greater pressure on electrical infrastructure, thermal management systems, and on-site teams already dealing with labour shortages and compressed deployment timelines.

Data centre professionals are confronting these realities as AI applications drive a sharp increase in compute intensity. Engineers are designing for variable load profiles created by large model training runs and real-time inference tasks, with global AI data centre power expected to hit 90 TWh by 2026 – a tenfold increase from 2022 – while average rack densities are anticipated to rise from 36 kW in 2023 to 50 kW by 2027. In response, operators are looking at measures such as advanced cooling techniques, relocating power equipment beyond IT racks, and deploying modular infrastructure to improve scalability, reliability, and maintainability.

AI workloads’ impact on power and heat

AI workloads are changing data centre design from the ground up. GPU-heavy clusters for training and inference generate extreme power draws and erratic thermal profiles that expose the limits of legacy air-cooling systems. Instead of a relatively uniform thermal profile across a rack, operators must manage concentrated hotspots at accelerators, rapidly changing loads, and tighter operational margins. The result is that a single high-density rack can require the kind of power and thermal planning that older facilities reserved for entire rows.

These demands can escalate quickly. Training large language models (LLMs) requires sustained high utilisation over long periods, while inference spikes create unpredictable peaks that stress grid connections and on-site transformers, pushing power equipment and cooling systems outside their comfort zone. Higher-voltage DC architectures, such as 400 V or 800 V, are emerging as one possible response, cutting losses by 1-5% compared with traditional AC by reducing conversion stages and enabling thinner cabling, which frees space for compute equipment. Without such adaptations, facilities risk cascading failures: overloaded circuits can trigger shutdowns, while persistent hotspots can shorten hardware lifespan. Engineers must now prioritise dynamic power capping and real-time monitoring to balance AI’s infrastructure demands against operational limits.

Rise of alternate cooling methods

Air cooling works well in conventional deployments, but it struggles to keep pace with AI-scale density. Operators hit practical limits in airflow delivery, fan power, and heat removal, especially when accelerators pack high thermal outputs into small areas. When air cannot remove heat quickly enough, operators either leave compute capacity underused or accept performance throttling and greater reliability risk.

Liquid cooling is gaining traction because it is better suited to these thermal demands. Direct-to-chip designs move heat away from GPUs and CPUs through cold plates and coolant loops, delivering more effective heat transfer where it matters most. Immersion cooling goes further by surrounding components with dielectric fluid, enabling very high densities in compact footprints. For many operators, the appeal is not only thermal performance, but also flexibility: liquid cooling systems can support higher densities without requiring a complete building redesign, while also providing a more predictable path for future accelerator generations.

These approaches can also improve the operational equation. Better thermal control reduces hotspots and stabilises component temperatures, supporting more consistent performance and longer hardware life. In many environments, liquid cooling reduces the burden on traditional room-level cooling systems and helps teams manage energy and capacity more effectively — particularly where power availability and cooling capacity are the main constraints.

Benefits of external power equipment outside the IT rack

As racks densify, every unit of space inside the rack becomes more valuable. One practical way to reclaim capacity is to move power equipment, such as distribution, conversion, and protection components, outside the IT rack into sidecars or adjacent enclosures. This opens additional room for compute hardware, improves cable routing, and reduces the congestion that can complicate airflow and maintenance.

Externalising power equipment can also improve serviceability. Technicians can access power components without disturbing sensitive IT gear, reducing the risk of accidental disruption and simplifying planned maintenance. It also supports a more modular replacement model: instead of performing complex work in a confined rack, teams can isolate, swap, and validate power modules in a safer, more controlled manner. That matters in AI environments where uptime expectations remain high and maintenance windows continue to shrink.

There is also a staffing dimension. When designs reduce the number of custom terminations and shift complexity towards standardised assemblies, teams can complete more work with smaller on-site crews. In a market where skilled labour remains difficult to secure, architectures that simplify installation and maintenance may help operators keep projects on schedule and maintain more consistent quality across sites.

Scaling with modular, connected infrastructure

Operators that embrace modular, connected infrastructure are often better positioned to deploy AI capacity quickly across distributed sites. Factory-built power skids, connectorised busways, and standardised rack modules that snap together like building blocks can cut installation time by 50% or more compared with custom wiring. This approach supports scale-up within racks, scale-out across rows, and scale across multiple facilities sharing AI workloads in real time.

Reliability can also improve with plug-and-play designs. Pre-tested assemblies minimise human error in terminations, improving mean time between failures while enabling hot-swap upgrades without full shutdowns. Connected monitoring via IoT integrates power, cooling, and compute data, allowing predictive maintenance in always-on AI environments.

When power and cooling components are integrated into a unified monitoring and management layer, operators gain earlier visibility into anomalies and can move from reactive fixes to planned maintenance. Predictive insights can help reduce downtime and keep utilisation high — two outcomes that matter when AI workloads place a premium on compute availability.

Thriving amid power constraints

Data centres now sit at the intersection of AI-driven demand, power availability, and physical constraints that do not bend easily. Operators that respond effectively will need to treat power delivery, thermal management, and maintainability as a coordinated system rather than a set of separate upgrades. In practice, the challenge is no longer simply adding more capacity, but designing infrastructure that can adapt as AI requirements continue to change.

Categories

Related Articles

More Opinions

It takes just one minute to register for the leading twice weekly B2B newsletter for the data centre industry, and it's free.