AI-ready infrastructure is no longer a single blueprint

Shahar Belkin, Chief Evangelist at ZutaCore, explains why the split between AI factories and inference workloads is forcing data centre operators to rethink power, cooling and location strategy.

AI has turned data centre growth into an energy and location question. As demand for AI compute accelerates, access to reliable, affordable power now influences where capacity can be built, and how quickly it can be brought online.

The scale of this change is evident in energy forecasts. Recent International Energy Agency analysis projects that global electricity demand from data centres could more than double by 2030 to around 945 TWh, with demand from AI-optimised data centres alone expected to more than quadruple.

In that context, two distinct types of AI infrastructure are emerging. The first is a set of centralised training hubs – often described as ‘AI factories’ – that cluster compute in a small number of sites. These facilities prioritise power availability and speed of deployment over proximity to end users.

The second is a distributed landscape for inference, where AI models are deployed across existing cloud regions, enterprise environments and, increasingly, at the edge. Here, factors such as latency, data residency, sovereignty, and user experience become just as important as raw compute throughput.

Two infrastructures, two operating realities

Over the past few years, AI has moved firmly into the mainstream, and with it data centre planning has started to change pace. Accelerator power has climbed quickly, and platform cycles are now moving faster than most buildings can adapt.

The shift is most visible in training environments. AI factories exist to create models, so success is measured in throughput. That concentrates large clusters into fewer sites and puts power delivery at the centre of the business case. In practice, location often follows access to power and speed of build, because the model can be deployed elsewhere once training is complete.

Inference environments, however, pull in the opposite direction. Once models are in production, value comes from embedding them inside services that people rely on every day. As a result, AI is increasingly deployed within existing cloud regions and enterprise estates that already run live workloads.

Some inference can sit centrally, but there are use cases where distance becomes a hard limit. When decisions must be made in real time, compute moves closer to users or machines. That is why edge deployments are appearing in more places, often in compact, modular form factors.

This divergence is also where the idea of being “AI-ready” stops being a single label. Training environments can often schedule disruption during upgrades or maintenance. By contrast, inference environments usually have less tolerance for interruption, so availability, design and maintenance discipline matter more.

Cooling and power are now strategic constraints

Cooling used to be treated as background infrastructure because air was simple and familiar. However, rising compute density is placing increasing strain on that model. While air cooling remains effective, it demands increasing amounts of airflow and fan power, which can drive bulkier server designs and higher overheads. In power-limited sites, that overhead can set a ceiling on usable capacity.

Liquid cooling has therefore moved from a niche solution to a more mainstream consideration for high-density deployments. TrendForce has projected liquid cooling penetration rising from 11% in 2024 to 24% in 2025, as next-generation platforms push rack power higher. As rack power rises, shared cooling equipment can become more practical, because in-rack systems stop scaling efficiently once the rack becomes the dominant design variable.

At the same time, the term “liquid cooling” covers several different technical approaches. Single-phase systems remove heat by warming a circulating liquid, so higher chip power usually means higher flow and higher pressure. Two-phase systems remove heat through boiling and condensation, so phase change carries much of the heat.

As a rough guide from today’s deployments, single-phase cold plates need around 1.5 litres per minute for every 1,000 watts of processor power, while two-phase approaches can be far lower – around 0.3 litres per minute for every 1,000 watts of processor power – because evaporation and condensation do the work. Exact figures depend on design, yet the scaling behaviour is the point.

Regulatory and sustainability pressures are also pushing the industry in this direction. The EU has established the first phase of a common Union rating scheme for data centres, with a tightening reporting environment for larger sites that includes energy and water-related metrics. When disclosure becomes normal, cooling choices sit inside a wider conversation about resource use and local impact.

Over the next two to three years, planning needs to treat training and inference as distinct infrastructures, even when they share suppliers and silicon. Training strategy should start with power reality and density targets, whereas inference strategy should start with service expectations and operational constraints.

Cooling needs to be considered early in both cases, because it shapes time to deploy and sets the ceiling for density, while also influencing resilience choices where maintenance windows are scarce.

AI-ready now means matching architecture to workload while staying inside the limits set by power availability and local expectations. The split between AI factories and inference is not a temporary phase. It is becoming part of the structure of how AI is being deployed, and leaders who plan for both realities will find it easier to scale without revisiting first principles every year.