Five tips to scale cooling for mega AI data centres

Kevin Roof, Director of Offer & Capture Management at LiquidStack, lays out five principles to future-proof cooling as AI campuses scale to gigawatts and racks push toward 1MW.

AI has redrawn the blueprint for the modern data centre. The next generation of sites will be far more than just rows of servers in cavernous halls. We’re now talking about ‘cities of compute’ consuming gigawatts of power, stretching across footprints comparable to entire city districts, and housing millions of GPUs.

The question is not whether we can build them, but how we run them efficiently, reliably and sustainably at this unprecedented scale.

With Meta becoming the latest big tech giant to announce plans for multi-gigawatt AI campuses, the message is clear: cooling will be one of the defining engineering challenges of the decade.

Here are five guiding principles for getting it right.

1. Plan for future silicon, not today’s racks

Cooling strategies must align with tomorrow’s silicon rather than today’s benchmarks. NVIDIA’s roadmap points towards racks climbing from 100kW into the hundreds, with projections of 600kW and even 1MW per rack in the coming years.

These figures shatter the assumptions of traditional thermal design. Operators who plan cooling around current averages risk constant retrofits, unplanned downtime, and ballooning costs. Instead, they need to model against the exponential trajectory of GPU performance. In practical terms, that means designing infrastructure that doesn’t just handle current technology demands but is robust enough to accommodate multiple generations of silicon innovation without fundamental redesign.

That means assuming megawatt-class racks are just the baseline. This will not only prevent bottlenecks but will also build resilience into the facility. By looking five years ahead, operators buy themselves breathing room to scale capacity without panic retrofits that stall deployment and drive up costs.

2. Think modular, think scalable

Hyperscale AI data centres are rarely deployed in a single, one-off build, they’re phased, layered, and scaled over years Their cooling strategy must reflect this. Traditional monolithic builds often lead to over-investment early on, potentially leaving vast amounts of underutilized infrastructure sitting idle. A modular, demand-driven scaling approach turns that model on its head.

Skidded, modular coolant distribution platforms allow operators to start small and scale to tens of megawatts as required. Instead of oversizing from day one, capacity is added incrementally, matching the cadence of GPU deployments. This flexibility reduces stranded capital and accelerates time-to-service, enabling operators to light up new areas of the campus without waiting for the entire site to be built out. In short, modularity creates agility: the ability to deploy cooling capacity in step with demand rather than in advance of it.

3. Design for maintainability and service

Take a lesson from IT architecture: build for redundancy, accessibility, and hot-swappability. In a mega campus with potentially millions of GPUs, downtime caused by a spluttering pump or failing sensor is simply unacceptable. Serviceability must be a first principle of design.

That means front-access units that can be positioned flexibly, components designed for easy replacement, and control systems decoupled from pumping hardware to allow targeted maintenance. Predictive monitoring adds another layer of reliability, using real-time data on flow rates, temperatures, and pressure to spot anomalies before they become failures. If a disk can be swapped without powering down a rack, the same philosophy should apply to cooling: service without disruption.

4. Keep supply chains as scalable as the technology

A cooling design that looks elegant on paper means nothing if it can’t be manufactured, delivered, and installed at the pace of hyperscale rollouts. Supply continuity is just as important as thermal performance. AI campus operators need partners capable of delivering cooling infrastructure globally, at speed, and at scale.

That requires more than factories – it demands an ecosystem spanning logistics, field engineers, and service technicians who can install, commission, and maintain cooling systems in tandem with GPU deployments. Operators must look for cooling partners who can both provide equipment and deliver ongoing service at global scale. When data centres scale in gigawatts, supply chains must be equally robust and agile.

5. Generate value, not just heat

Mega data centres will inevitably reject colossal amounts of heat, and simply venting it into the atmosphere is no longer acceptable. Communities and regulators will demand better. The opportunity lies in transforming this by-product into a resource.

District heating networks, industrial processes, and agricultural greenhouses all present opportunities for repurposing data centre heat. By integrating these solutions from the outset, operators can reduce environmental impact, improve community relations, and even create new revenue streams. Planning for heat reuse is not just about meeting sustainability targets – it’s about reframing cooling as an enabler of wider social and industrial ecosystems.

The bottom line

The next generation of AI campuses will present the largest cooling challenge the industry has ever faced – and they’re also the clearest opportunity to prove liquid cooling’s worth. Success will be defined not just by the raw ability to manage thermal loads, but by the foresight to design for density, modularity, serviceability, supply resilience, and heat reuse. This is not about building cooling for today’s racks, but future-proofing for the next decade of silicon innovation. Operators who embrace these principles will not only keep their AI factories cool, but also ensure they remain competitive, sustainable, and socially acceptable in an era of super-sized compute.