Gregory Lebourg, Global Environment Director at OVHcloud, believes it’s time to move beyond traditional air cooling and embrace liquid cooling as the future of greener, more efficient data centres, as he explains.
If you’ve spent any time in a data centre or even just using an old laptop, you’ll know how much heat our technology usage generates. On an industrial scale, organisations have had to become smarter about how they efficiently handle heat from servers to avoid incurring a high carbon footprint – not to mention correspondingly high electricity bills.
With the dawn of AI in the last 18 months, this challenge has become increasingly acute. According to the World Economic Forum, AI can consume approximately 33x more power than standard compute tasks. Furthermore, GPUs are usually more carbon-intensive to manufacture, with studies showing that GPUs have a carbon ‘cradle to grave’ footprint that can be between six and 30x higher than their CPU counterparts.
Of course, AI systems (and by extension, GPUs) are able to perform tasks far beyond what standard computing environments can achieve by themselves. And even though AI may be contributing to global warming more than other computing environments, AI tools are also able to track iceberg melting, help us to understand deforestation patterns, support recycling efforts and help to clear the oceans far better than other tools. AI can help us tackle some of the toughest sustainability challenges in the world – but we must also learn to use it responsibly.
Why it’s time to end our love affair with air cooling
Being smart about what we ‘AI-enable’ and what we don’t is crucial to future sustainability efforts – but so is every single link in the chain. Cooling is one particularly important part of this process, but one of the most significant challenges is our industry’s long-standing and somewhat dysfunctional relationship with air cooling. Air cooling is intrinsically volatile, and we mean that literally.
Air flow is difficult to model. There’s an entire industry (Computational Fluid Dynamics) which uses advanced technology to model how gases move. In confined environments – like enclosed server racks – there’s not a lot of room to manoeuvre, but there’s another challenge to consider: water’s volumetric heat capacity is just over three thousand times better than air, and as such, the air flow needs are proportionally greater to carry heat away.
In some cases, using air is practical, but there are better ways to cool a large proportion of the data centre estate.
The benefits of giving our chips a bath
People – largely gamers – and data centre organisations have been using water cooling for decades. In the past, this has primarily been direct-liquid-to-chip (DLC) where a water-filled cooling block is placed on the CPU or GPU. The water is heated by the components, then the warm water is removed via a pipe to a dispersal mechanism like a heat exchanger.
This technology can also be taken further where needed. Immersion cooling, as you might guess, immerses the entire server in a non-conductive fluid and cools the entire unit at once. One challenge in this process is that CPUs and GPUs tend to run a lot hotter than the other components (RAM, SSD etc.). However, the two technologies can be ‘combined’, using DLC to cool the CPU and GPU inside an immersion-cooled server, gaining further fine-grained control over the cooling process.
One challenge with measuring the impact of water-cooling systems is that both DLC and hybrid immersion tend to reduce both the power consumed by the servers directly and the amount of energy required to support servers – effectively reducing both sides of the PuE equation. However, we have seen studies showing that water-cooling can reduce data centre power consumption by 18%. Furthermore, hybrid immersion systems can reduce the power requirement of the cooling systems by approximately 20% compared to DLC systems, and the energy consumption of the server itself by 20% compared to air-cooled systems, and 7% compared to DLC systems.
What to do with our excess heat?
Clearly, water-cooling does need careful planning: maintenance is unambiguously more complicated, especially for immersion systems, but DLC cooling has been deployed at scale for some years now.
And at the risk of being simplistic (and regardless of whether organisations use DLC or immersion cooling – or even air) data centres generate a lot of heat. This heat has to be dispersed – or better yet, used. Many of us will have seen the example of the council swimming pool heated by the waste heat from a local data centre, but this use case presents an interesting conundrum for a lot of organisations.
Even for water-cooled data centres, making use of waste heat can be tough. For example, the temperature difference between the cold water entering the premises and the warm water leaving it is often quite small. In best cases, it can be as high as twenty degrees, which means that the warm water leaving the facility may only be as hot as water used for domestic purposes (40-50°C).
Adapting the temperature of the ‘cold pipe’ and/or increasing the difference in temperature can be envisaged but raising the overall temperature of the system requires a lot of R&D efforts. However, this difference can be changed, and with a little dedication, it’s possible to harness what is essentially ‘free’ energy and heat offices or nearby buildings with the waste heat.
What does the future hold?
On one level, the argument for developing more water-cooling systems in data centres is simple and straightforward: better thermal conductivity means more efficient cooling, which means lower power consumption (and smaller electricity bills).
But as we’ve said, we also need to look at the bigger picture, including the ability of water-cooling to improve sustainability as a whole, from reduced power and water consumption to improving the lifespan of server components themselves, reducing our need to mine as many rare earths and metals, for example. Water-cooling – particularly advanced systems like immersion (and hybrid immersion) cooling – is also able to cool hotter servers more effectively, like those running intensive AI applications or doing model training.
However, we should also question everything from end-to-end, including asking ourselves ‘bearing in mind the cost, do we need AI for this? Is AI the best way to do this?’. There’s no doubt that AI can help to supercharge our scientific progress, development and productivity, but it also comes with a cost.
We need responsible AI, and that means simultaneously looking at the big picture and the small details. Water- or more accurately, liquid-cooling is a key piece of this puzzle, representing a way of keeping energy consumption and costs down, enabling our newest industries, and ultimately helping to make sure that we can serve today’s needs without compromising the world of tomorrow.