The four techniques you need to know to cool AI data centres

Alan Farrimond
Alan Farrimond
Vice President of Data Center Solutions at Wesco

Alan Farrimond, Vice President of Data Center Solutions at Wesco, believes the rise of AI is driving high-density demands that traditional air-cooling alone can’t handle – as he explains the four critical methods teams should consider.

The growing need for high-density data centres to support AI and other demanding workloads is driving more data centre teams to adopt liquid cooling rather than more traditional air-cooling methods for the IT critical load. But outside of hyperscalers, many data centre teams are still learning the different liquid-cooling methods that exist and how they can make sure deployment goes smoothly.

There are four base design options for liquid cooling to consider: traditional hot/cold aisle containment, rear-door heat exchangers, direct-to-chip cooling, and immersion cooling. The latter three options far outperform traditional air-cooling systems, which may be insufficient for cooling the power-hungry racks in high-density data centres on its own. 

Each of these options has its own unique performance ranges and deployment and maintenance demands that need to be considered.

Let’s explore each of the liquid cooling options and what data centre teams should be thinking about so they can choose the right option for their project.

1. Traditional hot and cold aisle cooling

Traditional hot and cold aisle cooling is a widely used method in data centres. This technique involves arranging server racks in alternating rows with cold air intakes facing one cold aisle and hot air exhausts facing the opposite hot aisle. 

The cold aisle is supplied with cool air from computer-room-air-conditioner (CRAC) and computer-room-air-handler (CRAH) units, while the hot aisle collects the heated air and returns it to the cooling units for re-cooling. This separation helps to prevent the mixing of hot and cold air, thereby improving cooling efficiency and reducing energy consumption. However, this technique is generally restricted to circa 15-20kW per cabinet.

2. Rear-door heat exchangers

This technology is commonly used by data centres that are switching to liquid cooling because it provides an efficient and complete cooling solution.

Rear-door heat exchangers sit on the back of a cabinet and capture hot air from IT equipment before it enters the white space. The captured heat is transferred via a coil to a chilled water source, and the heat is then discharged out of the back of the cabinet.

A benefit of rear-door heat exchangers is that they often can remove 100% of the heat generated by the server without requiring other heat-dissipating technologies. They’re also room neutral. The temperature of the air discharged from the cabinet will be the same as the room’s ambient temperature.

Rear-door heat exchangers generally support power densities up to 85kW to 90kW per rack. Some manufacturers state their technology can support up to 200kW, but these offerings are typically designed for specialised use cases or cabinets. When considering rear-door heat exchangers that claim to support above 90kW, verify that they’re appropriate for the application and cabinet where they’ll be deployed.

3. Direct-to-chip cooling

This cooling method, also known as direct liquid cooling, puts water directly on a heat sink or cold plate inside the equipment to remove heat at its source, before it is discharged as hot air.

There are two options for direct-to-chip cooling. The most common is single-phase cooling. Here, the coolant or water that has absorbed the IT equipment’s heat is moved to a coolant distribution unit (CDU), which transfers the heat to a larger loop. The cooler fluid is then pumped back to the hardware as part of a continuous cycle.

In the two-phase option, the liquid that absorbs the heat is boiled off. The resulting vapor then condenses back into a liquid and cycles back through the system.

Direct-to-chip cooling typically supports up to 100kW per rack, although in some cases it can go as high as 120kW. A trade-off of this cooling method is that it only cools the chip, not the rest of the cabinet. This means that another cooling solution like rear-door heat exchangers needs to be used. As it is not a full-cabinet cooling solution, it’s typically not the first choice as a cooling solution.

4. Immersion cooling

Fully submerging servers or racks in liquid is the final closed-loop liquid-cooling option.

This process uses an inert and non-conductive dielectric fluid in an immersion enclosure to absorb heat generated by servers, GPUs and other associated hardware. The heated fluid is then circulated from the enclosure to a cooling system where the heat is extracted, such as with heat exchangers or direct liquid-to-liquid cooling. Next, the cooled fluid is transferred back to the enclosure.

A drawback of immersion is that it hinders maintenance. If equipment needs to be repaired, for instance, workers need to lift it from the immersion liquid, often using a gantry, and then wait for it to dry. This has largely limited the use of immersion cooling to applications like bitcoin mining where the hardware doesn’t experience regular change and maintaining equipment is not a high priority.

The maintenance pain point can be addressed with a two-phase option for immersion cooling. This approach uses a dialectic liquid that boils off as it captures the heat generated by the equipment. The vapor that is produced is then captured, condensed and cycled back through the system (or not, if maintenance needs to be done). However, the two-phase immersion option is still early in its adoption phase.

Other considerations before deploying

After a data centre team decides on a liquid cooling approach, keeping a few things top of mind during planning and implementation can help them avoid surprises.

First among them is to plan for the entire system, not just the technology.

Sometimes, data centre teams charge ahead with building and acquiring a bill of materials for a liquid-cooling project before they understand what will be needed to make those materials work together. This can create roadblocks when they discover the installation contractor they hired isn’t familiar with creating a secondary water or fluid loop, which is required to remove heat loads and is separate from the primary loop that cools the entire facility.

Addressing the day-two considerations for a liquid-cooling system is also important. Equipment like the CDUs used in direct-to-chip cooling systems are far less tolerant of contaminants than the CRAC and CRAH units that have been used in data centres for decades. Data centre teams now need to think about contamination that can happen at the microscopic level, such as by using additives like inhibitors that help prevent bacterial growth.

Teams should also be thinking about how they can standardize their liquid-cooling system if their scope extends beyond one data centre.

This is where a supply chain partner with a wide reach and liquid-cooling know-how can be especially helpful. They can help create a basis for design that includes not only stamp-ready drawings but also installation and commissioning packages, essentially creating a playbook for every deployment. This can save significant time, effort, and money compared to treating each liquid-cooling deployment as a bespoke project.

Keep cool and carry on

Growing power demands are making liquid cooling a necessity in today’s data centres. By understanding the nuances of the different liquid-cooling options, data centre teams can choose the best cooling strategy for their facility and manage the risks that come with bringing liquids into the white space.

Categories

Related Articles

Top Stories