When it comes to cooling your facility, could you be overspending? According to Anuraag Saxena, optimisation manager at EkkoSense, you probably are.
When we recently surveyed some 133 data centre halls, our granular thermal analysis found that the average data centre cooling utilisation level was just 40%. This result was only slightly better than the 34% figure we revealed from our data centre cooling research back in 2017.
With the renewed focus on initiatives to address the climate crisis, notably with the USA’s recent decision to rejoin the Paris climate accord, there’s increased pressure on organisations and their high energy users to make serious carbon reductions. It’s essential that IT operations do everything they can to deliver immediate carbon reductions to help organisations deliver on public net zero commitments.
Given this agenda, why are so many organisations still so inefficient in their data centre cooling? With cooling utilisation at 40%, it’s clear that facilities are overspending on their cooling energy costs. We estimate that the industry could make cumulative savings of over £1.2 billion by optimising their data centre cooling performance.
That’s a potential worldwide emissions reduction of over 3.3m tonnes CO2-eq. per annum – equivalent to the energy needed to power around one million UK homes for a year!
We know operations teams do an amazing job at keeping their facilities running, particularly given huge increases in compute demand, so why has the issue with poor cooling utilisation developed? From an optimisation perspective, there are three factors at play here.
- An over-reliance on often outdated design specifications
- A determination to deliver against data centre SLAs that aren’t necessarily applicable today
- An absence of true granular visibility into data centre performance.
Too much adherence to historic design specs?
Often the problem goes back to the initial data centre design specification – which could be anything up to 10 or 15 years old. Perhaps the original specification was for a maximum 350 kilowatts capacity, and that has always been the cooling capacity applied.
However, these legacy decisions often aren’t always directly communicated to the facilities team – who are busy dealing with the data centre every day. And of course, things always change – compute loads, data centre management teams, facilities engineers all evolve – with the result that the gap between the original design and today’s reality can quickly expand.
For example, we’ve seen sites that were cooling for an original design capacity but running at just a quarter of that load. And while this is an obvious example of over-cooling, nobody ever really saw this as a problem – as their data centre (operating at a significantly low average IT rack inlet temperature) wasn’t ever going to breach critical thermal SLAs.
Inflexibility of many data centre SLAs
Too many data centres are still locked into rigid SLAs around uptime and that means their focus and priorities remain heavily centred around risk avoidance. Moreover, significant number of data centres govern these SLAs against only few sensors, which again may be in an incorrect location which is not in line with the ASHRAE standard.
Until now this has removed any real incentive to optimise data centre cooling energy consumption. While uptime is obviously the prime driver for facilities teams, SLAs that were defined at the data centre design stage are becoming increasingly less relevant as time moves on.
For example, today’s data centre infrastructure is much more efficient and can run at higher temperatures than equipment that was specified five or ten years ago. At the same time, many data centres run on much tighter margins now than those enjoyed by previous operations teams.
Can today’s facilities teams really afford to keep on adding expensive cooling hardware, especially if they’re going to have to keep on paying for it over the next five plus years?
Lack of granular insight into real-time data centre performance
However perhaps the biggest barrier to effective data centre cooling utilisation is poor visibility – actually being able to see what’s going on across your site/estate in real-time.
Unfortunately, the reality for many operators is that they don’t have access to the tools that can help them to make smart data centre performance choices. For example, it may be that you could run your cooling system more efficiently with a different control settings – but how would you know this?
That’s why it’s important for data centre operations teams to be able to gather and visualise thermal, power/energy and space data at a much more granular level – ideally down to individual IT racks and data hall CRAC/CRAH units at the minimum.
We’ve already seen how adopting this kind of software optimisation approach can help organisations to reduce their cooling energy usage by 30% – while still ensuring that their risk is reduced by optimising sites to 100% rack-level ASHRAE-recommended thermal compliance.