Sustainability continues to hit the headlines, as the pressures of social and regulatory compliance mount. With 4% of global carbon emissions attributed to ICT services (set to grow to 14% by 2040), there is little doubt that the data centre industry must act.
At Keysource, we are seeing a growing number of customers committing to sustainability/net zero targets and asking us to help them understand the journey to achieving their IT and data centre related goals.
Know what you’ve got
Our starting point is simple – organisations need to get visibility of the utilisation of their compute in order to then optimise it. This means that deployment of DCIM (Data Centre Infrastructure Management) is the critical first step as it enables organisations to see the compute, storage and networking they operate, where it is and crucially what the hardware is.
Recent enhancements in software technology now allow us to interrogate IT (through management interfaces and industry standard monitoring protocols such is IPMI) to understand the actual utilisation of IT, and this can present a number of opportunities for optimisation which can lead to a more sustainable solution. This could, for example, show that current servers are under utilised and that these could be consolidated, or that a technology refresh is need to replacing equipment with new, more efficient hardware.
One recent project identified several dormant or ghost servers that are still drawing significant power. We see this as a quick win and our experience in this area shows that most organisations could cut the number of servers considerably or carry out much more compute with the existing investment. Our findings show that on average, compute is only about 16% utilised and we are regularly able to make this as high as 60% once we have the data.
It is also worth noting that every server contains 23 of the world’s 30 most critical raw materials, including tantalum, cobalt, gold, neodymium, antimony, which are in severe short supply, and have an incredible social impact when removed.
There is something else here too. If we can start to get visibility of utilisation, we can start to drive accountability for it. With optimisation statistics on individual servers, we can start to make platform managers and business segments accountable for their compute. Couple this with information on the power draw and we can start to monetise the cost of the inefficiencies.
So, once you have created visibility to enable optimised IT, the next stage is to leverage this information to keep driving efficiency. Currently, it is recognised as good practice to ensure that our data centres are scalable and have good levels of resilience so they are often designed for ‘day 2 load’ – meaning we can accommodate the ever expanding IT should we need to.
This however, poses as many challenges as it solves. ‘Day 1 loads’ are invariably much lower than the capacity of the infrastructure, often resulting in a ‘low load’ operation. Low load often means that we run with a greater level of resilience than we need (N+3 instead of design N+1), so we have more M&E powered than we really need. Another function of low load is that M&E systems are often not as efficient as designed and for example might be overcooling with low return air temperatures and reduced free cooling.
So how is this now relevant? Well, the technologies used to understand IT utilisation (including intelligent BMS) can also provide us with power draw information, giving us a clear picture of our IT load. Don’t we now know our IT load vs available M&E capacity? If we leverage intelligent BMS or control technologies, then we can write dynamic programs to match our operating M&E to the requirements of the IT load. In short, we can shutdown M&E we don’t need to be running! This will reduce cost and is likely to increase the efficiency of the remaining M&E, while the dynamic controls continue to monitor for IT load changes and automate bringing M&E systems in and out of service.
Bridging the gap
This approach could be particularly interesting to customers who have IT residing in colocation facilities. Understandably colocation providers are risk adverse, SLA focused and happy to provide for ‘day 2 loads’, since the end-user is ultimately paying for the power, there is little appetite to introduce risk associated to ‘turning stuff off’.
Creating predictive algorithms to identify when systems are likely to fail
We can also use these technologies to react quickly to failures in M&E, including predictive algorithms (such as system run hours vs MTBF) to identify when systems are likely to fail and also to pre-empt operations, such as pre-running chillers when the ambient temperatures begin to rise in spring.
These predictive technologies can in turn also contribute to sustainability goals. We can leverage technology to understand M&E equipment run hours and adapt our planned maintenance programs to service equipment when it is needed, not simply based on a calendar year. Consider your Scope 3 emissions (which include your supply chain) and the savings that could be made against unnecessary travel and the replacement of consumables you don’t yet need. The opportunities are huge.
It is estimated that the data centre industry is going to consume 1/5 of the global energy by 2025, so energy efficiency and sustainability will continue to be one of the most (if not the most) important topics in our industry.
DCIM can play a vital role in helping us to make informed choices by collecting and leveraging data and enabling the technology to drive value and reduce power usage and carbon. However, the software won’t do this alone as it needs to be part of a broader consistent approach. We need to address these challenges through better design, engineering, operation and wider mutual collaboration across the whole lifecycle.
We are rightly in the sights for wider reporting and regulatory compliance and for an industry that is already significantly, if indirectly regulated, this is a challenge. The stakes are high and we need to act now.