Paul Morrison, HPC/AI Infrastructure Consultant and Venessa Moffat, DCA Advisory Board, explore the potential of the use of artificial intelligence (AI) in data centres.
As data centre operations leaders juggle complex colocation, on-premise, and multi-cloud models, AI presents both a challenge and an opportunity. Adoption requires a strategic mindset as well as a discerning governance to address risks around integration with existing tools and infrastructure, cybersecurity vulnerabilities, and possibly even ethical implications.
However, once implemented, AI also promises to enhance human capabilities and radically transform operations. AI also presents a salient opportunity to enhance sustainability, from optimising power and cooling to forecasting workloads for resource efficiency – key concerns as data centre energy demands escalate.
If implemented carefully, AI can transform data centres from rigid process-led facilities into adaptive ecosystems, ushering in a step-change era of enhanced insight, efficiency, and capability. But humans behind the machines bear responsibility for shaping that future responsibly. AI’s trajectory will follow the principles instilled within its architecture. Leaders face a choice – employ AI merely as a tool for tactical optimisations or embrace it as a collaborator extending human potential more radically.
Revolutionising efficiency and resilience
To date, process-driven approaches to reduce data centre outages have not reduced downtime incidents or severe impacts as much as expected. In fact, the stats are headed in the wrong direction. The Uptime Institute recently noted that over 60% of outages now cost over $100,000, up from 39% in 2019. Outages costing over $1 million also increased from 11% to 15%.
Rather than merely removing humans from the loop, AI presents an opportunity to augment our best capabilities, putting people back in control with enhanced insight and reduced complexity. With proper governance and strategy, AI could succeed where policy-led efforts have fallen short.
For example, machine learning algorithms could analyse historical telemetry, infrastructure topology, and documented failure scenarios to identify risk patterns difficult for human data centre operators to discern in siloed data sets. Operators tapping into these AI-generated insights could then take data-driven, thoughtful actions to strengthen vulnerabilities before outages occur.
By assimilating vast analytical capabilities, AI can optimise workloads, infrastructure, and staff augmentation at new scales. Machine learning will enable predictive maintenance and management, versus reactive approaches.
Specifically, AI could enable advances like:
- Predictive diagnostics prescribed for assets using telemetry analysis, reducing downtime through repairs made before failures, not after
- Workload balancing adapting to live needs, rather than static models, preventing overprovisioning of power and computing
- Intelligent utility grid integration to act as a supply and demand partner for power and excess heat
- Automated regulatory compliance via rapid data processing and documentation, reducing audit preparation time and costs.
- Local optimisation via distributed learning algorithms, improving resilience through increased autonomy at the edge
- Virtual assistants enhancing human team collaboration, amplifying technician productivity, and reducing burnout
- Autonomous infrastructure calibration adjusting dynamically, optimising cooling, power, networking, storage, and chip-level computing in real-time.
Risks require diligent governance
Integrating AI also presents challenges requiring diligent governance. It will be vital to address ethical risks around bias, transparency, and oversight through accountability and impact analysis and manage rapidly evolving cybersecurity vulnerabilities through continuous detection-response adaptation.
There will need to be sizeable investments made in technology, tools, and training to develop in-house AI capabilities responsibly, and AI will need to be carefully integrated with legacy infrastructure, given interdependencies that are often opaque.
Adoption of AI will need to be reasonably paced to build operational maturity in phases, focusing first on constrained use cases. Leaders must mandate rigorous testing and oversight regimes tailored to AI’s complexity.
Advanced applications hold promise
Sophisticated AI techniques have the potential to offer additional transformational advantages. One area of application is in natural language processing (NLP). NLP can be utilised to extract compliance insights from dense regulations and contracts.
Another avenue is predictive telemetry analysis using statistical models tailored to specific asset configurations and failure distributions.
Cybersecurity augmentation could also be utilised, simulating evolving threats to continuously harden defences.
Nobody really knows how the future of AI in the data centre will play out. Whilst the path ahead remains shrouded, leaders can chart a course with care and vision. AI may yet transform rigid data centres into adaptive, resilient ecosystems – if organisations and people evolve alongside it responsibly. With patient governance and strategy, operations leaders can pioneer a new era where AI elevates rather than replaces.