AI ambition means little without the people to run it

The UK may be talking confidently about sovereign AI and compute capacity, but Matt Hawkins, CEO and founder of CUDO Compute, believes those ambitions will fall short unless the country addresses a growing shortage of specialist infrastructure talent.

I have been building and operating data centres since 2000. In that time, the industry has seen cloud, virtualisation, hyperscale, and edge, but none of those shifts has changed the skills profile as quickly as AI infrastructure has in the past three years.

Everyone talks about the AI skills gap. What they usually mean is the lack of developers, data scientists, prompt engineers, or regulators. Of course, these matter, but they sit at the top of the stack. The real bottleneck is lower down, in the physical and operational layer that makes AI possible, where infrastructure expertise is essential. Without land, power, and compute, the rest does not exist.

GPU-dense infrastructure at scale is not simply an extension of traditional enterprise IT. It is a different discipline. Running thousands of GPUs as a single system requires expertise in high-performance networking such as InfiniBand, and newer architectures like Spectrum-X. It requires an understanding of how AI workloads behave when distributed across clusters. It also requires operating racks drawing 100kW or more, with liquid cooling systems that often bear little resemblance to the 8–12kW environments most data centre engineers grew up with.

The issue is that the UK does not have a deep bench of engineers with that experience, and neither does Europe.

There is a misconception that this is purely a numbers problem; that we need tens of thousands of new infrastructure engineers. The reality is more nuanced. At the infrastructure layer, teams are small, and a highly skilled group can operate very large GPU environments. Across the UK and Europe, we are realistically talking about hundreds to a few thousand people capable of running clusters at this density and scale. That is precisely why the gap is so serious at a time when demand is rising exponentially.

It helps to think of AI capability in layers. At the bottom is the infrastructure layer: bare-metal GPU cluster build, high-density power, liquid cooling, specialist networking, and hardware operations. Above that sits orchestration and management, including SLURM, Kubernetes for AI workloads, cluster scheduling, monitoring, and optimisation. At the top is the application layer, where developers and researchers build models and services.

If you cannot staff the bottom layer, the rest does not function. You can train thousands of AI developers, but without stable, well-run infrastructure, they have nothing to build on.

The UK’s situation is further complicated by a trend that has developed over the past decade. For years, high-performance computing engineers maintained the infrastructure of universities and research institutions that ran clusters, tuned networks, and optimised workloads. When parts of the London finance sector realised that similar skills could accelerate trading systems and generate significant revenue, they began recruiting from that pool. The result was that salaries rose, talent moved, and the research base thinned.

HPC engineers are not automatically AI infrastructure engineers, but they are the closest adjacent skill set. With experience in parallel workloads and high-performance networking, they are among the easiest groups to retrain into large-scale GPU cluster operations. So, when that talent migrates elsewhere, the pipeline weakens further.

What we now face is not a generic digital skills shortage. It is a structural gap in a fast-emerging industry that is accelerating at extraordinary speed.

At the same time, policy language often overestimates what already exists. The UK AI Opportunities Action Plan speaks about sovereign compute capacity, and the ambition is welcome. But sovereign AI is not only about where data resides. It is also about who builds and operates the infrastructure. If we rely heavily on importing specialist teams from overseas, control over compute capability becomes more limited.

Other European nations are moving more deliberately. Norway, Finland, and Sweden are pairing renewable energy capacity with national AI infrastructure initiatives, while developing local operational capability alongside physical build-out. They recognise that AI factories require both power and people. Without local engineering capability, sovereign compute becomes harder to define in any meaningful sense.

Europe is well positioned from an energy perspective, especially in Eastern European locations such as Romania. Its renewable mix of wind, solar, and hydro is particularly strong. That matters because AI infrastructure at scale is power-intensive, and long-term investment decisions increasingly depend on sustainable supply. But renewable capacity alone does not create competitive advantage. Skilled operators matter just as much.

In practice, the talent market is already tight. Organisations building AI infrastructure are drawing from a very small pool of engineers, including those who have worked in research institutions and specialist environments managing tens of thousands of GPUs in production. This is not a broad labour market, but a concentrated competition for a limited number of individuals with deep operational experience.

In many cases, hardware availability or data centre space is not the primary constraint. The constraint is assembling teams that can deploy and operate high-density GPU clusters safely, efficiently, and at pace. There is no established training pathway for InfiniBand at scale or next-generation AI networking, and there are no widely recognised apprenticeships in GPU operations. The skills framework has not caught up with the technology or the pace of demand.

The solution does not necessarily require grand declarations or sweeping policy shifts. It requires targeted action. AI infrastructure operations should be formally recognised within the UK digital skills framework. DCMS and industry bodies could co-fund specialist apprenticeship schemes focused on GPU cluster build and operations. Universities with existing HPC expertise could partner with operators to create conversion pathways from research computing into commercial AI infrastructure.

This is not about creating a vast new workforce. It is about developing a small, highly capable sovereign capability that underpins everything above it. If the UK wants to lead in AI, it must look beyond models and software. Right now, the risk is not a lack of ambition, but a shortage of execution capability. AI runs on power and hardware, and those systems do not operate themselves.