Why AI efficiency needs a smarter benchmark

As AI adoption continues to accelerate, the need for a better benchmark for advances in rapid computing is becoming more urgent. Jean-Marc Denis, CPO at Submer, argues that the industry needs to look beyond kilowatt-hours and focus instead on useful intelligence per watt.

The need for a new benchmark is even clearer in the context of climate pressures and rising energy demand. Recent reports show that AI-driven data centre electricity consumption has grown by around 12% per year since 2017, more than four times the rate of overall global electricity demand.

A better metric would help organisations avoid mistaking efficient hardware for genuinely efficient AI outcomes, while also reducing carbon impact. It would also give developers a clearer way to pursue cost-effective, scalable improvements. Rather than relying on energy use alone, such as kilowatt-hours (kWh), AI efficiency should be measured against performance, so that power consumption is tied to meaningful output.

AI workloads generally fall into two categories: training and inference. Training is the process of teaching a model using vast datasets and intensive computation to build its capabilities. Inference is the operational phase in which the trained model responds to prompts, generates outputs and performs real-time tasks. While training is highly energy-intensive, inference is becoming the dominant long-term infrastructure challenge because it must operate continuously, at scale and often with low latency close to users. As a result, many emerging efficiency metrics, including intelligence per watt, are increasingly inference-centric.

How to measure AI output

One useful metric is tokens per watt, which measures how many units of text an AI model can generate for each unit of energy consumed. Another is TFLOPS per megawatt-hour (MWh), which assesses how much compute a processor or data centre can deliver for each MWh of electricity used.

However, these metrics still focus mainly on throughput. Intelligence per watt goes further by accounting for the quality and accuracy of the output, measuring how much usable capability is delivered for each watt consumed.

AI infrastructure efficiency, defined as intelligence per watt

The challenge for data centre efficiency is not only the speed of AI adoption, but also the growing expectation that intelligence should be available on demand and close to where data is created, whether on the factory floor or at a hospital bedside. As a result, AI inference is moving beyond centralised clouds and into distributed networks, because sending data to a distant cloud and back can introduce too much delay for real-time applications such as industrial automation, autonomous vehicles and interactive AI services.

Against this backdrop, a new way of assessing AI infrastructure efficiency is gaining attention: the amount of useful intelligence, or task accuracy, delivered for each unit of power consumed. Known as intelligence per watt (IPW), this metric considers how much useful AI output, such as accurate responses, completed tasks or successful inferences, can be delivered for each watt of power consumed. Unlike throughput-focused metrics such as tokens per second, IPW evaluates the relationship between energy consumption and the practical value produced by AI systems.

As AI systems move into a more agentic phase of inference, where models continuously interpret, decide and act in real time, infrastructure requirements are changing fundamentally. This shift is increasing compute demand and placing sustained pressure on energy use, latency and system efficiency across the full stack.

Efficiency across the full infrastructure stack

The need for resource-optimised AI infrastructure has never been greater, but what does it take to build efficient, future-ready systems? IPW helps answer that question by measuring infrastructure efficiency at system scale, recognising that overall intelligence output depends on how efficiently compute, memory, networking and power delivery work together.

Higher IPW allows AI models to achieve the same, or better, performance while using less electricity. This reframes AI from being seen purely as an unsustainable energy burden to being recognised as a system that can support efficiency gains at scale, provided the infrastructure is designed correctly. AI data centres producing higher IPW may be better placed to support inference applications, for example by optimising resource management in smart grids, improving renewable energy generation and reducing waste in industrial processes.

Data centre trade-offs

Most data centres reflect the physics and economics of yesterday’s workloads rather than what global, real-time inference needs. They were built for web, storage and training-style AI workloads, prioritising throughput and utilisation over predictable real-time latency. Their networks are often oversubscribed and concentrated in a few large GPU regions, creating congestion and adding distance from users.

Many existing data centre and telecoms infrastructures were not originally designed for the thermal density, deployment flexibility or operational patterns that large-scale AI inference increasingly requires. While centralised GPU clusters remain essential for training and some inference workloads, the growing demand for real-time AI services is also driving interest in more distributed deployment models and inference-optimised hardware closer to users. These trade-offs are not only architectural and geographic; they are also thermal, which makes cooling strategy an important part of AI infrastructure efficiency.

Maximising IPW through infrastructure

As AI infrastructure shifts from centralised training clusters to high-density, distributed inference, cooling can no longer be treated purely as a facilities consideration; it is becoming a core design constraint that shapes system efficiency. Thermal architecture can directly influence intelligence per watt by determining how much compute can be deployed, where it can operate and how consistently it can scale under sustained inference loads.

The key question is not simply whether one cooling method is “better” than another, but which infrastructure choices maximise usable compute per unit of power and space. Cooling architecture sits alongside silicon, power delivery and workload orchestration as a system-level optimisation lever: better thermal efficiency can support higher rack densities, broader deployment models and more viable edge inference by reducing the performance and location penalties imposed by heat.

In that context, liquid cooling is increasingly being considered as part of the architecture for efficient AI systems – not because the discussion is about liquid versus air in isolation, but because thermal management now helps set the boundary conditions for inference scalability, deployment flexibility and overall intelligence per watt.

The role of edge in IPW

This shift towards efficiency is also influencing where and how AI workloads are deployed. Research shows that deploying smaller, more specialised local AI processing at the edge can reduce energy consumption by up to 60–80% compared to large, general-purpose models that run in central cloud data centres. This decentralisation, which enables AI applications to be smaller, faster and to have a higher IPW, strengthens the case for developing data centres that prioritise efficient AI model architectures and hardware over simply building larger, more resource-intensive systems.

Efficiency in AI infrastructure is not simply a question of centralisation versus edge, nor is it limited to energy use alone. It also depends on how materials, capacity and lifecycle decisions are managed over time. A sustainable data centre is one that delivers the required performance while improving IPW and reducing total cost of ownership (TCO).

While the concept of intelligence per watt is still evolving, the industry will ultimately require standardised methodologies for measuring and comparing it across different AI systems and deployment models. A future paper will explore potential frameworks for calculating IPW more consistently, including how factors such as inference accuracy, latency, utilisation and energy consumption can be incorporated into a practical benchmarking model.

As AI becomes as essential as electricity or the internet, its infrastructure must meet the same standard: delivering more useful intelligence with less energy, less waste and lower cost. In that environment, intelligence per watt could become one of the clearest measures of whether AI infrastructure is truly fit for purpose.