Ed Ansett, Founder and Chairman at i3 Solutions, explores why AI is so resource hungry within data centres.
As of the end of 2023, any forecast of how much energy will be required by generative AI is inexact.
Headlines tend towards guesstimates of “5x, 10x, 30x power needed for AI” and “Enough power to run 100,000s of homes” etc. Meanwhile, reports in specialist publications such as the data centre press talk of power densities rising to 50kW or 100kW per rack.
Why is generative AI so resource hungry? What moves are being made to calculate its potential energy cost and carbon footprint? Or as one research paper puts it, what is the “huge computational cost of training these behemoths”? Today, much of this information is not readily available.
Analysts have forecast their own estimates for specific workload scenarios (see below), but with few disclosed numbers from the cloud hyperscalers at the forefront of model building, there is very little hard data to go on at this time.
Where analysis has been conducted, the carbon cost of AI model building from training to inference has produced some sobering figures. According to a report in the Harvard Business Review, “researchers have argued that training a ‘single large language deep learning model’ such as OpenAI’s GPT-4 or Google’s PaLM is estimated to use around 300 tons of CO2…
“Other researchers calculated that training a medium-sized generative AI model using a technique called ‘neural architecture search’ used electricity and energy consumption equivalent to 626,000 tons of CO2 emissions.”
So, what’s going on to make AI so power hungry?
Is it the data set, i.e. volume of data? The number of parameters used? The transformer model? The encoding, decoding and fine tuning? The processing time? The answer is of course a combination of all of the above.
It is often said that GenAI Large Language Models (LLMs) and Natural Language Processing (NLP) require large amounts of training data. However, measured in terms of traditional data storage, this is not actually the case.
For example, ChatGPT used www.commoncrawl.com data. Commoncrawl says of itself that it is the primary training corpus in every LLM and that it supplied 82% of raw tokens used to train GPT-3: “We make wholesale extraction, transformation and analysis of open web data accessible to researchers… Over 250 billion pages spanning 16 years. 3–5 billion new pages added each month.”
It is thought that ChatGPT-3 was trained on 45 TB of Commoncrawl plaintext, filtered down to 570 GB of text data. It is hosted on AWS for free as its contribution to Open Source AI data.
But storage volumes, the billions of web pages or data tokens that are scraped from the web, Wikipedia and elsewhere then encoded, decoded and fine-tuned to train ChatGPT and other models, should have no major impact on a data centre. Similarly, the terabytes or petabytes of data needed to train a text to speech, text to image or text to video model should put no extraordinary strain on the power and cooling systems in a data centre built for hosting IT equipment storing and processing hundreds or thousands of petabytes of data.
An example of a text to image model is LAION (Large Scale AI Open Network) – a German AI model with billions of images. One of its models, known as LAION 400m, is a 10 TB web data set. Another, LAION5B has 5.85 billion clip filtered text image pairs.
One reason that training data volumes remain a manageable size is that it’s been the fashion amongst the majority of AI model builders to use Pre-Training Models (PTMs), instead of search models trained from scratch. Two examples of PTMs that are becoming familiar are Bidirectional Encoder Representations from Transformers (BERT) and the Generative Pre-trained Transformer (GPT) series – as in ChatGPT.
Another measurement of AI training of interest to data centre operators are parameters.
AI parameters are used by generative AI models during training. The greater the number of parameters, the greater the accuracy of the prediction of the desired outcome. ChatGPT-3 was built on 175 billion parameters. But for AI, the number of parameters is already rising rapidly. WU Dao, a Chinese LLM first version used 1.75 trillion parameters. WU Dao, as well as being an LLM is also providing text to image and text to video. Expect the numbers to continue to grow.
With no hard data available, it is reasonable to surmise that the computational power required to run a model with 1.7 trillion parameters is going to be significant. As we move into more AI video generation, the data volumes and number of parameters used in models will surge.
Transformers are a type of neural network architecture developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence. Transformer layers rely on loops so that where the input data moves into one transformer layer, the data is looped back to its previous layer and out to the next layer. Such layers improve the predictive output of what comes next. It helps improve speech recognition, text-to-speech transformation, etc.
How much is enough power?
A report by S&P Global titled Power of AI: Wild predictions of power demand from AI put industry on edge quotes several sources: “Regarding US power demand, it’s really hard to quantify how much demand is needed for things like ChatGPT,” David Groarke, managing director at consultant Indigo Advisory Group, said in a recent phone interview. “In terms of macro numbers, by 2030 AI could account for 3% to 4% of global power demand. Google said right now AI is representing 10% to 15% of their power use or 2.3 TWh annually.”
S&P Global continues: “Academic research conducted by Alex de Vries, a PhD candidate at the VU Amsterdam School of Business and Economics [cites] research by semiconductor analysis firm SemiAnalysis. In a commentary published Oct. 10 in the journal Joule, [cited by de Vries] it is estimated that using generative AI such as ChatGPT in each Google search would require more than 500,000 of Nvidia’s A100 HGX servers, totaling 4.1 million graphics processing units, or GPUs. At a power demand of 6.5 kW per server, that would result in daily electricity consumption of 80 GWh and annual consumption of 29.2 TWh.”
A calculation of the actual power used to train AI models was offered by RI.SE – the Research Institute of Sweden. It says: “Training a super-large language model like GPT-4, with 1.7 trillion parameters and using 13 trillion tokens (word snippets), is a substantial undertaking. OpenAI has revealed that it cost them $100 million and took 100 days, utilising 25,000 NVIDIA A100 GPUs. Servers with these GPUs use about 6.5 kW each, resulting in an estimated 50 GWh of energy usage during training.”
This is important because the energy used by AI is rapidly becoming a topic of public discussion.
Data centres are already on the map and ecologically focused organisations are taking note. According to the site 8billiontrees, “There are no published estimates as of yet for the AI industry’s total footprint, and the field of AI is exploding so rapidly that an accurate number would be nearly impossible to obtain. Looking at the carbon emissions from individual AI models is the gold standard at this time… The majority of the energy is dedicated to powering and cooling the hyperscale data centres, where all the computation occurs.”
As we wait for the numbers to emerge for past and existing power use for ML and AI, what is clear is that it is once models get into production and use, we will be in the exabyte and exaflop scale of computation. For data centre power and cooling, it is then that things become really interesting and more challenging.