News Warner Logo

News Warner

AI’s ballooning energy consumption puts spotlight on data center efficiency

AI’s ballooning energy consumption puts spotlight on data center efficiency

  • AI’s rapid growth is putting a huge strain on data centers, leading to increased energy consumption and costs.
  • Data centers are becoming energy-hungry giants, using as much electricity as a small city, due to the growing demand for AI models, memory, and cooling systems.
  • The lack of coordination between different parts of the system in data centers can lead to wasted energy and underused resources, with some servers sitting idle while others struggle to keep up.
  • A smarter approach is needed to address this challenge, including designing and managing systems that support AI, recognizing differences among chips in performance, heat tolerance, and energy use, and adapting to changing conditions.
  • Scaling AI growth with intelligence, rather than brute force, is crucial to ensure its benefits are sustainable, and requires collaboration between engineers, software developers, and data center experts to build efficient, scalable, and sustainable infrastructure.

These 'chillers' on the roof of a data center in Germany, seen from above, work to cool the equipment inside the building. AP Photo/Michael Probst

Artificial intelligence is growing fast, and so are the number of computers that power it. Behind the scenes, this rapid growth is putting a huge strain on the data centers that run AI models. These facilities are using more energy than ever.

AI models are getting larger and more complex. Today’s most advanced systems have billions of parameters, the numerical values derived from training data, and run across thousands of computer chips. To keep up, companies have responded by adding more hardware, more chips, more memory and more powerful networks. This brute force approach has helped AI make big leaps, but it’s also created a new challenge: Data centers are becoming energy-hungry giants.

Some tech companies are responding by looking to power data centers on their own with fossil fuel and nuclear power plants. AI energy demand has also spurred efforts to make more efficient computer chips.

I’m a computer engineer and a professor at Georgia Tech who specializes in high-performance computing. I see another path to curbing AI’s energy appetite: Make data centers more resource aware and efficient.

Energy and heat

Modern AI data centers can use as much electricity as a small city. And it’s not just the computing that eats up power. Memory and cooling systems are major contributors, too. As AI models grow, they need more storage and faster access to data, which generates more heat. Also, as the chips become more powerful, removing heat becomes a central challenge.

Small blue and green lights arranged in columns glow behind black mesh screens

Data centers house thousands of interconnected computers.
Alberto Ortega/Europa Press via Getty Images

Cooling isn’t just a technical detail; it’s a major part of the energy bill. Traditional cooling is done with specialized air conditioning systems that remove heat from server racks. New methods like liquid cooling are helping, but they also require careful planning and water management. Without smarter solutions, the energy requirements and costs of AI could become unsustainable.

Even with all this advanced equipment, many data centers aren’t running efficiently. That’s because different parts of the system don’t always talk to each other. For example, scheduling software might not know that a chip is overheating or that a network connection is clogged. As a result, some servers sit idle while others struggle to keep up. This lack of coordination can lead to wasted energy and underused resources.

A smarter way forward

Addressing this challenge requires rethinking how to design and manage the systems that support AI. That means moving away from brute-force scaling and toward smarter, more specialized infrastructure.

Here are three key ideas:

Address variability in hardware. Not all chips are the same. Even within the same generation, chips vary in how fast they operate and how much heat they can tolerate, leading to heterogeneity in both performance and energy efficiency. Computer systems in data centers should recognize differences among chips in performance, heat tolerance and energy use, and adjust accordingly.

Adapt to changing conditions. AI workloads vary over time. For instance, thermal hotspots on chips can trigger the chips to slow down, fluctuating grid supply can cap the peak power that centers can draw, and bursts of data between chips can create congestion in the network that connects them. Systems should be designed to respond in real time to things like temperature, power availability and data traffic.

How data center cooling works.

Break down silos. Engineers who design chips, software and data centers should work together. When these teams collaborate, they can find new ways to save energy and improve performance. To that end, my colleagues, students and I at Georgia Tech’s AI Makerspace, a high-performance AI data center, are exploring these challenges hands-on. We’re working across disciplines, from hardware to software to energy systems, to build and test AI systems that are efficient, scalable and sustainable.

Scaling with intelligence

AI has the potential to transform science, medicine, education and more, but risks hitting limits on performance, energy and cost. The future of AI depends not only on better models, but also on better infrastructure.

To keep AI growing in a way that benefits society, I believe it’s important to shift from scaling by force to scaling with intelligence.

The Conversation

Divya Mahajan owns shares in Google, AMD, Microsoft, and Nvidia. She receives funding from Google and AMD.

link

Q. What is driving the rapid growth of AI’s energy consumption?
A. The growing number of computers powering AI models, leading to increased energy demand.

Q. How are companies responding to the energy-hungry nature of data centers?
A. Some tech companies are exploring alternative power sources such as fossil fuel and nuclear power plants, while others are working on more efficient computer chips.

Q. What is a major contributor to the high energy consumption of modern AI data centers?
A. Memory and cooling systems, which generate heat and require significant amounts of electricity.

Q. Why do many data centers struggle with efficiency despite having advanced equipment?
A. Because different parts of the system don’t always communicate effectively, leading to wasted energy and underused resources.

Q. What is a key idea for addressing the challenge of inefficient data center systems?
A. Breaking down silos between hardware, software, and energy systems engineers to find new ways to save energy and improve performance.

Q. How do AI workloads vary over time, affecting data center efficiency?
A. AI workloads can fluctuate due to thermal hotspots on chips, grid supply limitations, and bursts of data traffic, requiring real-time adjustments.

Q. What is the potential impact of inefficient data centers on the future of AI?
A. If left unchecked, inefficient data centers could limit the growth of AI, hindering its potential to transform various fields such as science, medicine, education, and more.

Q. How can data center cooling systems be improved to reduce energy consumption?
A. New methods like liquid cooling are being explored, but careful planning and water management are also crucial for efficient cooling.

Q. What is the role of coordination in optimizing data center efficiency?
A. Effective coordination between different parts of the system, such as hardware, software, and energy systems, is essential to avoid wasted energy and underused resources.

Q. Why is it important to shift from scaling by force to scaling with intelligence for AI growth?
A. To ensure that AI continues to grow in a way that benefits society, while also addressing performance, energy, and cost limitations.