When Mohammad Haft-Javaherian, a student at the Massachusetts Institute of Technology, attended MIT’s Green AI Hackathon in January, it was out of curiosity to learn about the capabilities of a new supercomputer cluster being showcased at the event. But what he had planned as a one-hour exploration of a cool new server drew him into a three-day competition to create energy-efficient artificial-intelligence programs.

The experience resulted in a revelation for Haft-Javaherian, who researches the use of AI in healthcare: “The clusters I use every day to build models with the goal of improving healthcare have carbon footprints,” Haft-Javaherian says.

The processors used in the development of artificial intelligence algorithms consume a lot of electricity. And in the past few years, as AI usage has grown, its energy consumption and carbon emissions have become an environmental concern.

“I changed my plan and stayed for the whole hackathon to work on my project with a different objective: to improve my models in terms of energy consumption and efficiency,” says Haft-Javaherian, who walked away with a $1,000 prize from the hackathon. He now considers carbon emission an important factor when developing new AI systems.

But unlike Haft-Javaherian, many developers and researchers overlook or remain oblivious to the environmental costs of their AI projects. In the age of cloud-computing services, developers can rent online servers with dozens of CPUs and strong graphics processors (GPUs) in a matter of minutes and quickly develop powerful artificial intelligence models. And as their computational needs rise, they can add more processors and GPUs with a few clicks (as long as they can foot the bill), not knowing that with every added processor, they’re contributing to the pollution of our green planet.

Why Does AI Consume So Much Energy?

The recent surge in AI’s power consumption is largely caused by the rise in popularity of deep learning, a branch of artificial-intelligence algorithms that depends on processing vast amounts of data. “Modern machine-learning algorithms use deep neural networks, which are very large mathematical models with hundreds of millions—or even billions—of parameters,” says Kate Saenko, associate professor at the Department of Computer Science at Boston University and director of the Computer Vision and Learning Group.

These many parameters enable neural networks to solve complicated problems such as classifying images, recognizing faces and voices, and generating coherent and convincing text. But before they can perform these tasks with optimal accuracy, neural networks need to undergo “training,” which involves tuning their parameters by performing complicated calculations on huge numbers of examples.

“To make matters worse, the network does not learn immediately after seeing the training examples once; it must be shown examples many times before its parameters become good enough to achieve optimal accuracy,” Saenko says.

All this computation requires a lot of electricity. According to a study by researchers at the University of Massachusetts, Amherst, the electricity consumed during the training of a transformer, a type of deep-learning algorithm, can emit more than 626,000 pounds of carbon dioxide—nearly five times the emissions of an average American car. Another study found that AlphaZero, Google’s Go- and chess-playing AI system, generated 192,000 pounds of CO2 during training.

To be fair, not all AI systems are this costly. Transformers are used in a fraction of deep-learning models, mostly in advanced natural-language processing systems such as OpenAI’s GPT-2 and BERT, which was recently integrated into Google’s search engine. And few AI labs have the financial resources to develop and train expensive AI models such as AlphaZero.

Also, after a deep-learning model is trained, using it requires much less power. “For a trained network to make predictions, it needs to look at the input data only once, and it is only one example rather than a whole large database. So inference is much cheaper to do computationally,” Saenko says.

Many deep-learning models can be deployed on smaller devices after being trained on large servers. Many applications of edge AI now run on mobile devices, drones, laptops, and IoT (Internet of Things) devices. But even small deep-learning models consume a lot of energy compared with other software. And given the expansion of deep-learning applications, the cumulative costs of the compute resources being allocated to training neural networks are developing into a problem.

“We’re only starting to appreciate how energy-intensive current AI techniques are. If you consider how rapidly AI is growing, you can see that we're heading in an unsustainable direction,” says John Cohn, IBM Fellow and research scientist with the MIT-IBM Watson AI Lab, who co-led the Green AI hackathon at MIT.

According to one estimate, by 2030, more than 6 percent of the world’s energy may be consumed by data centers. “I don't think it will come to that, though I do think exercises like our hackathon show how creative developers can be when given feedback about the choices they’re making. Their solutions will be far more efficient,” Cohn says.

Creating Energy-Efficient AI Hardware

“CPUs, GPUs, and cloud servers were not designed for AI work. They have been repurposed for it, as a result, are less efficient than processors that were designed specifically for AI work,” says Andrew Feldman, CEO and cofounder of Cerebras Systems. He compares the usage of heavy-duty generic processors for AI to using an 18-wheel-truck to take the kids to soccer practice.

Cerebras is one of a handful of companies that are creating specialized hardware for AI algorithms. Last year, it came out of stealth with the release of the CS-1, a huge processor with 1.2 trillion transistors, 18 gigabytes of on-chip memory, and 400,000 processing cores. Effectively, this allows the CS-1, the largest computer chip ever made, to house an entire deep learning model without the need to communicate with other components.

“When building a chip, it is important to note that communication on-chip is fast and low-power, while communication across chips is slow and very power-hungry,” Feldman says. “By building a very large chip, Cerebras keeps the computation and the communication on a single chip, dramatically reducing overall power consumed. GPUs, on the other hand, cluster many chips together through complex switches. This requires frequent communication off-chip, through switches and back to other chips. This process is slow, inefficient, and very power-hungry.”

The CS-1 uses a tenth of the power and space of a rack of GPUs that would provide the equivalent computation power.

Satori, the new supercomputer that IBM built for MIT and showcased at the Green AI hackathon, has also been designed to perform energy-efficient AI training. Satori was recently rated as one of the world’s greenest supercomputers. “Satori is equipped to give energy/carbon feedback to users, which makes it an excellent ‘laboratory’ for improving the carbon footprint both AI hardware and software,” says IBM’s Cohn.

Cohn also believes that the energy sources used to power AI hardware are just as important. Satori is now housed at the Massachusetts Green High Performance Computing Center (MGHPCC), which is powered almost exclusively by renewable energy.

“We recently calculated the cost of a high workload on Satori at MGHPCC compared to the average supercomputer at a data center using the average mix of energy sources. The results are astounding: One year of running the load on Satori would release as much carbon into the air as is stored in about five fully-grown maple trees. Running the same load on the 'average' machine would release the carbon equivalent of about 280 maple trees,” Cohn says.

Yannis Paschalidis, the Director of Boston University’s Center for Information and Systems Engineering, proposes a better integration of data centers and energy grids, which he describes as “demand-response” models. “The idea is to coordinate with the grid to reduce or increase consumption on-demand, depending on electricity supply and demand. This helps utilities better manage the grid and integrate more renewables into the production mix,” Paschalidis says.

For instance, when renewable energy supplies such as solar and wind power are scarce, data centers can be instructed to reduce consumption by slowing down computation jobs and putting low-priority AI tasks on pause. And when there’s an abundance of renewable energy, the data centers can increase consumption by speeding up computations.

The smart integration of power grids and AI data centers, Paschalidis says, will help manage the intermittency of renewable energy sources while also reducing the need to have too much stand-by capacity in dormant electricity plants.

The Future of Energy-Efficient AI

Scientists and researchers are looking for ways to create AI systems that don’t need huge amounts of data during training. After all, the human brain, which AI scientists try to replicate, uses a fraction of the data and power that current AI systems use.

During this year’s AAAI Conference, Yann LeCun, a deep-learning pioneer, discussed self-supervised learning, deep-learning systems that can learn with much less data. Others, including cognitive scientist Gary Marcus, believe that the way forward is hybrid artificial intelligence, a combination of neural networks and the more classic rule-based approach to AI. Hybrid AI systems have proven to be more data- and energy-efficient than pure neural-network-based systems.

“It's clear that the human brain doesn’t require large amounts of labeled data. We can generalize from relatively few examples and figure out the world using common sense. Thus, 'semi-supervised' or 'unsupervised' learning requires far less data and computation, which leads to both faster computation and less energy use,” Cohn says.