Written by James Orme Wed 12 Jun 2019

Training AI is one of the most energy-intensive workloads, and it is still largely executed on-premise

Researchers at the University of Amherst have claimed that AI model training can produce over five times more carbon dioxide than a car does in a lifetime.

In a paper, the researchers looked at the financial and environmental costs of training popular models for natural language processing (NLP), a branch of AI that investigates how to make human language intelligible to systems, the basis for today’s language translation and chatbots.

The accuracy of NLP tasks has improved markedly in the past five or so years, thanks to advances in techniques and hardware for training deep neural networks, but the energy required to train these highly accurate models has subsequently ballooned.

Energy-intensive hardware accelerators such as Nvidia GPUs are used to train the models, a process that can take weeks or months with processors permanently operating on full utilisation.

The researchers identified one model that had to be trained for 274,000 hours before producing accurate results, a process that generates more than 626,000 pounds of carbon dioxide, the equivalent of almost three hundred (295) return flights from London to New York.

The volumes are calculated based on workloads performed in on-premise data centres, not the cloud, which, the report notes, is relatively more environmentally friendly as cloud data centres source a significant amount of energy from renewable sources.

Due to the pricing model of cloud services, training models on-premises is usually more cost-effective, particularly for the more cutting-edge research ideas that require many hours of training. As a result, the majority of training is still performed in data centres, which may be an affordable choice for researchers, but one that comes at a high price to the environment.

The researchers tested four of the most game-changing NLP models that have emerged in recent years: Transformer, ELMo, BERT, and GPT-2. All were trained for a day while the researchers measured the energy required. They then multiplied this figure by the total number of days the original researchers required over the complete training process, and the carbon dioxide equivalent was calculated based on the average proportion of the gas in US air.

The upshot is that while training a single model is relatively inexpensive, the cost of tuning models can make the process eye-wateringly expensive, the researchers said. The chief culprit was an advanced tuning process called NAS that searches for the best neural network architecture.

“Our experiments suggest that it would be beneficial to directly compare different models to perform a cost-benefit (accuracy) analysis,” the researchers said. “To address this, when proposing a model that is meant to be re-trained for downstream use, such as retraining on a new domain or fine-tuning on a new task, authors should report training time and computational resources required, as well as model sensitivity to hyperparameters.”

The researchers said a cost-effective solution might be for governments to invest in a cloud purpose-built for academics, enabling the pooling of IT.

“A government-funded academic compute cloud would provide equitable access to all researchers,” the researchers said.