For the initiated, there is a magic and foolproof trick to improving the state-of-the-art in machine learning. So simple my 5-year old nephew could have invented it while stacking his Lego: add more blocks. The trick is to make your model larger by adding more "Transformer blocks".

But like any magic trick, this trick is cursed. It makes your model suck more energy. You need to feed it more data. Give it more power.

I have been wondering about the implications of this ongoing continuation of larger and larger models. Will we just continue to add more computing resources and more GPU's like a one-upping game between tech titans of "who has the largest"?

Mine is bigger than yours.

The current reigning champion of this size-measuring contest is Nvidia, the largest manufacturer of Machine Learning GPU's in the world. Based on the well-known Transformer architecture, they didn't look very far for their naming inspiration. I can only imagine their puberal giggles when naming this ... beast... this all-crusher... the biggest of all the Transformers... They adeptly named it ... MEGATRON.

MegatronLM is a language model that can be used for many tasks in Natural Language Processing.

Given MegatronLM's humongous need for powerful servers and the huge energy required for training this model, surely you need a small nuclear fusion reactor to train this 8.3 billion parameter model?

As "one of those weird climate people", I wanted to find out: Is Artificial Intelligence destroying the climate by using all that electricity and cooling?

The numbers of this beast

According to the original paper, to train MegatronLM, it took 309.657,6 GPU hours. At a peak power consumption of 250 Wh, that is 309.657,6 hours * 250 W or a massive 77.414.400Wh. To produce this amount of power, you would emit about 54,7 metric tons of CO2e. *

That's equivalent to 218.578 km with an average passenger vehicle. To train one model. Once.

Will MegatronLM destroy the climate? Or is there hope...?

Four times yes.

1) This model is an interesting academic result, but it is hardly practical. You could even argue it is just Nvidia showing off their hardware. Every ML engineer that wants to apply NLP models in a real setting would run away faster from Megatron than from an infected, projectile-sneezing COVID19-patient. To give you an idea why: consider that to rent the above mentioned 309.657,6 GPU hours on Microsoft Azure, it would cost you €3.2240/hour or a total of EUR 998,334.17 in a Pay-as-you-go setting. The model is impractical and crazily expensive except for the handful of enterprises that have their own clouds.

2) The cloud is clean. Microsoft has announced they will be carbon negative by 2050. Google has been using 100% renewable energy for years. Of the big three public cloud vendors, only Amazon AWS is seriously behind.

3) There is a whole new wave coming regarding training M.L. models in a sustainable and compute-efficient way. This includes a dedicated EMNLP challenge SustaiNLP, benchmarks on computational budgets, tools for tracking energy consumption of your model and start-ups researching model training efficiency like fast.ai.

MegatronLM is not even the largest model anymore, nor is it the best performing. It was dethroned by Microsoft's T-NLG. At 17 billion parameters, that's twice the size of MegatronLM. But the cost to train it, and the energy consumed was actually smaller by orders of magnitude, because of a lot of clever and innovative innovations. In fact, they trained it on "only one" DGX-2, which is a special deep learning computer server of 16 GPU's linked in a very efficient way.

4) Machine Learning models are being used to help analyze, predict and mitigate the effects of climate change. The European Commission considers it a key part of the EU Green Deal, with a EUR 25.000 prize totals in the EU Datathon challenge. For an overview, see Tackling Climate Change with Machine Learning.

TL;DR

Some laboratories experiment with pushing for ever-larger models. But I believe there is hope.

I strongly believe that the net impact of Artificial Intelligence is overwhelmingly positive for the environment.

Comments and discussion welcome.

Appendix

* actual consumption will be less because GPU's are not 100% occupied all the time, but also more if you factor in cooling and the consumption of other components.

Edit: A previous version of this article incorrectly confused miles and kilometers.

I am looking to improve my writing skills, so any feedback is very much appreciated.