How Data Hoarding Is the New Threat to Privacy and Climate Change

Big Tech needs to get better at energy efficiency

Photo: Sergii Iaremenko/Science Photo Library/Getty Images

As machine learning and other data-intensive algorithms proliferate, more organizations are hoarding data in hopes of alchemizing it into something valuable. From spy agencies to network infrastructure providers, data collection is part and parcel of the digital economy. The best data can be combined with clever algorithms to do incredible things — but digital hoarding and computationally-intensive workloads have externalities too.

The electrical costs — and therefore the environmental impacts — of computation are both extraordinary and growing. Modern machine learning (ML) models are a prime example. They require an enormous amount of energy in order to process mountains of data. The computational costs of training ML models have been growing exponentially since 2012, with a doubling period of 18 months, according to OpenAI. In recent months, similar studies have shown that the electrical costs of cryptocurrency and video streaming are also significant and growing.

Producing this electricity creates literal exhaust in most cases — there are precious few server farms running on 100% renewable energy — and with climate change looming large, it’s time we acknowledge the environmental impact of computation. Just like wrapping every little thing in a plastic bag is, some of our CPU usage is frivolous and wasteful.

Computer science and engineering experts have been complaining about this for years. Some point out that we went to the moon with only 4kb of RAM. Others detail how slow and bloated modern software is. Jonathan Blow went so far as to warn about the impending collapse of the entire software engineering discipline due to intergenerational knowledge loss.

Most of the time this argument is positioned in terms of engineering elitism. Its supporters nostalgically harken back to a time when it really meant something to be a software engineer. They scold beginners for not knowing better while flaunting their beautiful hair, tinged with the silvery gray of experience. Despite the condescension, they’re not completely wrong.

As computers got faster and faster, computer programs actually got slower. End-users didn’t notice because the slower programs still ran fast on the faster computers. As a result, many developers rarely have to focus on using memory or CPU cycles efficiently. Our incredible CPUs can run even relatively inefficient code fast enough for most users. Tools and programming languages that prioritize the developer’s time over CPU and memory efficiency have become the norm. AWS and other cloud services epitomize this tradeoff — why spend weeks of development time optimizing the code when Amazon can just automatically turn on a few more servers when we need them.

“More efficient is better,” just doesn’t motivate me the same way as, “we should do our part to conserve electricity, since climate change is an existential threat to humanity.”

There is nothing wrong with professionals trying to hold an industry to high standards. But I do wish the pro-efficiency crowd would use a more persuasive tactic than tautological scolding. Maybe it’s just me, but “more efficient is better,” just doesn’t motivate me the same way as, “we should do our part to conserve electricity since climate change is an existential threat to humanity.” It’s not just about the inefficient electrical use either. The data we generate is itself a kind of digital pollutant — a new kind of trash for the information age.

Some data is a waste product in the same way that junk mail is a waste product. How many computational resources are dedicated to the zillions of spam emails sent every day? How much bandwidth is dedicated to ads sitting unclicked in your sidebar? Increasingly, records of nearly every digital transaction — no matter how trivial — are transmitted to a data center and stored. It may seem hyperbolic to harp on a few wasted bits, but this is a serious problem.

Consider this: Loading Twitter requires about 6mb of data.