Today perhaps more than ever, data is ephemeral. Despite Stephen Hawking's late-in-life revelation that information can never truly be destroyed, it can absolutely disappear from public access without leaving a trace.

It’s not just analogue data, either. Just as books go out of print, websites can drop offline, taking with them the wealth of knowledge, opinions, and facts they contain. (You won't find the complete herb archives of old Deadspin on that site, for instance.) And in an era where updates to stories or songs or short-form videos happen with the ease of a click, edits happen and often leave no indication of what came before. There is an entire generation of adults who are unaware that a certain firefight in the Mos Eisley Cantina was a cold-blooded murder, for instance.

View more

So on any given day, 19-year-old Peter Hanrahan now spends his evenings bingeing on chart-topping radio shows from the 1960s. A student from the North of England, he recently started collecting episodes of Top of the Pops—a British chart music show that ran between 1964 and 2006—after seeing the 2019 Tarantino flick, Once Upon a Time in Hollywood.

"I was searching for TOTP episodes as I found that there was a severe lack of them available on YouTube, the BBC iPlayer, or any other radio shows,” he tells Ars. “But I wanted to experience what it would have been like back then and searching because of how atmospheric the radio was in Once Upon a Time in Hollywood. It's been another way to discover music from that era."

If Hanrahan merely wanted to experience more ‘60s British chart-toppers, of course, he could have simply run to Spotify. But he wants the experience of TV as it was recorded back in the day—including live studio audiences, lip sync controversies, and alleged sex offenders.


Naturally, YouTube does have many old episodes, but the BBC has tried taking down ones featuring Jimmy Savile or Gary Glitter, for instance. Today it’s far from a complete TOTP library with only a fraction of the episodes Hanrahan is looking for accessible on the platform. YouTube is also quick to respond to takedown notices, and episodes that are currently there one day can disappear the next.

His next stop is archive.org, the venerable non-profit library that boasts a tremendous 411 billion archived Web pages, 23 million books, 5.5 million movies, and a variety of other data. Often they will have what Hanrahan needs, but if not, his next stop is an obscure corner of reddit, where it is just possible that someone, somewhere, will have a copy saved.

It has taken Hanrahan a long time to find and obtain them, but his work, trawling the edges of the Internet and connecting with real people, is finally paying off. In his first year as a self-confessed hoarder, Hanrahan had collected more than a terabyte of data.


This impermanence of information, of course, goes far beyond old British radio. And luckily for future generations, the itch to seek it out, collect it, and store it goes beyond Hanrahan, too. It’s a sentiment currently driving thousands of individuals to band together online in the communal pursuit of archiving old media of all sorts. This ain’t the grant-and-partnerships-funded well-coordinated operation of the Internet Archive; it’s the individual-obsession-driven r/Datahoarder.

There’s a subreddit for everything

In 2020, the r/Datahoarder community on reddit is almost 200,000 members strong, with around 1,000 or so idling or posting in the subreddit at any time. The communal purpose here is exactly what it sounds like: these amateur archivists set out to collect and capture data and to preserve it for record, reference, and future reading. Often, the goal is to retain this information both online and off, through physical media or terabytes of personal hard drives and storage. In a way, you can think of r/Datahoarder like thousands of haphazard individual Internet Archives—though each member tends to have a few specific niche areas of focus.

On r/Datahoarder, you’ll find people storing data on everything from YouTube videos to game install discs. One person was even planning to copy all Australia-based websites even as the country burned in the worst wildfires in history. The post was deleted after it was pointed out that the physical servers for Australian websites are located outside the country. They’re safe for now—phew.


Some users archive every website they visit or service they use, and the gamut of media includes virtually everything: movies, music, and porn are all popular.

And for future historians, every tweet, every livestream, every TV and news show of the recent and ongoing Hong Kong democracy movement has been squirreled away by a few dedicated users. Already it's proving useful to at least one academic who visited r/DataHoarder seeking research material for their Sociology master's thesis on the Hong Kong protests.

Any hardware is welcome. While many users boast huge storage racks of expensive equipment, even humble Raspberry Pis are routinely kitted out with oversized drives and employed as real-time reddit-scrapers. That embarrassing 3am post about how you really need to get back with your ex? You may have deleted it within seconds of posting, but it's almost guaranteed that there are multiple copies in private archives—available to your ex on request.


1990’s-era mass storage devices such as the Iomega Zip Drive occasionally float to the surface of the sub, as their owners rediscover them from a cupboard under the stairs, prompting discussion on drivers, recovery methods, file formats, and readability.

The desire to save information for posterity seems to be almost universal but manifests in different ways according to each hoarder's own interest. Scroll through the boards and you'll find archived websites offering customization for Windows 98 machines and novelty cursors. You'll find users on a mission to preserve the entire Internet of a single country at a given point in time. You'll find users whose particular obsession is satellite weather forecasts for Japan, or silent movies.

As you might guess based on a collection of highly motivated and obsessed tech users, r/Datahoarder started first as a single IRC chat channel on freenode. Eventually, the community transitioned to the still-in-occasional-use r/datahoarders, with r/datahoarder being brought into existence four years ago. There is also a separate exchange subreddit, r/DHExchange, where members attempt to fill gaps in their collections.


Discussion these days is typically highly technical, largely revolving around efficient means of storing or hoarding vast quantities of data gleaned from online and elsewhere. Users want to get advice on hard drive arrays running into the hundreds of terabytes, mass storage options in the cloud, and the astonishing costs associated with archiving otherwise forgotten older media like broadcasts, music, journals, and webpages.

Hanrahan didn’t get involved out of his love of the 1960s musical zeitgeist—old British music acts are only the latest archival effort he’s undertaking. In real life, Hanrahan has 12 drawers of color-coordinated Lego bricks he uses frequently and an extensive vinyl collection, which includes everything from the original The Good, the Bad, and the Ugly soundtrack to music from Red Dead Redemption II. Perhaps unsurprisingly, he also maintains a large digital games library.

"It started out as me compiling together stuff that I think is relatively hard to find, and just some cool stuff I find, like old commercials and TV intros like ABC's," he said.


As a small and whimsical fish in the data hoarding pool, Hanrahan’s storage isn't extensive but is still considerably more than what most users would have on their home systems. His storage capacity is 6TB, with 3TB given over to backups. He spends an additional £100 (roughly $130) on two 1TB drives each time he starts to run out of space. He even keeps additional drives containing his most valued data at another family member's house and updates his hoard yearly.

A brief history of archiving impulses

The urge to store rare or useful recordings and information has been going on for as long as humans have had the means at their disposal. The first archives of written material started appearing at around 3500 BC—not long after the invention of writing, and the Great Library of Alexandria was founded with the aim of acquiring and hoarding the best and most authoritative copies of every piece of work ever produced, employing scribes to hand copy onto the finest parchment available—the ancient equivalent of 8K UltraHD blu ray rips.

It wasn't until the 1970s, with the phenomenal success of the compact cassette tape, that amateur archiving of popular live media became possible. Teenagers in their bedrooms would record live radio shows as they aired with the latest pop songs from pirate radio stations. By 1974, Billboard magazine reported that over 40 percent of all age groups recorded live shows from the radio, with a corresponding drop in the number of prerecorded tapes being purchased. Home taping is killing the music industry? This is where it started. Tapes were recorded and recorded again, before being condemned to disposal or a purgatory of eternal storage in a slowly yellowing plastic case, or at the back of a kitchen drawer.


The advent of Betamax and VHS soon gave hoarders a new tool. Live and pre-recorded TV shows and movies became available to watch on demand from the users' own personal libraries. As with cassette tapes, most recorded shows were later recorded over to make room for the next episode of The Bob Newhart Show or All in the Family. What most people had in mind was not a permanent archive—it was the convenience of being able to watch or listen to the latest installment of a favorite soap when it suited them.

But as VCRs gave way to DVD players, then to DVDRs, TiVo boxes, and eventually the streaming landscape we know and love today, VHS tapes suffered the same fate as cassettes. Broadcast TV, like radio, has largely been lost to the mists of time unless the creators and rights holders put in the effort to create and securely store backups.

For instance, Doctor Who is one of British television's most successful exports, and at its peak popularity in 1982, the show was being watched by a global audience of 98 million people. Today, the fandom is obsessive—poring over the tiniest plot details, stockpiling episodes, and arguing over which of the Doctor’s 13 incarnations was the greatest.


But between its initial broadcast in 1967 and 1978, the BBC routinely deleted its programming after it had been broadcast in the belief that there was no practical value to keeping copies. Nine years of beloved Doctor Who episodes are missing. Some clips survive and occasionally, a full episode will turn up, courtesy of a foreign network that found the original two-inch tape in a box down the side of the couch, but most of Doctor Who's earliest broadcasts are gone for good.



Do we really need everything?

In the specific example above, the Doctor Who rescue effort is underway, and the BBC archives are unlikely to disappear any time soon. But some r/Datahoarder users are worried about the impermanence of other types of network television, its archives, and the Internet as a whole.

Take Reddit user Cwtard. He’s worried that politics and censorship will prevent the people of the future from easily accessing the facts of today. If all that survives are news opinion shows in streaming service archives, for instance, future viewers will see only a distorted and one-sided vision of the past.


"I collect news because it is in the most danger. It is a record of what we were being led to believe as well as a record of what we were allowed to hear," he told Ars. "If there is anything that globalists, corporations, and politicians want scrubbed from the Internet—in my opinion, it is the news."

Cwtard started archiving the news in 2008 and only more recently discovered r/Datahoarder. It has become a virtual venue where he can keep an eye out for broadcasts to flesh out his incomplete collection. To him, the Internet is an impermanent place, which could vanish at any moment, and Cwtard needs the material on his servers, in his physical possession. He sees it as an obligation to ensure that a true record of the present and the past survives into the future.

Currently, Cwtard is on the lookout for old CBS Evening News broadcasts, the NBC Today Show, Hoda and Jenna, CBS Sunday Morning, Face the Nation, and 60 Minutes, as well as copies or scans of old newspapers.


"There's definitely a wider duty when you see what's coming down the pipe. At best the Internet will be subscription based with only the rich having access currently enjoyed by everyone. At worst it will be completely sanitized of anything deemed dangerous or ‘wrongthink,’" he says. “Given the geopolitical climate these days, there's a real possibility that an event could shut down the Internet completely—at least until TV 2.0 is ready to go online. In this event, you want to be able to save as much history as possible because when it comes back on—only authorized history will be allowed, in my opinion."

Cwtard isn’t wrong. Even the Internet Archive—a hugely respected institution on r/Datahoarder—is under threat. In 2019, a dispute over audiobooks threatened to take the site offline across the whole of Russia. Lawsuits can happen at any time in any part of the world, and the monolithic archive.org could be legally blocked by ISPs, its treasure trove buried forever.

Do I really need a server closet? Yes and no

Distrust of cloud computing is a common but not overriding theme among data hoarders. Some do trust their archives and backups to the likes of Google and Amazon; others certainly share Cwtard's view.

"Cloud storage has the same impermanence as the Internet, even less actually,” he says. “You are putting your trust in companies that have proven they can't be trusted. Why surrender your data to a company that holds different morals and values than you?... They are pushing the cloud idea. They want the public to surrender all data to them. I think they have a vision for the world where computers just access the cloud, so there's no reason to own a computer."


Again, the idea of entrusting vast amounts of data to Web giants such as Google is a popular one outside of r/Datahoarder. Sales of desktop rigs with limitless upgradeability and a largely empty case in which to stuff RAID arrays have been in freefall for years, while sales of Google's Chromebook range (which typically offer very limited memory and non-upgradeable onboard storage—along with a free cloud account) have been soaring since the range's launch in 2006. In January 2019, Google disclosed that Chromebooks were being used by 30 million students and educators. By 2023, estimates point to Google shipping 17 million Chromebooks per year.

So although all anyone needs to start out down the r/Datahoarder rabbit hole is a low spec laptop and an account with a few terabytes in the cloud, there are reasons to focus on the physical and local. Google specifically has long had a nasty habit of snooping into its users' business, and its so-called anti-abuse mechanisms can lock access to files that the search giant suspects may be an illicit copy of copyrighted material.

In the United States, anything created after January 1, 1978, whether it’s a drawing, a poem, reference material, or a blog post, has a copyright for the life of the author plus 70 years. So almost everything published to the Web since its inception, or broadcast in any form in the last 100 years, is still legally locked down. What many cloud-reliant data hoarders are doing is technically illegal and can be shut down or deleted by Google without warning. In light of that, on premises storage is the only way to go for users who want to keep an archive that they both own and that is safe from prying eyes or deletion.


Numerous hoarders have been hit by DMCA copyright claims, forcing them to take their publicly available archives offline. One of the more poignant incidents involved u/dunklesToast who had amassed over 200GB of Donald Duck comics in Finnish but was forced offline after receiving a legal notice. A couple of fans had even pledged to learn the language so they could appreciate the monumental effort, while others planned a mass translation effort. All for nought.

Accordingly, it’s not uncommon to see, on the r/DHExchange request subreddit, users asking for rare or banned movies. There are also posts seeking current releases, proposing magazine scans to augment holes in decades-long print runs, and (like Peter) requesting British music shows from the distant past.

Surprisingly, the requests are often met by the people who recorded from live TV in decades long gone—often on obsolete equipment.


"I have a couple of recordings from 1999," reads one reply to yet another TOTP request. "Not the best quality unfortunately; knackered VCR heads when recording and a horrible ghost due to tall buildings nearby, but it’s watchable." The user, AU8830, has uploaded the episodes to a temporary host, each file a hefty 12GB in all of its interlaced and artifact-ridden glory. "Thanks so much man this is epic! You are a legend," reads the reply.

Even if the data, whatever form it takes, is available elsewhere online or in libraries, there is an ever-present awareness on r/Datahoarder that it could vanish at any moment. And what becomes of these personal information caches decades down the line is a clear risk and challenge inherent to this rogue recording approach versus the institutionalization of libraries or The Internet Archive. Hanrahan plans to give his drives to a friend who will “sustain it and store it safely,” he says (and hopes). But Cwtard's relatives and friends lack an appreciation for what he does. "My life's work will be for nothing. Tossed in the garbage like an old Atari found in an attic," he said. "Doesn't mean I don't have an obligation to do it anyway."

More than anything else, that—a sense of obligation to act for some greater informational good—may be the ultimate takeaway for anyone wading periodically into the r/Datahoarder world. Some people do come only for the British music or the Japanese weather or the Microsoft install discs. But the wider duty expressed by Cwtard is certainly echoed periodically by other users.


"A shower thought just hit me today," posted u/mamborambo on the lesser-used r/datahoarders subreddit. "With all the active archiving projects being launched recently to save historical content from Yahoo Groups, Youtube, dying mailing lists, evidence of human rights abuses, etc. etc., the datahoarders' role has been elevated from a nerd with compulsive hoarding tendencies into a champion of free speech and preservationist of history. We now boldly go where the corporate interest fails. Our terabytes are finally put to use for the betterment of mankind. Hopefully none of our home rigs fail, we remember to do our 3-2-1 backup correctly, and most importantly to make our loot accessible, because data is useless if not shared."

And if data is what you currently seek, I have a feeling I know where I can find you an archived copy.

David Rutland is a freelance writer with a background in print journalism. He is a terrible guitar player, and he spends his free time touring the British Isles, off grid, with his caravan and dog. He also writes for cyberpunks.com and runs planetearth.press from a Raspberry Pi behind his couch.


Listing image by MARTIN BUREAU/AFP via Getty Images