Open source has won the present, but what about the future?

In the March 2018 issue of Linux Journal, I wrote an article taking a look back over the previous decade. An astonishing amount has changed in such a short time. But as I pointed out, perhaps that's not surprising, as ten years represents an appreciable portion of the entire history of Linux and (to a lesser extent) of the GNU project, which began in August 1991 and September 1983, respectively. Those dates makes the launch of Linux Journal in April 1994 an extremely bold and far-sighted move, and something worth celebrating on its 25th anniversary.

For me, the year 1994 was also memorable for other reasons. It marked the start of a weekly column that I wrote about the internet in business—one of the first to do so. In total, I produced 413 "Getting Wired" columns, the last one appearing in April 2003. I first mentioned Linux in February 1995. Thereafter, free software and (later) open source become an increasingly important thread running through the columns—the word "Linux" appeared 663 times in total. Reflecting on the dotcom meltdown that recently had taken place, which wiped out thousands of companies and billions of dollars, here's what I wrote in my last Getting Wired column:

The true internet did not die: it simply moved back into the labs and bedrooms where it had first arisen. For the real internet revolution was driven not by share options, but by sharing—specifically, the sharing of free software. ... The ideas behind free software—and hence those that powered the heady early days of the internet—are so ineluctable, that even as powerful a company as Microsoft is being forced to adopt them. Indeed, I predict that within the next five years Microsoft will follow in the footsteps of IBM to become a fervent supporter of open source, and hence the ultimate symbol of the triumph of the internet spirit.

You can read that final column online on the Computer Weekly site, where it originally appeared. It's one of several hundred Getting Wired columns still available there. But the archive for some years is incomplete, and in any case, it goes back only to 2000. That means five years' worth—around 250 columns—are no longer accessible to the general public (I naturally still have my own original files).

Even if all my Getting Wired columns were available on the Computer Weekly site, there's no guarantee they would always be available. In the future, the site might be redesigned and links to the files removed. The files themselves might be deleted as ancient history, no longer of interest. The title might be closed down, and its articles simply dumped. So whatever is available today is not certain to exist tomorrow.

The Internet Archive was set up in part to address the problem of older web pages being lost. It aims to take snapshots of the internet as it evolves, to record and store the fleeting moments of our digital culture. Already it preserves billions of web pages that no longer are available, acting as a bulwark against time and forgetting. It's an incredible, irreplaceable resource, which receives no official funding from governments, so I urge you to donate what you can—you never know when you will need it.

The Internet Archive's Wayback Machine, with its 347 billion web pages already saved, is a marvel. But it's not perfect. In particular, it does not seem to have any backup copies of my Getting Wired column. No great loss, perhaps, but it is indicative of the partial nature of its holdings. More generally, it raises two important questions. First: who should be preserving our digital heritage? And second: what should be kept? Although some digital artefacts are being preserved in the US, UK and elsewhere, the resources are piecemeal, reactive and generally without any proper strategy for long-term preservation. Contrast that with what is happening in Norway, as described in this ZDNet story last year:

In the far north of Norway, near the Arctic Circle, experts at the National Library of Norway's (NLN) secure storage facility are in the process of implementing an astonishing plan. They aim to digitize everything ever published in Norway: books, newspapers, manuscripts, posters, photos, movies, broadcasts, and maps, as well as all websites on the Norwegian .no domain. Their work has been going on for the past 12 years and will take 30 years to complete by current estimations.

The article reports that 540,000 books and more than two million newspapers already have been digitized. The collection at the end of last year stood at around eight petabytes of data, growing by between five and ten terabytes a day. The headline speaks of "a 1,000-year archive". Although that may sound like a long time, in historical terms, it's not. We have more than a million tablets containing cuneiform inscriptions dating back two or even three millennia. Egyptian hieroglyphs have survived just as long, as have the oracle bone scripts in China. At this stage in our civilization, we should be thinking about how to preserve today's information for tens or even hundreds of thousands of years. One project is already tackling that challenge:

The Long Now Foundation was established in 01996 to develop the Clock and Library projects, as well as to become the seed of a very long-term cultural institution. The Long Now Foundation hopes to provide a counterpoint to today's accelerating culture and help make long-term thinking more common. We hope to foster responsibility in the framework of the next 10,000 years.

The Long Now's Library project is "of the deep future, for the deep future". There are already three tools: the Rosetta Disk, the Long Viewer and the Long Server. The Long Viewer is "an open source Timeline tool", while the Long Server is "the over-arching program for Long Now's digital continuity software projects". Sadly, there are no details yet about what form the Long Server will take. However, the Long Server website does mention that the team is "now working on a file format conversion project called The Format Exchange".

File format conversion is one of the central challenges of storing digital material for thousands of years. Our own short experience shows how quickly formats are replaced, resulting in old files that are hard to read. Now imagine the difficulty of reading a digital file whose bits are perfectly preserved but written using a file format from ten thousand years ago.

Fortunately, it's obvious what form the solution to this central problem must take. The only hope of reading ancient file formats is if they are completely open. That way, readers and file conversion tools can be built with relative ease. Similarly, any programs that are created for projects looking at very long-term preservation of digital material must be open source, so that they too can be examined in detail, modified and built upon. Even after just 25 years, we know that free software has won. But it is also highly likely that its success will be very long term, assuming human culture survives with any continuity. Open source—and only open source—is eternal.