The other thing to make it work on a scale that the world would really be interested in is to automate and miniaturize. All the technologies exist — they’re all commercially available. But they’re not all in one place, and they’re not designed to work with each other as such.

If you wanted to do it properly you’d invest in the site, you’d have DNA synthesis at the site, you’d have the storage there, you’d have the reading back in one place, and you’d miniaturize it all. You’d have micro-fluidics to do what is currently lab science — even to the level of having robots to do the filing of the test tubes onto shelves. Robots are used in magnetic tape archive centers now, and you’d just want a smaller version of the same.

How similar is what you’ve done to what is involved in today’s gene-sequencing systems, which read and store the proteins in a DNA molecule?

The sequencing, or reading it back, that we did is exactly the same. We designed it that way. We designed it so that it would work in the standard protocols that we and our laboratory collaborators are familiar with, day in day out. It is really exactly the same process. We use an Illumina sequencing machine.

The writing of the information is a technology I’m a little bit less familiar with. But Agilent Technologies, whom we worked with, is one of the world leaders in developing this, and it is, I believe, very much like an inkjet printing system. But you’re not using colored dyes on paper — you’re using chemical solutions that include in them the nucleotides, the basis of DNA, fired very accurately onto a glass slide so that each little spot on the slide you build up is a separate sequence.

Is there a category of information you were most interested in archiving?

The inspiration for the project came through the issues we’re having to deal with at the European Bioinfomatics Institute, where many of the authors work. We’re responsible for creating and archiving and maintaining and providing to the world over the Internet some of the major biological databases: genome sequence databases, protein structure databases and others.

And we have a constant management headache. On the one hand, it’s our duty to archive that information and serve it live over the Internet, but it’s increasing exponentially, and as you might imagine, our budgets are not increasing exponentially. And so we have for a number of years have had headaches, such as “Can we afford that many hard drives?” and “Can we afford to run them?” and “What are we going to do if we can’t?”