Darren Waters

Technology editor, BBC News website, San Francisco



The project was inspired by work on the Archimedes Palimpsest

The firm's open source team is working on ways to physically transfer huge data sets up to 120 terabytes in size.

"We have started collecting these data sets and shipping them out to other scientists who want them," said Google's Chris DiBona.

Google sends scientists a hard drive system and then copies it before passing it on to other researchers.

Data delays

It hopes that one day the data it helps to swap will be available to the public.

Mr DiBona, open source program manager at Google, said the team was inspired by work done by Microsoft researcher Jim Gray, who delivered copies of the Terraserver mapping data to people around the world.

The one terabyte of image data in the set was too large to send over a computer network.

"I wished people were doing that for biology, genetic research and antiquities research," said Mr DiBona.

The team was spurred into action after speaking to the research group re-constructing The Archimedes Palimpsest, a medieval parchment manuscript containing seven treatises by the Greek scientist.

Because the parchment had been over-written several times by different authors, scientists have been reconstructing the original work by uncovering the layers using advanced imaging techniques.

DiBona: Networks are not big enough to ship terabytes of data

"The networks aren't basically big enough and you don't want to ship the data in this manner, you want to ship it fast.

"You want to ship it sometimes on a hard drive. What if you have these huge data sets - 120 terabytes - how do you get them from point A to point B for these scientists?"

Google has worked with the Archimedes group - and subsequent institutions - to give them a hard drive recording system.

"We have a number of machines about the size of brick blocks, filled with hard drives.

"We send them out to people who copy the data on them and ship them back to us. We dump them on to one of our data systems and ship it out to people."

Google keeps a copy and the data is always in an open format, or in the public domain or perhaps covered by a creative commons license.

The program is currently informal and not open to the general public. Google either approaches bodies that it knows has large data sets or is contacted by scientists themselves.

One of the largest data sets copied and distributed was data from the Hubble telescope - 120 terabytes of data. One terabyte is equivalent to 1,000 gigabytes.

Mr DiBona said he hoped that Google could one day make the data available to the public.

"We have got those systems in development but we are not yet ready to launch to the public. In the mean time we are useful and we need to do this for more groups."

Mr DiBona's team has the job of helping the development of open source projects around the world as well as building internal projects.

"My mission is also to release a lot of code and we've released over a million lines of open source code between web tool kits and other things."

Code curriculum

Google also helps by offering funding to open source projects around the world and by offering engineering time. More than $1.5m was spent last year in donations to projects, said Mr DiBona.

It also runs a program to place student developers with open source teams, called the Summer of Code.

Google ships out hard drives to hold the huge amounts of data

"The founders of Google are passionate about open source. They see Google as a net beneficiary of open source technology.

"They want to make sure that these people whose software we use everyday remain vital."

Google's infrastructure relies on a lot of open source software, including using a version of Linux as the desktop operating system for developers.

"A lot of Googlers come out of this community - we are open source people."

Some have questioned why Google has not done even more - such as releasing an open source operating system or opening up more of its software to the open source community.

Mr Dibona, who is a long-standing Linux evangelist, said: "I am comfortable with where Google is operating. People are often upset and feel we should be releasing more.

"And I agree; I would love to release more. It's more a function of engineering time, than it is a function of desire."