Researchers from three leading British institutions are using BitTorrent to share over 150 GB of unique high-resolution brain scans of unborn babies with colleagues worldwide. Using the popular file-sharing protocol is a "no-brainer," according to a Research Associate, who says that dealing with people's misconceptions toward torrents was one of the biggest challenges.

One of the core pillars of academic research is sharing.

By letting other researchers know what you do, ideas are criticized, improved upon and extended. In today’s digital age, sharing is easier than ever before, especially with help from torrents.

One of the leading scientific projects that has adopted BitTorrent is the developing Human Connectome Project, or dHCP for short. The goal of the project is to map the brain wiring of developing babies in the wombs of their mothers.

To do so, a consortium of researchers with expertise ranging from computer science, to MRI physics and clinical medicine, has teamed up across three British institutions: Imperial College London, King’s College London and the University of Oxford.

The collected data is extremely valuable for the neuroscience community and the project has received mainstream press coverage and financial backing from the European Union Research Council. Not only to build the dataset, but also to share it with researchers around the globe. This is where BitTorrent comes in.

Sharing more than 150 GB of data with researchers all over the world can be quite a challenge. Regular HTTP downloads are not really up to the task, and many other transfer options have a high failure rate.

Baby brain scan (Credit: Developing Human Connectome Project)



This is why Jonathan Passerat-Palmbach, Research Associate Department of Computing Imperial College London, came up with the idea to embrace BitTorrent instead.

“For me, it was a no-brainer from day one that we couldn’t rely on plain old HTTP to make this dataset available. Our first pilot release is 150GB, and I expect the next ones to reach a couple of TB. Torrents seemed like the de facto solution to share this data with the world’s scientific community.” Passerat-Palmbach says.

The researchers opted to go for the Academic Torrents tracker, which specializes in sharing research data. A torrent with the first batch of images was made available there a few weeks ago.

“This initial release contains 3,629 files accounting for 167.20GB of data. While this figure might not appear extremely large at the moment, it will significantly grow as the project aims to make the data of 1,000 subjects available by the time it has completed.”

Torrent of the first dataset



The download numbers are nowhere in the region of an average Hollywood blockbuster, of course. Thus far the tracker has registered just 28 downloads. That said, as a superior and open file-transfer protocol, BitTorrent does aid in critical research that helps researchers to discover more about the development of conditions such as ADHD and autism.

Interestingly, the biggest challenges of implementing the torrent solution were not of a technical nature. Most time and effort went into assuring other team members that this was the right solution.

“I had to push for more than a year for the adoption of torrents within the consortium. While my colleagues could understand the potential of the approach and its technical inputs, they remained skeptical as to the feasibility to implement such a solution within an academic context and its reception by the world community.

“However, when the first dataset was put together, amounting to 150GB, it became obvious all the HTTP and FTP fallback plans would not fit our needs,” Passerat-Palmbach adds.

Baby brain scans (Credit: Developing Human Connectome Project)



When the consortium finally agreed that BitTorrent was an acceptable way to share the data, local IT staff at the university had to give their seal of approval. Imperial College London doesn’t allow torrent traffic to flow freely across the network, so an exception had to be made.

“Torrents are blocked across the wireless and VPN networks at Imperial. Getting an explicit firewall exception created for our seeding machine was not a walk in the park. It was the first time they were faced with such a situation and we were clearly told that it was not to become the rule.”

Then, finally, the data could be shared around the world.

While BitTorrent is probably the most efficient way to share large files, there were other proprietary solutions that could do the same. However, Passerat-Palmbach preferred not to force other researchers to install “proprietary black boxes” on their machines.

Torrents are free and open, which is more in line with the Open Access approach more academics take today.

Looking back, it certainly wasn’t a walk in the park to share the data via BitTorrent. Passerat-Palmbach was frequently confronted with the piracy stigma torrents have amoung many of his peers, even among younger generations.

“Considering how hard it was to convince my colleagues within the project to actually share this dataset using torrents (‘isn’t it illegal?’ and other kinds of misconceptions…), I think there’s still a lot of work to do to demystify the use of torrents with the public.

“I was even surprised to see that these misconceptions spread out not only to more senior scientists but also to junior researchers who I was expecting to be more tech-aware,” Passerat-Palmbach adds.

That said, the hard work is done now and in the months and years ahead the neuroscience community will have access to Petabytes of important data, with help from BitTorrent. That is definitely worth the effort.

Finally, we thought it was fitting to end with Passerat-Palmbach’s “pledge to seed,” which he shared with his peers. Keep on sharing!

—



On the importance of seeding

Dear fellow scientist,

Thank for you very much for the interest you are showing in the dHCP dataset!

Once you start downloading the dataset, you’ll notice that your torrent client mentions a sharing / seeding ratio. It means that as soon as you start downloading the dataset, you become part of our community of sharers and contribute to making the dataset available to other researchers all around the world!

There’s no reason to be scared! It’s perfectly legal as long as you’re allowed to have a copy of the dataset (that’s the bit you need to forward to your lab’s IT staff if they’re blocking your ports).

You’re actually providing a tremendous contribution to dHCP by spreading the data, so thank you again for that!

With your help, we can make sure this data remains available and can be downloaded relatively fast in the future. Over time, the dataset will grow and your contribution will be more and more important so that each and everyone of you can still obtain the data in the smoothest possible way.

We cannot do it without you. By seeding, you’re actually saying “cheers!” to your peers whom you downloaded your data from. So leave your client open and stay tuned!

All this is made possible thanks to the amazing folks at academictorrents and their infrastructure, so kudos academictorrents!

You can learn more about their project here and get some help to get started with torrent downloading here.

Jonathan Passerat-Palmbach

—