Whole human genome sequence data sets provided by Complete Genomics, containing 69 standard, non-diseased samples as well as two matched tumor and normal sample pairs.

Size: 50.4TB

Last Updated: 2013-06-04 15:30:00 UTC

Access Instructions

All public data sets are available on both commodity internet connections and high speed StarLight/Internet2 connections. We recommend using rsync or UDR to download the data.

Downloading with UDR (UDT enabled rsync)

UDR is a wrapper around rsync that enables rsync to use the high performance UDT network protocol, which can greatly improve download speeds, especially over high speed networks. Once installed, the only change is placing the udr command before the same rsync command you typically use to download the data. UDR is open source and under active development, the most recent version is available on github. At the moment, UDR is not enabled on the transfer node. The functionality should return shortly. Use rsync in the meantime.

List the contents of Complete Genomics Public Data:

Using rsync: rsync publicdata.opensciencedatacloud.org::ark:/31807/osdc-919d4bed/

Using udr: udr rsync publicdata.opensciencedatacloud.org::ark:/31807/osdc-919d4bed/

Download/synchronize Complete Genomics Public Data:

Using rsync: rsync -avzuP publicdata.opensciencedatacloud.org::ark:/31807/osdc-919d4bed/ /path/to/local_copy

Using udr: udr rsync -avzuP publicdata.opensciencedatacloud.org::ark:/31807/osdc-919d4bed/ /path/to/local_copy

Download an individual file from Complete Genomics Public Data: