Internetting for research: How do I find genomic data to download?

If you would ask me this before the internet times, my answer would be simply: don’t download, it is too slow — get a disk instead, but before the internet I would hardly have any suggestion on where to find data.

Today, internet speeds are fast, and data is everywhere — you just have to find it.

And this is where you will find that signposting of data is extremely helpful. Depending on what category of data you are after, a general internet search may be helpful to an extent, but you are much more likely to reach your desired result if you search on a repository specific to the type of data that you are after.

Below you will find the content of Table 1. A list of repositories where researchers can download or upload genomic data. From our publication in PLoS Biology: DNAdigest and Repositive: Connecting the World of Genomic Data, Kovalevskaya et al

To make your search even easier, we are indexing all the data sources of raw sequencing data on the free Repositive platform: http://repositive.io

What other data sources do you find useful for your research in cancer or rare diseases? Let me know in the comments and together we can expand this list.

Cheers,

-Fiona-

dbGaP

Raw sequence data & phenotypic data

Database of Genotypes and Phenotypes, developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype.

http://www.ncbi.nlm.nih.gov/gap

dbVar

Variant data

Database of genomic structural variation — it contains insertions, deletions, duplications, inversions, multinucleotide substitutions, mobile element insertions, translocations, and complex chromosomal rearrangements

http://www.ncbi.nlm.nih.gov/dbvar

dbSNP

Variant data

Database of single nucleotide polymorphisms (SNPs) and multiple small-scale variations that include insertions/deletions, microsatellites, and non- polymorphic variants

http://www.ncbi.nlm.nih.gov/snp

GEO

Raw sequencing data

Public functional genomics data repository supporting MIAME-compliant data submissions. Tools are provided to help users query and download experiments and curated gene expression profiles.

http://www.ncbi.nlm.nih.gov/geo/

Sequence Read Archive (SRA)

Raw sequencing data

Stores raw sequencing data and alignment information from high-throughput sequencing platforms.

http://www.ncbi.nlm.nih.gov/sra

ClinVar

Variant data

Aggregates information about genomic variation and its relationship to human health.

http://www.ncbi.nlm.nih.gov/clinvar/

The European Genome-phenome Archive (EGA)

Raw sequence data & phenotypic data

Allows you to explore datasets from genomic studies, provided by a range of data providers

https://www.ebi.ac.uk/ega/

The European Nucleotide Archive (ENA)

Raw sequencing data

A comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.

http://www.ebi.ac.uk/ena

The European Variation Archive

Variant data

An open-access database of all types of genetic variation data from all species.

http://www.ebi.ac.uk/eva/

(EVA) ArrayExpress

Raw sequencing data

Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.

https://www.ebi.ac.uk/arrayexpress/

DNA data bank of Japan (DDBJ)

Raw sequencing data

Collects nucleotide sequence data as a member of INSDC and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.

https://www.ddbj.nig.ac.jp

Japanese Genotype- phenotype Archive (JGA)

Raw sequencing data

A service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects. The JGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers.

https://trace.ddbj.nig.ac.jp/jga/index_e.html

Catalogue of somatic mutation in cancer (COSMIC)

Variant data

Stores and displays somatic mutation information and related details and contains information relating to human cancers. There are two types of data in COSMIC: Expert manual curation data and systematic screen data.

http://cancer.sanger.ac.uk/cosmic

DECIPHER

Variant data & phenotypic data

Database contains data from >17800 patients who have given consent for broad data-sharing. Used by the clinical community to share and compare phenotypic and genotypic data.

https://decipher.sanger.ac.uk

Figshare

Raw sequencing data

A repository where users can make all of their research outputs available in a citable, shareable and discoverable manner

http://figshare.com

Dryad

Raw sequencing data

A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes.

http://datadryad.org

LOVD

Variant data

A free, flexible web-based open source database developed designed to collect and display variants in the DNA sequence.

http://www.lovd.nl/3.0/home

GigaDB

Raw sequencing data

Associated with the journal GigaScience, contains discoverable, trackable, and citable datasets that are available for public download and use.

http://gigadb.org

The Autism Genetic Resource Exchange (AGRE)

Variant data & phenotypic data

A repository of biomaterials and phenotypic and genotypic data to aid research on autism spectrum disorders.

http://agre.autismspeaks.org

Genomes unzipped (GNZ)

Raw sequencing data

A collaborative project aiming to provide genetic testing customers with the knowledge and tools they need to make the most of their own genetic data. As part of the project members are taking commercial genetic tests and making the raw data publicly available for others to download, analyse and reuse.

http://genomesunzipped.org

OpenSNP

Raw sequencing data

Allows induviduals to publish their genetic test results, find others with similar genetic variations, learn more about their results, get the latest primary literature on their variations and help scientists find new associations.

https://opensnp.org

—

In my day job I run a charity (DNAdigest) and a company (Repositive) where everything we do is about making efficient use of research data to have the most positive impact for research in health and disease.

This blog post was reblogged from my post on Medium(https://medium.com/@glyn_dk/internetting-for-research-how-do-i-find-genomic-data-to-download-bdf46ce15123# .gd4sw32p6)