The Mouse Genomes Project uses next generation sequencing technologies to sequence the genomes of key laboratory mouse strains. The project consists of two arms:

Short-read sequencing of many laboratory mouse strains and identification of sequence variation (SNPs, short insertions and deletions, and larger structural variations) relative to the C57BL/6J mouse reference genome.

De novo genome assembly and strain specific gene annotation of the most highly used strains.

Sequence variation

We and our collaborators have used short-read sequencing to identify SNPs, indels, and structural variations relative to the C57BL/6J mouse reference genome. The strains that have been sequenced and are in our variation catalog are:

129P2/OlaHsd 129S1/SvImJ 129S5SvEvBrd A/J AKR/J BALB/cJ BTBR BUB/BnJ C3H/HeH C3H/HeJ C57BL/10J C57BL/6NJ C57BR/cdJ C57L/J C58/J CAST/EiJ CBA/J DBA/1J DBA/2J FVB/NJ I/LnJ KK/HiJ LEWES/EiJ LP/J MOLF/EiJ NOD/ShiLtJ NZB/B1NJ NZO/HlLtJ NZW/LacJ PWK/PhJ RF/J SEA/GnJ SPRET/EiJ ST/bJ WSB/EiJ ZALENDE/EiJ

The sample accession codes are listed here. The sequence variation can be queried via our query tool. For bulk download, the sequencing reads are available in BAM format from our ftp site and the variations are available in VCF format on our ftp site. All of the variation data has been published and can be used without restriction. The primary citation for the resource is:

Mouse genomic variation and its effect on phenotypes and gene regulation. Keane TM, Goodstadt L, Danecek P, White MA, Wong K et al. Nature 2011;477;7364;289-94 PUBMED: 21921910; PMC: 3276836; DOI: 10.1038/nature10413

Assembled Genomes

We have produced draft de novo genome assemblies for and strain specific gene annotation for 16 laboratory and wild derived strains. The genome assemblies and annotation is now available via the Ensembl genome browser and the UCSC genome browser.

NOTE: These assembled chromosomes are released as unpublished, preliminary and incomplete sequences and as such they have not yet been submitted to the accessioned in the public genome sequence repositories (INSDC). The assembled sequences will be fully accessioned in public repositories at the time of publication. These data are released in accordance with the Fort Lauderdale agreement and Toronto agreements. As producers of these data we reserve the right to be the first to publish a genome-wide analysis of the data we have generated. The pre-publication data that we release are embargoed for publication except for analyses of single chromosomes in single strains or single gene loci across multiple strains. We strongly encourage researchers to contact us (mousegenomes@sanger.ac.uk) if there are any queries about referencing or publishing analysis based on pre-publication data. We expect to accession and publish the genome sequences and strain specific gene annotation in mid-late 2016.