Segway semi-automated genomic annotation

Hoffman MM*, Ernst J*, Steven WP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E, Hardison RC, Dunham I, Kellis M, Noble WS. 2012. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41:827-841 doi: (BibTeX)

The free Segway software package contains a novel method for analyzing multiple tracks of functional genomics data. Our method uses a dynamic Bayesian network (DBN) model, which enables it to analyze the entire genome at 1-bp resolution even in the face of heterogeneous patterns of missing data. This method is the first application of DBN techniques to genome-scale data and the first genomic segmentation method designed for use with the maximum resolution data available from ChIP-seq experiments without downsampling. Segway uses the Graphical Models Toolkit (GMTK) for efficient DBN inference. Our software has extensive documentation and was designed from the outset with external users in mind.

Segmentations

Human chromatin structure

There are two published segmentations of human chromatin structure available.

The regulatory segmentation from the Ensembl Regulatory Build viewable in Ensembl The segmentation from our Nature Methods paper, "Unsupervised pattern discovery in human chromatin structure through genomic segmentation," viewable in the UCSC Genome Browser

Ensembl

The segmentation can be displayed by clicking the "Configure this page" option on the left navigation bar. The segmentations for each cell line can be selected under "Regulatory Features" and under the heading of "Enable/disable all Segmentation features". As an example you can try viewing the segmentations for BRCA2 in hg38.

For more details and instructions see the description of Regulatory Segmentation.

UCSC Genome Browser

The Ensembl Regulatory Build for GRCh38 (hg38) can be viewed from the UCSC Genome Browser. The annotation can also be loaded through the Track Data Hub interface. You can connect "Ensembl Regulatory Build" listed in the Public Hubs directory. After loading the track hub, you can show the "Cell Type Segmentations" supertrack which contains a Segway track for each of 18 cell types.

Annotations for older assemblies can be browsed on the UCSC Genome Browser below:

There is a brief description of the various classes of segment labels.

You can download the segmentation for further analysis. GRCh37 (hg19). NCBI36 (hg18). (~165 MB, gzipped BED). Here are the mnemonic assignments (tab-delimited).

Integrative annotation of chromatin elements

View the segmentation from our Nucleic Acids Research paper, "Integrative annotation of chromatin elements from ENCODE data," in the UCSC Genome Browser: hg19 only. These segmentations are already relabeled so it is not necessary to use a mnemonic assignment file.

Segmentation downloads (hg19)

GM12878 (bed) (bigbed)

H1-Hesc (bed) (bigbed)

HelaS3 (bed) (bigbed)

Hepg2 (bed) (bigbed)

Huvec (bed) (bigbed)

K562 (bed) (bigbed)

Annotation of 164 human cell types and the Segway encyclopedia

View and download annotations and encyclopedia from our submitted manuscript, "A unified encyclopedia of human functional elements through fully automated annotation of 164 human cell types" (preprint).

Documentation

Installation

To install Segway on bioconda:

conda install -c bioconda segway

For additional installation methods, see the "Quick Start" section of the documentation. For more detailed installation instructions read the installation guide.

Currently, segway can run locally or on various cluster systems such as the Sun Grid Engine/Oracle Grid Engine/Open Grid Scheduler and Platform LSF. If you would like to use Segway on another system, please open a ticket in the issue tracker. You can also run Segway on SGE via the Amazon EC2 compute cloud.

Segway is only supported on Linux. Specifically, this means it is not supported on other operating systems such as Mac OS X.

Support

For support of Segway, please write to the segway-users mailing list, rather than writing the authors directly. Using the mailing list will get your question answered more quickly. It also allows us to pool knowledge and reduce getting the same inquiries over and over. Questions sent to the mailing list will receive a higher priority than those sent to us individually.

Alternatively if you wish to ask on a public forum, you can post your message to a Biostars with a segway tag and an expert will be notified to help you.

Specifically, if you want to report a bug or request a feature, please do so using the Segway issue tracker. We are interested in all comments on the package, and the ease of use of installation and documentation.

If you do not want to read discussions about other people's use of Segway, but would like to hear about new releases and other important information, please subscribe to the segway-announce mailing list. Announcements of this nature are sent to both segway-users and segway-announce .

Useful links

Running Segway in the Amazon Compute Cloud by Jay Hesselberth, University of Colorado Denver

Source code

Notes on the segmentation

The underlying signal data for the segmentation presented above is available in bedGraph and bigWig formats (NCBI36/hg18). Use this browser file to load all the bigWigs. We produced these signal files using Wiggler from original data available from the Encode DCC.

We produced the original segmentations for NCBI36. We used liftOver (minMatch=0.99) to convert segmentations to GRCh37, and then filtered out any overlapping regions.

segway-users mailing list