This web-site was developed to support our publication:



Neil R Clark, Ruth Dannenfelser, Christopher M Tan, Michael E Komosinski and Avi Ma'ayan

Sets2Networks: network inference from repeated observations of sets

BMC Systems Biology 6, 89 (2012) PMID: 22824380.



Please cite our paper if you are using our algorithm and or tool.

Background

The skeleton of complex systems can be represented as networks where vertices represent entities, and edges represent the relations between these entities. Often it is impossible, or expensive, to determine the network structure by experimental validation of the binary interactions between every vertex pair. It is usually more practical to infer the network from surrogate observations. Network inference is the process by which an underlying network of relations between entities is determined from indirect evidence. While many algorithms have been developed to infer networks from quantitative data, less attention has been paid to methods which infer networks from repeated observations of related sets. This type of data is ubiquitous in the field of systems biology and in other areas of complex systems research, hence such methods would be of great utility and value.



Results

Here we present a general method for network inference from repeated observations of sets of related entities. Given experimental observations of such sets, we infer the underlying network connecting these entities by generating an ensemble of networks consistent with the data. The frequency of occurrence of a given link throughout this ensemble is interpreted as the probability that the link is present in the underlying real network conditioned on the data. Exponential random graphs are used to generate and sample the ensemble of consistent networks, and we take an algorithmic approach to numerically executing the inference method. The effectiveness of the method is demonstrated on synthetic data before employing this inference approach to problems in systems biology and systems pharmacology, as well as to construct a co-authorship collaboration network. We predict direct protein-protein interactions from high-throughput mass-spectrometry proteomics; build networks that connect pluripotency regulators based on ChIP-seq and loss-of-function/gain-of-function followed by expression data; extract a network that connects 53 cancer drugs to each other and to 34 severe adverse events by mining the FDA's Adverse Events Reporting Systems (AERS); and construct a co-authorship network that connects Mount Sinai School of Medicine investigators. The predicted networks and online software to create networks from entity-set libraries are provided online at http://www.maayanlab.net/S2N.



Conclusions

As empirical data about sets of related entities accrues, there are more constraints on possible network realizations that can fit the data; in the language of statistical mechanics, the size of the microstate ensemble shrinks, until the underlying network resolves. The network inference method presented here can be applied to resolve different types of networks in current systems biology and systems pharmacology as well as in other fields of research.

Powerpoint slides describing the project presented by Dr. Neil R. Clark at the SBBQ International Conference at Iguassu, Brazil on 5/23/2012

Powerpoint slides describing the project presented by Professor Avi Ma'ayan at the National Systems Biology Centers's Annual Meeting in Chicago, USA 7/20/2012

Poster describing the project presented by Professor Avi Ma'ayan at the National Systems Biology Centers's Annual Meeting in Chicago, USA 7/20/2012

Workflow of the Algorithm Applied to a Synthetic Network

Original Network

Random Walks (Gene Sets)

Inferred Network



The synthetic network is first converted into gene sets by following a series of random walks. After running Sets2Networks on the file an inferred network is derived which closely resembles the synthetic network.

Download the original synthetic network or the gene sets.

Protein Protein Interactions





Click on the image to view an interactive version of the network.

Download the interactions The highest confidence inferred PPI interactions. White edges are confirmed interactions and dark edges are predictions. Edge weight corresponds to the probability of the prediction.

Drug and Side Effect Interactions





Click on the image to view an interactive version of the network.

Download the interactions Inferred interactions between side effects and drugs using data from the FDA's Adverse Reporting System (AERS). Light brown square nodes are side effects and dark brown circle nodes are drugs. Edges between side effects are colored red, edges between drugs are colored white, and side effect-drug interactions are black.

Mount Sinai Co-Authorship





Click on the image to view an interactive version of the network.

Download the interactions Inferred interactions between researchers at Mount Sinai School of Medicine using co-authorship data from PubMed. Only the latest 5,000 PubMed articles affilated with Mount Sinai School of Medicine were used as input. Predicted edges with scores higher than 0.67 were preserved in the network, giving a sparse snapshot of collaborations.

Stem Cell Networks ChIP-X LoGoF

Consensus



Click on the images to view an interactive version of the network.

Download the ChIP-X LoGoF , and consensus interactions.

The original stem cell data can be found in the ESCAPE database. The ChIP-X network is made of the highest confidence interactions inferred from stem cell ChIP-chip and ChIP-seq experiments. The stem cell data covers 203,192 protein-DNA binding interactions in proximity to the coding regions of 48 ESC transcriptional regulators. Similarly, the LoGoF network is derived from 153,920 stem cell protein-mRNA interactions extracted from loss of function, gain of function studies followed by microarray profiling. The consensus network is inferred from a combination of these two networks.

We applied the S2N algorithm to predict new protein-protein interactions for 50 CORUM complexes. The higher the confidence of the prediction the lighter the color in the left heatmap. The heatmap on the right contains known PPI interactions in the background.