The world’s first protein database for Machine Learning and AI

2,903 reads

Ladies and gentlemen I am incredibly proud and excited to present the very first public product of Peptone, the Database of Structural Propensities of Proteins.

Database of Structural Propensities of Proteins (dSPP) is the world’s first interactive repository of structural and dynamic features of proteins with seamless integration for leading Machine Learning frameworks, Keras and Tensorflow.

dSPP is based on peer-reviewed research from leading academic institutions around the world involved in Nuclear Magnetic Resonance spectroscopy techniques for protein structure and disorder characterization. dSPP data are derived from solution and solid state Nuclear Magnetic Resonance spectroscopy experiments for 7200+ unrelated proteins studied under physiologically-relevant conditions.

dSPP is a unique source of information for Intrinsically Disordered Proteins (IDPs), which are a challenging class of proteins to study. IDPs are implicated in numerous debilitating human pathologies, including Alzheimer’s, Parkinson’s, prion diseases, molecular basis of cancer, HIV, HSV, HVC, ZIKVR, and many others.

dSPP data can be readily used by experimentalist to gain exclusive insight into structural stability of secondary structure motifs, as well as high throughput computational techniques, which aim to deliver realistic models of medically relevant proteins.

As opposed to binary (logits) secondary structure assignments available in other protein datasets for experimentalists and the machine learning community, dSPP data report on protein structure and local dynamics at the residue level with atomic resolution, as gauged from continuous structural propensity assignment in a range -1.0 to 1.0.

dSPP user interface overview (https://peptone.io/dspp)

dSPP experimental data were collected at physiologically-relevant conditions, rendering them absolutely unique for structure and disorder prediction methods that aim to tackle protein folding and stability in biologically and medically relevant contexts.

dSPP is equipped with intuitive user interface which offers seamless access to relevant decision data, original literature citations, and uniform rendering of Machine Learning data belonging to protein of interest.

Seamless dSPP integration with Keras and Tensorflow machine learning frameworks is achieved via dspp-keras Python package, available for download and setup in under 60 seconds time. Thus, virtually any person with basic understanding of machine learning can start experimenting with protein structure prediction methodology.

dSPP is the first publicly available product by Peptone with automated 14-day updatecycle, made specifically for continuous learning AI applications.

Scientific reference:

Structural Propensity Database Of Proteins. Kamil Tamiola, Matthew Michael Heberling, Jan Domanski. bioRxiv 144840; doi:https://doi.org/10.1101/144840

Availability

Interactive search engine and data rendering: https://peptone.io/dspp

and data rendering: https://peptone.io/dspp Standalone JSON and Python cPickle downloads : https://peptone.io/dspp/download

and Python : https://peptone.io/dspp/download Keras and Tensorflow integration : https://github.com/PeptoneInc/dspp-keras

and : https://github.com/PeptoneInc/dspp-keras from within Terminal: pip install dspp-keras

Acknowledgements

We want to acknowledge Dr. Wenwei Zheng (NIDDK, US), Dr. Ruud Scheek (University of Groningen, NL) and Dr. Xavier Periole (Aarhus University, DK) for insightful comments and editorial suggestions concerning our dSPP paper.

(NIDDK, US), (University of Groningen, NL) and (Aarhus University, DK) for insightful comments and editorial suggestions concerning our dSPP paper. François Chollet of Keras / Google is greatly acknowledged for insightful feedback on database interface and straightforward suggestions concerning Keras integration.

integration. We extend sincere thanks to Alison Lowndes , Carlo Ruiz and Dr. Adam Grzywaczewski , (NVIDIA Corporation) for facilitating collaboration and access to DGX-1 supercomputer.

, and , (NVIDIA Corporation) for facilitating collaboration and access to DGX-1 supercomputer. Jon Wedell (BMRB) is greatly acknowledged for technical support with NMR resonance assignment retrieval from BMRB.

(BMRB) is greatly acknowledged for technical support with NMR resonance assignment retrieval from BMRB. We thank Dr. Frans A.A. Mulder (Aarhus University, DK) and Dr. Predrag Kukic (University of Cambridge, UK) for providing structural ensemble models of MOAG-4.

(Aarhus University, DK) and (University of Cambridge, UK) for providing structural ensemble models of MOAG-4. Lastly, we want to greatly acknowledge Mark Berger (NVIDIA Corporation) for overwhelming support throughout the execution of this project.

Press release

This press release along with the media assets can be downloaded from https://drive.google.com/open?id=0B0VsF9FO3J_OMXljcm1MS3NCRHc

About Peptone

Founded in 2016 (Amsterdam, The Netherlands), Peptone offers state of the art solutions for protein biotechnology via Machine Learning and AI. We transform big data from public and private repositories into powerful predictive models and intuitive tools for protein production, stability, disorder, engineering, and directed evolution experiments, providing our clients with transparent and complementary software that saves time and yields precise research answers.

Tags