RaptorX-Contact: a software package for protein contact and distance prediction by deep residual neural network.
This package has source code of the deep learning method developed by Xu group for protein contact/distance prediction and distance-based folding. The code and documentation will be improved gradually. Anaconda, Theano and possibly BioPython shall be installed in order to use this package.
The package contains core code used to produce results in the following papers.
- Analysis of distance-based protein structure prediction by deep learning in CASP13. PROTEINS, 2019
- Distance-based protein folding powered by deep learning. PNAS, August 2019
- ComplexContact: a web server for inter-protein contact prediction using deep learning. NAR, May 2018
- Analysis of deep learning methods for blind protein contact prediction in CASP12. PROTEINS, March 2018
- Folding Membrane Proteins by Deep Transfer Learning. Cell Systems, September 2017
- Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS CB, Jan 2017
Some auxiliary code will be gradually uploaded.
The testsets used in the PLoS CB paper and the multiple sequence alignment for the CASP13 hard targets are available at http://raptorx.uchicago.edu/download/ . Two deep models are also available for download at the same site. After login this site, please check out 0README.data4contactPrediction.txt and 0README.models4ContactDistancePrediction.txt for the download of data and models.
Here are a list of input features needed for our deep models:
- primary sequence represented as a string;
- position-specific scoring matrix represented as a L*20 matrix;
- predicted secondary structure confidence score represented as a L*3 matrix;
- predicted solvent accessibility score represented as a L83 matrix; 5) normalized CCMpred matrix;
- Three other 2D matrices for pairwsie relationship generated by alnstats in MetaPSICOV. For a single protein, its features are deposited as a Python dict(). Please check out our testdata for the dict() keywords and exact format. In addition, protein name and sequence length are also needed in the dict(), although they are not used as input features. The input features of all test proteins are saved as a python list of dict() and then packed as a cPickle file.
Contact: Jinbo Xu, jinboxu@gmail.com