this is fork from official source
This repository contain all the information about the datasets and the models used in the paper Claudio Mirabello, Björn Wallner, rawMSA: End-to-end Deep Learning Makes Protein Sequence Profiles and Feature Extraction obsolete doi: https://doi.org/10.1101/394437 pdf.
- The folder
datasetscontains the lists of proteins used in the 5-fold crossvalidation and the scripts necessary to produce the correct alignments and input files in the correct ".num" format - The folder
scriptscontains the python and bash scripts to run predictions and ensembling from the models - The folder
modelscontains .h5 models for keras/tensorflow for both the CMAP and SS-RSA networks. These models are binary files that might not work on some keras/tensorflow versions. Send us an email if that is the case.
The full hdf5 dataset containing the SS and RSA classes, as well as the MSA inputs to the SS and RSA models, is too large to be kept on git (150 GB approx.) and can be found here: http://duffman.it.liu.se/rawmsa/
Contact: claudio (dot) mirabello [at] liu (dot) se for original authors.