ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape.
The following notebooks can be used to explore the basic functionality of proteinsolver.
Other notebooks in the notebooks/ directory show how to perform more extensive validations of the networks and how to train new networks.
Docker images with all required dependencies are provided at: https://gitlab.com/ostrokach/proteinsolver/container_registry.
To evaluate a proteinsolver network from a Jupyter notebook, we can run the following:
docker run -it --rm -p 8000:8000 registry.gitlab.com/ostrokach/proteinsolver:v0.1.25 jupyter notebook --ip 0.0.0.0 --port 8000We recommend installing proteinsolver into a clean conda environment using the following command:
conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolverFirst, use conda to install proteinsolver into a new conda environment. This will also install all dependencies.
conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolverSecond, run pip install --editable . inside the root directory of this package. This will force Python to use the development version of our code.
cd path/to/proteinsolver
pip install --editable .Pre-trained models can be downloaded using wget by running the following command in the root folder of the proteinsolver repository:
wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/"For an example of how to use a pretrained ProteinSolver models in downstream applications (such as mutation ΔΔG prediction), see the elaspic/elaspic2 repository, and in particular the src/elaspic2/plugins/proteinsolver module.
Data used to train and validate the "proteinsolver" network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from http://deep-protein-gen.data.proteinsolver.org/:
wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/"The generation of the training and validation datasets was carried out in our predecessor project: ostrokach/protein-adjacency-net.
DATAPKG_DATA_DIR- Location of training and validation data.
- Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and flexible protein design using deep graph neural networks. Cell Systems (2020); 11: 1–10. doi: 10.1016/j.cels.2020.08.016