Skip to content

mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations

License

Notifications You must be signed in to change notification settings

nhattruongpham/mACPpred2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mACPpred2

Standalone program for the paper "mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations"

stars forks license DOI

IntroductionInstallationGetting StartedCitationReferences

Introduction

This repository provides the standalone program that was added to the mACPpred 2.0 web server at https://balalab-skku.org/mACPpred2/. The baseline and final models are available via Zenodo at DOI

Installation

Software requirements

  • Ubuntu 20.04.6 LTS (This source code has been already tested on Ubuntu)
  • CUDA 11.7 (with GPU suport)
  • cuDNN 8.6.0.163 (with GPU support)
  • Python 3.9

Creating conda environment

conda create -n mACPpred2 python=3.9.12
conda activate mACPpred2

Installing TensorFlow with CUDA support

conda install -c conda-forge cudatoolkit=11.7.0
python -m pip install nvidia-cudnn-cu11==8.6.0.163 --no-cache-dir
python -m pip install tensorflow==2.11.* --no-cache-dir
python -m pip install chardet --no-cache-dir
conda install anaconda::numpy-base
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Installing bio-embeddings[1] and re-installing PyTorch with CUDA support

python -m pip install --upgrade pip setuptools wheel --no-cache-dir
python -m pip install gensim==3.8 --use-pep517 --no-cache-dir
python -m pip install bio-embeddings[seqvec] --no-cache-dir
python -m pip install scipy==1.10.1 --no-cache-dir
python -m pip install protobuf==3.20.* --no-cache-dir
python -m pip install bio-embeddings[all] --no-cache-dir
python -m pip uninstall numpy
python -m pip install numpy==1.26.0 --no-cache-dir
conda install anaconda::numpy-base
python -m pip uninstall torch
python -m pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117 --no-cache-dir

Installing required specific packages

python -m pip install peptidy==0.0.1 --no-cache-dir
python -m pip install protlearn==0.0.3 --no-cache-dir
python -m pip install catboost==1.2 lightgbm==3.3.5 scikit-learn==0.24.2 xgboost==0.82 --no-cache-dir

Getting started

Cloning this repository

git clone https://github.com/nhattruongpham/mACPpred2.git
cd mACPpred2

Downloading basline and final models

  • Please download the baseline and final models via Zenodo at DOI
  • For the baseline models, please extract and put all *.pkl files into the models/baseline_models folder.
  • For the final models, please please extract and put all *.h5 files into models/final_models folder.

Running prediction

Usage

CUDA_VISIBLE_DEVICES=<GPU_NUMBER> python predictor.py --input_file <PATH_TO_INPUT_FILE> --output_file <PATH_TO_OUTPUT_FILE>

Example

CUDA_VISIBLE_DEVICES=0 python predictor.py --input_file examples/test.fasta --output_file result.csv

Citation

If you use this code or part of it, please cite the following papers:

Main

@article{sangaraju2024macppred,
  title={mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations},
  author={Sangaraju, Vinoth Kumar and Pham, Nhat Truong and Wei, Leyi and Yu, Xue and Manavalan, Balachandran},
  journal={Journal of Molecular Biology},
  volume={436},
  number={17},
  pages={168687},
  year={2024},
  publisher={Elsevier}
}

Zenodo

@software{sangaraju_2024_11350064,
  author       = {Sangaraju, Vinoth Kumar and
                  Pham, Nhat Truong and
                  Manavalan, Balachandran},
  title        = {{mACPpred 2.0: Stacked deep learning for anticancer 
                   peptide prediction with integrated spatial and
                   probabilistic feature representations}},
  month        = may,
  year         = 2024,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.11350064},
  url          = {https://doi.org/10.5281/zenodo.11350064}
}

References

[1] Dallago, C., Schütze, K., Heinzinger, M., Olenyi, T., Littmann, M., Lu, A. X., Yang, K. K., Min, S., Yoon, S., Morton, J. T., & Rost, B. (2021). Learned embeddings from deep learning to visualize and predict protein sets. Current Protocols, 1, e113. DOI
[2] Özçelik, R., van Weesep, L., de Ruiter, S., & Grisoni, F. (2024). peptidy: A light-weight Python library for peptide representation in machine learning. DOI
[3] Dorfer, T. (2021). protlearn: A Python package for extracting protein sequence features. (v0.0.3 on Mar 24, 2021) URL: https://github.com/tadorfer/protlearn.

About

mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages