ESMDisPred: A Structure-Aware CNN–Transformer Framework for Intrinsically Disordered Protein (IDP) Prediction.
ESMDisPred is a deep learning framework that integrates convolutional and Transformer-based architectures to predict intrinsically disordered regions in proteins. By combining sequence embeddings, evolutionary features, and structural information, ESMDisPred achieves high predictive accuracy and generalization across diverse protein families.
-
ESMDisPred-1
Utilizes sequence-based features from DisPredict3.0 and ESM-1 embeddings. -
ESMDisPred-2
Extends ESMDisPred-1 by incorporating ESM-2 embeddings, providing richer contextual protein representations. -
ESMDisPred-2PDB
Builds upon ESMDisPred-2 by integrating structural-related features derived from PDB data, enhancing structural context awareness. -
ESMDisPred-DNN
A comprehensive CNN–Transformer hybrid model trained using all feature types from DisPredict3.0, ESM-1, ESM-2, and PDB-derived structural descriptors.
This variant captures both local residue patterns (via CNNs) and long-range dependencies (via Transformers), resulting in superior predictive performance.
- OS: Ubuntu 20.04 (tested)
- Python: via pyenv (recommended). Python 3.9–3.10 typically works best with PyTorch.
- Hardware: CPU works; CUDA GPU strongly recommended for speed (CUDA 11+).
- Disk: ≥ 20 GB free (models + features cache).
- Tools (optional): Docker 24+ or Singularity 3.9+.
Tip: If you plan to run ESM models on GPU, ensure
nvidia-driver,nvidia-container-toolkit(for Docker), and a CUDA-enabled PyTorch build are available.
# 1) Get the code
git clone https://github.com/wasicse/ESMDisPred.git
cd ESMDisPred
# 2) Download pre-trained model bundles (large)
./run_downloadLargeModels.shDataset examples are under dataset/. A demo FASTA is provided at example/sample.fasta.
Warning: Running locally without Docker may fail due to missing system libraries (e.g.,
libidn.so.11) or Python package version conflicts (e.g.,scikit-learn,lightgbm). We strongly recommend using Docker or Singularity for consistent and reproducible results. See Run with Docker or Run with Singularity sections below.
# from repo root
./install_dependencies.shThe script will set up Python, packages, and expected folders (including largeModels/).
# from repo root
./run_ESMDisPred.sh "$(pwd)/example/sample.fasta" outputs- Input: path to a FASTA file (may contain one or more sequences)
- Output: results in the
outputs/directory
You can build the image or pull it from the registry.
docker build -t wasicse/esmdispred https://github.com/wasicse/ESMDisPred.git#master
docker build -t wasicse/esmdispred:version2 -f ./Dockerfile2 .docker pull wasicse/esmdispred:version2The helper script mounts your input FASTA, largeModels/, and outputs/ into the container:
# Interactive mode - will prompt for model selection
./run_ESMDisPred_Docker.sh "$(pwd)/example/sample.fasta" outputs
# Non-interactive mode - specify model as 3rd parameter
./run_ESMDisPred_Docker.sh "$(pwd)/example/sample.fasta" outputs 3
./run_ESMDisPred_Docker.sh "$(pwd)/example/sample.fasta" outputs ESMDisPred-2PDB
./run_ESMDisPred_Docker.sh "$(pwd)/example/sample.fasta" outputs allModel options (3rd parameter):
1orESMDisPred-1- DisPredict3.0 + ESM12orESMDisPred-2- DisPredict3.0 + ESM1 + ESM23orESMDisPred-2PDB- DisPredict3.0 + ESM1 + ESM2 + PDB features4orESMDisPred-DNN- CNN–Transformer hybrid5orall- Run ALL models
If the 3rd parameter is omitted, the script will prompt you interactively to select a model.
sudo singularity build ESMDispS.sif ESMDispS.def
sudo singularity run --writable-tmpfs \
-B "$(pwd)/example/sample.fasta:/opt/ESMDisPred/example/sample.fasta" \
-B "$(pwd)/largeModels:/opt/ESMDisPred/largeModels" \
-B "$(pwd)/outputs:/opt/ESMDisPred/outputs:rw" \
ESMDispS.sif
cd /opt/ESMDisPred && ./run_ESMDisPred.sh "$(pwd)/example/sample.fasta" outputssingularity pull esmdispred.sif docker://wasicse/esmdispred:latest
sudo singularity run --writable-tmpfs \
-B "$(pwd)/example/sample.fasta:/opt/ESMDisPred/example/sample.fasta" \
-B "$(pwd)/largeModels:/opt/ESMDisPred/largeModels" \
-B "$(pwd)/outputs:/opt/ESMDisPred/outputs:rw" \
esmdispred.sif
cd /opt/ESMDisPred && ./run_ESMDisPred.sh "$(pwd)/example/sample.fasta" outputsInside outputs/ you’ll find:
PROTEINID.caid— per-residue disorder probabilities (one file per sequence), tab- or space-delimited.timings.csv— wall-clock timings per stage/model.- Subfolders by model variant (e.g.,
ESMDisPred-1/,ESMDisPred-2/, …) when applicable.
File format (*.caid)
# Example (columns may include: residue_index, residue, probability)
1 M 0.034
2 A 0.071
...
If you use ESMDisPred, please cite:
-
Md Wasi Ul Kabir, Ayon Dey, Farzeen Nafees, and Md Tamjidul Hoque. "ESMDisPred: A Structure-Aware CNN-Transformer Architecture for Intrinsically Disordered Protein Prediction." bioRxiv (2026). https://doi.org/10.64898/2026.01.22.701204.
-
Md Wasi Ul Kabir, and Md Tamjidul Hoque. "DisPredict3.0: Prediction of Intrinsically Disordered Regions/Proteins Using Protein Language Model." Applied Mathematics and Computation 472 (July 2024): 128630. https://doi.org/10.1016/j.amc.2024.128630.
BibTeX
@article{Kabir2026ESMDisPred,
author = {Kabir, Md Wasi Ul and Dey, Ayon and Nafees, Farzeen and Hoque, Md Tamjidul},
title = {ESMDisPred: A Structure-Aware CNN-Transformer Architecture for Intrinsically Disordered Protein Prediction},
year = {2026},
doi = {10.64898/2026.01.22.701204},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}
@article{Kabir2024DisPredict3,
title = {DisPredict3.0: Prediction of Intrinsically Disordered Regions/Proteins Using Protein Language Model},
author = {Kabir, Md Wasi Ul and Hoque, Md Tamjidul},
journal = {Applied Mathematics and Computation},
volume = {472},
pages = {128630},
year = {2024},
doi = {10.1016/j.amc.2024.128630}
}Md Wasi Ul Kabir, Md Tamjidul Hoque
Questions/Issues: Md Tamjidul Hoque — thoque@uno.edu