An official implementation of our research paper "SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations".
SynProtX is a deep learning model that integrates large-scale proteomics data, molecular graphs, and chemical fingerprints to predict synergistic effects of anticancer drug combinations. It provides robust performance across tissue-specific and study-specific datasets, enhancing reproducibility and biological relevance in drug synergy prediction.
We use Miniconda to manage Python dependencies in this project. To reproduce our environment, please run the following script in the terminal:
conda env create -f env.yml
conda activate SynProtXDatasets, hyperparameters, and model checkpoints can be downloaded through .
SynProtX allows the prediction of synergistic effects between drug combinations through inference using the SynProtX model. It leverages various tissue-specific and study datasets to make these predictions.
To perform inference, you can run the following command:
python synprotx_inference.py --smi1 "CCOc1ccc2c(c1)N=C(N)N(c3ccc(Cl)cc3)S2" --smi2 "CN1CCC(CC1)Nc2nccc3c2ncn3C" \
--dataset ALMANAC-Breast --cell_line MCF7 --task classification --thr 0.5In this example:
--smi1and--smi2represent the SMILES strings of the two drug compounds being tested.--datasetspecifies the dataset to use (e.g., ALMANAC-Breast).--cell_lineindicates the cell line to consider (e.g., MCF7).--taskdefines the type of task: classification for synergy/antagonism prediction or regression for raw score prediction.--thrsets the threshold for classification tasks, used to differentiate between synergistic and antagonistic interactions.
| Option | Description |
|---|---|
--smi1 |
SMILES string of the first compound (required) |
--smi2 |
SMILES string of the second compound (required) |
--cell_line |
Cell-line identifier (e.g. MCF7) (required) |
--dataset |
Dataset to use (default: ALMANAC-Breast). Available options are: |
- For Tissue Datasets: ALMANAC-Breast, ALMANAC-Lung, ALMANAC-Ovary, ALMANAC-Skin |
|
- For Study Datasets: FRIEDMAN, ONEIL |
|
--task |
Task type (default: regression). Options: |
- classification, regression |
|
--device |
Device for computation (default: cpu). Options: |
- cpu, cuda:0 (or another CUDA device string) |
|
--thr |
Threshold for classifying synergy vs antagonism (only for classification task). Default: 0.5 |
| Dataset | Cell Lines |
|---|---|
| ALMANAC-Breast | BT-549, MCF7, MDA-MB-231, MDA-MB-468 |
| ALMANAC-Lung | A549, EKVX, HOP-62, HOP-92, NCI-H226, NCI-H460, NCI-H522 |
| ALMANAC-Ovary | OVCAR-4, OVCAR-5, OVCAR-8, SK-OV-3 |
| ALMANAC-Skin | SK-MEL-2, SK-MEL-5, SK-MEL-28, UACC-257 |
| FRIEDMAN (Skin) | A2058, G-361, IPC-298, RVH-421, SK-MEL-2, SK-MEL-5, SK-MEL-28, UACC-257 |
| ONEIL (Several Tissues) | A2058 (skin), NCI-H460 (lung), SK-OV-3 (ovary), A2780 (ovary), A427 (lung), RKO (large intestine), SW837 (large intestine) |
A tarball will be obtained after download. After file extraction, move all nested folders to the root of this project directory. You might need to move all files in data/export up to data folder. Otherwise, you will run the Jupyter Notebook files to generate mandatory data. Let’s take a look at ipynb folder. Run the following files in order if you want to replicate our exported data.
01_drugcomb_clean.ipynb→cleandata_cancer.csv02_CCLE_gene_expression→CCLE_expression_cleaned.csv03_omics_preprocess→protein_omics_data_cleaned.csv04_drugcomb_gene_prot_clean→data_preprocessing_gene.pkl,data_drugcomb.pkl,data_preprocessing_protein.pkl05_graph_generate.ipynb→nps_intersectedfolder06_smiles_feat_generate.ipynb→smiles_graph_data.pkl07_to_ecfp6_deepsyn.ipynb→deepsyn_drug_row.npy,deepsyn_drug_col.npy
If the console shows an error indicating that SMILES are not found, you MUST run the file
06_smiles_feat_generate.ipynbagain to regenerate data.
To execute a training and testing task for our model, run the following script
python synprotx/<model>.py -d <database> -m <mode>Possible options are listed below.
modelrepresents the name of the model to run. Must be one ofgat,gcn,attentivefpandgatfp.--database/-dspecifies data source to train the model on. Must be one ofalmanac-breast,almanac-lung,almanac-ovary,almanac-skin,friedman,oneil.--mode/-minput must be eitherclas, for classification task, orregr, for regression task. Default toclas- Flags
--no-feamol,--no-feagene,--no-feaprotdisable the molecule branch, gene expression branch, and protein expression branch, respectively, when propagate through the model.
Note: There are more options to configure. Execute python synprotx/<model>.py -h for a more detailed description.
The performance evaluation per repeated fold can be looked up in the folder "results". This folder includes a comprehensive list of all results files obtained from the training process.
The models in comparison are XGBoost, DeepDDS, DeepSyn, SynProtX variations, and AttenSyn. The type of split includes random, cold-start for (leave-one-out) drugs, drug combinations,
and cell lines, and ablation (gene and protein) on both classification and regression tasks.
Disclaimer: The CSV files in the "results" folder are not covered by the same MIT license as the source code. These data files are dedicated to the public domain under CC0.
Research Article
@article{boonyarit2025synprotx_gigascience,
author = {Boonyarit, Bundit and
Kositchutima, Matin and
Phattalung, Tisorn Na and
Yamprasert, Nattawin and
Thuwajit, Chanitra and
Rungrotmongkol, Thanyada and
Nutanong, Sarana},
title = {SynProtX: a large-scale proteomics-based deep learning model for predicting synergistic anticancer drug combinations},
journal = {GigaScience},
volume = {14},
pages = {giaf080},
year = {2025},
month = {08},
issn = {2047-217X},
doi = {10.1093/gigascience/giaf080},
url = {https://doi.org/10.1093/gigascience/giaf080},
eprint = {https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giaf080/64028448/giaf080.pdf},
}Zenodo
@online{boonyarit2025synprotx_zenodo,
author = {Boonyarit, Bundit and
Kositchutima, Matin and
Phattalung, Tisorn Na and
Yamprasert, Nattawin and
Thuwajit, Chanitra and
Rungrotmongkol, Thanyada and
Nutanong, Sarana},
title = {SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.15603481},
url = {https://doi.org/10.5281/zenodo.15603481},
note = {[Dataset]}
}WorkflowHub
@online{boonyarit2025synprotx_workflowhub,
author = {Boonyarit, Bundit and
Kositchutima, Matin and
Phattalung, Tisorn Na and
Yamprasert, Nattawin and
Thuwajit, Chanitra and
Rungrotmongkol, Thanyada and
Nutanong, Sarana},
title = {SynProtX},
year = {2025}
url = {https://workflowhub.eu/workflows/1726?version=3},
DOI = {10.48546/WORKFLOWHUB.WORKFLOW.1726.3},
publisher = {WorkflowHub}
}Software Heritage
@online{boonyarit2025synprotx_software,
author = {Boonyarit, Bundit and
Kositchutima, Matin and
Phattalung, Tisorn Na and
Yamprasert, Nattawin and
Thuwajit, Chanitra and
Rungrotmongkol, Thanyada and
Nutanong, Sarana},
title = {SynProtX: A Large-Scale Proteomics-Based Deep Learning Model for Predicting Synergistic Anticancer Drug Combinations (Version 1)},
year = {2025},
note = {[Computer software]},
url = {https://archive.softwareheritage.org/swh:1:snp:750d09d4ed20b1628cef1f20cf0d2b2e518c4a3b;origin=https://github.com/manbaritone/SynProtX}
}