Refine https://github.com/lifanchen-simm/transformerCPI/ for large dataset and multiple GPU training
conda env create -f py36_tCPI.yml
First run
sh script/generate_map.sh
to generate protein_map.pkl
and smiles_map.pkl
, which is the mapping from smiles
to smiles_feature
, and protein_seq
to protein_seq_feature
Then run sh script/main.sh
to start training.
If you want to stop the training process, run sh script/stop.sh
Comparison to the orginal repo https://github.com/lifanchen-simm/transformerCPI
Advantages:
- You can use
torch.nn.DataParallel
(along withtorch.cuda.amp
) to accelerate your training process. - For large scale dataset, Using
DTADataset
inDataUtil.py
along withtorch.nn.DataLoader
can accelerate your data loading process and tremendously reduce the memory usage. - I change the code into solving regression problem instead of classification problem in the original paper.
Thanks for the brilliant work of authors in this paper https://doi.org/10.1093/bioinformatics/btaa524