DLTKcat v1.0: Deep learning based prediction of temperature dependent enzyme turnover rates
The dataset curation process is in /code/GetData.ipynb
.
- Required inputs: substrate name, Uniprot ID of enzyme protein, temperature.
- Get SMILES strings and enzyme protein sequences using
convert_input(path, enz_col, sub_col )
in /code/feature_functions.py. - The input must be a csv file with columns of 'smiles', 'seq', 'Temp_K_norm', 'Inv_Temp_norm'.
'Temp_K_norm' and 'Inv_Temp_norm' are normalized temperature and inverse temperature values. - Run prediction:
python predict.py --model_path [default = /data/performances/model_latentdim=40_outlayer=4_rmsetest=0.8854_rmsedev=0.908.pth]<br>
--param_dict_pkl [default = /data/hyparams/param_2.pkl] <br>
--input [input.csv] --output [output file name] <br>
--has_label [default = False]
- Get attention weights of protein residues:
python get_attention.py --input [input.csv] --output [output file name]
- Mutants of Pyrococcus furiosus Ornithine Carbamoyltransferase via directed evolution (
/data/PFOCT/
,/code/CaseStudy_PFOCT.ipynb
).
Ref: https://doi.org/10.1128/jb.183.3.1101-1105.2001 - Growth and metabolism of Lactococcus lactis and Streptococcus thermophilus at different temperatures(
/data/GEMs
,/code/GEMs.ipynb
).
Ref: https://doi.org/10.1038/srep14199, https://doi.org/10.1111/j.1365-2672.2004.02418.x
- Pytorch (1.8.1+cu101): https://pytorch.org/
- Scikit-learn: https://scikit-learn.org/
- RDKit:https://www.rdkit.org/
- BRENDApyrser: https://github.com/Robaina/BRENDApyrser
- COBRApy: https://github.com/opencobra/cobrapy
- Seaborn statistical data visualization:https://seaborn.pydata.org/index.html
- Escher: https://github.com/zakandrewking/escher
DLTKcat: deep learning based prediction of temperature dependent enzyme turnover rates Sizhe Qiu, Simiao Zhao, Aidong Yang bioRxiv 2023.08.10.552798; doi: https://doi.org/10.1101/2023.08.10.552798
Users might encounter "Index out of range" error at amino_vector = self.embedding_layer_amino(amino)
.
The potential solution is +1 to n_atom, n_amino
in model parameters, and train a new model.