RF-Score-VS is a novel Random Forest-based scoring function for Virtual Screening which predicts binding affinity. Its descriptors are based on RF-Score developed by Pedro Ballester et. al. Presented binary implements RF-Score-VS v2, meaning, it counts atoms of certain types within a 12A radius, divided into 2A bins. Further information about reported performance in various scenarios and validation across datasets, see the publication.
- SDF/MDL (.sdf, .mol)
- Mol2 (.mol2)
- PDBQT (.pdbqt)
- PDB (.pdb)
Download package appropriate for your platform, which contains the binary and sample data to test the RF-Score-VS. To use the scoring function uncompress the archive and open a terminal in the same directory as the binary.
- untagged parameters are treated as docked ligands; user can supply multiple molecular files [required]
-i
input file format; if not present then based on extension [optional]--receptor
a protein file; format based on extension [required]-O
output file; if-o
is not present file format is based on extension [optional]-o
output file format; if-O
is not present then molecules are printed to standard output [optional]
RF-Score-VS predicitons are in -pK units, which means the higher the score the better. To select best binder sort in descending order.
To rescore docked conformations simply run (on Windows omit the leading ./
):
./rf-score-vs --receptor protein.pdb ligands.sdf -O ligands_rescored.sdf
Producing CSV files, the RF-Score-VS score is appended to "RFScoreVS_v2" column (additional --field
parameter to limit output columns):
./rf-score-vs --receptor protein.pdb ligands.sdf -o csv --field "name" --field "RFScoreVS_v2"
Running test data included with bundle:
./rf-score-vs --receptor test/receptor_rdkit.pdb test/actives_docked.sdf -ocsv
To get the list of all available parameters:
./rf-score-vs --help
NOTE: There is an expected overhead at the beginning of execution of RF-Score-VS binary due to setting up of temporary Python environment.
- Single binary is obtained by freezing the Python script with Pyinstaller
- Open Drug Discovery Toolkit with OpenBabel
- Random forest trees are compiled using sklearn-compiledtrees for extra performance
- Python environment and all dependencies are prepared using Miniconda with Intel MKL optimizations (see the Miniconda LICENSE)
- All components provided here are covered by on permissive 3-clause BSD License.
-
Wójcikowski M, Ballester PJ, Siedlecki P. Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep. Nature Publishing Group; 2017;7: 46710. doi:10.1038/srep46710
-
Wójcikowski M, Zielenkiewicz P, Siedlecki P. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J Cheminform. 2015;7: 5317. doi:10.1186/s13321-015-0078-2
-
Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26: 1169–1175. doi:10.1093/bioinformatics/btq112
-
Ballester PJ, Schreyer A, Blundell TL. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model. 2014;54: 944–955. doi:10.1021/ci500091r
-
Li H, Leung K-S, Wong M-H, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform. WILEY-VCH Verlag; 2015;34: 115–126. doi:10.1002/minf.201400132