compbench aims to facilitate head-to-head benchmarking of different drug discovery platforms.
compbench has been tested on Windows 10 version 10.0.19045. Most modern computers should be able to run compbench. Must have Python 3. To use CANDO wrapper, must have CANDO and its prerequisites installed (see CANDO repository).
Download compbench.py. If you plan to compare with CANDO, download cando_wrapper.py as well. Download any drug-to-indication mappings you wish to test on (CTD and TTD mappings currently available). Further information and a random control function can be found within compbench.py. Note that you may need to create a python wrapper to interface with the drug discovery/repurposing platform to be benchmarked.
Before using compbench to benchmark new modules, run compbench.py directly. This will benchmark two functions as a test: (1) a random control function that randomly ranks the compounds for each indication and (2) a function that ranks the compounds for each indication based on alphabetical order (i.e. a useless but deterministic ranking function). These will both be benchmarked on the CTD data set provided in this repo, assuming it has been downloaded into a data folder in the same directory as compbench.py. This should take less than a minute on a modern computer.
Running compbench.py will result in the creation of four output files:
- "randomized_results.tsv" and "alphabetical_results.tsv" - contains the raw benchmarking results for each associated drug for each indication
- "random_TPR_FPR_by_rank.tsv" and "alphabetical_TPR_FPR_by_rank.tsv" - contains the true and false positive rates at each rank threshold; can be used to graph the receiver operating characteristic curve
In addition, the following should be printed to the terminal or IDE window:
Creating dict translating indication ID to drug IDs
Benchmarking...
AUROC <random value, approximately 0.00125> at max FPR 0.05
AUROC <random value, approximately 0.50>
NDCG <random value, generally 0.003-0.006> at cutoff 10
NDCG <random value, generally 0.153-0.156>
Benchmarking...
AUROC 0.0012285786553041158 at max FPR 0.05
AUROC 0.5147506856710282
NDCG 0.0052199569618313945 at cutoff 10
NDCG 0.15451257905022314
The first set of values show the AUROC and NDCG metrics calculated from the random control. Since this function is randomized, these values will vary from run to run; approximate values have thus been provided. The second set of values show the metrics calculated from the alphabetical sorting function. This function is deterministic, so your results should be identical to those shown here.