This is a Maven project that contains source code in Java and ground truth data for a biomedical ontology benchmark.
More details in the research paper:
D. Oliveira, A. S. Butt, A. Haller, D. Rebholz-Schuhmann, and R. Sahay, “Where to search top-K biomedical ontologies?,” Brief Bioinform, vol. 20, no. 4, pp. 1477–1491, Jul. 2019, doi: 10.1093/bib/bby015.
To run this project you will need the following:
-
A Linux machine.
-
- Clone the virt-jena repository inside the benchmark directory.
- Inside the new virt-jena directory do
mvn clean install
-
- Create a directory in the root of the bioont repository to store of Virtuoso database, e.g.
virt_database
- Change the
virtuoso.ini
parameters according to your machine requirements and put the file in your Virtuoso database directory. - Start the Virtuoso server in your database directory.
- Edit the scripts/bulk_load.sh script and change the first four parameters to correspond to your Virtuoso server port, user, password and the directory of the Virtuoso database (e.g
VIRT_DB=$PWD/virt_database
). - In the root directory of the repository, bulk load the ontologies into Virtuoso with
scripts/bulk_load.sh
.
- To restart the Virtuoso store, stop virtuoso and delete everything inside
virt_database
except for thevirtuoso.ini
file. Restart virtuoso and run scripts/bulk_load.sh again.
- Create a directory in the root of the bioont repository to store of Virtuoso database, e.g.
-
Solr - the use of OLS-SOLR spring boot application is advised for optimal compatibility (https://github.com/EBISPOT/OLS/tree/master/ols-apps/ols-solr-app). Follow these steps:
- Clone/download the OLS git repository into the bioont repository.
- Delete the contents of the resources directory.
- Copy all contents of the userinput/ontology_property_files directory into the resources directory.
- Build OLS by running
mvn clean package
in the root of the OLS repositorty. - Download and extract Solr (only version 5.2.1 was tested) to the root of the bioont repository.
- Create a directory to store the Solr indexes in the root of the bioont repository, e.g.
solr_index
- Start solr with:
$ solr-5.2.1/bin/solr -Dsolr.solr.home=$PWD/OLS/ols-solr/src/main/solr-5-config -Dsolr.data.dir=$PWD/solr_index
- Build the Solr indexes from the root of the bioont repository with:
$ scripts/index.sh
- To restart the Solr indexes, stop Solr, delete everything inside the
solr_index
directory and run step (vii) again.
Keep Virtuoso and Solr running. Open the file userinput/config.properties and change the necessary parameters. Note that you will need to register in BioPortal to obtain an API key.
To run the benchmark do the following:
- In the benchmark directory build the project with
mvn clean package
. - Run the benchmark with
java -jar benchmark/target/bioont-1.0-SNAPSHOT-shaded.jar
- View the results in the userinput/ranking_results and userinput/evaluation folders.
- To restart the benchmark, delete the userinput/ranking_models and run step (2) again.
- If you don't want to load the ontology metadata and the preprocessing for the algorithms again and just want to re-run the benchmark, open the Test class and change the variables
loadData
andpreprocessing
to false. Then follow steps (1) and (2) again.
If you wish to use the benchmark with a different set of ontologies you will need to create new ontology configuration files with the exact some structure and repeat the Solr steps starting from (iii). You will also need to add the acronym for those new ontologies in userinput/acronyms.txt and the URL for their download in userinput/uris.txt.
To change the query terms used in the benchmark edit the file userinput/test_terms.txt and introduce one query term per line.