KnowlyBERT - Hybrid Query Processing over Language Models and Knowledge Graphs

This repository contains the code which allows to reproduce our results in the paper.

System Requirements

Linux
128GB RAM recommended
a CUDA-enabled GPU with at least 11GB memory (the software runs also on CPU, but it is extremely slow)

Dependencies

python3.6
python3-pip
unixodbc-dev
PyPi Packages
- matplotlib==3.1.2
- cython==0.29.2
- numpy==1.15.1
- torch==1.0.1
- pytorch-pretrained-bert==0.6.1
- allennlp==0.8.5
- spacy==2.1.8
- tqdm==4.26.0
- termcolor==1.1.0
- pandas==0.23.4
- fairseq==0.8.0
- colorama==0.4.1
- simplejson==3.17.2
- pyodbc==4.0.30
- dill==0.2.9
- tensorflow==1.14.0 (select GPU support in requirements.txt manually!)

RUN IN DOCKER

We provide Dockerfiles to create a docker image with which you are able to run our code with only a few commands.

Create Docker Images

There are two Dockerfiles in this repository:

`Dockerfile`

Creates an image which reproduces ALL results, including the results of our HolE Embedding. We highly recommend to install the NVIDIA Container Toolkit for Docker to enable GPU acceleration. Running this image without GPU acceleration will be extremely time consuming. If you don't want to setup GPU acceleration, you can instead create an image without the computation of our HolE results. (See Dockerfile_no-hole)

$ docker build --file Dockerfile --tag knowlybert:all .
$ docker run -it --volume /path/on/host:/opt/KnowlyBERT/evaluation knowlybert:all

Set /path/on/host to any non-existent location on your host-system where the container should store our evaluation results.

`Dockerfile_no-hole`

Creates an image which reproduces all results, EXCEPT the results of our HolE Embedding. You can run this image without GPU acceleration and it should finish in a few hours.

$ docker build --file Dockerfile_no-hole --tag knowlybert:no-hole .
$ docker run -it --volume /path/on/host:/opt/KnowlyBERT/evaluation knowlybert:no-hole

Set /path/on/host to any non-existent location on your host-system where the container should store our evaluation results.

FIRST STEPS

If you don't want to use Docker to reproduce our results, you have to manually setup the required environment.

Install Python requirements

$ python3 -m pip install -r requirements.txt

Clone RelAlign Repository

$ cd kb_embeddings/
$ git clone https://github.com/JanKalo/RelAlign.git
$ cd ..

Install LAMA

Do not clone the LAMA repository again. Only install it as an editable package.

$ cd LAMA/
$ pip install --editable .
$ cd ..

Repository Structure

/LAMA/

This is mainly the repository of Petroni et al. (https://github.com/facebookresearch/LAMA) but there are also some scripts added and edited to enable this hybrid system: 1) multi token results of the language model 2) automatically extracted templates

/baseline/

This directory includes the script to evaluate the results of the Laguage Model to a specific query file. It is also possible to evaluate the two baselines as a comaprison to the language model: 1) relation extraction model 2) knowledge base embedding. For more information, see the README.md file located in the directory baseline/.

/kb_embeddings/

This directory includes the script for the integration of the knowledge base embedding HolE to get the loss of a given tripel.

/threshold_method/

This directory includes the script for calculating the threshold of the language model probabilities.

Python Files

This section only contains the files which are needed to reproduce the results.

1) get_results.py

This script saves the results of the language model to given queries and parameters of the hybrid system. The parameters can be changed in get_results.py starting from line 343. For each evaluation and the given parameters a result directory (e.g. <chosen_result_directory> = 21.05._03:18:34_tmc_tprank2_ts5_trmmax_ps1_kbe-1_cpTrue_mmd0.6) is saved to evaluation/.

$ python3 get_results.py
$ cd evaluation/<chosen_result_directory>/

2) baseline/evaluate.py

This script evaluates the results of the language model by reading the result files in evaluation/<chosen_result_directory>/. It returns the following twelve files:

evaluation_all.json → all given queries
evaluation_object.json → only queries based on the tripel (s, p, ?x)
evaluation_subject.json → only queries based on the tripel (?x, p, o)
evaluation_single.json → only queries with only one-token results
evaluation_multi.json → only queries with one-token AND multi-token results
evaluation_1-1.json → only queries with 1-1 properties
evaluation_1-n.json → only queries with 1-n properties
evaluation_n-m.json → only queries with n-m properties
evaluation_cardinality-1.json → only queries with one results
evaluation_cardinality-1-10.json → only queries with two to ten results
evaluation_cardinality-10-100.json → only queries with eleven to 100 results
evaluation_cardinality-100-inf.json → only queries with more than 100 results

$ python3 ../../baseline/evaluate.py --missing-data ../../baseline/missing_data.json --query-groups *query_groups.json ../../baseline/query_propmap.json ../../baseline/gold_dataset.json ../../baseline/ContextWeighted2017.json ../../baseline/hole_baseline.json data/

3) baseline/get_precision_recall.py

This script saves files with precision and recall values by reading the output files of baseline/evaluate.py. For each evaluation.json, it returns a file with the averaged precision and recall over all queries and a file with the precision and recall averaged over all the containing queries per property.

$ python3 ../../baseline/get_precision_recall.py evaluation_all.json evaluation_object.json evaluation_subject.json evaluation_single.json evaluation_multi.json evaluation_1-1.json evaluation_1-n.json evaluation_n-m.json evaluation_cardinality-1.json evaluation_cardinality-1-10.json evaluation_cardinality-10-100.json evaluation_cardinality-100-inf.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowlyBERT - Hybrid Query Processing over Language Models and Knowledge Graphs

System Requirements

Dependencies

RUN IN DOCKER

Create Docker Images

`Dockerfile`

`Dockerfile_no-hole`

FIRST STEPS

Install Python requirements

Clone RelAlign Repository

Install LAMA

Repository Structure

/LAMA/

/baseline/

/kb_embeddings/

/threshold_method/

Python Files

1) get_results.py

2) baseline/evaluate.py

3) baseline/get_precision_recall.py

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
LAMA		LAMA
baseline		baseline
kb_embeddings		kb_embeddings
threshold_method		threshold_method
Dockerfile		Dockerfile
Dockerfile_no-hole		Dockerfile_no-hole
README.md		README.md
build_language_model.py		build_language_model.py
config.json		config.json
get_results.py		get_results.py
helper_functions.py		helper_functions.py
hybrid_system.py		hybrid_system.py
requirements.txt		requirements.txt
workflow.sh		workflow.sh
workflow_no-hole.sh		workflow_no-hole.sh

JanKalo/KnowlyBERT

Folders and files

Latest commit

History

Repository files navigation

KnowlyBERT - Hybrid Query Processing over Language Models and Knowledge Graphs

System Requirements

Dependencies

RUN IN DOCKER

Create Docker Images

Dockerfile

Dockerfile_no-hole

FIRST STEPS

Install Python requirements

Clone RelAlign Repository

Install LAMA

Repository Structure

/LAMA/

/baseline/

/kb_embeddings/

/threshold_method/

Python Files

1) get_results.py

2) baseline/evaluate.py

3) baseline/get_precision_recall.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`Dockerfile`

`Dockerfile_no-hole`

Packages