Submission of my KGQA system to SCHOLARLY QALD @ISWC 2023
KGQA (Knowledge Graph Question Answering) on DBLP-QUAD is a project developed for the task of question answering over the DBLP Knowledge Graph, utilizing the DBLP-QUAD dataset. The task forms a part of the SCHOLARLY QALD challenge at ISWC, hosted on Codalab Competition Page. The objective involves harnessing the power of machine learning, natural language processing, and knowledge graph exploration to retrieve accurate and concise answers to natural language questions.
Task 1: DBLP-QUAD — Knowledge Graph Question Answering over DBLP
- Dataset: DBLP-QUAD
- Volume: Consists of 10,000 question-SPARQL pairs.
- Aim: The task requires participants to develop systems to effectively answer natural language questions using the DBLP Knowledge Graph.
- One-Hop Relations: Our system begins by extracting one-hop relations between entities.
- Labeled Candidate Pairs: For every entity encountered, the system retrieves labeled candidate pairs from the DBLP Knowledge Graph.
- BERT cls Embeddings: The system employs BERT cls embeddings to facilitate the selection of the most plausible relation between candidate pairs.
- Winning Candidate Selection: Based on the identified relations, the system chooses the winning candidate that would most likely yield the accurate answer.
- Dataset: 200 questions extracted from DBLP QUAD’s original 2,000 question test set, specifically focusing on questions involving one-hop relations.
- Result: Achieved 100% accuracy.
- Location: Results can be accessed in
TEST_DBLP_DATASET/results
.
- Dataset: Applied on the provided 500 test questions in the competition.
- Location: Results are available in the
500_questiondataset
folder.
Detailed results, along with relevant graphs, charts, and discussion, are elaborated in the respective result folders. The high accuracy in the preliminary test set illustrates the system's competence in handling one-hop relations within the KG.
KGQA-DBLP-QUAD/
│
├── Test_DBLP_DATASET/
│ ├── dataset/ - contains dataset with 200 questions selected from DBLP QUAD test dataset
│ └── results/ - contains labeled pairs and final accuracy results with winning candidates
│ ├── bertdataset.py -
│ └── st3dataset.py -
│
├── 500_questiondataset/
│ ├── entity linked dataset/ - Entity linking and listing performed on 500 questions test dataset
│ └── results/ - contains labeled pairs and final accuracy results with winning candidates
│ ├── bert500ds.py -
│ └── st3500dat.py -
├── Single_Question/
│ ├── accuracy_results.json - sample accuracy results with winning candidate for one question
│ └── labeled_pairs.json - contains labeled pairs for sample question
│ ├── bertimplement.py -
│ └── st3.py -
│
├── requirements.txt
└── README.md
Ensure that you have Python 3.x installed. Install the necessary libraries and dependencies using the following command:
pip install -r requirements.txt
Provide instructions for how to run your code, query the model, or fine-tune it with additional data.
We invite researchers and developers to contribute to the enhancement of KGQA on DBLP-QUAD. Feel free to raise issues, suggest enhancements, or create pull requests.
Ensure to provide due credits to datasets, models, and resources utilized during the project. Express gratitude towards contributors and collaborators.
Specify the license under which the project is being distributed.