Codes for paper: "SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases" (AAAI-2020) detail. If you meet any questions, please email to him (ywsun at smail.nju.edu.cn).
Note that SPARQA is updated to SkeletonKBQA. If you are interested in SkeletonKBQA, please see here.
File | Description |
---|---|
code | codes |
skeleton | skeleton bank |
slides | slides and poster |
- Root of dataset: default D:/dataset. Note that you can edit it in common/globals_args.py.
Note that the following files are in baidu wangpan. The extraction code of all files is kbqa.
- Eight Resources: GloVe (glove.6B.300d), Stanford CoreNLP server, SUTime Java library, BERT pre-trained Models, and four preprocessing files(stopwords.txt, ordinal_fengli.tsv, unimportantphrase, and unimportantwords). unzip and save in the root.
- Two version Freebase: latest version and 2013 version. Next, download a virtuoso server and load the KBs. You can also download the KBs from freebase site. The file is helpful, if you meet questions.
- CWQ 1.1 dataset: skeleton parsing models, word-level scorer model, sentence-level scorer model. unzip and save in the root.
- Lexicons: entity-related lexicons and KB schema-related lexicons. unzip and save in the root.
- GraphQuestions dataset: Skeleton Parsing models, Word-level scorer model. unzip and save in the root.
- Lexicons: Entity-related Lexicons and KB schema-related lexicons. unzip and save in the root.
The pipeline has two steps for answering questions:
- (1) KB-indenpendent graph-structured ungrounded query generation.
- (2) KB-dependent graph-structure grounded query generation and ranking.
See running/freebase/pipeline_cwq.py if run CWQ 1.1. See running/freebase/pipeline_grapqh.py if run GraphQuestions. Below, an example on GraphQuestions.
Note that the steps are not friendly. To understand easliy, we provided samples of these steps in the output_graphq folder.
- Set datset in the common/globals_args.py: q_mode=graphq. (note that q_mode=cwq if CWQ 1.1)
- Set skeleton parsing in the common/globals_args.py: parser_mode=head, which means skeleton parsing. (note that parser_mode=dep, which means dependency parsing).
- Replace the freebase_pyodbc_info and freebase_sparql_html_info in the common/globals_args.py with your local address. (note that 2013 version is for GraphQuestions, and latest version is for CWQ 1.1).
- Run KB-indenpendent query generation. Setup variable module=1.0. The input: dataset. The output: structure with 1.0 ungrounded graph. We provided sample in output_graphq folder.
- Generate variant generation. Set variable module=2.1. The input: structure with 1.0 ungrounded graph. The output: structure with 2.1 grounded graph. We provided sample in output_graphq folder.
- Ground candidate queries. Set module=2.2. The input: structure with 2.1 grounded graph. The output: structure with 2.2 grounded graphs. We provided samples of questions in output_graphq folder. one sample.
- Rank using word-level scorer. Set module=2.3_word_match. The input: 2.2 grounded graphs.
- Combine sentence-level scorer and word-level scorer. Set module=2.3_add_question_match. The input: 2.2 grounded graphs.
- Run evaluation. Set module=3_evaluation. The input: 2.2 grounded graphs. The output: result.
- SPARQA also provides a tool of parsing. The input is a question. The output is the skeleton of the question. (Now, it only supports English language. Later, it will support Chinese language)
- You can use SPARQA's skeleton parsing to train yourself language. (It need replace the pre-trained models and annotated data with your language)
- SPARQA has provided a trained word-level scorer model and sentence-level scorer in dataset folder.
- We provide the code of offline ways, oracle graphs of CWQ 1.1 and oracle graphs of GraphQuestions. The way first retrieve oracle graphs (to reduce storage space) and then generate candidate queries from oracle graphs. About oracle graph, please see this paper.
- We can also provide the code of online ways. The way is to generate candidate queries online. The problem is efficiency issue.
- GraphQuestions: PARA4QA, SCANNER, UDEPLAMBDA.
- CWQ 1.1: PullNet, SPLITQA, and MHQA-GRN. Note that PullNet used annotated topic entities of questions in its KB only setting. SPARQA, an end-to-end method, do not use annotated topic entities. Thus, it is not comparable.
@inproceedings{SunZ0Q20,
author = {Yawei Sun and Lingling Zhang and Gong Cheng and Yuzhong Qu},
title = {{SPARQA:} Skeleton-Based Semantic Parsing for Complex Questions over Knowledge Bases},
booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI} 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA, February 7-12, 2020},
pages = {8952--8959},
publisher = {{AAAI} Press},
year = {2020},
url = {https://aaai.org/ojs/index.php/AAAI/article/view/6426},
}
If you have any difficulty or questions in running codes, reproducing experimental results, and skeleton parsing, please email to him (ywsun at smail.nju.edu.cn).