GitHub - pooruss/FAQ-System-COUGH: Graduate project in Tianjin University. An FAQ system based on BM25 and RocketQA. Corpus from COUGH, a multi-lingual FAQ retrieval dataset for COVID-19.

FAQ System based on COUGH

Background

数据集来源： "COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval".

	FAQIR	StackFAQ	LocalGov	Sun and Sedoc	Poliak et al.	COUGH (ours)
Domain	Yahoo!	StackExachange	Government	COVID-19	COVID-19	COVID-19
# of FAQs	4313	719	1786	690	2115	15919
# of Queries (Q)	1233	1249	784	6495*	24240*	1201
# of annotations per Q	8.22	Not Applicable	<10	5	5	32.17
Query Length	7.30	13.84	**	**	**	12.97
FAQ-query Length	12.30	10.39	**	**	**	13.00
FAQ-answer Length	33.00	76.54	**	**	**	113.58
Language	English	English	Japanese	English	Multi-lingual	Multi-lingual
# of sources	1	1	1	12	34	55

Requirements:

python 3.7

nltk==3.7
numpy==1.19.3
paddlepaddle-gpu==2.2.2.post111
rocketqa==1.0.0

Recall and Rerank

召回模块支持：BM25、BM25L、BM25+
精排模块支持：Rocketqa-v1-marco-de、Rocketqa-v1-marco-ce ...

Data preprocess

根据不同scheme(A or C)，抽取评估集；共1200+条query

cd preprocess
python extract_evaluate_set.py A
# 结果保存为npy文件，位于../data/evaluate_set/evaluate_set_scheme_A_test.npy

抽取不同语种的问答库，英文部分共9000+条（question，answer）items

cd preprocess
python extract_from_bank.py en en_bank
# 结果保存为txt文件，位于../data/faq_bank/en_bank.txt

Demo

python main.py \
--config ./config/en_q_config.ini \
--task demo \
--rerank False

Evaluation

python main.py \
--config ./config/en_q_config.ini \
--task evaluation \
--rerank False

Todo

更好地结合召回模块和精排模块的能力
多语言功能完善
前端页面

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
__pycache__		__pycache__
config		config
preprocess		preprocess
tools		tools
README.md		README.md
evaluation.py		evaluation.py
google_translate.py		google_translate.py
main.py		main.py
recall_module.py		recall_module.py
requirements.txt		requirements.txt
rerank_module.py		rerank_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FAQ System based on COUGH

Background

Requirements:

Recall and Rerank

Data preprocess

Demo

Evaluation

Todo

About

Uh oh!

Releases

Packages

Languages

pooruss/FAQ-System-COUGH

Folders and files

Latest commit

History

Repository files navigation

FAQ System based on COUGH

Background

Requirements:

Recall and Rerank

Data preprocess

Demo

Evaluation

Todo

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages