Source codes for the paper "You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona", accepted at EMNLP 2022 Findings.
The code runs with python 3.6. All dependencies are listed in requirements.txt
pip install -r requirements.txt
You can download FoCus Dataset (Persona-Knowledge Chat) in here
Since we use RAG for dialogue generation, you need to create a knowledge index file for the generation.
Before creating a knowledge index, you need to move Focus dataset into the data/
folder.
|-- data
|-- FoCus
|-- train_focus.json
`-- valid_focus.json
1) The preprocessing code for creating raw knowledge is in the knowledge_index folder
create_knowledge_index_for_github.ipynb
2) The code for creating a knowledge index file is as below
python use_own_knowledge_dataset --csv_path=your file --output_dir=your dir
or you can simply run sh file
sh create_knowldege_index.sh
we used the same file in the transformers Github but modified it a bit for preprocessing the raw knowledge
3) After creating a knowledge index for FoCus Dataset, you should change your path in the config/rag-tok-base-ct.json
"data_dir":
"save_dirpath":
"knowledge_dataset_path":
"knowledge_index_path":
Before you train the model, please modify the config file.
sh train.sh
sh evaluate.sh