Source codes for the paper "You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona", accepted at EMNLP 2022 Findings.
The code runs with python 3.6. All dependencies are listed in requirements.txt
pip install -r requirements.txt
You can download FoCus Dataset (Persona-Knowledge Chat) in here
Since we use RAG for dialogue generation, you need to create a knowledge index file for the generation.
Before creating a knowledge index, you need to move Focus dataset into the data/
|-- data
|-- FoCus
|-- train_focus.json
`-- valid_focus.json
1) The preprocessing code for creating raw knowledge is in the knowledge_index folder
2) The code for creating a knowledge index file is as below
python use_own_knowledge_dataset --csv_path=your file --output_dir=your dir
or you can simply run sh file
we used the same file in the transformers Github but modified it a bit for preprocessing the raw knowledge
3) After creating a knowledge index for FoCus Dataset, you should change your path in the config/rag-tok-base-ct.json
Before you train the model, please modify the config file.