CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles (Accepted at EMNLP 2023)
Arxiv Paper Link: https://arxiv.org/abs/2305.13272
Please find the CLASS slides here.
We train an education tutoring chatbot, Spock, on Llama-13B + Vicuna-13B weights (https://github.com/lm-sys/FastChat/) weights. To train the chatbot, we create a synthetic dataset of mock conversations between a student and a tutor based on learning science principles like scaffolding. We employed a specialized prompt to generate these mock conversations using OpenAI's GPT-4 APIs.
To use the model, first install the fastchat library, and then follow the steps here:
- Update the conversation.py from our repository in the FastChat folder.
- Update the inference.py from our repository in the FastChat folder.
- Use the apply_delta.py on Spock-Bio-Llama-Diff to get actual Spock weights.
- Example:
python3 -m fastchat.model.apply_delta --base decapoda-research/llama-13b-hf --target tutorbot_spock_vicuna_prompt_v3 --delta luffycodes/tutorbot-spock-bio-llama-diff - Also, please put
vicunain the target model name sinceconversation.pyandinference.pycheck ifvicunais a substring in a model name and change conversation starter and inference prompts respectively. Note we modifyvicunaprompts so you would not able to able to use originalvicunamodels unless you revert back changes toconversation.pyandinference.py.
- Example:
- Build a biology index with OpenStax Biology 2e textbook. Put the generated
os_bio_2e_index.faissand the openstax_biology_2e.csv in same folder as inference.py i.e.FastChat/fastchatfolder.
Creating synthetic conversation and scaffolding datasets to train Spock for subjects other than Biology
- Run the mock_con_GPTx_prompt_v3.py
- It uses conversation prompt v3
- Remember to put openai.organization and openai.api_key in the file
- To create a scaffolding dataset, use prompts in folder
- Run the create_dataset_spock.py to create the training dataset with mock conversations in FastChat Vicuna format.
- Use the training instructions from fastchat library.
If you use this work, please cite: CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles https://arxiv.org/abs/2305.13272
@misc{sonkar2023class,
title={CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles},
author={Shashank Sonkar and Lucy Liu and Debshila Basu Mallick and Richard G. Baraniuk},
year={2023},
eprint={2305.13272},
archivePrefix={arXiv},
primaryClass={cs.CL}
}