Skip to content

luffycodes/Tutorbot-Spock

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles (Accepted at EMNLP 2023)

Arxiv Paper Link: https://arxiv.org/abs/2305.13272

Please find the CLASS slides here.

We train an education tutoring chatbot, Spock, on Llama-13B + Vicuna-13B weights (https://github.com/lm-sys/FastChat/) weights. To train the chatbot, we create a synthetic dataset of mock conversations between a student and a tutor based on learning science principles like scaffolding. We employed a specialized prompt to generate these mock conversations using OpenAI's GPT-4 APIs.

Inference

To use the model, first install the fastchat library, and then follow the steps here:

  1. Update the conversation.py from our repository in the FastChat folder.
  2. Update the inference.py from our repository in the FastChat folder.
  3. Use the apply_delta.py on Spock-Bio-Llama-Diff to get actual Spock weights.
    • Example: python3 -m fastchat.model.apply_delta --base decapoda-research/llama-13b-hf --target tutorbot_spock_vicuna_prompt_v3 --delta luffycodes/tutorbot-spock-bio-llama-diff
    • Also, please put vicuna in the target model name since conversation.py and inference.py check if vicuna is a substring in a model name and change conversation starter and inference prompts respectively. Note we modify vicuna prompts so you would not able to able to use original vicuna models unless you revert back changes to conversation.py and inference.py.
  4. Build a biology index with OpenStax Biology 2e textbook. Put the generated os_bio_2e_index.faiss and the openstax_biology_2e.csv in same folder as inference.py i.e. FastChat/fastchat folder.

Creating synthetic conversation and scaffolding datasets to train Spock for subjects other than Biology

Example of generating conversational dataset using GPT

  1. Run the mock_con_GPTx_prompt_v3.py
  2. Remember to put openai.organization and openai.api_key in the file
  3. To create a scaffolding dataset, use prompts in folder

Training

  1. Run the create_dataset_spock.py to create the training dataset with mock conversations in FastChat Vicuna format.
  2. Use the training instructions from fastchat library.

If you use this work, please cite: CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles https://arxiv.org/abs/2305.13272

@misc{sonkar2023class,
      title={CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles}, 
      author={Shashank Sonkar and Lucy Liu and Debshila Basu Mallick and Richard G. Baraniuk},
      year={2023},
      eprint={2305.13272},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

An Education Tutoring Chatbot based on Learning Science Principles powered by Large Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors