GitHub - luffycodes/Tutorbot-Spock at CLASS

Name	Name	Last commit message	Last commit date
Latest commit History 31 Commits
book_index_retrieval	book_index_retrieval
datasets	datasets
fastchat	fastchat
gptx_datagen	gptx_datagen
prompts	prompts
readme.md	readme.md

Name

Last commit message

Last commit date

CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles (Accepted at EMNLP 2023)

Arxiv Paper Link: https://arxiv.org/abs/2305.13272

Please find the CLASS slides here.

We train an education tutoring chatbot, Spock, on Llama-13B + Vicuna-13B weights (https://github.com/lm-sys/FastChat/) weights. To train the chatbot, we create a synthetic dataset of mock conversations between a student and a tutor based on learning science principles like scaffolding. We employed a specialized prompt to generate these mock conversations using OpenAI's GPT-4 APIs.

Inference

To use the model, first install the fastchat library, and then follow the steps here:

Update the conversation.py from our repository in the FastChat folder.
Update the inference.py from our repository in the FastChat folder.
Use the apply_delta.py on Spock-Bio-Llama-Diff to get actual Spock weights.
- Example: python3 -m fastchat.model.apply_delta --base decapoda-research/llama-13b-hf --target tutorbot_spock_vicuna_prompt_v3 --delta luffycodes/tutorbot-spock-bio-llama-diff
- Also, please put vicuna in the target model name since conversation.py and inference.py check if vicuna is a substring in a model name and change conversation starter and inference prompts respectively. Note we modify vicuna prompts so you would not able to able to use original vicuna models unless you revert back changes to conversation.py and inference.py.
Build a biology index with OpenStax Biology 2e textbook. Put the generated os_bio_2e_index.faiss and the openstax_biology_2e.csv in same folder as inference.py i.e. FastChat/fastchat folder.

Creating synthetic conversation and scaffolding datasets to train Spock for subjects other than Biology

Example of generating conversational dataset using GPT

Run the mock_con_GPTx_prompt_v3.py
- It uses conversation prompt v3
Remember to put openai.organization and openai.api_key in the file
To create a scaffolding dataset, use prompts in folder

Training

Run the create_dataset_spock.py to create the training dataset with mock conversations in FastChat Vicuna format.
Use the training instructions from fastchat library.

If you use this work, please cite: CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles https://arxiv.org/abs/2305.13272

@misc{sonkar2023class,
      title={CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles}, 
      author={Shashank Sonkar and Lucy Liu and Debshila Basu Mallick and Richard G. Baraniuk},
      year={2023},
      eprint={2305.13272},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles (Accepted at EMNLP 2023)

Inference

Creating synthetic conversation and scaffolding datasets to train Spock for subjects other than Biology

Example of generating conversational dataset using GPT

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles (Accepted at EMNLP 2023)

Inference

Creating synthetic conversation and scaffolding datasets to train Spock for subjects other than Biology

Example of generating conversational dataset using GPT

Training

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages