Skip to content

A Project done at the Technical University of Denmark (DTU) on creating a Digital Teachers Assistant enabled by LLMs.

License

Notifications You must be signed in to change notification settings

osquera/ChatGPT_Tutor_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning with ChatGPT project

Aim of the project

This repository is made for a project, exploring using LLM's such as ChatGPT for creating an AI Teaching Assistant.
This includes all elements needed for a pipeline for that task:

  • Data generation
  • Semantic search
  • Full pipeline

Requirements

Requirements for running this project can be seen as requirements.txt. For running most files an OpenAI API key is also needed. A .env file can be created with an API-key defined for running the scripts.

Structure of repository

This repository contains multiple scripts, Jupyter Notebooks and other files.

We encourage to view the script final pipeline for a script of the entire final pipeline from input query to output answer

The Data_generation folder contains scripts for splitting PDF documents into paragraphs. A folder df_pickle is also found here which contains raw datasets that can be loaded into other scripts for further processing. The final raw dataset is final_02450_emb.pkl both containing questions paragraphs and associated embeddings.

The Notebook Similarity network contains code for training Semantic search models. It also includes training plots and ROC curves for the models. The 3 datasets used in the test/training are first created in the beginning of the notebook. The datasets are very large (around 20GB) therefore it is not included in the repository, however it will be generated when running the notebook.
These models are trained and evaluated:

  • The three ANN's
  • The Weighted Cosine Similarity

The weights and structure of the Oversampled ANN and the Weighted CS are saved in the folder ANN.

Furthermore the files AB test, AB test relevant context and AB test similarity score where used to conduct the A/B tests on the pipeline. The AB test is the final A/B between the pipeline and native ChatGPT.

A possible implementation of the pipeline

By running the script Interface an interactive interface of using the pipeline can be viewed and used. Note that this script requires a .env file with an API-key for OpenAI. It also requires the folder ANN to be present. Finally it requires the user to download a model from HuggingFace, however this is done automatically if the script is run.

About

A Project done at the Technical University of Denmark (DTU) on creating a Digital Teachers Assistant enabled by LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published