Learning with ChatGPT project

Aim of the project

This repository is made for a project, exploring using LLM's such as ChatGPT for creating an AI Teaching Assistant.
This includes all elements needed for a pipeline for that task:

Data generation
Semantic search
Full pipeline

Requirements

Requirements for running this project can be seen as requirements.txt. For running most files an OpenAI API key is also needed. A .env file can be created with an API-key defined for running the scripts.

Structure of repository

This repository contains multiple scripts, Jupyter Notebooks and other files.

We encourage to view the script final pipeline for a script of the entire final pipeline from input query to output answer

The Data_generation folder contains scripts for splitting PDF documents into paragraphs. A folder df_pickle is also found here which contains raw datasets that can be loaded into other scripts for further processing. The final raw dataset is final_02450_emb.pkl both containing questions paragraphs and associated embeddings.

The Notebook Similarity network contains code for training Semantic search models. It also includes training plots and ROC curves for the models. The 3 datasets used in the test/training are first created in the beginning of the notebook. The datasets are very large (around 20GB) therefore it is not included in the repository, however it will be generated when running the notebook.
These models are trained and evaluated:

The three ANN's
The Weighted Cosine Similarity

The weights and structure of the Oversampled ANN and the Weighted CS are saved in the folder ANN.

Furthermore the files AB test, AB test relevant context and AB test similarity score where used to conduct the A/B tests on the pipeline. The AB test is the final A/B between the pipeline and native ChatGPT.

A possible implementation of the pipeline

By running the script Interface an interactive interface of using the pipeline can be viewed and used. Note that this script requires a .env file with an API-key for OpenAI. It also requires the folder ANN to be present. Finally it requires the user to download a model from HuggingFace, however this is done automatically if the script is run.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
ANN		ANN
Data_Generation		Data_Generation
.gitignore		.gitignore
AB test relavant context.py		AB test relavant context.py
AB test similarity score.py		AB test similarity score.py
AB test.py		AB test.py
AB_Questions.txt		AB_Questions.txt
Interface.py		Interface.py
LICENSE		LICENSE
Naïve_Pipeline.ipynb		Naïve_Pipeline.ipynb
README.md		README.md
Similarity network.ipynb		Similarity network.ipynb
Vector embedding visualization.ipynb		Vector embedding visualization.ipynb
context_visualization.ipynb		context_visualization.ipynb
documents_cleaner.py		documents_cleaner.py
entire_pipeline.py		entire_pipeline.py
final_pipeline.py		final_pipeline.py
first draft notebook.ipynb		first draft notebook.ipynb
get_embedding.py		get_embedding.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning with ChatGPT project

Aim of the project

Requirements

Structure of repository

A possible implementation of the pipeline

About

Releases

Packages

Contributors 3

Languages

License

osquera/ChatGPT_Tutor_Project

Folders and files

Latest commit

History

Repository files navigation

Learning with ChatGPT project

Aim of the project

Requirements

Structure of repository

A possible implementation of the pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages