Skip to content

augcog/tai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TAI: Teaching Assistant Intelligence

TAI_logo.png

Python Code style: black

πŸš€ Quick Start

TAI uses project-specific virtual environments with automatic activation. Each project has its own dependencies and setup.

Prerequisites

1. Install Poetry

curl -sSL https://install.python-poetry.org | python3 -

2. Install direnv for automatic environment activation

# On macOS with Homebrew
brew install direnv

# On Linux with apt
sudo apt install direnv

3. Add direnv hook to your shell

# For zsh
echo 'eval "$(direnv hook zsh)"' >> ~/.zshrc
source ~/.zshrc

# For bash
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc
source ~/.bashrc

Setup Projects

Navigate to each project directory to set up and use. Each project has its own README with detailed instructions:

AI Chatbot Backend

cd ai_chatbot_backend
direnv allow  # Allow automatic environment activation
# Follow instructions in ai_course_bot/ai_chatbot_backend/README.md

RAG Pipeline

cd rag
direnv allow  # Allow automatic environment activation
# Follow instructions in rag/README.md

Recommended setup for Cursor/VScode:

Highly recommand openning the project using workspace(eidtor should alert you automatically) and install all recommand extension and plugins.

πŸ—οΈ Architecture

TAI is organized as a modular monorepo with independent project environments:

tai/
β”œβ”€β”€ πŸš€ ai_chatbot_backend/  # FastAPI backend service
β”‚   β”œβ”€β”€ .venv/                           # Project-specific virtual environment
β”‚   β”œβ”€β”€ pyproject.toml                   # Poetry dependencies
β”‚   β”œβ”€β”€ Makefile                         # Project commands
β”‚   └── .envrc                           # Auto-activation with direnv
β”œβ”€β”€ 🧠 rag/                              # RAG pipeline and file processing
β”‚   β”œβ”€β”€ .venv/                           # Project-specific virtual environment
β”‚   β”œβ”€β”€ pyproject.toml                   # Poetry dependencies
β”‚   β”œβ”€β”€ Makefile                         # Project commands
β”‚   └── .envrc                           # Auto-activation with direnv
β”œβ”€β”€ πŸ“ rag/file_organizer/               # File organization utilities
└── πŸ“Š evaluation/dataset_generate/      # Evaluation and dataset tools

πŸ”„ How It Works

  1. Navigate to a project: cd ai_chatbot_backend
  2. Environment activates automatically via direnv
  3. Use project commands: make help to see available commands
  4. VSCode detects the correct Python interpreter automatically

Link to Website

https://tai.berkeley.edu

What is TAI?

TAI is an open source project developed by researchers and students at UC Berkeley (see Credits below), with the goal to offer the power of edge GPT models and services for education purposes. The GPT models selected in TAI are carefully curated to allow students to easily spin up their own local GPT services and websites. The project further develops robust embedding and RAG toolkits to allow users to convert their knowledge base and multimedia documents into searchable vector databases.

Once installed locally, TAI allows individuals to easily start a conversation to use GPT techniques to search through local documents using simple natural languages.

Core Algorithms

Llama3 as base model, BGE-M3 as embedding model, Sqlite-vss as vector database, and RAG agent.

AI Course Bot

AI course bot is our Open-Source RAG Framework, designed to facilitate the creation and deployment of a TAI website. This platform harnesses the power of a Retrieval-Augmented Generation (RAG) system to provide answers to questions sourced from course materials and online resources. With its user-friendly deployment process and customization options, the TAI serves as a valuable resource for providing seamless support to students.

RAG

To ready the vector database for the RAG system, a web scraper is employed to extract online documentation. The obtained data, which comes in various formats, is subsequently divided into segments. These segments are then embedded and stored within the vector database to ensure efficient retrieval by the TA Agent.

Evaluation

The TAI project is equipped with a comprehensive test suite that ensures the reliability and accuracy of the system. The tests are designed to evaluate the functionality of the core algorithms, including the Llama3 model, BGE-M3 embedding model, and Sqlite-vss vector database. By running these evaluations, users can verify the performance of the TAI system and identify any potential issues that may arise during operation.

Tutorial

The following is the video tutorial for each part:

Credits

The TAI project is a collaborative effort by researchers and students at UC Berkeley. The project is led by Director Dr. Allen Y. Yang and includes contributions from the following individuals:

  • Franco Leonardo Huang
  • Wei Quan Lai
  • Ines L Bouissou
  • Jingchao Zhong
  • Terrianne Zhang
  • Michael Wu
  • Steve Gao
  • Tianlun Zhang
  • Divya Jindal
  • Yikang Yu
  • Charles Xu
  • Ashton Lee
  • Arnav Jain

Acknowledgements

We are deeply grateful for the support and contributions from the following organizations:

  • Qualcomm: For their generous AI Hub sponsorship, which has been instrumental in our progress.
  • Hitch Interactive: For their unwavering general support, which has been crucial to our success.
  • Nimbus-Nova: For their exceptional work in system design and architecture.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 19