Local RAG Chatbot for ASWF Projects

Overview

This project is a hands-on guide to building a local Retrieval-Augmented Generation (RAG) system from scratch. The goal is to create a chatbot capable of answering questions about projects from the Academy Software Foundation (ASWF), using a knowledge base built from their official documentation and source code repositories.

The initial focus is on three key ASWF projects:

OpenColorIO (OCIO)
OpenImageIO (OIIO)
OpenEXR

The architecture is designed to be modular and extensible, allowing for the easy addition of other knowledge sources in the future (e.g., Pixar's Universal Scene Description - USD).

This project serves as a practical learning exercise in Python, Machine Learning, and modern AI application development.

Tech Stack

The core technologies chosen for this project are:

Language: Python 3.11
Core Framework: LangChain for orchestrating the RAG pipeline.
LLM: Meta-Llama-3-8B-Instruct (via GGUF)
Vector Database: ChromaDB for local, persistent storage and retrieval of text embeddings.
Embedding Model: Sentence-Transformers for generating high-quality text embeddings locally.
Frontend: Streamlit for the chatbot interface.

Project Structure

The project follows a modular structure to keep the code organized and easy to test:

/
├── .venv/                  # The Python virtual environment
├── data/                   # For storing raw or processed data
├── input/
│   └── sources.txt         # List of URLs to scrape for the knowledge base
├── models/                 # For storing the local LLM model
├── notebooks/              # Jupyter notebooks for experimentation
├── src/                    # Main source code
│   ├── app.py              # The Streamlit chatbot application
│   ├── data_loader.py      # Scripts for loading and processing data
│   ├── rag_chain.py        # The core RAG chain logic
│   ├── vector_store.py     # Scripts for managing the ChromaDB instance
│   └── main.py             # Main application script for data ingestion
├── .gitignore              # Git ignore file
├── requirements.in         # Pip-tools input file for dependencies
├── requirements.txt        # Project dependencies
└── README.md               # This file

Getting Started

Follow these steps to set up your local development environment.

Prerequisites

Python 3.11
Git

Installation

Clone the repository:

git clone <repository-url>
cd RagLangChain

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Download the LLM: Download the Meta-Llama-3-8B-Instruct.Q5_K_M.gguf model and place it in the models/ directory.

Usage

The project has two main parts: data ingestion and the chatbot application.

1. Data Ingestion

To build the. knowledge base, you first need to ingest the data from the sources defined in input/sources.txt.

python src/main.py

This script will scrape the data, create embeddings, and store them in the ChromaDB vector store.

2. Run the Chatbot

Once the data has been ingested, you can start the chatbot application.

streamlit run src/app.py

This will open a new tab in your browser with the chatbot interface.

Current Status

The project is in a functional state. The data ingestion pipeline and the RAG-based chatbot are implemented. Future work could include adding more data sources, experimenting with different LLMs and embedding models, and improving the chatbot's user interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG Chatbot for ASWF Projects

Overview

Tech Stack

Project Structure

Getting Started

Prerequisites

Installation

Usage

1. Data Ingestion

2. Run the Chatbot

Current Status

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
input		input
src		src
.gitignore		.gitignore
readme.md		readme.md
requirements.in		requirements.in
requirements.txt		requirements.txt

JoergFlue/RagLangChain

Folders and files

Latest commit

History

Repository files navigation

Local RAG Chatbot for ASWF Projects

Overview

Tech Stack

Project Structure

Getting Started

Prerequisites

Installation

Usage

1. Data Ingestion

2. Run the Chatbot

Current Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages