Skip to content

RAG-based chatbot for querying 42 / 1337 related rules and informations. Combines Nomic embeddings + FAISS retrieval with LLM generation for context-aware answers and conversational memory.

Notifications You must be signed in to change notification settings

Sfeso13/handbook_assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€–handbook_assistant

A Retrieval-Augmented Generation (RAG) system designed for efficiently querying a structured handbook.

what to expect next:

  • Expanding "Knowledge base" to include 42-related informations and overall general facts and rules any 42/1337 student need to be aware of.
  • A dockerized environment for ease of deployment.
  • Web app UI for ease of use with intra account login.
  • Possibility of choosing your own embedding and LLM models.

πŸ“‹ Table of Contents


Film Projector About

handbook_assistant is a local Retrieval-Augmented Generation (RAG) system built to answer questions related to 1337 coding school using its official handbook and internal documentation. The system combines:

  • Structured document chunking
  • Dense vector embeddings
  • Semantic search with FAISS
  • Local LLM inference via Ollama

It is designed to only answer questions related to 1337, while rejecting unrelated queries or instructions.


Control Knobs Features

  • βœ… Hierarchical markdown chunking (H1 / H2 aware)
  • βœ… Token-aware chunk splitting for embeddings
  • βœ… Local embeddings using nomic-embed-text
  • βœ… Vector search with FAISS
  • βœ… Query routing (retrieval vs chat)
  • βœ… Ollama-based local LLM inference
  • βœ… Strict scope enforcement (1337-only answers)
  • βœ… Modular and extensible design

Alien Monster System Architecture

User Query
    β”‚
    β–Ό
Query Router (Embedding Similarity / LLM fallback)
    β”œβ”€β”€ Chat β†’ Generic LLM (scope-limited)
    └── Retrieval
          β”œβ”€β”€ Embed query
          β”œβ”€β”€ FAISS similarity search
          β”œβ”€β”€ Retrieve top-k chunks
          └── LLM answers using retrieved context


Card File Box Project Structure

Click to expand
.
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ cleaned/
β”‚   β”‚   └── handbook_clean.md
β”‚   β”œβ”€β”€ chunked/
β”‚   β”‚   └── handbook_chunked.jsonl
β”‚   β”œβ”€β”€ index/
β”‚   β”‚   β”œβ”€β”€ handbook.faiss
β”‚   β”‚   └── metadata.json
β”‚   β”œβ”€β”€ processed/
β”‚   β”‚   β”œβ”€β”€ images/
β”‚   β”‚   └── handbook.md
β”‚   └── raw/
β”‚       └── handbook.pdf
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ chat_retrieve.py
β”‚   β”œβ”€β”€ cli_app.py
β”‚   β”œβ”€β”€ prep_data.py
β”‚   └── scripts/
β”‚       β”œβ”€β”€ chunker.py
β”‚       β”œβ”€β”€ answer.py
β”‚       β”œβ”€β”€ clean_md.py
β”‚       β”œβ”€β”€ memory.py
β”‚       β”œβ”€β”€ pdf_to_md.py
β”‚       β”œβ”€β”€ prompt.py
β”‚       β”œβ”€β”€ trim_history.py
β”‚       β”œβ”€β”€ embedding.py
β”‚       β”œβ”€β”€ retrieval.py
β”‚       β”œβ”€β”€ decision.py
β”‚       └── token_length.py
β”œβ”€β”€ README.md
└── requirements.txt
  

Gear Setup

  1. Clone the repository
git clone https://github.com/Sfeso13/handbook_assistant.git
cd handbook_assistant
  1. Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt

Make sure ollama is installed and running locally.

  1. Pull required models
ollama pull qwen3:4b-instruct-2507-q4_K_M
ollama pull sanruss/qwen3-2b-rag
ollama pull nomic-embed-text-v2-moe

Abacus Usage

  1. Start the assistant
python3 src/cli_app.py            #for general use
python3 src/cli_app.py --debug    #for printing debug messages
  1. Query to your heart contents

About

RAG-based chatbot for querying 42 / 1337 related rules and informations. Combines Nomic embeddings + FAISS retrieval with LLM generation for context-aware answers and conversational memory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages