This repository contains a fully runnable local RAG chatbot demo. It runs entirely locally:
- Embeddings:
sentence-transformers(all-MiniLM-L6-v2) - Vector DB:
chromadb(local, persistent) - Orchestration:
langchain - UI:
streamlit - LLM: Preferably Ollama (local runner) via CLI if you installed it. If Ollama is NOT available, the code automatically falls back to a local
transformerstext-generation model.
A privacy-first contextual chatbot that ingests documents, creates embeddings, stores them in Chroma, and answers user queries by retrieving relevant chunks and generating grounded answers.
- Python 3.10+
- git
- (Optional but recommended) GPU + CUDA for local transformer model acceleration
Ollama is a local MLC/LLM runner. It is optional — if present, the system uses it via the ollama CLI for low-latency local generation.
Follow instructions: https://ollama.com (install the Ollama app / CLI and pull a model, for example ollama pull mistral).
If you don't install Ollama, the code will automatically use a local
transformersmodel as fallback.
git clone <your-repo-url>
cd contextual-chatbot-real
python -m venv .venv
source .venv/bin/activate # Windows: .\.venv\Scripts\activate
pip install -r requirements.txtpython ingest.pystreamlit run streamlit_app.pyOpen http://localhost:8501 in your browser.
streamlit_app.py— Streamlit UI for chatrag_pipeline.py— Full pipeline: embedding, Chroma, retriever, LLM wrapper with Ollama CLI + fallback to Transformersingest.py— Ingest sample docs into Chroma DB with recursive chunkingrequirements.txt— exact Python packagesprompt_templates/default_prompt.mddata/sample_docs/sample.md— small demo documentdocs/images/architecture.png&docs/images/techstack.png— simple generated PNGs.gitignore,LICENSE,notebooks/demo.ipynb
This repo uses a simple, robust strategy:
- It first tries to call the
ollamaCLI (e.g.,ollama run <model> --prompt "<text>") usingsubprocessand will use its stdout as the response. - If the
ollamaCLI is not available, it falls back to a localtransformerstext-generation pipeline (Hugging Face) withgpt2-style models or any other local model you have downloaded. This ensures the repo is immediately runnable without any external paid APIs.