Date: 2025-09-03
This project is a local Insights Assistant. You ingest PDF/CSV files into a vector database (Chroma) via a FastAPI server. A client (CLI or Streamlit) searches those embeddings and then calls an LLM summarizer (Ollama or OpenAI) to produce a concise, cited answer grounded in retrieved snippets.
- Server (
mcp_server.py): exposes/tools/ingest_pdf,/tools/ingest_csv,/tools/search_docs. Uses LangChain loaders → splitter → embeddings → Chroma (persistent in./db). - Client library (
mcp_host.py): tiny HTTP wrapper with retries/backoff and friendly exceptions. - Clients:
- CLI (
client_app.py):ingest-*andask. - Streamlit (
streamlit_app.py): tabs for Ingest and Ask + preview panel.
- CLI (
- Summarizer (
summarizer.py): calls Ollama (local) or OpenAI to turn top-k snippets into a clean, cited answer. (Required in this setup.)
ASCII = r""" +------+ +--------------------+ +-------------------------+ |User |--uses---> | Streamlit UI | | CLI Client | +------+ | (streamlit_app.py) | | (client_app.py) | +----------+---------+ +-----------+-------------+ \ / \ via MCP Host (HTTP client) / v v +--------+-------------------------------+ | MCP Host (mcp_host.py) | | Retries • Backoff • Friendly errors | +-------------------+--------------------+ | | POST /tools/* v +------------+-------------+ | FastAPI Server | | (mcp_server.py) | | ingest_pdf / ingest_csv | | search_docs | +------------+-------------+ | +-----------------------------+------------------------------+ | LangChain RAG pipeline (server-side) | | Loaders -> Splitter -> Embeddings -> Chroma (./db) | +-----------+------------+-------------+--------------------+ | | | | docs | chunks | vectors v v v [files] [chunks] [persisted index]
REQUIRED (client-side): +---------------------------------------------------------------+ | summarizer.py → Ollama/OpenAI (REST) | | Produces the ONLY answer shown to users (cited, grounded). | +---------------------------------------------------------------+ """.strip("\n")
- Python: 3.11+ (3.12 works; on 3.13 remove or place any
from __future__lines at the very top) - Pip:
fastapi,uvicorn[standard],langchain,langchain-community,chromadb,pypdf,sentence-transformers,python-dotenv,requests,streamlit,pandas - External: Ollama (
ollama serve;ollama pull llama3.1:8b) - Ports: Server 8799 • Streamlit 8501 • Ollama 11434
DB_DIR=./db EMBED_MODEL=sentence-transformers/all-MiniLM-L6-v2
SUMMARIZER_PROVIDER=ollama OLLAMA_URL=http://127.0.0.1:11434
OLLAMA_MODEL=llama3.1:8b
If using OpenAI instead: OPENAI_API_KEY=sk-... OPENAI_MODEL=gpt-4o-mini
Load in Python near the top:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
## Run Order
1) Start Ollama
ollama serve
ollama pull llama3.1:8b
2) Start the FastAPI server (in your .venv)
.\.venv\Scripts\Activate.ps1
python -m uvicorn mcp_server:app --host 127.0.0.1 --port 8799 --reload
3) Start a client
- Streamlit UI:
.\.venv\Scripts\Activate.ps1
python -m streamlit run streamlit_app.py # http://localhost:8501
- CLI:
.\.venv\Scripts\Activate.ps1
python client_app.py ingest-pdf .\data\doc.pdf
python client_app.py ask "Paste a phrase from your PDF"
## Sanity Checks
- where python; python -V
- curl http://127.0.0.1:8799/
- curl http://127.0.0.1:11434/api/tags
- python client_app.py ingest-pdf .\data\doc.pdf
- python client_app.py ask "your phrase"
## Common Issues (and fixes)
Cannot reach server → start uvicorn; test /; port 8799; firewall/port conflict (netstat -ano | findstr :8799)
Wrong Python (global vs .venv) → activate .venv or run .\.venv\Scripts\python -m ...
from __future__ SyntaxError → remove or move to file top (3.11+ doesn’t need it)
Few/zero chunks → short/scanned PDF; pre-OCR or OCR loader; reduce chunk size to 500/100
LLM summary missing (required) → ollama serve, ollama pull llama3.1:8b, set OLLAMA_* envs
File not found (400) → pass a correct absolute/relative path
Proxy interferes with localhost → NO_PROXY=127.0.0.1,localhost
---