Policy Agentic RAG (Base)

See also: docs/REQUIREMENTS.md for a friendly overview, architecture diagram, and quick commands.

Architecture details: docs/ARCHITECTURE.md

A minimal, production-ready skeleton for a company policy QA system using:

Chroma as vector store
Snowflake Arctic embeddings via Ollama (snowflake-arctic-embed:335m)
Local LLM via Ollama for orchestration
PII masking before any LLM calls
Guard + Orchestrator agents

Structure

src/rag/ packages for agents, chunking, embeddings, vector store, LLM client, PII, prompts
data/ input PDFs (already present)
app.py CLI entry point

Setup

Environment (PowerShell):

python -m venv .venv
. .venv/Scripts/Activate.ps1
pip install -r requirements.txt
cp .env.example .env

Install and run Ollama, then pull models:

winget install Ollama.Ollama
ollama pull snowflake-arctic-embed:335m
ollama pull llama3.1:8b

Optional configuration:

$Env:OLLAMA_HOST="http://127.0.0.1:11434"
$Env:OLLAMA_EMBED_MODEL="snowflake-arctic-embed:335m"
$Env:OLLAMA_CHAT_MODEL="llama3.1:8b"
$Env:CHROMA_DIR=".chroma"

Run Ollama on a custom port (if 11434 is slow/blocked):

# Example: run on 11435
$Env:OLLAMA_HOST = "127.0.0.1:11435"   # scheme optional; defaults to http://
ollama serve --host 127.0.0.1 --port 11435

# In another terminal (same session or set OLLAMA_HOST again if new window)
ollama pull snowflake-arctic-embed:335m
ollama pull llama3.1:8b

The app reads OLLAMA_HOST for both embeddings and chat.

Index PDFs

python app.py index

Ask Questions

python app.py query "What is our sickness absence policy?"

Outputs an answer and a Sources list with filenames and scores.

Notes

PII masking is applied to user inputs and LLM payloads.
Chunking: default HYBRID_TOK (headings + sentence packing token-aware). Configure via env:
- $Env:CHUNK_MODE="HYBRID_TOK" (or HYBRID, HEADING)
- $Env:CHUNK_MAX_TOKENS="500"
- $Env:CHUNK_OVERLAP_TOKENS="60"
- $Env:CHUNK_MAX_CHARS="1200"
- $Env:CHUNK_OVERLAP="150" Tunables are read by IndexPipeline.
Prompts live under src/rag/prompts/.

Models in Ollama

Embeddings: snowflake-arctic-embed:335m
Chat: choose any local model, default llama3.1:8b

Run API server

uvicorn src.rag.api:app --host 0.0.0.0 --port 8000 --reload

POST /index → { chunks_indexed }
POST /query { query, k?, temperature? } → { answer, citations, guardrail_flags }

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
scripts		scripts
src/rag		src/rag
test		test
.gitignore		.gitignore
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Policy Agentic RAG (Base)

Structure

Setup

Index PDFs

Ask Questions

Notes

Models in Ollama

Run API server

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

waleedbin20/AI-Policy-RAG

Folders and files

Latest commit

History

Repository files navigation

Policy Agentic RAG (Base)

Structure

Setup

Index PDFs

Ask Questions

Notes

Models in Ollama

Run API server

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages