Skip to content

waleedbin20/AI-Policy-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Policy Agentic RAG (Base)

See also: docs/REQUIREMENTS.md for a friendly overview, architecture diagram, and quick commands.

Architecture details: docs/ARCHITECTURE.md

A minimal, production-ready skeleton for a company policy QA system using:

  • Chroma as vector store
  • Snowflake Arctic embeddings via Ollama (snowflake-arctic-embed:335m)
  • Local LLM via Ollama for orchestration
  • PII masking before any LLM calls
  • Guard + Orchestrator agents

Structure

  • src/rag/ packages for agents, chunking, embeddings, vector store, LLM client, PII, prompts
  • data/ input PDFs (already present)
  • app.py CLI entry point

Setup

Environment (PowerShell):

python -m venv .venv
. .venv/Scripts/Activate.ps1
pip install -r requirements.txt
cp .env.example .env

Install and run Ollama, then pull models:

winget install Ollama.Ollama
ollama pull snowflake-arctic-embed:335m
ollama pull llama3.1:8b

Optional configuration:

$Env:OLLAMA_HOST="http://127.0.0.1:11434"
$Env:OLLAMA_EMBED_MODEL="snowflake-arctic-embed:335m"
$Env:OLLAMA_CHAT_MODEL="llama3.1:8b"
$Env:CHROMA_DIR=".chroma"

Run Ollama on a custom port (if 11434 is slow/blocked):

# Example: run on 11435
$Env:OLLAMA_HOST = "127.0.0.1:11435"   # scheme optional; defaults to http://
ollama serve --host 127.0.0.1 --port 11435

# In another terminal (same session or set OLLAMA_HOST again if new window)
ollama pull snowflake-arctic-embed:335m
ollama pull llama3.1:8b

The app reads OLLAMA_HOST for both embeddings and chat.

Index PDFs

python app.py index

Ask Questions

python app.py query "What is our sickness absence policy?"

Outputs an answer and a Sources list with filenames and scores.

Notes

  • PII masking is applied to user inputs and LLM payloads.
  • Chunking: default HYBRID_TOK (headings + sentence packing token-aware). Configure via env:
    • $Env:CHUNK_MODE="HYBRID_TOK" (or HYBRID, HEADING)
    • $Env:CHUNK_MAX_TOKENS="500"
    • $Env:CHUNK_OVERLAP_TOKENS="60"
    • $Env:CHUNK_MAX_CHARS="1200"
    • $Env:CHUNK_OVERLAP="150" Tunables are read by IndexPipeline.
  • Prompts live under src/rag/prompts/.

Models in Ollama

  • Embeddings: snowflake-arctic-embed:335m
  • Chat: choose any local model, default llama3.1:8b

Run API server

uvicorn src.rag.api:app --host 0.0.0.0 --port 8000 --reload
  • POST /index → { chunks_indexed }
  • POST /query { query, k?, temperature? } → { answer, citations, guardrail_flags }

About

This is a policy agentic rag system which is customized to run locally

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages