Claude Agent RAG

Author: Gregory Zemskov (head@extractum.io)

A RAG (Retrieval-Augmented Generation) agent powered by Claude Agent SDK. This project demonstrates how to build an AI agent that can search and retrieve information from a document knowledge base to answer questions.

Features

Claude Agent SDK Integration: Uses the official Claude Agent SDK for agentic interactions
RAG Search Skill: Custom skill for searching document knowledge bases with boolean query support
Multi-Format Support: Search across text, Markdown, Office documents (DOCX), PDF, Excel, and more
Session Management: Full logging of agent sessions with JSONL conversation history
Configurable Models: Support for different Claude models

Project Structure

in_file_rag/
├── agent_rag.py              # Main agent script
├── task.md                   # Task description for the agent
├── requirements.txt          # Python dependencies
├── data/
│   └── knowledge_base/       # Your document collection (DOCX, PDF, TXT, etc.)
├── skills/
│   └── rag-search-chunks-in-files/
│       ├── SKILL.md          # Skill documentation
│       ├── ugp-rag           # RAG search tool (Python script)
│       ├── boolean_parser.py # Boolean query parser
│       └── prerequisites.md  # Required system tools
├── sessions/                 # Generated session data
└── logs/                     # Agent execution logs

Requirements

Python

Python 3.10+
Dependencies listed in requirements.txt

System Tools (for RAG search)

The RAG search skill requires additional system tools depending on your document types:

Tool	Package	Required For
`ugrep`	homebrew/apt	Core search (required)
`pandoc`	homebrew/apt	DOCX, ODT, RTF, EPUB
`pdftotext`	poppler	PDF files
`xlsx2csv`	pip	Excel files

See skills/rag-search-chunks-in-files/prerequisites.md for detailed installation instructions.

Installation

Clone the repository:

git clone https://github.com/yourusername/in_file_rag.git
cd in_file_rag

Create virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:
```
pip install -r requirements.txt
```
Install system tools (macOS):
```
brew install ugrep pandoc poppler
```

Configure API key:

cp .env.example .env
# Edit .env and add your Anthropic API key

Add documents to knowledge base:

mkdir -p data/knowledge_base
# Copy your documents (DOCX, PDF, TXT, MD, etc.) to data/knowledge_base/

Usage

Basic Usage

Edit task.md with your task, then run:

python agent_rag.py

Command Line Options

# Run with default task file (task.md)
python agent_rag.py

# Run with a specific task file
python agent_rag.py --task-file my_task.md

# Run with inline task text
python agent_rag.py --task "Search for fire safety regulations in the knowledge base"

# Use a specific model
python agent_rag.py --model claude-sonnet-4-5

# Specify session ID
python agent_rag.py --session-id my-session-001

Example Tasks

# Example task.md

Search the knowledge base for information about accessibility requirements
for parking spaces. Summarize the key requirements and cite the source documents.

RAG Search Skill

The agent has access to a powerful RAG search skill that can:

Search across multiple document formats
Use boolean queries (AND, OR, NOT)
Support wildcard patterns (e.g., accessib*)
Return structured chunks with context

Direct Tool Usage

# Simple search
./skills/rag-search-chunks-in-files/ugp-rag "fire safety" data/knowledge_base/

# Multiple patterns (OR)
./skills/rag-search-chunks-in-files/ugp-rag -e "accessib*" -e "disab*" data/knowledge_base/

# Boolean query
./skills/rag-search-chunks-in-files/ugp-rag -q '"fire safety" AND "egress"' data/knowledge_base/

Configuration

Agent Options

Edit constants in agent_rag.py:

DEFAULT_MAX_TURNS = 50          # Maximum conversation turns
DEFAULT_MAX_BUDGET_USD = 10.0   # Budget limit in USD
DEFAULT_MODEL = "claude-sonnet-4-5"
DEFAULT_PERMISSION_MODE = "acceptEdits"  # Auto-accept file edits

Permission Modes

Mode	Description
`default`	CLI prompts for dangerous tools
`acceptEdits`	Auto-accept file edits
`plan`	Planning mode - no execution
`bypassPermissions`	Allow all tools (use with caution)

Output

Session Logs

Each session creates:

sessions/{session_id}/conversation.jsonl - Structured conversation log
logs/{session_id}.log - Human-readable text log

JSONL Format

{
  "timestamp": "2024-01-15T10:30:00.000000",
  "type": "AssistantMessage",
  "content": "I found 3 relevant documents...",
  "model": "claude-sonnet-4-5",
  "content_blocks": [...]
}

Development

Project Rules

The .cursor/rules/ directory contains development guidelines:

Python standards
Type annotations
Logging configuration
Error handling patterns

Adding New Skills

Create a new directory under skills/
Add a SKILL.md with skill documentation
Implement the skill tool
Add prerequisite documentation if needed

Troubleshooting

"ANTHROPIC_API_KEY not found"

Ensure your .env file contains a valid API key:

ANTHROPIC_API_KEY=sk-ant-api03-...

"ugrep not found"

Install ugrep for RAG search functionality:

# macOS
brew install ugrep

# Ubuntu/Debian
sudo apt-get install ugrep

Documents not being searched

Check that required conversion tools are installed. See skills/rag-search-chunks-in-files/prerequisites.md.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please read the development guidelines in .cursor/rules/ before submitting PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Agent RAG

Features

Project Structure

Requirements

Python

System Tools (for RAG search)

Installation

Usage

Basic Usage

Command Line Options

Example Tasks

RAG Search Skill

Direct Tool Usage

Configuration

Agent Options

Permission Modes

Output

Session Logs

JSONL Format

Development

Project Rules

Adding New Skills

Troubleshooting

"ANTHROPIC_API_KEY not found"

"ugrep not found"

Documents not being searched

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cursor/rules		.cursor/rules
data/knowledge_base		data/knowledge_base
skills/rag-search-chunks-in-files		skills/rag-search-chunks-in-files
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent_rag.py		agent_rag.py
requirements.txt		requirements.txt
task.md		task.md

License

extractumio/file-based-rag-agent

Folders and files

Latest commit

History

Repository files navigation

Claude Agent RAG

Features

Project Structure

Requirements

Python

System Tools (for RAG search)

Installation

Usage

Basic Usage

Command Line Options

Example Tasks

RAG Search Skill

Direct Tool Usage

Configuration

Agent Options

Permission Modes

Output

Session Logs

JSONL Format

Development

Project Rules

Adding New Skills

Troubleshooting

"ANTHROPIC_API_KEY not found"

"ugrep not found"

Documents not being searched

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages