AGENTS.md

This document is the primary engineering guide for autonomous coding agents working in the UltraRAG repository.

Use this file as the source of truth for architecture, conventions, workflows, and safe change patterns. If CLAUDE.md exists, it should only point to this file.

1) Project Identity

UltraRAG is a lightweight RAG framework built around the Model Context Protocol (MCP). The key design choice is strict modularization: retrieval, prompting, generation, routing, memory, and evaluation are implemented as independent MCP servers orchestrated by YAML pipelines.

Current core metadata:

Package: ultrarag
Version: 0.3.0
Python: >=3.11, <3.13
CLI entrypoint: ultrarag = ultrarag.client:main
Package manager: uv ([tool.uv] package = true)

2) Repository Map (What Matters Most)

UltraRAG/
├── src/ultrarag/                    # Installable core package
│   ├── client.py                    # CLI + pipeline engine + run/build orchestration
│   ├── server.py                    # UltraRAG_MCP_Server (FastMCP extension)
│   ├── api.py                       # Python API wrappers (ToolCall, PipelineCall)
│   ├── cli.py                       # Rich banner and CLI visuals
│   ├── mcp_logging.py               # Central logging setup
│   ├── mcp_exceptions.py            # Node.js checks for remote MCP
│   └── utils.py                     # Subprocess lifecycle helpers
│
├── servers/                         # MCP microservices (each server is independent)
│   ├── retriever/
│   ├── generation/
│   ├── prompt/
│   ├── reranker/
│   ├── benchmark/
│   ├── evaluation/
│   ├── corpus/
│   ├── memory/
│   ├── router/
│   ├── custom/
│   ├── pageindex/
│   └── sayhello/
│
├── examples/
│   ├── demos/                       # UI-ready demo pipelines
│   └── experiments/                 # Experiment/research pipelines
│
├── ui/
│   ├── backend/                     # Flask backend + pipeline manager
│   └── frontend/                    # Vite + React + TypeScript
│
├── docs/                            # Docs and assets
├── script/                          # Utility scripts (deploy, case study, etc.)
├── pyproject.toml                   # Dependencies + package metadata
├── uv.lock                          # Locked dependency graph
├── Dockerfile*                      # Container variants
└── .gitignore

Important generated/derived files:

servers/*/server.yaml (generated by each server build tool)
examples/**/parameter/*_parameter.yaml (pipeline-merged parameters)
examples/**/server/*_server.yaml (pipeline-merged server config)
output/memory_*.json (per-run memory snapshots)

3) Mental Model of the System

Think of UltraRAG as a three-layer system:

Interface layer: CLI (ultrarag ...), UI (ultrarag show ui), and Python API (ToolCall, PipelineCall)
Orchestration layer: src/ultrarag/client.py (build, load_pipeline_context, execute_pipeline)
Execution layer: MCP servers in servers/*, each exposing tools/prompts over stdio (or remote MCP proxy)

The runtime contract is:

A pipeline YAML declares which servers to use and which steps to execute.
The client resolves I/O dependencies between steps.
Each step calls exactly one MCP tool or prompt.
Outputs are saved to a shared variable pool and can feed downstream steps.

4) Two-Phase Execution Lifecycle

Phase A: Build

Command:

ultrarag build <pipeline.yaml>

What happens:

Reads servers: from the pipeline YAML.
For each referenced server, calls the server's build tool.
Produces:
- <pipeline_dir>/parameter/<pipeline_name>_parameter.yaml
- <pipeline_dir>/server/<pipeline_name>_server.yaml

Why this matters:

build materializes exact tool/prompt I/O metadata before runtime.
UI and runner rely on these generated artifacts for deterministic execution.

Phase B: Run

Command:

ultrarag run <pipeline.yaml> [--param path] [--is_demo]

What happens:

Loads generated server config + parameter config.
Creates fastmcp.Client transport config for each server.
Executes pipeline steps in order, including loop and branch.
Saves intermediate memory snapshots and writes output/memory_*.json.
Invokes cleanup tools (e.g., tools ending with vllm_shutdown) if present.

5) Pipeline DSL Reference

UltraRAG accepts mixed step forms:

5.1 Plain step

- retriever.retriever_search

5.2 Step with input/output remapping

- generation.generate:
    input:
      prompt_ls: custom_prompt_ls
    output:
      ans_ls: final_answer_ls

5.3 Loop block

- loop:
    times: 3
    steps:
    - retriever.retriever_search
    - generation.generate

5.4 Branch block

- branch:
    router:
    - router.route_query
    branches:
      need_retrieval:
      - retriever.retriever_search
      direct_answer:
      - generation.generate

5.5 Prompt vs Tool step semantics

Steps under the prompt server call client.get_prompt(...).
Non-prompt steps call client.call_tool(...).

6) Variable Resolution and Data Flow Rules

In UltraData, each tool input value is interpreted by convention:

"$foo" -> load from server-local params (parameter.yaml)
"bar" -> read from global variable pool (global_vars["bar"])
"memory_xxx" -> read/write memory lists

Output handling:

Tool returns JSON payload -> keys mapped to declared outputs
Output remapping (output: in pipeline step) is applied at save time
Prompt outputs usually produce prompt_ls

Branch handling:

Branch-aware values use wrapped list records with branch-state keys.
Internal sentinel (UNSET) is used to distinguish "not filled yet" from None.

Memory handling:

The engine tracks memory_* histories automatically.
Final snapshots are serialized to output/memory_<...>.json.
If a memory server is detected, turn-level memory auto-save can be triggered.

7) Core Python Modules (Authoritative Guide)

`src/ultrarag/client.py`

This is the orchestration heart of the project.

Key responsibilities:

CLI entrypoint (main)
UI launch (launch_ui) and case-study launch (launch_case_study)
Config loading (Configuration)
Pipeline data graph and state (UltraData)
Build pipeline configs (build)
Load runtime context (load_pipeline_context)
Execute step engine (execute_pipeline)
Run full pipeline (run)

Important runtime behaviors:

Supports both local python MCP servers (path.endswith(".py")) and remote MCP endpoints (http(s)).
For remote MCP, requires Node.js >= 20 and uses npx -y mcp-remote <url>.
Keeps loop-termination state in ContextVar for coroutine safety.
Emits structured stream events in demo/UI flows (step_start, step_end, token, sources).

`src/ultrarag/server.py`

Defines UltraRAG_MCP_Server, a compatibility wrapper over FastMCP.

Key responsibilities:

Enhanced tool() and prompt() registration with output metadata support
Metadata capture for automatic config generation
build(parameter_file) to generate per-server server.yaml
Compatibility filtering for FastMCP signature differences

`src/ultrarag/api.py`

Provides ergonomic Python-side wrappers:

initialize(servers, server_root, log_level)
ToolCall.server_name.tool_name(...)
PipelineCall(pipeline_file, parameter_file, log_level)

`src/ultrarag/mcp_logging.py`

Initializes root logger UltraRAG
Rich console logging + file logging (logs/<timestamp>.log)
Log level controlled by log_level argument and environment

`src/ultrarag/mcp_exceptions.py`

Validates local Node.js availability/version
Raises NodeNotInstalledError / NodeVersionTooLowError

`src/ultrarag/utils.py`

Subprocess lifecycle helpers
POSIX parent-death signal support
Windows job object support for child-process cleanup

8) MCP Server Authoring Contract

Each server follows this shape:

servers/<name>/
├── parameter.yaml
├── server.yaml            # generated
└── src/<name>.py

8.1 Registration styles

Use either:

Decorator style
Class-bound method registration style

Both are valid in this codebase.

8.2 `output=` grammar

Canonical form:

input1,input2,$param_a -> output1,output2

Rules:

Left side maps function args to pipeline inputs.
Right side defines expected output keys from returned dict/JSON.
-> None means no output variables.
$param means value comes from server parameter file.

8.3 Return payload expectations

For tools: return JSON-serializable dict payloads matching declared output keys.
For prompts: return prompt messages (typically list-like prompt payloads consumed by get_prompt).

8.4 Entrypoint requirement

Server modules should end with:

if __name__ == "__main__":
    app.run(transport="stdio")

9) Retriever/Generation/Prompt Specific Notes

Retriever (`servers/retriever`)

Supports multiple retrieval modes:
- Dense retrieval
- BM25
- Web search
- Project-memory retrieval
Index backends are pluggable via factory:
- faiss
- milvus
Web search backends are pluggable via factory:
- exa
- tavily
- zhipuai

Generation (`servers/generation`)

Supports generation backends including openai, vllm, and hf workflows.
Provides explicit cleanup tool vllm_shutdown.
Demo mode can use local streaming generation service.

Prompt (`servers/prompt`)

Uses SandboxedEnvironment from Jinja2 for safer rendering.
Validates template paths to reduce traversal risk.
Escapes string inputs before template rendering.

10) UI Backend Architecture (`ui/backend`)

`app.py`

Flask app factory (create_app)
Serves frontend static assets
Exposes chat/pipeline/KB/auth endpoints
Reads optional frontend override via ULTRARAG_FRONTEND_DIR

`pipeline_manager.py`

This module is large and central to UI behavior.

Major responsibilities:

Session lifecycle and streaming chat management
Background chat task management
Pipeline CRUD (list/load/save/rename/delete)
Parameter load/save/build wrappers
Knowledge base file ingest and pipeline triggering
Memory synchronization to per-user KB collections
Optional server introspection via AST stub generation if server.yaml is missing

Notable behavior:

Applies defensive patches to suppress noisy closed-event-loop teardown logs.
Uses a queue bridge for async-to-sync SSE event streaming.
Supports automatic memory-to-KB sync for pipelines that include memory components.

11) Storage Model and Paths

Default UI storage root:

ui/storage

Can be overridden by:

ULTRARAG_UI_STORAGE_ROOT

Key subpaths:

db/users.sqlite3
chat_sessions/
knowledge_base/raw|corpus|chunks|index
memory/
knowledge_base/_memory_sync

Runtime outputs:

output/memory_*.json
logs/*.log

12) Environment Variables You Should Know

ULTRARAG_UI_STORAGE_ROOT: override UI storage root
ULTRARAG_FRONTEND_DIR: override frontend static directory
ULTRARAG_SESSION_TIMEOUT: foreground chat session timeout
ULTRARAG_BG_SESSION_TIMEOUT: background session timeout
ULTRARAG_LOG_TS: custom timestamp seed for log file naming
log_level: consumed by core logger initialization

13) Dependency Model

Install tiers from pyproject.toml:

Core install: no extras
retriever extra
generation extra
evaluation extra
corpus extra
all extra (union)

Typical commands:

uv sync
uv sync --extra retriever
uv sync --extra generation
uv sync --all-extras

Development dependencies include:

ruff
ipython
jupyter
pytest

14) CLI Commands (Canonical)

ultrarag build <pipeline.yaml>
ultrarag run <pipeline.yaml> [--param <parameter.yaml>] [--log_level info|debug|warn|error] [--is_demo]
ultrarag show ui [--host 127.0.0.1] [--port 5050]
ultrarag show case [--config_path <memory.json>] [--host 127.0.0.1] [--port 8080]

Minimal smoke check:

ultrarag run examples/experiments/sayhello.yaml

15) Docker Variants

Dockerfile: full image (builds frontend, installs all extras)
Dockerfile.base-cpu: CPU base image
Dockerfile.base-gpu: GPU base image

All variants start UI with:

ultrarag show ui --port 5050 --host 0.0.0.0

16) Development Playbooks

16.1 Add a new MCP server

Create servers/<name>/parameter.yaml
Implement servers/<name>/src/<name>.py
Register tools/prompts via app.tool / app.prompt (or class-bound registration)
Ensure app.run(transport="stdio") exists
Add server to servers: in a pipeline YAML
Run ultrarag build <pipeline.yaml>

16.2 Add a new tool to an existing server

Implement function/method
Register with explicit output=... contract
Ensure return payload keys match outputs
Update relevant parameter keys in parameter.yaml if using $...
Add the step in pipeline YAML and rebuild

16.3 Add a retriever index backend

Implement backend in servers/retriever/src/index_backends/
Follow BaseIndexBackend contract
Register in _INDEX_BACKENDS map in index_backends/__init__.py

16.4 Add a web-search backend

Implement backend in servers/retriever/src/websearch_backends/
Follow BaseWebSearchBackend contract
Register in _WEBSEARCH_BACKENDS map

16.5 Modify UI pipeline behavior

Primary files:

ui/backend/app.py
ui/backend/pipeline_manager.py

If touching build/runtime semantics, cross-check against:

src/ultrarag/client.py

17) Coding Standards (Repository-Conformant)

Use type hints for function signatures.
Prefer pathlib.Path for filesystem paths.
Use yaml.safe_load / yaml.safe_dump.
Use project logger (get_logger or app.logger) instead of print.
Keep tool outputs deterministic and JSON-serializable.
Keep imports grouped: stdlib -> third-party -> local.
Keep async boundaries explicit (async/await).

18) What Not To Edit Blindly

Treat these as generated or runtime artifacts unless intentionally regenerating:

servers/*/server.yaml
examples/**/parameter/*_parameter.yaml
examples/**/server/*_server.yaml
output/*
logs/*
ui/storage/* runtime data

Also avoid committing secrets:

.env
any credential-bearing local config

19) Common Failure Modes and Fixes

Error: server file not found

Check servers.<name> path in pipeline YAML.
Ensure server entry script exists at servers/<name>/src/<name>.py (or update path in config).

Error: missing variable in pipeline execution

Verify output key names from upstream tool match downstream input names.
Verify remapping under step-level output: is correct.
Verify $param keys exist in that server's parameter config.

Remote MCP server fails to start

Ensure Node.js >= 20.
Confirm npx is available.
Confirm remote URL in server path is reachable.

Build succeeds but UI cannot list tools

Ensure server.yaml exists or can be inferred by AST parsing.
Check for unusual dynamic registration patterns that static analysis cannot infer.

No final answer in chat

Inspect output/memory_*.json.
Check whether generation step ran and produced ans_ls.
Check stream events in UI path (step_start, step_end, sources, token, final).

20) Validation Checklist for Agents

Before finalizing any non-trivial change:

Build the affected pipeline:
- ultrarag build <pipeline.yaml>
Run a relevant smoke case:
- ultrarag run <pipeline.yaml>
If UI behavior changed:
- run ultrarag show ui and verify route behavior
If retriever/generation backends changed:
- validate parameter schema keys and output names
Keep generated artifacts intentional:
- do not accidentally commit transient runtime outputs

21) Minimal Quickstart for New Agents

# 1) Install dependencies
uv sync --all-extras

# 2) Smoke test
ultrarag run examples/experiments/sayhello.yaml

# 3) Build and run a demo pipeline
ultrarag build examples/demos/LLM.yaml
ultrarag run examples/demos/LLM.yaml

# 4) Launch UI
ultrarag show ui --host 127.0.0.1 --port 5050

22) Final Notes

This repository is orchestration-first: correctness depends heavily on I/O naming consistency across tools and pipeline steps.
Most regressions come from mismatched variable names, stale generated configs, or incomplete parameter updates.
When in doubt, inspect src/ultrarag/client.py first: it is the execution truth.

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

1) Project Identity

2) Repository Map (What Matters Most)

3) Mental Model of the System

4) Two-Phase Execution Lifecycle

Phase A: Build

Phase B: Run

5) Pipeline DSL Reference

5.1 Plain step

5.2 Step with input/output remapping

5.3 Loop block

5.4 Branch block

5.5 Prompt vs Tool step semantics

6) Variable Resolution and Data Flow Rules

7) Core Python Modules (Authoritative Guide)

src/ultrarag/client.py

src/ultrarag/server.py

src/ultrarag/api.py

src/ultrarag/mcp_logging.py

src/ultrarag/mcp_exceptions.py

src/ultrarag/utils.py

8) MCP Server Authoring Contract

8.1 Registration styles

8.2 output= grammar

8.3 Return payload expectations

8.4 Entrypoint requirement

9) Retriever/Generation/Prompt Specific Notes

Retriever (servers/retriever)

Generation (servers/generation)

Prompt (servers/prompt)

10) UI Backend Architecture (ui/backend)

app.py

pipeline_manager.py

11) Storage Model and Paths

12) Environment Variables You Should Know

13) Dependency Model

14) CLI Commands (Canonical)

15) Docker Variants

16) Development Playbooks

16.1 Add a new MCP server

16.2 Add a new tool to an existing server

16.3 Add a retriever index backend

16.4 Add a web-search backend

16.5 Modify UI pipeline behavior

17) Coding Standards (Repository-Conformant)

18) What Not To Edit Blindly

19) Common Failure Modes and Fixes

Error: server file not found

Error: missing variable in pipeline execution

Remote MCP server fails to start

Build succeeds but UI cannot list tools

No final answer in chat

20) Validation Checklist for Agents

21) Minimal Quickstart for New Agents

22) Final Notes

`src/ultrarag/client.py`

`src/ultrarag/server.py`

`src/ultrarag/api.py`

`src/ultrarag/mcp_logging.py`

`src/ultrarag/mcp_exceptions.py`

`src/ultrarag/utils.py`

8.2 `output=` grammar

Retriever (`servers/retriever`)

Generation (`servers/generation`)

Prompt (`servers/prompt`)

10) UI Backend Architecture (`ui/backend`)

`app.py`

`pipeline_manager.py`