Skip to content

Latest commit

 

History

History
629 lines (437 loc) · 16.7 KB

File metadata and controls

629 lines (437 loc) · 16.7 KB

AGENTS.md

This document is the primary engineering guide for autonomous coding agents working in the UltraRAG repository.

Use this file as the source of truth for architecture, conventions, workflows, and safe change patterns. If CLAUDE.md exists, it should only point to this file.


1) Project Identity

UltraRAG is a lightweight RAG framework built around the Model Context Protocol (MCP). The key design choice is strict modularization: retrieval, prompting, generation, routing, memory, and evaluation are implemented as independent MCP servers orchestrated by YAML pipelines.

Current core metadata:

  • Package: ultrarag
  • Version: 0.3.0
  • Python: >=3.11, <3.13
  • CLI entrypoint: ultrarag = ultrarag.client:main
  • Package manager: uv ([tool.uv] package = true)

2) Repository Map (What Matters Most)

UltraRAG/
├── src/ultrarag/                    # Installable core package
│   ├── client.py                    # CLI + pipeline engine + run/build orchestration
│   ├── server.py                    # UltraRAG_MCP_Server (FastMCP extension)
│   ├── api.py                       # Python API wrappers (ToolCall, PipelineCall)
│   ├── cli.py                       # Rich banner and CLI visuals
│   ├── mcp_logging.py               # Central logging setup
│   ├── mcp_exceptions.py            # Node.js checks for remote MCP
│   └── utils.py                     # Subprocess lifecycle helpers
│
├── servers/                         # MCP microservices (each server is independent)
│   ├── retriever/
│   ├── generation/
│   ├── prompt/
│   ├── reranker/
│   ├── benchmark/
│   ├── evaluation/
│   ├── corpus/
│   ├── memory/
│   ├── router/
│   ├── custom/
│   ├── pageindex/
│   └── sayhello/
│
├── examples/
│   ├── demos/                       # UI-ready demo pipelines
│   └── experiments/                 # Experiment/research pipelines
│
├── ui/
│   ├── backend/                     # Flask backend + pipeline manager
│   └── frontend/                    # Vite + React + TypeScript
│
├── docs/                            # Docs and assets
├── script/                          # Utility scripts (deploy, case study, etc.)
├── pyproject.toml                   # Dependencies + package metadata
├── uv.lock                          # Locked dependency graph
├── Dockerfile*                      # Container variants
└── .gitignore

Important generated/derived files:

  • servers/*/server.yaml (generated by each server build tool)
  • examples/**/parameter/*_parameter.yaml (pipeline-merged parameters)
  • examples/**/server/*_server.yaml (pipeline-merged server config)
  • output/memory_*.json (per-run memory snapshots)

3) Mental Model of the System

Think of UltraRAG as a three-layer system:

  1. Interface layer: CLI (ultrarag ...), UI (ultrarag show ui), and Python API (ToolCall, PipelineCall)
  2. Orchestration layer: src/ultrarag/client.py (build, load_pipeline_context, execute_pipeline)
  3. Execution layer: MCP servers in servers/*, each exposing tools/prompts over stdio (or remote MCP proxy)

The runtime contract is:

  • A pipeline YAML declares which servers to use and which steps to execute.
  • The client resolves I/O dependencies between steps.
  • Each step calls exactly one MCP tool or prompt.
  • Outputs are saved to a shared variable pool and can feed downstream steps.

4) Two-Phase Execution Lifecycle

Phase A: Build

Command:

ultrarag build <pipeline.yaml>

What happens:

  • Reads servers: from the pipeline YAML.
  • For each referenced server, calls the server's build tool.
  • Produces:
    • <pipeline_dir>/parameter/<pipeline_name>_parameter.yaml
    • <pipeline_dir>/server/<pipeline_name>_server.yaml

Why this matters:

  • build materializes exact tool/prompt I/O metadata before runtime.
  • UI and runner rely on these generated artifacts for deterministic execution.

Phase B: Run

Command:

ultrarag run <pipeline.yaml> [--param path] [--is_demo]

What happens:

  • Loads generated server config + parameter config.
  • Creates fastmcp.Client transport config for each server.
  • Executes pipeline steps in order, including loop and branch.
  • Saves intermediate memory snapshots and writes output/memory_*.json.
  • Invokes cleanup tools (e.g., tools ending with vllm_shutdown) if present.

5) Pipeline DSL Reference

UltraRAG accepts mixed step forms:

5.1 Plain step

- retriever.retriever_search

5.2 Step with input/output remapping

- generation.generate:
    input:
      prompt_ls: custom_prompt_ls
    output:
      ans_ls: final_answer_ls

5.3 Loop block

- loop:
    times: 3
    steps:
    - retriever.retriever_search
    - generation.generate

5.4 Branch block

- branch:
    router:
    - router.route_query
    branches:
      need_retrieval:
      - retriever.retriever_search
      direct_answer:
      - generation.generate

5.5 Prompt vs Tool step semantics

  • Steps under the prompt server call client.get_prompt(...).
  • Non-prompt steps call client.call_tool(...).

6) Variable Resolution and Data Flow Rules

In UltraData, each tool input value is interpreted by convention:

  • "$foo" -> load from server-local params (parameter.yaml)
  • "bar" -> read from global variable pool (global_vars["bar"])
  • "memory_xxx" -> read/write memory lists

Output handling:

  • Tool returns JSON payload -> keys mapped to declared outputs
  • Output remapping (output: in pipeline step) is applied at save time
  • Prompt outputs usually produce prompt_ls

Branch handling:

  • Branch-aware values use wrapped list records with branch-state keys.
  • Internal sentinel (UNSET) is used to distinguish "not filled yet" from None.

Memory handling:

  • The engine tracks memory_* histories automatically.
  • Final snapshots are serialized to output/memory_<...>.json.
  • If a memory server is detected, turn-level memory auto-save can be triggered.

7) Core Python Modules (Authoritative Guide)

src/ultrarag/client.py

This is the orchestration heart of the project.

Key responsibilities:

  • CLI entrypoint (main)
  • UI launch (launch_ui) and case-study launch (launch_case_study)
  • Config loading (Configuration)
  • Pipeline data graph and state (UltraData)
  • Build pipeline configs (build)
  • Load runtime context (load_pipeline_context)
  • Execute step engine (execute_pipeline)
  • Run full pipeline (run)

Important runtime behaviors:

  • Supports both local python MCP servers (path.endswith(".py")) and remote MCP endpoints (http(s)).
  • For remote MCP, requires Node.js >= 20 and uses npx -y mcp-remote <url>.
  • Keeps loop-termination state in ContextVar for coroutine safety.
  • Emits structured stream events in demo/UI flows (step_start, step_end, token, sources).

src/ultrarag/server.py

Defines UltraRAG_MCP_Server, a compatibility wrapper over FastMCP.

Key responsibilities:

  • Enhanced tool() and prompt() registration with output metadata support
  • Metadata capture for automatic config generation
  • build(parameter_file) to generate per-server server.yaml
  • Compatibility filtering for FastMCP signature differences

src/ultrarag/api.py

Provides ergonomic Python-side wrappers:

  • initialize(servers, server_root, log_level)
  • ToolCall.server_name.tool_name(...)
  • PipelineCall(pipeline_file, parameter_file, log_level)

src/ultrarag/mcp_logging.py

  • Initializes root logger UltraRAG
  • Rich console logging + file logging (logs/<timestamp>.log)
  • Log level controlled by log_level argument and environment

src/ultrarag/mcp_exceptions.py

  • Validates local Node.js availability/version
  • Raises NodeNotInstalledError / NodeVersionTooLowError

src/ultrarag/utils.py

  • Subprocess lifecycle helpers
  • POSIX parent-death signal support
  • Windows job object support for child-process cleanup

8) MCP Server Authoring Contract

Each server follows this shape:

servers/<name>/
├── parameter.yaml
├── server.yaml            # generated
└── src/<name>.py

8.1 Registration styles

Use either:

  1. Decorator style
  2. Class-bound method registration style

Both are valid in this codebase.

8.2 output= grammar

Canonical form:

input1,input2,$param_a -> output1,output2

Rules:

  • Left side maps function args to pipeline inputs.
  • Right side defines expected output keys from returned dict/JSON.
  • -> None means no output variables.
  • $param means value comes from server parameter file.

8.3 Return payload expectations

  • For tools: return JSON-serializable dict payloads matching declared output keys.
  • For prompts: return prompt messages (typically list-like prompt payloads consumed by get_prompt).

8.4 Entrypoint requirement

Server modules should end with:

if __name__ == "__main__":
    app.run(transport="stdio")

9) Retriever/Generation/Prompt Specific Notes

Retriever (servers/retriever)

  • Supports multiple retrieval modes:
    • Dense retrieval
    • BM25
    • Web search
    • Project-memory retrieval
  • Index backends are pluggable via factory:
    • faiss
    • milvus
  • Web search backends are pluggable via factory:
    • exa
    • tavily
    • zhipuai

Generation (servers/generation)

  • Supports generation backends including openai, vllm, and hf workflows.
  • Provides explicit cleanup tool vllm_shutdown.
  • Demo mode can use local streaming generation service.

Prompt (servers/prompt)

  • Uses SandboxedEnvironment from Jinja2 for safer rendering.
  • Validates template paths to reduce traversal risk.
  • Escapes string inputs before template rendering.

10) UI Backend Architecture (ui/backend)

app.py

  • Flask app factory (create_app)
  • Serves frontend static assets
  • Exposes chat/pipeline/KB/auth endpoints
  • Reads optional frontend override via ULTRARAG_FRONTEND_DIR

pipeline_manager.py

This module is large and central to UI behavior.

Major responsibilities:

  • Session lifecycle and streaming chat management
  • Background chat task management
  • Pipeline CRUD (list/load/save/rename/delete)
  • Parameter load/save/build wrappers
  • Knowledge base file ingest and pipeline triggering
  • Memory synchronization to per-user KB collections
  • Optional server introspection via AST stub generation if server.yaml is missing

Notable behavior:

  • Applies defensive patches to suppress noisy closed-event-loop teardown logs.
  • Uses a queue bridge for async-to-sync SSE event streaming.
  • Supports automatic memory-to-KB sync for pipelines that include memory components.

11) Storage Model and Paths

Default UI storage root:

  • ui/storage

Can be overridden by:

  • ULTRARAG_UI_STORAGE_ROOT

Key subpaths:

  • db/users.sqlite3
  • chat_sessions/
  • knowledge_base/raw|corpus|chunks|index
  • memory/
  • knowledge_base/_memory_sync

Runtime outputs:

  • output/memory_*.json
  • logs/*.log

12) Environment Variables You Should Know

  • ULTRARAG_UI_STORAGE_ROOT: override UI storage root
  • ULTRARAG_FRONTEND_DIR: override frontend static directory
  • ULTRARAG_SESSION_TIMEOUT: foreground chat session timeout
  • ULTRARAG_BG_SESSION_TIMEOUT: background session timeout
  • ULTRARAG_LOG_TS: custom timestamp seed for log file naming
  • log_level: consumed by core logger initialization

13) Dependency Model

Install tiers from pyproject.toml:

  • Core install: no extras
  • retriever extra
  • generation extra
  • evaluation extra
  • corpus extra
  • all extra (union)

Typical commands:

uv sync
uv sync --extra retriever
uv sync --extra generation
uv sync --all-extras

Development dependencies include:

  • ruff
  • ipython
  • jupyter
  • pytest

14) CLI Commands (Canonical)

ultrarag build <pipeline.yaml>
ultrarag run <pipeline.yaml> [--param <parameter.yaml>] [--log_level info|debug|warn|error] [--is_demo]
ultrarag show ui [--host 127.0.0.1] [--port 5050]
ultrarag show case [--config_path <memory.json>] [--host 127.0.0.1] [--port 8080]

Minimal smoke check:

ultrarag run examples/experiments/sayhello.yaml

15) Docker Variants

  • Dockerfile: full image (builds frontend, installs all extras)
  • Dockerfile.base-cpu: CPU base image
  • Dockerfile.base-gpu: GPU base image

All variants start UI with:

ultrarag show ui --port 5050 --host 0.0.0.0

16) Development Playbooks

16.1 Add a new MCP server

  1. Create servers/<name>/parameter.yaml
  2. Implement servers/<name>/src/<name>.py
  3. Register tools/prompts via app.tool / app.prompt (or class-bound registration)
  4. Ensure app.run(transport="stdio") exists
  5. Add server to servers: in a pipeline YAML
  6. Run ultrarag build <pipeline.yaml>

16.2 Add a new tool to an existing server

  1. Implement function/method
  2. Register with explicit output=... contract
  3. Ensure return payload keys match outputs
  4. Update relevant parameter keys in parameter.yaml if using $...
  5. Add the step in pipeline YAML and rebuild

16.3 Add a retriever index backend

  1. Implement backend in servers/retriever/src/index_backends/
  2. Follow BaseIndexBackend contract
  3. Register in _INDEX_BACKENDS map in index_backends/__init__.py

16.4 Add a web-search backend

  1. Implement backend in servers/retriever/src/websearch_backends/
  2. Follow BaseWebSearchBackend contract
  3. Register in _WEBSEARCH_BACKENDS map

16.5 Modify UI pipeline behavior

Primary files:

  • ui/backend/app.py
  • ui/backend/pipeline_manager.py

If touching build/runtime semantics, cross-check against:

  • src/ultrarag/client.py

17) Coding Standards (Repository-Conformant)

  • Use type hints for function signatures.
  • Prefer pathlib.Path for filesystem paths.
  • Use yaml.safe_load / yaml.safe_dump.
  • Use project logger (get_logger or app.logger) instead of print.
  • Keep tool outputs deterministic and JSON-serializable.
  • Keep imports grouped: stdlib -> third-party -> local.
  • Keep async boundaries explicit (async/await).

18) What Not To Edit Blindly

Treat these as generated or runtime artifacts unless intentionally regenerating:

  • servers/*/server.yaml
  • examples/**/parameter/*_parameter.yaml
  • examples/**/server/*_server.yaml
  • output/*
  • logs/*
  • ui/storage/* runtime data

Also avoid committing secrets:

  • .env
  • any credential-bearing local config

19) Common Failure Modes and Fixes

Error: server file not found

  • Check servers.<name> path in pipeline YAML.
  • Ensure server entry script exists at servers/<name>/src/<name>.py (or update path in config).

Error: missing variable in pipeline execution

  • Verify output key names from upstream tool match downstream input names.
  • Verify remapping under step-level output: is correct.
  • Verify $param keys exist in that server's parameter config.

Remote MCP server fails to start

  • Ensure Node.js >= 20.
  • Confirm npx is available.
  • Confirm remote URL in server path is reachable.

Build succeeds but UI cannot list tools

  • Ensure server.yaml exists or can be inferred by AST parsing.
  • Check for unusual dynamic registration patterns that static analysis cannot infer.

No final answer in chat

  • Inspect output/memory_*.json.
  • Check whether generation step ran and produced ans_ls.
  • Check stream events in UI path (step_start, step_end, sources, token, final).

20) Validation Checklist for Agents

Before finalizing any non-trivial change:

  1. Build the affected pipeline:
    • ultrarag build <pipeline.yaml>
  2. Run a relevant smoke case:
    • ultrarag run <pipeline.yaml>
  3. If UI behavior changed:
    • run ultrarag show ui and verify route behavior
  4. If retriever/generation backends changed:
    • validate parameter schema keys and output names
  5. Keep generated artifacts intentional:
    • do not accidentally commit transient runtime outputs

21) Minimal Quickstart for New Agents

# 1) Install dependencies
uv sync --all-extras

# 2) Smoke test
ultrarag run examples/experiments/sayhello.yaml

# 3) Build and run a demo pipeline
ultrarag build examples/demos/LLM.yaml
ultrarag run examples/demos/LLM.yaml

# 4) Launch UI
ultrarag show ui --host 127.0.0.1 --port 5050

22) Final Notes

  • This repository is orchestration-first: correctness depends heavily on I/O naming consistency across tools and pipeline steps.
  • Most regressions come from mismatched variable names, stale generated configs, or incomplete parameter updates.
  • When in doubt, inspect src/ultrarag/client.py first: it is the execution truth.