This document is the primary engineering guide for autonomous coding agents working in the UltraRAG repository.
Use this file as the source of truth for architecture, conventions, workflows, and safe change patterns.
If CLAUDE.md exists, it should only point to this file.
UltraRAG is a lightweight RAG framework built around the Model Context Protocol (MCP).
The key design choice is strict modularization: retrieval, prompting, generation, routing, memory, and evaluation are implemented as independent MCP servers orchestrated by YAML pipelines.
Current core metadata:
- Package:
ultrarag - Version:
0.3.0 - Python:
>=3.11, <3.13 - CLI entrypoint:
ultrarag = ultrarag.client:main - Package manager:
uv([tool.uv] package = true)
UltraRAG/
├── src/ultrarag/ # Installable core package
│ ├── client.py # CLI + pipeline engine + run/build orchestration
│ ├── server.py # UltraRAG_MCP_Server (FastMCP extension)
│ ├── api.py # Python API wrappers (ToolCall, PipelineCall)
│ ├── cli.py # Rich banner and CLI visuals
│ ├── mcp_logging.py # Central logging setup
│ ├── mcp_exceptions.py # Node.js checks for remote MCP
│ └── utils.py # Subprocess lifecycle helpers
│
├── servers/ # MCP microservices (each server is independent)
│ ├── retriever/
│ ├── generation/
│ ├── prompt/
│ ├── reranker/
│ ├── benchmark/
│ ├── evaluation/
│ ├── corpus/
│ ├── memory/
│ ├── router/
│ ├── custom/
│ ├── pageindex/
│ └── sayhello/
│
├── examples/
│ ├── demos/ # UI-ready demo pipelines
│ └── experiments/ # Experiment/research pipelines
│
├── ui/
│ ├── backend/ # Flask backend + pipeline manager
│ └── frontend/ # Vite + React + TypeScript
│
├── docs/ # Docs and assets
├── script/ # Utility scripts (deploy, case study, etc.)
├── pyproject.toml # Dependencies + package metadata
├── uv.lock # Locked dependency graph
├── Dockerfile* # Container variants
└── .gitignore
Important generated/derived files:
servers/*/server.yaml(generated by each serverbuildtool)examples/**/parameter/*_parameter.yaml(pipeline-merged parameters)examples/**/server/*_server.yaml(pipeline-merged server config)output/memory_*.json(per-run memory snapshots)
Think of UltraRAG as a three-layer system:
- Interface layer: CLI (
ultrarag ...), UI (ultrarag show ui), and Python API (ToolCall,PipelineCall) - Orchestration layer:
src/ultrarag/client.py(build,load_pipeline_context,execute_pipeline) - Execution layer: MCP servers in
servers/*, each exposing tools/prompts over stdio (or remote MCP proxy)
The runtime contract is:
- A pipeline YAML declares which servers to use and which steps to execute.
- The client resolves I/O dependencies between steps.
- Each step calls exactly one MCP tool or prompt.
- Outputs are saved to a shared variable pool and can feed downstream steps.
Command:
ultrarag build <pipeline.yaml>What happens:
- Reads
servers:from the pipeline YAML. - For each referenced server, calls the server's
buildtool. - Produces:
<pipeline_dir>/parameter/<pipeline_name>_parameter.yaml<pipeline_dir>/server/<pipeline_name>_server.yaml
Why this matters:
buildmaterializes exact tool/prompt I/O metadata before runtime.- UI and runner rely on these generated artifacts for deterministic execution.
Command:
ultrarag run <pipeline.yaml> [--param path] [--is_demo]What happens:
- Loads generated server config + parameter config.
- Creates
fastmcp.Clienttransport config for each server. - Executes pipeline steps in order, including
loopandbranch. - Saves intermediate memory snapshots and writes
output/memory_*.json. - Invokes cleanup tools (e.g., tools ending with
vllm_shutdown) if present.
UltraRAG accepts mixed step forms:
- retriever.retriever_search- generation.generate:
input:
prompt_ls: custom_prompt_ls
output:
ans_ls: final_answer_ls- loop:
times: 3
steps:
- retriever.retriever_search
- generation.generate- branch:
router:
- router.route_query
branches:
need_retrieval:
- retriever.retriever_search
direct_answer:
- generation.generate- Steps under the
promptserver callclient.get_prompt(...). - Non-prompt steps call
client.call_tool(...).
In UltraData, each tool input value is interpreted by convention:
"$foo"-> load from server-local params (parameter.yaml)"bar"-> read from global variable pool (global_vars["bar"])"memory_xxx"-> read/write memory lists
Output handling:
- Tool returns JSON payload -> keys mapped to declared outputs
- Output remapping (
output:in pipeline step) is applied at save time - Prompt outputs usually produce
prompt_ls
Branch handling:
- Branch-aware values use wrapped list records with branch-state keys.
- Internal sentinel (
UNSET) is used to distinguish "not filled yet" fromNone.
Memory handling:
- The engine tracks
memory_*histories automatically. - Final snapshots are serialized to
output/memory_<...>.json. - If a memory server is detected, turn-level memory auto-save can be triggered.
This is the orchestration heart of the project.
Key responsibilities:
- CLI entrypoint (
main) - UI launch (
launch_ui) and case-study launch (launch_case_study) - Config loading (
Configuration) - Pipeline data graph and state (
UltraData) - Build pipeline configs (
build) - Load runtime context (
load_pipeline_context) - Execute step engine (
execute_pipeline) - Run full pipeline (
run)
Important runtime behaviors:
- Supports both local python MCP servers (
path.endswith(".py")) and remote MCP endpoints (http(s)). - For remote MCP, requires Node.js >= 20 and uses
npx -y mcp-remote <url>. - Keeps loop-termination state in
ContextVarfor coroutine safety. - Emits structured stream events in demo/UI flows (
step_start,step_end,token,sources).
Defines UltraRAG_MCP_Server, a compatibility wrapper over FastMCP.
Key responsibilities:
- Enhanced
tool()andprompt()registration withoutputmetadata support - Metadata capture for automatic config generation
build(parameter_file)to generate per-serverserver.yaml- Compatibility filtering for FastMCP signature differences
Provides ergonomic Python-side wrappers:
initialize(servers, server_root, log_level)ToolCall.server_name.tool_name(...)PipelineCall(pipeline_file, parameter_file, log_level)
- Initializes root logger
UltraRAG - Rich console logging + file logging (
logs/<timestamp>.log) - Log level controlled by
log_levelargument and environment
- Validates local Node.js availability/version
- Raises
NodeNotInstalledError/NodeVersionTooLowError
- Subprocess lifecycle helpers
- POSIX parent-death signal support
- Windows job object support for child-process cleanup
Each server follows this shape:
servers/<name>/
├── parameter.yaml
├── server.yaml # generated
└── src/<name>.py
Use either:
- Decorator style
- Class-bound method registration style
Both are valid in this codebase.
Canonical form:
input1,input2,$param_a -> output1,output2
Rules:
- Left side maps function args to pipeline inputs.
- Right side defines expected output keys from returned dict/JSON.
-> Nonemeans no output variables.$parammeans value comes from server parameter file.
- For tools: return JSON-serializable dict payloads matching declared output keys.
- For prompts: return prompt messages (typically list-like prompt payloads consumed by
get_prompt).
Server modules should end with:
if __name__ == "__main__":
app.run(transport="stdio")- Supports multiple retrieval modes:
- Dense retrieval
- BM25
- Web search
- Project-memory retrieval
- Index backends are pluggable via factory:
faissmilvus
- Web search backends are pluggable via factory:
exatavilyzhipuai
- Supports generation backends including
openai,vllm, and hf workflows. - Provides explicit cleanup tool
vllm_shutdown. - Demo mode can use local streaming generation service.
- Uses
SandboxedEnvironmentfrom Jinja2 for safer rendering. - Validates template paths to reduce traversal risk.
- Escapes string inputs before template rendering.
- Flask app factory (
create_app) - Serves frontend static assets
- Exposes chat/pipeline/KB/auth endpoints
- Reads optional frontend override via
ULTRARAG_FRONTEND_DIR
This module is large and central to UI behavior.
Major responsibilities:
- Session lifecycle and streaming chat management
- Background chat task management
- Pipeline CRUD (list/load/save/rename/delete)
- Parameter load/save/build wrappers
- Knowledge base file ingest and pipeline triggering
- Memory synchronization to per-user KB collections
- Optional server introspection via AST stub generation if
server.yamlis missing
Notable behavior:
- Applies defensive patches to suppress noisy closed-event-loop teardown logs.
- Uses a queue bridge for async-to-sync SSE event streaming.
- Supports automatic memory-to-KB sync for pipelines that include memory components.
Default UI storage root:
ui/storage
Can be overridden by:
ULTRARAG_UI_STORAGE_ROOT
Key subpaths:
db/users.sqlite3chat_sessions/knowledge_base/raw|corpus|chunks|indexmemory/knowledge_base/_memory_sync
Runtime outputs:
output/memory_*.jsonlogs/*.log
ULTRARAG_UI_STORAGE_ROOT: override UI storage rootULTRARAG_FRONTEND_DIR: override frontend static directoryULTRARAG_SESSION_TIMEOUT: foreground chat session timeoutULTRARAG_BG_SESSION_TIMEOUT: background session timeoutULTRARAG_LOG_TS: custom timestamp seed for log file naminglog_level: consumed by core logger initialization
Install tiers from pyproject.toml:
- Core install: no extras
retrieverextragenerationextraevaluationextracorpusextraallextra (union)
Typical commands:
uv sync
uv sync --extra retriever
uv sync --extra generation
uv sync --all-extrasDevelopment dependencies include:
ruffipythonjupyterpytest
ultrarag build <pipeline.yaml>
ultrarag run <pipeline.yaml> [--param <parameter.yaml>] [--log_level info|debug|warn|error] [--is_demo]
ultrarag show ui [--host 127.0.0.1] [--port 5050]
ultrarag show case [--config_path <memory.json>] [--host 127.0.0.1] [--port 8080]Minimal smoke check:
ultrarag run examples/experiments/sayhello.yamlDockerfile: full image (builds frontend, installs all extras)Dockerfile.base-cpu: CPU base imageDockerfile.base-gpu: GPU base image
All variants start UI with:
ultrarag show ui --port 5050 --host 0.0.0.0- Create
servers/<name>/parameter.yaml - Implement
servers/<name>/src/<name>.py - Register tools/prompts via
app.tool/app.prompt(or class-bound registration) - Ensure
app.run(transport="stdio")exists - Add server to
servers:in a pipeline YAML - Run
ultrarag build <pipeline.yaml>
- Implement function/method
- Register with explicit
output=...contract - Ensure return payload keys match outputs
- Update relevant parameter keys in
parameter.yamlif using$... - Add the step in pipeline YAML and rebuild
- Implement backend in
servers/retriever/src/index_backends/ - Follow
BaseIndexBackendcontract - Register in
_INDEX_BACKENDSmap inindex_backends/__init__.py
- Implement backend in
servers/retriever/src/websearch_backends/ - Follow
BaseWebSearchBackendcontract - Register in
_WEBSEARCH_BACKENDSmap
Primary files:
ui/backend/app.pyui/backend/pipeline_manager.py
If touching build/runtime semantics, cross-check against:
src/ultrarag/client.py
- Use type hints for function signatures.
- Prefer
pathlib.Pathfor filesystem paths. - Use
yaml.safe_load/yaml.safe_dump. - Use project logger (
get_loggerorapp.logger) instead ofprint. - Keep tool outputs deterministic and JSON-serializable.
- Keep imports grouped: stdlib -> third-party -> local.
- Keep async boundaries explicit (
async/await).
Treat these as generated or runtime artifacts unless intentionally regenerating:
servers/*/server.yamlexamples/**/parameter/*_parameter.yamlexamples/**/server/*_server.yamloutput/*logs/*ui/storage/*runtime data
Also avoid committing secrets:
.env- any credential-bearing local config
- Check
servers.<name>path in pipeline YAML. - Ensure server entry script exists at
servers/<name>/src/<name>.py(or update path in config).
- Verify output key names from upstream tool match downstream input names.
- Verify remapping under step-level
output:is correct. - Verify
$paramkeys exist in that server's parameter config.
- Ensure Node.js >= 20.
- Confirm
npxis available. - Confirm remote URL in server path is reachable.
- Ensure
server.yamlexists or can be inferred by AST parsing. - Check for unusual dynamic registration patterns that static analysis cannot infer.
- Inspect
output/memory_*.json. - Check whether generation step ran and produced
ans_ls. - Check stream events in UI path (
step_start,step_end,sources,token,final).
Before finalizing any non-trivial change:
- Build the affected pipeline:
ultrarag build <pipeline.yaml>
- Run a relevant smoke case:
ultrarag run <pipeline.yaml>
- If UI behavior changed:
- run
ultrarag show uiand verify route behavior
- run
- If retriever/generation backends changed:
- validate parameter schema keys and output names
- Keep generated artifacts intentional:
- do not accidentally commit transient runtime outputs
# 1) Install dependencies
uv sync --all-extras
# 2) Smoke test
ultrarag run examples/experiments/sayhello.yaml
# 3) Build and run a demo pipeline
ultrarag build examples/demos/LLM.yaml
ultrarag run examples/demos/LLM.yaml
# 4) Launch UI
ultrarag show ui --host 127.0.0.1 --port 5050- This repository is orchestration-first: correctness depends heavily on I/O naming consistency across tools and pipeline steps.
- Most regressions come from mismatched variable names, stale generated configs, or incomplete parameter updates.
- When in doubt, inspect
src/ultrarag/client.pyfirst: it is the execution truth.