This document provides a comprehensive, deep-dive architectural guide for developers attempting to understand, modify, or contribute code to Clawith (even as an AI Agent). By reading this specification, you will understand the data flows and operational backbone connecting the system's various modules.
Clawith employs a fully decoupled frontend-backend architecture, interacting via REST APIs and WebSockets (long-lived connections).
- Backend: Python (3.11+), FastAPI, SQLAlchemy 2.0 (AsyncSession), PostgreSQL (underlying DB), Redis (optional, for partial queue implementations), Loguru (logging system). Core LLM calls are uniformly encapsulated, supporting multiple providers (OpenAI, DeepSeek, Claude, etc.).
- Frontend: React 18, Vite, TypeScript, Zustand (global state flow), React Router v6. The UI is deeply customized with a Linear-Style aesthetic (dark mode, translucent glassmorphism, grid backgrounds, micro-animation interactions).
- External Integrations: Feishu/DingTalk/WeCom (bot Webhook access layer), Slack/Discord channels, and native support for the MCP (Model Context Protocol) plugin system.
To help future development quickly locate core files, here are the most critical code locations:
api/: REST API routes and controllers.websocket.py: The most critical file! Controls LLM streaming output, the Tool-calling Loop, and agent heartbeat mechanics.gateway.py: Edge Node Gateway. Responsible for authentication, command dispatch (poll), and result return (report) for OpenClaw Agents (agents running locally on users' machines).triggers.py: Frontend settings interfaces for the Aware Engine.feishu.py/discord_bot.py: Message entry points (Webhooks/Gateways) for third-party IM software.
models/: SQLAlchemy database ORM entities (see Module 2).services/: Core business logic layer.agent_tools.py: The Agent Tool Hub. Contains core sandbox file operations (write_file,read_file), Agent A2A communication interception logic, and Feishu message dispatching logic.agent_context.py: Assembles the LLM context (stitching togethersoul.md, system-level Prompts, etc.).
components/: Reusable UI components.pages/: Complete view layers.AgentDetail.tsx: The primary user-facing interface. Contains Agent settings, relationship chains, trigger panels, and the crucial WebSocket real-time conversation rendering logic (A2A bubble alignment calculations also happen here).Plaza.tsx: The discovery page for finding and "hiring" public Agents on the platform.Layout.tsx: Global structural wrapper.
services/api.ts: Encapsulates all outbound Axios requests to the backend.stores/: Zustand state repositories, such asuseAuthStore(permission routing), responsible for seamless Client State management.index.css: The singular theme and atomic CSS file for the project, defining the color scale and Linear-Style UI across the entire site.
The fastest way to understand how Clawith operates is through its underlying relational data mapping (backend/app/models/). Here are the crucial table structures maintaining the ecosystem:
All core entities contain a tenant_id to enforce physical isolation between different enterprises within the SaaS architecture.
User(user.py): Real human users, possessingsuper_adminor standard permissions.Tenant(tenant.py): The tenant entity managing data isolation spaces.OrgDepartment&OrgMember(org.py): Clones of corporate organizational structures. The system actively syncs corporate directories from sources like Feishu and caches them here. When an Agent dispatches an outgoing message, it matches names against this table to retrieve the target'sfeishu_open_id.
Agent(agent.py): The "Digital Employees" of the platform.- Key fields:
agent_type(nativeplatform-hosted oropenclawexternally registered),heartbeat_enabled(whether periodic sleep/wake is active),autonomy_policy(a dictionary of L1-L3 level autonomous operation authorizations).
- Key fields:
Participant(participant.py): Crucial Table! The multi-party communication routing anchor. Anyone capable of speaking on this platform receives a participant ID (withtypedistinguishing betweenuserandagent). Its existence allows Agents not only to converse with humans but to initiate multi-party or A2A (Agent-to-Agent) group chats with other Agents.ChatSession(chat_session.py): Bundles multiple messages into entities with coherent context.ChatMessage(audit.py): Every LLM request/response, and even every tool invocation (tool_call), is fully snapshot and stored here.
To prevent any two Agents in the system from arbitrarily communicating and spamming each other, the system enforces strict access control:
AgentAgentRelationship(org.py): The A2A (Agent-to-Agent) bidirectional relationship table. Underlying cross-boundary file transfers (send_file_to_agentinagent_tools.py) are strictly prohibited unless a correlative record pointing fromagent_Atoagent_B(or vice versa) exists in this table.Plaza(plaza.py): Marketplace records. Once a public Digital Employee goes through the "hire" button flow, the system automatically establishes anAgentRelationshipassociation between the operator and the Agent in the background, unlocking collaboration rights.
AgentTrigger(trigger.py): This table constitutes the heart of the Aware Engine. It records configurations such ascronroutine wake-ups andpollAPI monitoring. Background daemon processes periodically sweep this table; once conditions are met, they bypass human input to inject system pulses directly intowebsocket.py, awakening the Agent.GatewayMessage(gateway_message.py): A pending queue exclusively allocated foropenclawtypes. Because remote machines are not in the Clawith server room, when the system has communications targeting that machine, it writes to this table. The remote Mac computer retrieves the information via thepollinterface; after finishing its local LLM computations, it writes the result back throughreport, eventually triggering a WebSocket reverse notification to the frontend.
Clawith's most complex core business logic is centralized within backend/app/api/websocket.py. Understanding this file means understanding the entire thought and action flow of Native Agents.
When a user opens a single Agent's page in the browser:
- The frontend initiates a
ws://.../ws/chat/{agent_id}request carrying a JWT Token. - The backend immediately Accepts the connection (for lightning-fast visual response) before performing asynchronous interception for Token and Agent permissions (expiration validation, etc., via
check_agent_access,is_agent_expired). - If no existing
session_idis matched, it automatically allocates one via UUID5 or fetches the lastChatSessionbetween the user and that Agent, loading up to 20 history messages as context (history_messages). Important Detail: If the extracted history containsrole="tool_call"records, the system restructures the JSON back into OpenAI's native concurrent Assistant+Tool_Calls format, preserving the LLM's coherent memory of its tool usage.
When a user sends a message ([WS] Received:...), the system does not simply invoke the LLM once and return text. Instead, it enters a deep polling circuit allowed up to 50 iterations:
# /backend/app/api/websocket.py: call_llm()
for round_i in range(_max_tool_rounds):
# Dynamically inject tool limitation warnings
# ...
# Stream-call the LLM to obtain thought processes and Tool Calls
response = await client.stream(...)
# Exit condition evaluation
if not response.tool_calls:
# No tools called; final text answer generation complete. Exit Loop and return to frontend.
return response.content
# Execute Tool Call (Reflection call to executor)
result = await execute_tool(tool_name, args, ...)
# Reassemble results and proceed to the next round- Resource Protection Warning Mechanism: To prevent the LLM from entering infinite loops by stubbornly retrying a failing tool, the system incorporates a pre-terminal life-cycle warning. At
_warn_threshold_80(when 80% of round limits are exhausted), the system preemptively injects aSystemMessagetelling the model "You have used x/50 calls; please save your progress to focus.md immediately", preventing long-running tasks from dying abruptly. - Hard Parameter Validation: For high-risk required-argument functions like
write_fileordelete_file, if the LLM (like Claude) issues a tool call declaration with emptyargs, the system does not execute and throw an environment error. Instead, it intercepts execution and returns an error message within the context urging the model to correct it immediately, dramatically improving fault tolerance.
During streaming output, providers (like certain open-source frameworks) might not return usage token counts. _accumulated_tokens will trigger estimate_tokens_from_chars(), replenishing deductions via Chinese/English string estimation ratios to ensure the user's daily/monthly Agent Quota is accurately billed at all times.
To allow the Clawith ecosystem to embrace intelligent agents running on local laptops, Raspberry Pis, or even other proprietary environments, the system introduces the OpenClaw Edge Node Protocol.
- Local devices calling the gateway (defined in
backend/app/api/gateway.py) do not use JWT User Tokens; rather, they use the exclusiveX-Api-Keyissued when the Edge Agent was created. - Upon entry, the system undergoes dual verification: supporting plaintext (new version) or
hashlib.sha256(legacy compatibility) reverse lookups against theagentstable to verify legitimacy.
An OpenClaw Node is essentially a local daemon process executing an infinite loop script forever:
- Poll:
/gateway/pollendpoint. The local Agent interrogates this endpoint every few seconds asking if there are anyGatewayMessages targeting itsidwithstatus='pending'. If so, it marks them asdeliveredand takes away the packaged context history. - Local Computation: The OpenClaw Node, detached from Clawith computing power, can assemble Prompts locally and offload them to an Ollama instance or third-party LLM running on the local machine.
- Report:
/gateway/reportendpoint. After local results are computed, they are sent to this endpoint bearing the initialmessage_idand theresult. Upon receipt, the Gateway:- Updates the initial
GatewayMessagestatus tocompleted. - Core Flow: Morphs it into a
ChatMessage(role='assistant')and forcibly shoves it into the user'sChatSessiondatabase. - Invokes the WebSocket Manager to trigger an
await manager.send_message({"type": "done", "content": body.result})forcefully streaming to the end user gazing at the online interface!
- Updates the initial
- Send (Proactive Communication):
/gateway/send-message. When a local Agent suddenly wants to reach a headquarters person or another Agent (i.e. an A2A scenario). This interface detects whetherbody.targetis Human (triggering Feishu dispatch) or Native Agent (triggering a massively long asynchronous LLM pushstream labeled_send_to_agent_background).
Defined within backend/app/models/trigger.py and backend/app/api/triggers.py lies the core that liberates Agents from "passive dialogue boxes" into "autonomous workers": the Pulse Engine.
Each Agent can set up a series of triggers targeting itself (rendered identically in the frontend Aware page panel):
type:cron(Cron scheduling),interval(fixed interval scanning),poll(pulling and comparing against external APIs),on_message(letters from specific individuals).config: Houses JSON expressions customized by Type (e.g., croniter's'0 9 * * 1-5').cooldown_seconds: Anti-bounce cooldown periods avoiding polling storms.
- The backend runs a periodically ticking Scheduler Task (Pulse Emitter).
- The system sweeps tables locating
AgentTriggers hitting execution time. - The system fabricates a
SystemMessagemasquerading as human-triggered input. Example:[System Trigger]: Your set time trigger "Daily Data Report" has expired. Please execute the initially designated goal immediately. - It forcefully pushes this request into the corresponding
agent_idWebSocket flow / or generates a newChatSession, prompting the native core execution engine (call_llm) to spin up and invoke tools to generate the report.
A2A (Agent-to-Agent) communication is Clawith's trump card logic moat. By simulating peer-to-peer relationships, it allows models to toss requirements back and forth as if they were working inside human chatting software.
All foundational A2A interactive capabilities are centralized in backend/app/services/agent_tools.py:
send_message_to_agent(target_name, message)send_file_to_agent(target_name, filename, explanation_message)
Interception Logic:
# When an Agent invokes send_message_to_agent:
# 1. Fuzzy search target_name (within identical tenant_id)
# 2. Block evaluation:
rel_forward = select(AgentAgentRelationship).where(agent_id=src, target=dst)
rel_backward = select(AgentAgentRelationship).where(agent_id=dst, target=src)
if not (rel_forward or rel_backward):
# Respond to LLM reporting: "Permission restricted: you are not on the same team/authorization not acquired"This meticulous design prevents scenarios utilizing Prompt Injection to instruct an Agent to randomly cast nets and scan other confidential exclusive Agents within the company.
In typical ChatUIs, role: "user" usually warrants blue backgrounds aligned right, while role: "assistant" yields grey backgrounds aligned left. But what if two Agents (A and B) are conversing?
- The
Participantmodel single-handedly redeems this paradox: - Within
frontend/src/pages/AgentDetail.tsx, an exclusiveisSenderconditional function operates. - If we are viewing Agent A's history: as long as
message.participant_idbelongs to A itself, render it on the right side! This holds regardless of whether the content is recorded in the DB asrole="assistant"(because it is the responding end commanded by others) orrole="user"(because it proactively initiated_send_to_agent_backgroundawakening the other's conversation). This guarantees that within management scopes, "I" always speak on the right side; "The opposite end" is perpetually on the left.
Clawith architecturally refuses to solidify frontend chatting as the foremost priority. The Web UI is merely one of its multitudinous "monitors". The system devised a generalized ChannelConfig to uniformly consolidate messages flowing from terminal IM software.
Taking Feishu as an example (backend/app/api/feishu.py):
- Event Recept: Receives encrypted Webhook POST requests (
im.message.receive_v1) from the Feishu Open Platform. - Identity Mapping: Utilizing the incoming
open_id, queries theOrgMembertable to reverse search the correlatedUserrecord bound to that employee within Clawith. - Dispatch to Endpoint: Generates a native
ChatMessage(role='user', source_channel='feishu')outfitted with standardized context. It is then dumped into the underlying LLM execution pool to be computed exactly as if it came from the Web interface. - Packet Wrapping for Return: After model computations conclude realizing text or Markdown, underlying tools (
send_feishu_message) or lifecycle hooks render it into rich text in reverse and send it back to Feishu.
This paradigm ensures that regardless of Slack, Discord, or Personal WeChat, backend large model execution logic demands absolutely zero modifications, altogether reusing the tenets of Module 3.
As a high-tier collaborative whiteboard, the Frontend (frontend/) relentlessly pursues rendering efficiency and real-time feedback in engineering management.
- Build & UI Frameworks:
Vitebuildup engine +React 18. In terms of aesthetic design, structurally adheres to Linear-Style beauty (aberrantly dark backdrops, hairline contours, translucent gaussian blurred frosted glass panels, Lucide vector icons). Primary constraints are globally defined insideindex.cssatomic variables. - Global State Control: Abandons heavy-duty frameworks like Redux in favor of the lightweight
Zustandto script Hook Stores (rooted atfrontend/src/stores/).- Example:
useAuthStoremanages JWT state persistence, User Authority, Multilingual Preference Locales (i18n).
- Example:
At the AgentDetail.tsx page, the system battles extreme rendering pressure: outputs from the large model might be split into hundreds of minuscule Tokens. How do we achieve 60-FPS butter-smooth typewriter effects without throttling?
- Data Slice Interception:
WebSocketevents flooding from backend encompass multiple typologies:chunk(Textography),tool_call(Tools executing image-text),think(Deep thinking trajectories). - Incremental Referencing (Refs vs State):
The system forbids indiscriminately stuffing every single
chunkcleanly into React'suseState, avoiding triggering hundreds and thousands of comprehensive global repaints. Instead, it maintains a live queue pertaining to the active message currently generating, administering localized state assemblages or downloading renders via Throttle mechanisms. - Markdown Rich-Text Cleansing:
Employs
react-markdownexecuting ultimate filtration presentations. Imposing Copy buttons over code snippet blocks; demonstrating imagified placeholder renders mapped against localized hyperlinks.
[The End] Architecture Document Completion.
Clawith Architecture Document Engine Edition. This document currently engulfs all core logic within the system. Whether pivoting underlying engine circuits, appending new Database tables, or drafting novel outbound calling channel pipelines, please persistently harbor reverence toward the Workspace/Tenant isolation barriers alongside Relationship object strictly bound constraints.