Skip to content

Latest commit

 

History

History
213 lines (161 loc) · 19.3 KB

File metadata and controls

213 lines (161 loc) · 19.3 KB

Clawith Architecture Specification

This document provides a comprehensive, deep-dive architectural guide for developers attempting to understand, modify, or contribute code to Clawith (even as an AI Agent). By reading this specification, you will understand the data flows and operational backbone connecting the system's various modules.


Module 1: System Macro-Architecture & Directory Structure

Clawith employs a fully decoupled frontend-backend architecture, interacting via REST APIs and WebSockets (long-lived connections).

1.1 Tech Stack

  • Backend: Python (3.11+), FastAPI, SQLAlchemy 2.0 (AsyncSession), PostgreSQL (underlying DB), Redis (optional, for partial queue implementations), Loguru (logging system). Core LLM calls are uniformly encapsulated, supporting multiple providers (OpenAI, DeepSeek, Claude, etc.).
  • Frontend: React 18, Vite, TypeScript, Zustand (global state flow), React Router v6. The UI is deeply customized with a Linear-Style aesthetic (dark mode, translucent glassmorphism, grid backgrounds, micro-animation interactions).
  • External Integrations: Feishu/DingTalk/WeCom (bot Webhook access layer), Slack/Discord channels, and native support for the MCP (Model Context Protocol) plugin system.

1.2 Directory Map

To help future development quickly locate core files, here are the most critical code locations:

Backend (/backend/app/)

  • api/: REST API routes and controllers.
    • websocket.py: The most critical file! Controls LLM streaming output, the Tool-calling Loop, and agent heartbeat mechanics.
    • gateway.py: Edge Node Gateway. Responsible for authentication, command dispatch (poll), and result return (report) for OpenClaw Agents (agents running locally on users' machines).
    • triggers.py: Frontend settings interfaces for the Aware Engine.
    • feishu.py / discord_bot.py: Message entry points (Webhooks/Gateways) for third-party IM software.
  • models/: SQLAlchemy database ORM entities (see Module 2).
  • services/: Core business logic layer.
    • agent_tools.py: The Agent Tool Hub. Contains core sandbox file operations (write_file, read_file), Agent A2A communication interception logic, and Feishu message dispatching logic.
    • agent_context.py: Assembles the LLM context (stitching together soul.md, system-level Prompts, etc.).

Frontend (/frontend/src/)

  • components/: Reusable UI components.
  • pages/: Complete view layers.
    • AgentDetail.tsx: The primary user-facing interface. Contains Agent settings, relationship chains, trigger panels, and the crucial WebSocket real-time conversation rendering logic (A2A bubble alignment calculations also happen here).
    • Plaza.tsx: The discovery page for finding and "hiring" public Agents on the platform.
    • Layout.tsx: Global structural wrapper.
  • services/api.ts: Encapsulates all outbound Axios requests to the backend.
  • stores/: Zustand state repositories, such as useAuthStore (permission routing), responsible for seamless Client State management.
  • index.css: The singular theme and atomic CSS file for the project, defining the color scale and Linear-Style UI across the entire site.

Module 2: Core Data Models (Database Schema)

The fastest way to understand how Clawith operates is through its underlying relational data mapping (backend/app/models/). Here are the crucial table structures maintaining the ecosystem:

2.1 Tenant & Organization Security (Tenant Isolation)

All core entities contain a tenant_id to enforce physical isolation between different enterprises within the SaaS architecture.

  • User (user.py): Real human users, possessing super_admin or standard permissions.
  • Tenant (tenant.py): The tenant entity managing data isolation spaces.
  • OrgDepartment & OrgMember (org.py): Clones of corporate organizational structures. The system actively syncs corporate directories from sources like Feishu and caches them here. When an Agent dispatches an outgoing message, it matches names against this table to retrieve the target's feishu_open_id.

2.2 Core Operational Entities (Agent, Session, Messages)

  • Agent (agent.py): The "Digital Employees" of the platform.
    • Key fields: agent_type (native platform-hosted or openclaw externally registered), heartbeat_enabled (whether periodic sleep/wake is active), autonomy_policy (a dictionary of L1-L3 level autonomous operation authorizations).
  • Participant (participant.py): Crucial Table! The multi-party communication routing anchor. Anyone capable of speaking on this platform receives a participant ID (with type distinguishing between user and agent). Its existence allows Agents not only to converse with humans but to initiate multi-party or A2A (Agent-to-Agent) group chats with other Agents.
  • ChatSession (chat_session.py): Bundles multiple messages into entities with coherent context.
  • ChatMessage (audit.py): Every LLM request/response, and even every tool invocation (tool_call), is fully snapshot and stored here.

2.3 M2M Collaboration & Discovery (Relationships & Plaza)

To prevent any two Agents in the system from arbitrarily communicating and spamming each other, the system enforces strict access control:

  • AgentAgentRelationship (org.py): The A2A (Agent-to-Agent) bidirectional relationship table. Underlying cross-boundary file transfers (send_file_to_agent in agent_tools.py) are strictly prohibited unless a correlative record pointing from agent_A to agent_B (or vice versa) exists in this table.
  • Plaza (plaza.py): Marketplace records. Once a public Digital Employee goes through the "hire" button flow, the system automatically establishes an AgentRelationship association between the operator and the Agent in the background, unlocking collaboration rights.

2.4 Pulse Engine & Edge Computing (Pulse & Gateway API)

  • AgentTrigger (trigger.py): This table constitutes the heart of the Aware Engine. It records configurations such as cron routine wake-ups and poll API monitoring. Background daemon processes periodically sweep this table; once conditions are met, they bypass human input to inject system pulses directly into websocket.py, awakening the Agent.
  • GatewayMessage (gateway_message.py): A pending queue exclusively allocated for openclaw types. Because remote machines are not in the Clawith server room, when the system has communications targeting that machine, it writes to this table. The remote Mac computer retrieves the information via the poll interface; after finishing its local LLM computations, it writes the result back through report, eventually triggering a WebSocket reverse notification to the frontend.

Module 3: Native Core Engine

Clawith's most complex core business logic is centralized within backend/app/api/websocket.py. Understanding this file means understanding the entire thought and action flow of Native Agents.

3.1 Lightning-Fast Connection & On-Demand Auth

When a user opens a single Agent's page in the browser:

  1. The frontend initiates a ws://.../ws/chat/{agent_id} request carrying a JWT Token.
  2. The backend immediately Accepts the connection (for lightning-fast visual response) before performing asynchronous interception for Token and Agent permissions (expiration validation, etc., via check_agent_access, is_agent_expired).
  3. If no existing session_id is matched, it automatically allocates one via UUID5 or fetches the last ChatSession between the user and that Agent, loading up to 20 history messages as context (history_messages). Important Detail: If the extracted history contains role="tool_call" records, the system restructures the JSON back into OpenAI's native concurrent Assistant+Tool_Calls format, preserving the LLM's coherent memory of its tool usage.

3.2 The Tool-Calling Loop

When a user sends a message ([WS] Received:...), the system does not simply invoke the LLM once and return text. Instead, it enters a deep polling circuit allowed up to 50 iterations:

# /backend/app/api/websocket.py: call_llm()
for round_i in range(_max_tool_rounds):
    # Dynamically inject tool limitation warnings
    # ...
    # Stream-call the LLM to obtain thought processes and Tool Calls
    response = await client.stream(...)
    
    # Exit condition evaluation
    if not response.tool_calls:
        # No tools called; final text answer generation complete. Exit Loop and return to frontend.
        return response.content

    # Execute Tool Call (Reflection call to executor)
    result = await execute_tool(tool_name, args, ...)
    # Reassemble results and proceed to the next round
  • Resource Protection Warning Mechanism: To prevent the LLM from entering infinite loops by stubbornly retrying a failing tool, the system incorporates a pre-terminal life-cycle warning. At _warn_threshold_80 (when 80% of round limits are exhausted), the system preemptively injects a SystemMessage telling the model "You have used x/50 calls; please save your progress to focus.md immediately", preventing long-running tasks from dying abruptly.
  • Hard Parameter Validation: For high-risk required-argument functions like write_file or delete_file, if the LLM (like Claude) issues a tool call declaration with empty args, the system does not execute and throw an environment error. Instead, it intercepts execution and returns an error message within the context urging the model to correct it immediately, dramatically improving fault tolerance.

3.3 Token Deduction & Long Text Estimation

During streaming output, providers (like certain open-source frameworks) might not return usage token counts. _accumulated_tokens will trigger estimate_tokens_from_chars(), replenishing deductions via Chinese/English string estimation ratios to ensure the user's daily/monthly Agent Quota is accurately billed at all times.


Module 4: Edge Computing & Ecosystem Extension (OpenClaw Gateway)

To allow the Clawith ecosystem to embrace intelligent agents running on local laptops, Raspberry Pis, or even other proprietary environments, the system introduces the OpenClaw Edge Node Protocol.

4.1 X-Api-Key Gateway Auth

  • Local devices calling the gateway (defined in backend/app/api/gateway.py) do not use JWT User Tokens; rather, they use the exclusive X-Api-Key issued when the Edge Agent was created.
  • Upon entry, the system undergoes dual verification: supporting plaintext (new version) or hashlib.sha256 (legacy compatibility) reverse lookups against the agents table to verify legitimacy.

4.2 Poll-Report-Send Messenger Mechanism

An OpenClaw Node is essentially a local daemon process executing an infinite loop script forever:

  1. Poll: /gateway/poll endpoint. The local Agent interrogates this endpoint every few seconds asking if there are any GatewayMessages targeting its id with status='pending'. If so, it marks them as delivered and takes away the packaged context history.
  2. Local Computation: The OpenClaw Node, detached from Clawith computing power, can assemble Prompts locally and offload them to an Ollama instance or third-party LLM running on the local machine.
  3. Report: /gateway/report endpoint. After local results are computed, they are sent to this endpoint bearing the initial message_id and the result. Upon receipt, the Gateway:
    • Updates the initial GatewayMessage status to completed.
    • Core Flow: Morphs it into a ChatMessage(role='assistant') and forcibly shoves it into the user's ChatSession database.
    • Invokes the WebSocket Manager to trigger an await manager.send_message({"type": "done", "content": body.result}) forcefully streaming to the end user gazing at the online interface!
  4. Send (Proactive Communication): /gateway/send-message. When a local Agent suddenly wants to reach a headquarters person or another Agent (i.e. an A2A scenario). This interface detects whether body.target is Human (triggering Feishu dispatch) or Native Agent (triggering a massively long asynchronous LLM pushstream labeled _send_to_agent_background).

Module 5: Aware & Pulse Engine

Defined within backend/app/models/trigger.py and backend/app/api/triggers.py lies the core that liberates Agents from "passive dialogue boxes" into "autonomous workers": the Pulse Engine.

5.1 Trigger Core Structure (AgentTrigger)

Each Agent can set up a series of triggers targeting itself (rendered identically in the frontend Aware page panel):

  • type: cron (Cron scheduling), interval (fixed interval scanning), poll (pulling and comparing against external APIs), on_message (letters from specific individuals).
  • config: Houses JSON expressions customized by Type (e.g., croniter's '0 9 * * 1-5').
  • cooldown_seconds: Anti-bounce cooldown periods avoiding polling storms.

5.2 How Does the Trigger Chain Flow?

  1. The backend runs a periodically ticking Scheduler Task (Pulse Emitter).
  2. The system sweeps tables locating AgentTriggers hitting execution time.
  3. The system fabricates a SystemMessage masquerading as human-triggered input. Example: [System Trigger]: Your set time trigger "Daily Data Report" has expired. Please execute the initially designated goal immediately.
  4. It forcefully pushes this request into the corresponding agent_id WebSocket flow / or generates a new ChatSession, prompting the native core execution engine (call_llm) to spin up and invoke tools to generate the report.

Module 6: Multi-Agent A2A Collaboration

A2A (Agent-to-Agent) communication is Clawith's trump card logic moat. By simulating peer-to-peer relationships, it allows models to toss requirements back and forth as if they were working inside human chatting software.

6.1 Strict Border Control for A2A (Relationship Check)

All foundational A2A interactive capabilities are centralized in backend/app/services/agent_tools.py:

  • send_message_to_agent(target_name, message)
  • send_file_to_agent(target_name, filename, explanation_message)

Interception Logic:

# When an Agent invokes send_message_to_agent:
# 1. Fuzzy search target_name (within identical tenant_id)
# 2. Block evaluation:
rel_forward = select(AgentAgentRelationship).where(agent_id=src, target=dst)
rel_backward = select(AgentAgentRelationship).where(agent_id=dst, target=src)
if not (rel_forward or rel_backward):
    # Respond to LLM reporting: "Permission restricted: you are not on the same team/authorization not acquired"

This meticulous design prevents scenarios utilizing Prompt Injection to instruct an Agent to randomly cast nets and scan other confidential exclusive Agents within the company.

6.2 Frontend A2A Bubble Rendering Revolution

In typical ChatUIs, role: "user" usually warrants blue backgrounds aligned right, while role: "assistant" yields grey backgrounds aligned left. But what if two Agents (A and B) are conversing?

  • The Participant model single-handedly redeems this paradox:
  • Within frontend/src/pages/AgentDetail.tsx, an exclusive isSender conditional function operates.
  • If we are viewing Agent A's history: as long as message.participant_id belongs to A itself, render it on the right side! This holds regardless of whether the content is recorded in the DB as role="assistant" (because it is the responding end commanded by others) or role="user" (because it proactively initiated _send_to_agent_background awakening the other's conversation). This guarantees that within management scopes, "I" always speak on the right side; "The opposite end" is perpetually on the left.

Module 7: Omni-Channel Integration

Clawith architecturally refuses to solidify frontend chatting as the foremost priority. The Web UI is merely one of its multitudinous "monitors". The system devised a generalized ChannelConfig to uniformly consolidate messages flowing from terminal IM software.

7.1 Protocol Transformation: Webhook to ChatMessage

Taking Feishu as an example (backend/app/api/feishu.py):

  1. Event Recept: Receives encrypted Webhook POST requests (im.message.receive_v1) from the Feishu Open Platform.
  2. Identity Mapping: Utilizing the incoming open_id, queries the OrgMember table to reverse search the correlated User record bound to that employee within Clawith.
  3. Dispatch to Endpoint: Generates a native ChatMessage(role='user', source_channel='feishu') outfitted with standardized context. It is then dumped into the underlying LLM execution pool to be computed exactly as if it came from the Web interface.
  4. Packet Wrapping for Return: After model computations conclude realizing text or Markdown, underlying tools (send_feishu_message) or lifecycle hooks render it into rich text in reverse and send it back to Feishu.

This paradigm ensures that regardless of Slack, Discord, or Personal WeChat, backend large model execution logic demands absolutely zero modifications, altogether reusing the tenets of Module 3.


Module 8: Frontend Architecture & Real-Time Flow

As a high-tier collaborative whiteboard, the Frontend (frontend/) relentlessly pursues rendering efficiency and real-time feedback in engineering management.

8.1 Core Technology Nuances

  • Build & UI Frameworks: Vite buildup engine + React 18. In terms of aesthetic design, structurally adheres to Linear-Style beauty (aberrantly dark backdrops, hairline contours, translucent gaussian blurred frosted glass panels, Lucide vector icons). Primary constraints are globally defined inside index.css atomic variables.
  • Global State Control: Abandons heavy-duty frameworks like Redux in favor of the lightweight Zustand to script Hook Stores (rooted at frontend/src/stores/).
    • Example: useAuthStore manages JWT state persistence, User Authority, Multilingual Preference Locales (i18n).

8.2 The Typewriter Rendering Challenge Over WebSockets (AgentDetail.tsx)

At the AgentDetail.tsx page, the system battles extreme rendering pressure: outputs from the large model might be split into hundreds of minuscule Tokens. How do we achieve 60-FPS butter-smooth typewriter effects without throttling?

  1. Data Slice Interception: WebSocket events flooding from backend encompass multiple typologies: chunk (Textography), tool_call (Tools executing image-text), think (Deep thinking trajectories).
  2. Incremental Referencing (Refs vs State): The system forbids indiscriminately stuffing every single chunk cleanly into React's useState, avoiding triggering hundreds and thousands of comprehensive global repaints. Instead, it maintains a live queue pertaining to the active message currently generating, administering localized state assemblages or downloading renders via Throttle mechanisms.
  3. Markdown Rich-Text Cleansing: Employs react-markdown executing ultimate filtration presentations. Imposing Copy buttons over code snippet blocks; demonstrating imagified placeholder renders mapped against localized hyperlinks.

[The End] Architecture Document Completion.

Clawith Architecture Document Engine Edition. This document currently engulfs all core logic within the system. Whether pivoting underlying engine circuits, appending new Database tables, or drafting novel outbound calling channel pipelines, please persistently harbor reverence toward the Workspace/Tenant isolation barriers alongside Relationship object strictly bound constraints.