Skip to content

Commit d97f86e

Browse files
abrichrclaude
andcommitted
docs: Add evolved architecture diagram to README and architecture-evolution.md
- Three-phase pipeline: DEMONSTRATE, LEARN, EXECUTE with modernized terminology - Demo-conditioned prompting as core innovation (show, don't tell) - Policy/Grounding separation in EXECUTE phase - Safety Gate as runtime layer with validation and risk assessment - Multi-source data ingestion (human demos, synthetic data, benchmarks) - Evaluation-driven feedback loops (success traces become training data) - Abstraction Ladder visualization (Literal -> Symbolic -> Template -> Semantic -> Goal) - Retrieval used in BOTH training AND evaluation for demo conditioning - Solid lines = implemented, dashed = future Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 9c1d02b commit d97f86e

File tree

2 files changed

+1475
-64
lines changed

2 files changed

+1475
-64
lines changed

README.md

Lines changed: 200 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -102,104 +102,240 @@ openadapt doctor Check system requirements
102102

103103
## How It Works
104104

105-
See the full [Architecture Documentation](docs/architecture.md) for detailed diagrams.
105+
See the full [Architecture Evolution](docs/architecture-evolution.md) for detailed documentation.
106+
107+
### Three-Phase Pipeline
106108

107109
```mermaid
108110
flowchart TB
109-
%% Main workflow phases
110-
subgraph Record["1. RECORD"]
111+
%% ═══════════════════════════════════════════════════════════════════════
112+
%% DATA SOURCES (Multi-Source Ingestion)
113+
%% ═══════════════════════════════════════════════════════════════════════
114+
subgraph DataSources["Data Sources"]
115+
direction LR
116+
HUMAN["Human Demos"]
117+
SYNTH["Synthetic Data"]:::future
118+
BENCH_DATA["Benchmark Tasks"]
119+
end
120+
121+
%% ═══════════════════════════════════════════════════════════════════════
122+
%% PHASE 1: DEMONSTRATE (Observation Collection)
123+
%% ═══════════════════════════════════════════════════════════════════════
124+
subgraph Demonstrate["1. DEMONSTRATE (Observation Collection)"]
111125
direction TB
112-
DEMO[User Demo] --> CAPTURE[openadapt-capture]
126+
CAP["Capture<br/>openadapt-capture"]
127+
PRIV["Privacy<br/>openadapt-privacy"]
128+
STORE[("Demo Library")]
129+
130+
CAP --> PRIV
131+
PRIV --> STORE
113132
end
114133
115-
subgraph Train["2. TRAIN"]
134+
%% ═══════════════════════════════════════════════════════════════════════
135+
%% PHASE 2: LEARN (Policy Acquisition)
136+
%% ═══════════════════════════════════════════════════════════════════════
137+
subgraph Learn["2. LEARN (Policy Acquisition)"]
116138
direction TB
117-
DATA[Captured Data] --> ML[openadapt-ml]
139+
140+
subgraph RetrievalPath["Retrieval Path"]
141+
EMB["Embed"]
142+
IDX["Index"]
143+
SEARCH["Search"]
144+
EMB --> IDX --> SEARCH
145+
end
146+
147+
subgraph TrainingPath["Training Path"]
148+
LOADER["Load"]
149+
TRAIN["Train"]
150+
CKPT[("Checkpoint")]
151+
LOADER --> TRAIN --> CKPT
152+
end
153+
154+
subgraph ProcessMining["Process Mining"]:::futureBlock
155+
ABSTRACT["Abstract"]:::future
156+
PATTERNS["Patterns"]:::future
157+
ABSTRACT --> PATTERNS
158+
end
118159
end
119160
120-
subgraph Deploy["3. DEPLOY"]
161+
%% ═══════════════════════════════════════════════════════════════════════
162+
%% PHASE 3: EXECUTE (Agent Deployment)
163+
%% ═══════════════════════════════════════════════════════════════════════
164+
subgraph Execute["3. EXECUTE (Agent Deployment)"]
121165
direction TB
122-
MODEL[Trained Model] --> AGENT[Agent Policy]
123-
AGENT --> REPLAY[Action Replay]
166+
167+
subgraph AgentCore["Agent Core"]
168+
OBS["Observe"]
169+
POLICY["Policy<br/>(Demo-Conditioned)"]
170+
GROUND["Grounding<br/>openadapt-grounding"]
171+
ACT["Act"]
172+
173+
OBS --> POLICY
174+
POLICY --> GROUND
175+
GROUND --> ACT
176+
end
177+
178+
subgraph SafetyGate["Safety Gate"]:::safetyBlock
179+
VALIDATE["Validate"]
180+
CONFIRM["Confirm"]:::future
181+
VALIDATE --> CONFIRM
182+
end
183+
184+
subgraph Evaluation["Evaluation"]
185+
EVALS["Evals<br/>openadapt-evals"]
186+
METRICS["Metrics"]
187+
EVALS --> METRICS
188+
end
189+
190+
ACT --> VALIDATE
191+
VALIDATE --> EVALS
124192
end
125193
126-
subgraph Evaluate["4. EVALUATE"]
194+
%% ═══════════════════════════════════════════════════════════════════════
195+
%% THE ABSTRACTION LADDER (Side Panel)
196+
%% ═══════════════════════════════════════════════════════════════════════
197+
subgraph AbstractionLadder["Abstraction Ladder"]
127198
direction TB
128-
BENCH[Benchmarks] --> EVALS[openadapt-evals]
129-
EVALS --> METRICS[Metrics]
199+
L0["Literal<br/>(Raw Events)"]
200+
L1["Symbolic<br/>(Semantic Actions)"]
201+
L2["Template<br/>(Parameterized)"]
202+
L3["Semantic<br/>(Intent)"]:::future
203+
L4["Goal<br/>(Task Spec)"]:::future
204+
205+
L0 --> L1
206+
L1 --> L2
207+
L2 -.-> L3
208+
L3 -.-> L4
209+
end
210+
211+
%% ═══════════════════════════════════════════════════════════════════════
212+
%% MODEL LAYER
213+
%% ═══════════════════════════════════════════════════════════════════════
214+
subgraph Models["Model Layer (VLMs)"]
215+
direction LR
216+
CLAUDE["Claude"]
217+
GPT["GPT-4o"]
218+
GEMINI["Gemini"]
219+
QWEN["Qwen-VL"]
130220
end
131221
132-
%% Main flow connections
133-
CAPTURE --> DATA
134-
ML --> MODEL
135-
AGENT --> BENCH
136-
137-
%% Viewer - independent component
138-
VIEWER[openadapt-viewer]
139-
VIEWER -.->|"view at any phase"| Record
140-
VIEWER -.->|"view at any phase"| Train
141-
VIEWER -.->|"view at any phase"| Deploy
142-
VIEWER -.->|"view at any phase"| Evaluate
143-
144-
%% Optional packages with integration points
145-
PRIVACY[openadapt-privacy]
146-
RETRIEVAL[openadapt-retrieval]
147-
GROUNDING[openadapt-grounding]
148-
149-
PRIVACY -.->|"PII/PHI scrubbing"| CAPTURE
150-
RETRIEVAL -.->|"demo retrieval"| ML
151-
GROUNDING -.->|"UI localization"| REPLAY
152-
153-
%% Styling
154-
classDef corePhase fill:#e1f5fe,stroke:#01579b
155-
classDef optionalPkg fill:#fff3e0,stroke:#e65100,stroke-dasharray: 5 5
156-
classDef viewerPkg fill:#e8f5e9,stroke:#2e7d32,stroke-dasharray: 3 3
157-
158-
class Record,Train,Deploy,Evaluate corePhase
159-
class PRIVACY,RETRIEVAL,GROUNDING optionalPkg
160-
class VIEWER viewerPkg
222+
%% ═══════════════════════════════════════════════════════════════════════
223+
%% MAIN DATA FLOW
224+
%% ═══════════════════════════════════════════════════════════════════════
225+
226+
%% Data sources feed into phases
227+
HUMAN --> CAP
228+
SYNTH -.-> LOADER
229+
BENCH_DATA --> EVALS
230+
231+
%% Demo library feeds learning
232+
STORE --> EMB
233+
STORE --> LOADER
234+
STORE -.-> ABSTRACT
235+
236+
%% Learning outputs feed execution
237+
SEARCH -->|"demo context"| POLICY
238+
CKPT -->|"trained policy"| POLICY
239+
PATTERNS -.->|"templates"| POLICY
240+
241+
%% Model connections
242+
POLICY --> Models
243+
GROUND --> Models
244+
245+
%% ═══════════════════════════════════════════════════════════════════════
246+
%% FEEDBACK LOOPS (Evaluation-Driven)
247+
%% ═══════════════════════════════════════════════════════════════════════
248+
METRICS -->|"success traces"| STORE
249+
METRICS -.->|"training signal"| TRAIN
250+
251+
%% Retrieval in BOTH training AND evaluation
252+
SEARCH -->|"eval conditioning"| EVALS
253+
254+
%% ═══════════════════════════════════════════════════════════════════════
255+
%% STYLING
256+
%% ═══════════════════════════════════════════════════════════════════════
257+
258+
%% Phase colors
259+
classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff
260+
classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff
261+
classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff
262+
263+
%% Component states
264+
classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff
265+
classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5
266+
classDef futureBlock fill:#f5f5f5,stroke:#95A5A6,stroke-dasharray: 5 5
267+
classDef safetyBlock fill:#E74C3C,stroke:#A93226,color:#fff
268+
269+
%% Model layer
270+
classDef models fill:#F39C12,stroke:#B7950B,color:#fff
271+
272+
%% Apply styles
273+
class CAP,PRIV,STORE phase1
274+
class EMB,IDX,SEARCH,LOADER,TRAIN,CKPT phase2
275+
class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3
276+
class CLAUDE,GPT,GEMINI,QWEN models
277+
class L0,L1,L2 implemented
161278
```
162279

163-
OpenAdapt:
164-
- Records screenshots and user input events
165-
- Trains ML models on demonstrations
166-
- Generates and replays synthetic input via model completions
167-
- Evaluates agents on GUI automation benchmarks
280+
### Core Innovation: Demo-Conditioned Prompting
281+
282+
OpenAdapt's key differentiator is **demonstration-conditioned automation** - "show, don't tell":
283+
284+
| Traditional Agent | OpenAdapt Agent |
285+
|-------------------|-----------------|
286+
| User writes prompts | User records demonstration |
287+
| Ambiguous instructions | Grounded in actual UI |
288+
| Requires prompt engineering | No technical expertise needed |
289+
| Context-free | Context from similar demos |
290+
291+
**Retrieval powers BOTH training AND evaluation**: Similar demonstrations are retrieved as context for the VLM, improving accuracy from 33% to 100% on first-action benchmarks.
168292

169-
**Key differentiators:**
170-
1. Model agnostic - works with any LMM
171-
2. Auto-prompted from human demonstration (not user-prompted)
172-
3. Works with all desktop GUIs including virtualized and web
173-
4. Open source (MIT license)
293+
### Key Concepts
294+
295+
- **Policy/Grounding Separation**: The Policy decides *what* to do; Grounding determines *where* to do it
296+
- **Safety Gate**: Runtime validation layer before action execution (confirm mode for high-risk actions)
297+
- **Abstraction Ladder**: Progressive generalization from literal replay to goal-level automation
298+
- **Evaluation-Driven Feedback**: Success traces become new training data
299+
300+
**Legend:** Solid = Implemented | Dashed = Future
174301

175302
---
176303

177-
## Key Concepts
304+
## Terminology (Aligned with GUI Agent Literature)
305+
306+
| Term | Description |
307+
|------|-------------|
308+
| **Observation** | What the agent perceives (screenshot, accessibility tree) |
309+
| **Action** | What the agent does (click, type, scroll, etc.) |
310+
| **Trajectory** | Sequence of observation-action pairs |
311+
| **Demonstration** | Human-provided example trajectory |
312+
| **Policy** | Decision-making component that maps observations to actions |
313+
| **Grounding** | Mapping intent to specific UI elements (coordinates) |
178314

179-
### Meta-Package Structure
315+
## Meta-Package Structure
180316

181317
OpenAdapt v1.0+ uses a **modular architecture** where the main `openadapt` package acts as a meta-package that coordinates focused sub-packages:
182318

183-
- **Core Packages**: Essential for the main workflow
184-
- `openadapt-capture` - Records screenshots and input events
185-
- `openadapt-ml` - Trains models on demonstrations
186-
- `openadapt-evals` - Evaluates agents on benchmarks
319+
- **Core Packages**: Essential for the three-phase pipeline
320+
- `openadapt-capture` - DEMONSTRATE phase: Collects observations and actions
321+
- `openadapt-ml` - LEARN phase: Trains policies from demonstrations
322+
- `openadapt-evals` - EXECUTE phase: Evaluates agents on benchmarks
187323

188324
- **Optional Packages**: Enhance specific workflow phases
189-
- `openadapt-privacy` - Integrates at **Record** phase for PII/PHI scrubbing
190-
- `openadapt-retrieval` - Integrates at **Train** phase for multimodal demo retrieval
191-
- `openadapt-grounding` - Integrates at **Deploy** phase for UI element localization
325+
- `openadapt-privacy` - DEMONSTRATE: PII/PHI scrubbing before storage
326+
- `openadapt-retrieval` - LEARN + EXECUTE: Demo conditioning for both training and evaluation
327+
- `openadapt-grounding` - EXECUTE: UI element localization (SoM, OmniParser)
192328

193-
- **Independent Components**:
194-
- `openadapt-viewer` - HTML visualization that works with any phase
329+
- **Cross-Cutting**:
330+
- `openadapt-viewer` - Trajectory visualization at any phase
195331

196332
### Two Paths to Automation
197333

198-
1. **Custom Training Path**: Record demonstrations -> Train your own model -> Deploy agent
334+
1. **Custom Training Path**: Demonstrate -> Train policy -> Deploy agent
199335
- Best for: Repetitive tasks specific to your workflow
200336
- Requires: `openadapt[core]`
201337

202-
2. **API Agent Path**: Use pre-trained LMM APIs (Claude, GPT-4V, etc.) -> Evaluate on benchmarks
338+
2. **API Agent Path**: Use pre-trained VLM APIs (Claude, GPT-4V, etc.) with demo conditioning
203339
- Best for: General-purpose automation, rapid prototyping
204340
- Requires: `openadapt[evals]`
205341

0 commit comments

Comments
 (0)