You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3
276
+
class CLAUDE,GPT,GEMINI,QWEN models
277
+
class L0,L1,L2 implemented
161
278
```
162
279
163
-
OpenAdapt:
164
-
- Records screenshots and user input events
165
-
- Trains ML models on demonstrations
166
-
- Generates and replays synthetic input via model completions
167
-
- Evaluates agents on GUI automation benchmarks
280
+
### Core Innovation: Demo-Conditioned Prompting
281
+
282
+
OpenAdapt's key differentiator is **demonstration-conditioned automation** - "show, don't tell":
283
+
284
+
| Traditional Agent | OpenAdapt Agent |
285
+
|-------------------|-----------------|
286
+
| User writes prompts | User records demonstration |
287
+
| Ambiguous instructions | Grounded in actual UI |
288
+
| Requires prompt engineering | No technical expertise needed |
289
+
| Context-free | Context from similar demos |
290
+
291
+
**Retrieval powers BOTH training AND evaluation**: Similar demonstrations are retrieved as context for the VLM, improving accuracy from 33% to 100% on first-action benchmarks.
168
292
169
-
**Key differentiators:**
170
-
1. Model agnostic - works with any LMM
171
-
2. Auto-prompted from human demonstration (not user-prompted)
172
-
3. Works with all desktop GUIs including virtualized and web
173
-
4. Open source (MIT license)
293
+
### Key Concepts
294
+
295
+
-**Policy/Grounding Separation**: The Policy decides *what* to do; Grounding determines *where* to do it
296
+
-**Safety Gate**: Runtime validation layer before action execution (confirm mode for high-risk actions)
297
+
-**Abstraction Ladder**: Progressive generalization from literal replay to goal-level automation
298
+
-**Evaluation-Driven Feedback**: Success traces become new training data
299
+
300
+
**Legend:** Solid = Implemented | Dashed = Future
174
301
175
302
---
176
303
177
-
## Key Concepts
304
+
## Terminology (Aligned with GUI Agent Literature)
305
+
306
+
| Term | Description |
307
+
|------|-------------|
308
+
|**Observation**| What the agent perceives (screenshot, accessibility tree) |
309
+
|**Action**| What the agent does (click, type, scroll, etc.) |
310
+
|**Trajectory**| Sequence of observation-action pairs |
311
+
|**Demonstration**| Human-provided example trajectory |
312
+
|**Policy**| Decision-making component that maps observations to actions |
313
+
|**Grounding**| Mapping intent to specific UI elements (coordinates) |
178
314
179
-
###Meta-Package Structure
315
+
## Meta-Package Structure
180
316
181
317
OpenAdapt v1.0+ uses a **modular architecture** where the main `openadapt` package acts as a meta-package that coordinates focused sub-packages:
182
318
183
-
-**Core Packages**: Essential for the main workflow
184
-
-`openadapt-capture` - Records screenshots and input events
185
-
-`openadapt-ml` - Trains models on demonstrations
186
-
-`openadapt-evals` - Evaluates agents on benchmarks
319
+
-**Core Packages**: Essential for the three-phase pipeline
320
+
-`openadapt-capture` - DEMONSTRATE phase: Collects observations and actions
321
+
-`openadapt-ml` - LEARN phase: Trains policies from demonstrations
322
+
-`openadapt-evals` - EXECUTE phase: Evaluates agents on benchmarks
187
323
188
324
-**Optional Packages**: Enhance specific workflow phases
189
-
-`openadapt-privacy` - Integrates at **Record** phase for PII/PHI scrubbing
190
-
-`openadapt-retrieval` - Integrates at **Train** phase for multimodal demo retrieval
191
-
-`openadapt-grounding` - Integrates at **Deploy** phase for UI element localization
325
+
-`openadapt-privacy` - DEMONSTRATE: PII/PHI scrubbing before storage
326
+
-`openadapt-retrieval` - LEARN + EXECUTE: Demo conditioning for both training and evaluation
327
+
-`openadapt-grounding` - EXECUTE: UI element localization (SoM, OmniParser)
192
328
193
-
-**Independent Components**:
194
-
-`openadapt-viewer` - HTML visualization that works with any phase
329
+
-**Cross-Cutting**:
330
+
-`openadapt-viewer` - Trajectory visualization at any phase
195
331
196
332
### Two Paths to Automation
197
333
198
-
1.**Custom Training Path**: Record demonstrations -> Train your own model -> Deploy agent
334
+
1.**Custom Training Path**: Demonstrate -> Train policy -> Deploy agent
199
335
- Best for: Repetitive tasks specific to your workflow
200
336
- Requires: `openadapt[core]`
201
337
202
-
2.**API Agent Path**: Use pre-trained LMM APIs (Claude, GPT-4V, etc.) -> Evaluate on benchmarks
338
+
2.**API Agent Path**: Use pre-trained VLM APIs (Claude, GPT-4V, etc.) with demo conditioning
203
339
- Best for: General-purpose automation, rapid prototyping
0 commit comments