OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.
Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.
Join us on Discord | Documentation | OpenAdapt.ai
OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:
| Package | Description | Repository |
|---|---|---|
openadapt |
Meta-package with unified CLI | This repo |
openadapt-capture |
Event recording and storage | openadapt-capture |
openadapt-ml |
ML engine, training, inference | openadapt-ml |
openadapt-evals |
Benchmark evaluation | openadapt-evals |
openadapt-viewer |
HTML visualization | openadapt-viewer |
openadapt-grounding |
UI element localization | openadapt-grounding |
openadapt-retrieval |
Multimodal demo retrieval | openadapt-retrieval |
openadapt-privacy |
PII/PHI scrubbing | openadapt-privacy |
openadapt-wright |
Dev automation | openadapt-wright |
openadapt-herald |
Social media from git history | openadapt-herald |
openadapt-crier |
Telegram approval bot | openadapt-crier |
openadapt-consilium |
Multi-model consensus | openadapt-consilium |
openadapt-tray |
System tray app | openadapt-tray |
openadapt-agent |
Production execution engine | openadapt-agent |
openadapt-telemetry |
Error tracking | openadapt-telemetry |
Install what you need:
pip install openadapt # Minimal CLI only
pip install openadapt[capture] # GUI capture/recording
pip install openadapt[ml] # ML training and inference
pip install openadapt[evals] # Benchmark evaluation
pip install openadapt[privacy] # PII/PHI scrubbing
pip install openadapt[all] # EverythingRequirements: Python 3.10+
openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stopopenadapt train start --capture my-task --model qwen3vl-2bopenadapt eval run --checkpoint training_output/model.pt --benchmark waaopenadapt capture view my-task| Package | Description | Repository |
|---|---|---|
openadapt |
Meta-package with unified CLI | This repo |
openadapt-capture |
Event recording and storage | openadapt-capture |
openadapt-ml |
ML engine, training, inference | openadapt-ml |
openadapt-evals |
Benchmark evaluation | openadapt-evals |
openadapt-viewer |
HTML visualization | openadapt-viewer |
openadapt-grounding |
UI element localization | openadapt-grounding |
openadapt-retrieval |
Multimodal demo retrieval | openadapt-retrieval |
openadapt-privacy |
PII/PHI scrubbing | openadapt-privacy |
| Package | Description | Repository |
|---|---|---|
openadapt-tray |
System tray app | openadapt-tray |
openadapt-agent |
Production execution engine | openadapt-agent |
openadapt-wright |
Dev automation | openadapt-wright |
openadapt-herald |
Social media from git history | openadapt-herald |
openadapt-crier |
Telegram approval bot | openadapt-crier |
openadapt-consilium |
Multi-model consensus | openadapt-consilium |
openadapt-telemetry |
Error tracking | openadapt-telemetry |
openadapt capture start --name <name> Start recording
openadapt capture stop Stop recording
openadapt capture list List captures
openadapt capture view <name> Open capture viewer
openadapt train start --capture <name> Train model on capture
openadapt train status Check training progress
openadapt train stop Stop training
openadapt eval run --checkpoint <path> Evaluate trained model
openadapt eval run --agent api-claude Evaluate API agent
openadapt eval mock --tasks 10 Run mock evaluation
openadapt serve --port 8080 Start dashboard server
openadapt version Show installed versions
openadapt doctor Check system requirements
See the full Architecture Evolution for detailed documentation.
OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:
1. DEMONSTRATE (Observation Collection)
- Capture: Record user actions and screenshots with
openadapt-capture - Privacy: Scrub PII/PHI from recordings with
openadapt-privacy - Store: Build a searchable demonstration library
2. LEARN (Policy Acquisition)
- Retrieval Path: Embed demonstrations, index them, and enable semantic search
- Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
- Abstraction: Progress from literal replay to template-based automation
3. EXECUTE (Agent Deployment)
- Observe: Take screenshots and gather accessibility information
- Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
- Ground: Map intentions to specific UI coordinates with
openadapt-grounding - Act: Execute validated actions with safety gates
- Evaluate: Measure success with
openadapt-evalsand feed results back for improvement
OpenAdapt explores demonstration-conditioned automation - "show, don't tell":
| Traditional Agent | OpenAdapt Agent |
|---|---|
| User writes prompts | User records demonstration |
| Ambiguous instructions | Grounded in actual UI |
| Requires prompt engineering | Reduced prompt engineering |
| Context-free | Context from similar demos |
Retrieval powers BOTH training AND evaluation: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the publication roadmap for methodology and limitations.
- Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
- Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
- Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
- Evaluation-Driven Feedback: Success traces become new training data
Legend: Solid = Implemented | Dashed = Future
| Term | Description |
|---|---|
| Observation | What the agent perceives (screenshot, accessibility tree) |
| Action | What the agent does (click, type, scroll, etc.) |
| Trajectory | Sequence of observation-action pairs |
| Demonstration | Human-provided example trajectory |
| Policy | Decision-making component that maps observations to actions |
| Grounding | Mapping intent to specific UI elements (coordinates) |
- https://twitter.com/abrichr/status/1784307190062342237
- https://www.loom.com/share/9d77eb7028f34f7f87c6661fb758d1c0
macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.
Windows: Run as Administrator if needed for input capture.
The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.
To use the legacy version:
pip install openadapt==0.46.0See docs/LEGACY_FREEZE.md for migration guide and details.
- Join Discord
- Pick an issue from the relevant sub-package repository
- Submit a PR
For sub-package development:
git clone https://github.com/OpenAdaptAI/openadapt-ml # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"- OpenAdaptAI/SoM - Set-of-Mark prompting
- OpenAdaptAI/pynput - Input monitoring fork
- OpenAdaptAI/atomacos - macOS accessibility
- Discord: https://discord.gg/yF527cQbDG
- Issues: Use the relevant sub-package repository
- Architecture docs: GitHub Wiki
MIT License - see LICENSE for details.