Clawd Cursor

AI Desktop Agent — Universal Smart Pipeline
Works with any AI provider · Runs free with local models · Self-healing doctor

Website · Discord · Quick Start · How It Works · API · Changelog

What's New in v0.6.3

Universal Pipeline, Multi-App Workflows, Provider-Agnostic.

🧠 LLM-based task pre-processor — one cheap text LLM call decomposes any command into structured intent. No more brittle regex parsing.
📋 Multi-app workflows — copy from Wikipedia, paste in Notepad? Works. 6-checkpoint tracking ensures every step completes (select → copy → switch app → click → paste → verify).
⌨️ Site-specific shortcuts — Reddit (j/k/a/c), Twitter/X, YouTube, Gmail, GitHub, Slack + generic hints. Vision LLM uses keyboard instead of slow mouse clicks.
🌐 OS-level browser detection — reads Windows registry or macOS LaunchServices for actual default browser. No hardcoded Edge/Safari.
🔄 3 smart verification retries — on failure, builds step log digest + checkpoint status so the vision LLM fixes the exact missed step.
🔌 Mixed-provider pipelines — kimi for text + anthropic for Computer Use, with per-layer API key resolution from OpenClaw auth-profiles.
🔧 Global install fix — config discovery now checks package dir first, then cwd.
🏗️ Provider-agnostic internals — no hardcoded model names, no hardcoded app lists, universal checkpoint detection.

What's New in v0.6.1

Keyboard Shortcuts, Pipeline Fixes, Better URL Handling.

⌨️ Keyboard shortcuts registry — common actions (scroll, copy, reddit upvote) execute as direct keystrokes. Zero LLM calls, instant.
🔧 Pipeline gate fix — Action Router now always runs, even for browser-context tasks. Shortcuts work everywhere.
🌐 Smarter URL extraction — "open gmail and send email to foo@bar.com" correctly navigates to Gmail instead of bar.com.
🔄 CDP→UIDriver fallback — Smart Interaction falls back to accessibility tree when browser CDP fails.
🛑 Reliable force-stop — clawdcursor stop kills lingering processes.
📊 Provider label inference — startup logs show text/vision providers clearly.

What's New in v0.6.0

Universal Provider Support, OpenClaw Integration, Security Hardening.

🔗 OpenClaw integration — auto-discovers all configured providers from OpenClaw's config. No separate API key needed when running as a skill.
🌐 Universal provider support — Anthropic, OpenAI, Groq, Together AI, DeepSeek, Kimi, Ollama, or any OpenAI-compatible endpoint. Provider auto-detected from API key format.
🧠 Mixed provider pipelines — use Ollama for text (free) + cloud for vision (best quality). Doctor picks the optimal split automatically.
🔒 Security hardened — sensitive app policy (agents must ask before email/banking/messaging), safety tiers enforced, no credentials stored in skill files.
🔧 Auto-detection as default — no hardcoded models or providers. Doctor dynamically picks the best available setup.

v0.5.6 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback

🧠 Fluid task decomposition — LLM reasons about what ANY app needs instead of matching hardcoded patterns.
🩺 Interactive doctor — scans all providers, detects GPU/VRAM, lets you pick TEXT and VISION LLMs.
🖥️ Smart vision fallback — remaining subtasks bundled and handed to vision when cheap layers fail midway.

v0.5.2 — Web Dashboard + Browser Foreground Focus

🖥️ Web Dashboard — real-time logs, approve/reject safety confirmations, kill switch. Dark theme, zero dependencies.
🪟 Browser foreground focus — Playwright activates Chrome at OS level. No more invisible background tabs.
Multi-provider — 7+ providers supported out of the box
95% cheaper — simple tasks run for $0 with local models
Self-healing — if a model fails, the pipeline adapts automatically

Performance

Task	v0.4 (single provider)	v0.5+ (local, $0)	v0.5+ (cloud)
Calculator (255*38=)	43s	2.6s	20.1s
Notepad (type hello)	73s	2.0s	54.2s
File Explorer	53s	1.9s	22.1s
Gmail compose	162s (18 LLM calls)	—	21.7s (1 LLM call)

OpenClaw Integration

Clawd Cursor ships as an OpenClaw skill. Install it and any OpenClaw agent — yours or community-built — can control your desktop through natural language.

The SKILL.md teaches agents when and how to use Clawd Cursor: REST API for full desktop control, CDP direct for fast browser reads. Agents learn to be independent — no more asking you to screenshot or copy-paste things they can do themselves.

For orchestration best practices (how to avoid overlap and keep OpenClaw + Clawd Cursor efficient), see docs/OPENCLAW-INTEGRATION-RECOMMENDATIONS.md.

# Install as OpenClaw skill
openclaw skills install clawd-cursor

Quick Start

Windows

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install
npm run setup      # builds + registers 'clawdcursor' command globally

# Just install and start — auto-configures from OpenClaw or env vars
clawdcursor start

# Or specify any provider
clawdcursor start --base-url https://api.example.com/v1 --api-key KEY

# Fine-tune setup interactively (optional)
clawdcursor doctor

macOS

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor && npm install && npm run setup

# Grant Accessibility permissions to your terminal first!
# System Settings → Privacy & Security → Accessibility → Add Terminal/iTerm

# Make macOS scripts executable
chmod +x scripts/mac/*.sh scripts/mac/*.jxa

# Just start — auto-detects available providers
clawdcursor start

# Or specify any provider
clawdcursor start --base-url https://api.example.com/v1 --api-key KEY

Linux

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor && npm install && npm run setup

# Linux: browser control via CDP only (no native desktop automation)
# Just start — auto-detects available providers
clawdcursor start

# Or specify any provider
clawdcursor start --base-url https://api.example.com/v1 --api-key KEY

📖 See docs/MACOS-SETUP.md for the full macOS onboarding guide.

First run auto-configuration will:

Scan for AI providers from OpenClaw config, environment variables, and CLI flags
Quick-test discovered providers (5s timeout per provider)
Build the optimal pipeline automatically
Save config and start immediately

The optional doctor command provides interactive configuration:

Tests your screen capture and accessibility bridge
Scans all AI providers (Anthropic, OpenAI, Groq, Together, DeepSeek, Kimi, Ollama) and detects GPU/VRAM
Tests each model and shows you what works with latency
Lets you pick your TEXT LLM and VISION LLM (or accept the recommended defaults)
Shows setup instructions for any unconfigured cloud providers
Builds your optimal pipeline and saves it

Send a task:

clawdcursor task "Open Notepad and type hello world"

# Or via API:
curl http://localhost:3847/task -H "Content-Type: application/json" \
  -d '{"task": "Open Notepad and type hello world"}'

Note: npm run setup runs npm run build && npm link, which registers clawdcursor as a global command. If you prefer not to link globally, run npm run build instead and use npx clawdcursor or node dist/index.js to run commands.

Provider Quick Setup

Free (no API key needed):

# Just need Ollama running with any model
ollama pull <model>   # e.g. qwen2.5:7b, llama3.2, gemma2
clawdcursor doctor
clawdcursor start

Any cloud provider:

echo "AI_API_KEY=your-key-here" > .env
clawdcursor doctor
clawdcursor start

Doctor auto-detects your provider from the key format. Supported out of the box:

Provider	Key prefix	Vision	Computer Use
Anthropic	`sk-ant-`	✅	✅
OpenAI	`sk-`	✅	❌
Groq	`gsk_`	✅	❌
Together AI	—	✅	❌
DeepSeek	—	✅	❌
Kimi/Moonshot	`sk-` (long)	❌	❌
Any OpenAI-compatible	—	varies	❌

For providers without key prefix detection, specify explicitly:

clawdcursor doctor --provider together --api-key YOUR_KEY

OpenClaw users: No setup needed — Clawd Cursor auto-discovers all your configured providers.

Compatibility (v0.6.0 Audit)

OS	Status	Notes
Windows 10/11	✅ Full support	Native desktop automation via PowerShell + UI Automation scripts.
macOS 13+	✅ Full support	Native desktop automation via JXA/System Events scripts.
Linux	⚠️ Partial support	Browser/CDP flows work. Native desktop automation requires X11 native libs (for `@nut-tree-fork/nut-js`) and may still vary by distro/desktop environment.

Linux prerequisites for native automation (Debian/Ubuntu example):

sudo apt-get update
sudo apt-get install -y libxtst6 libx11-xcb1 libxcomposite1 libxdamage1 libxfixes3 libxi6 libxrandr2 libxtst-dev

If these libraries are missing, clawdcursor doctor can fail on startup with errors like libXtst.so.6: cannot open shared object file.

How It Works

The Smart Pipeline

Every task is pre-processed by a cheap text LLM, then flows through up to 5 layers. Each layer is cheaper and faster than the next. Most tasks never reach Layer 3.

┌─────────────────────────────────────────────────────┐
│  Pre-processor: LLM Task Decomposition (1 text call) │
│  Parses any natural language → {app, navigate, task,  │
│  contextHints}. Opens app + navigates URL before      │
│  pipeline starts. Detects multi-app workflows.        │
├─────────────────────────────────────────────────────┤
│  Layer 0: Browser (Playwright — free, instant)       │
│  Direct browser control via CDP. page.goto(),        │
│  brings Chrome to foreground. Zero vision tokens.     │
├─────────────────────────────────────────────────────┤
│  Layer 1: Action Router + Shortcuts (instant, free)  │
│  Regex + UI Automation. "Open X", "type Y", "click Z"│
│  Includes keyboard shortcuts registry — common       │
│  actions like scroll, copy, undo, reddit upvote      │
│  execute as direct keystrokes. Zero LLM calls.       │
├─────────────────────────────────────────────────────┤
│  Layer 1.5: Smart Interaction (1 LLM call)           │
│  CDPDriver (browser) or UIDriver (desktop apps).     │
│  LLM plans steps → executes via selectors/a11y.      │
├─────────────────────────────────────────────────────┤
│  Layer 2: Accessibility Reasoner (fast, cheap/free)   │
│  Reads the accessibility tree, sends to cheap LLM     │
│  (Haiku, Qwen, GPT-4o-mini). No screenshots needed   │
├─────────────────────────────────────────────────────┤
│  Layer 3: Computer Use / Vision (powerful, expensive) │
│  Full screenshot → vision LLM with site-specific      │
│  shortcuts + scroll guidance + multi-app workflows.   │
│  3 smart verification retries with step log analysis. │
└─────────────────────────────────────────────────────┘

The doctor decides which layers are available based on your setup. No API key? Layers 0-2 with Ollama. Anthropic key? All layers with Computer Use.

Keyboard Shortcuts (Layer 1)

Clawd Cursor ships with a keyboard shortcuts registry. Common actions execute as direct keystrokes — no LLM calls, no screenshots, instant.

Category	Examples
Navigation	scroll up/down, page up/down, go back/forward
Editing	copy, paste, undo, redo, select all
Browser	new tab, close tab, refresh, find
Social	reddit upvote/downvote, next/prev post
System	minimize, maximize, switch window

Custom shortcuts can be added to src/shortcuts.ts. The action router uses fuzzy matching — "scroll the page down" maps to the scroll-down shortcut automatically.

Provider-Specific Behavior

Provider	Layer 1	Layer 2 (text)	Layer 3 (vision)	Computer Use
Anthropic	✅	Haiku	Sonnet	✅ Native
OpenAI	✅	GPT-4o-mini	GPT-4o	❌
Groq	✅	Llama 3.3 70B	Llama 3.2 90B Vision	❌
Together AI	✅	Llama 3.1 70B	Llama 3.2 90B Vision	❌
DeepSeek	✅	DeepSeek Chat	DeepSeek Chat	❌
Kimi	✅	Moonshot-8k	Moonshot-8k	❌
Ollama	✅	Auto-detected	Auto-detected	❌
No key	✅	❌	❌	❌

Mixed providers: Doctor can configure Ollama for text (free) + a cloud provider for vision (best quality). The pipeline picks the cheapest option for each layer automatically.

Self-Healing

The pipeline adapts at runtime:

Model fails? → Circuit breaker trips, falls to next layer
API rate limited? → Exponential backoff + automatic retry
Doctor detects issues? → Falls back to available alternatives (e.g., cloud model unavailable → local Ollama)

Doctor

npm run doctor

🩺 Clawd Cursor Doctor - diagnosing your setup...

📸 Screen capture...
   ✅ 2560x1440, 110ms
♿ Accessibility bridge...
   ✅ 20 windows detected, 822ms

🔍 Scanning providers...
   Anthropic:           ✅ key found (sk-ant-a...)
   OpenAI:              ❌ no key
   Groq:                ❌ no key
   Together AI:         ❌ no key
   DeepSeek:            ❌ no key
   Kimi (Moonshot):     ❌ no key
   Ollama (Local):      ✅ running (qwen2.5:7b, llama3.2)

   💡 Cloud providers not configured (add API keys to unlock):
      OpenAI: set OPENAI_API_KEY — https://platform.openai.com
      Groq: set GROQ_API_KEY — https://console.groq.com
      Together AI: set TOGETHER_API_KEY — https://api.together.xyz

   Testing models...
   Text:   claude-haiku-4-5 (Anthropic) ✅ 498ms
   Vision: claude-sonnet-4 (Anthropic) ✅ 1217ms
   Text:   qwen2.5:7b (Ollama) ✅ 4117ms

🎮 GPU detected: NVIDIA GeForce RTX 3080 (10240 MB VRAM)

🧩 Choose your pipeline models (press Enter for recommended).
   TEXT LLM (Layer 2):
   1. claude-haiku-4-5 (Anthropic, 498ms)
   2. qwen2.5:7b (Ollama, 4117ms) ★ recommended
   Pick 1-2 (Enter=2):

   VISION LLM (Layer 3):
   1. claude-sonnet-4 (Anthropic, 1217ms) ★ recommended
   Pick 1 (Enter=1):

🧠 Selected pipeline:
   Layer 1: Action Router (offline) ✅
   Layer 2: qwen2.5:7b via Ollama ✅
   Layer 3: claude-sonnet-4 via Anthropic ✅
   🖥️  Computer Use API: enabled

💾 Config saved to .clawd-config.json

Options:

--provider <name>   Force a provider (anthropic|openai|ollama|kimi)
--api-key <key>     Override API key
--no-save           Don't save config to disk

API Endpoints

http://localhost:3847

Endpoint	Method	Description
`/`	GET	Web dashboard UI
`/task`	POST	Execute a task: `{"task": "Open Chrome"}`
`/status`	GET	Agent state and current task
`/logs`	GET	Last 200 log entries (JSON array)
`/confirm`	POST	Approve/reject pending action
`/abort`	POST	Stop the current task
`/stop`	POST	Graceful server shutdown
`/health`	GET	Server health + version

Architecture

┌───────────────────────────────────────────────────┐
│           Your Desktop (Native Control)            │
│     @nut-tree-fork/nut-js · Playwright · OS-level  │
└──────────────────────┬────────────────────────────┘
                       │
┌──────────────────────┴────────────────────────────┐
│              Clawd Cursor Agent                    │
│                                                    │
│  ┌────────┐ ┌────────┐ ┌───────┐ ┌─────┐ ┌─────┐│
│  │Layer 0 │ │Layer 1 │ │L 1.5  │ │ L2  │ │ L3  ││
│  │Browser │→│Action  │→│Smart  │→│A11y │→│Vision││
│  │Playwrt │ │Router+ │ │Interac│ │Tree │ │+CU   ││
│  │(free)  │ │Shortct │ │(1 LLM)│ │(cheap│ │(full)││
│  └────────┘ └────────┘ └───────┘ └─────┘ └─────┘│
│       ↑                                            │
│  ┌──────────┐  ┌────────────────┐                 │
│  │ Doctor   │  │ Web Dashboard  │                 │
│  │ Auto-cfg │  │ localhost:3847 │                 │
│  └──────────┘  └────────────────┘                 │
│                                                    │
│  Safety Layer · REST API · Circuit Breaker         │
└────────────────────────────────────────────────────┘

Safety Tiers

Tier	Actions	Behavior
🟢 Auto	Navigation, reading, opening apps	Runs immediately
🟡 Preview	Typing, form filling	Logs before executing
🔴 Confirm	Sending messages, deleting, purchases	Pauses for approval

CLI Options

clawdcursor start        Start the agent
clawdcursor doctor       Diagnose and auto-configure
clawdcursor task <t>     Send a task to running agent
clawdcursor dashboard    Open the web dashboard in your browser
clawdcursor kill         Stop the running server
clawdcursor stop         Stop the running server

Options:
  --port <port>          API port (default: 3847)
  --provider <provider>  Auto-detected, or: anthropic|openai|ollama|groq|together|deepseek|kimi|...
  --model <model>        Override vision model
  --api-key <key>        AI provider API key
  --debug                Save screenshots to debug/ folder

Platform Support

Platform	UI Automation	Browser (CDP)	Status
Windows	PowerShell + .NET UI Automation	✅ Chrome/Edge	✅ Full support
macOS	JXA + System Events (Accessibility API)	✅ Chrome/Edge	✅ Full support
Linux	—	✅ Chrome/Edge (CDP only)	🔶 Browser only

Platform Notes

Windows: Uses powershell.exe + .NET UIAutomationClient for native app interaction. Shell chaining: cd dir; npm start
macOS: Uses osascript + JXA (JavaScript for Automation) + System Events. Requires Accessibility permissions. Shell chaining: cd dir && npm start. See docs/MACOS-SETUP.md.
Both: CDPDriver (browser automation) works identically — connects via WebSocket to localhost:9222.

Browser CDP Setup

# Windows (PowerShell)
Start-Process chrome --ArgumentList "--remote-debugging-port=9222"

# macOS (Bash)
open -a "Google Chrome" --args --remote-debugging-port=9222

# Edge on macOS
open -a "Microsoft Edge" --args --remote-debugging-port=9222

Prerequisites

Node.js 18+ (20+ recommended)
Windows: PowerShell (included with Windows)
macOS 13+: osascript (included), Accessibility permissions granted
AI API Key - optional. Works offline with Ollama or Action Router only.

Tech Stack

TypeScript · Node.js · @nut-tree-fork/nut-js · sharp · Express · Any OpenAI-compatible API · Anthropic Computer Use · Windows UI Automation · macOS Accessibility (JXA) · Ollama

License

MIT

clawdcursor.com

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
docs		docs
perf		perf
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AI-SHORTCUTS.md		AI-SHORTCUTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
shortcut-demo.md		shortcut-demo.md
test-shortcuts.js		test-shortcuts.js
tsconfig.json		tsconfig.json

License

AmrDab/clawd-cursor

Folders and files

Latest commit

History

Repository files navigation

Clawd Cursor

What's New in v0.6.3

What's New in v0.6.1

What's New in v0.6.0

v0.5.6 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback

v0.5.2 — Web Dashboard + Browser Foreground Focus

Performance

OpenClaw Integration

Quick Start

Windows

macOS

Linux

Provider Quick Setup

Compatibility (v0.6.0 Audit)

How It Works

The Smart Pipeline

Keyboard Shortcuts (Layer 1)

Provider-Specific Behavior

Self-Healing

Doctor

API Endpoints

Architecture

Safety Tiers

CLI Options

Platform Support

Platform Notes

Browser CDP Setup

Prerequisites

Tech Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages