Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 58 additions & 21 deletions skills/opencli-operate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Make websites accessible for AI agents. Navigate, click, type, extr
allowed-tools: Bash(opencli:*), Read, Edit, Write
---

# OpenCLI — Make Websites Accessible for AI Agents
# OpenCLI Operate — Browser Automation for AI Agents

Control Chrome step-by-step via CLI. Reuses existing login sessions — no passwords needed.

Expand All @@ -16,25 +16,49 @@ opencli doctor # Verify extension + daemon connectivity

Requires: Chrome running + OpenCLI Browser Bridge extension installed.

## Quickstart for AI Agents (1 step)
## Critical Rules

Point your AI agent to this file. It contains everything needed to operate browsers.
1. **ALWAYS use `state` to inspect the page, NEVER use `screenshot`** — `state` returns structured DOM with `[N]` element indices, is instant and costs zero tokens. `screenshot` requires vision processing and is slow. Only use `screenshot` when the user explicitly asks to save a visual.
2. **ALWAYS use `click`/`type`/`select` for interaction, NEVER use `eval` to click or type** — `eval "el.click()"` bypasses scrollIntoView and CDP click pipeline, causing failures on off-screen elements. Use `state` to find the `[N]` index, then `click <N>`.
3. **Verify inputs with `get value`, not screenshots** — after `type`, run `get value <index>` to confirm.
4. **Run `state` after every page change** — after `open`, `click` (on links), `scroll`, always run `state` to see the new elements and their indices. Never guess indices.
5. **Chain safe commands with `&&`** — `type 3 "a" && type 4 "b" && click 7` is one call instead of three. But always run `state` first to get correct indices before chaining.
6. **`eval` is read-only** — use `eval` ONLY for data extraction (`JSON.stringify(...)`), never for clicking, typing, or navigating. Always wrap in IIFE to avoid variable conflicts: `eval "(function(){ const x = ...; return JSON.stringify(x); })()"`.
7. **Prefer `network` to discover APIs** — most sites have JSON APIs. API-based adapters are more reliable than DOM scraping.

## Quickstart for Humans (3 steps)
## Command Cost Guide

| Cost | Commands | When to use |
|------|----------|-------------|
| **Free & instant** | `state`, `get *`, `eval`, `network`, `scroll`, `keys` | Default — use these |
| **Free but changes page** | `open`, `click`, `type`, `select`, `back` | Interaction — run `state` after |
| **Expensive (vision tokens)** | `screenshot` | ONLY when user needs a saved image |

## Action Chaining Rules

Commands can be chained with `&&`. The browser persists via daemon, so chaining is safe.

**Safe to chain** — these don't change the page structure:
```bash
npm install -g @jackwener/opencli # 1. Install
# Install extension from chrome://extensions # 2. Load extension
opencli operate open https://example.com # 3. Go!
# Fill multiple fields then submit
opencli operate type 3 "hello" && opencli operate type 4 "world" && opencli operate click 7

# Open and inspect
opencli operate open https://example.com && opencli operate state
```

**Page-changing — always put last** in a chain (subsequent commands see stale indices):
- `open <url>`, `back`, `click <link/button that navigates>`

**Rule**: Chain when you already know the indices. Run `state` separately when you need to discover indices first.

## Core Workflow

1. **Navigate**: `opencli operate open <url>`
2. **Inspect**: `opencli operate state` → see elements with `[N]` indices
2. **Inspect**: `opencli operate state` → elements with `[N]` indices
3. **Interact**: use indices — `click`, `type`, `select`, `keys`
4. **Wait**: `opencli operate wait selector ".loaded"` or `wait text "Success"`
5. **Verify**: `opencli operate get title` or `opencli operate screenshot`
4. **Wait** (if needed): `opencli operate wait selector ".loaded"` or `wait text "Success"`
5. **Verify**: `opencli operate state` or `opencli operate get value <N>`
6. **Repeat**: browser stays open between commands
7. **Save**: write a TS adapter to `~/.opencli/clis/<site>/<command>.ts`

Expand All @@ -43,26 +67,26 @@ opencli operate open https://example.com # 3. Go!
### Navigation

```bash
opencli operate open <url> # Open URL
opencli operate back # Go back
opencli operate open <url> # Open URL (page-changing)
opencli operate back # Go back (page-changing)
opencli operate scroll down # Scroll (up/down, --amount N)
opencli operate scroll up --amount 1000
```

### Inspect
### Inspect (free & instant)

```bash
opencli operate state # Elements with [N] indices
opencli operate screenshot [path.png] # Screenshot
opencli operate state # Structured DOM with [N] indices — PRIMARY tool
opencli operate screenshot [path.png] # Save visual to file — ONLY for user deliverables
```

### Get (structured data)
### Get (free & instant)

```bash
opencli operate get title # Page title
opencli operate get url # Current URL
opencli operate get text <index> # Element text content
opencli operate get value <index> # Input/textarea value
opencli operate get value <index> # Input/textarea value (use to verify after type)
opencli operate get html # Full page HTML
opencli operate get html --selector "h1" # Scoped HTML
opencli operate get attributes <index> # Element attributes
Expand All @@ -86,11 +110,16 @@ opencli operate wait text "Success" # Wait for text
opencli operate wait time 3 # Wait N seconds
```

### Extract
### Extract (free & instant, read-only)

Use `eval` ONLY for reading data. Never use it to click, type, or navigate.

```bash
opencli operate eval "document.title"
opencli operate eval "JSON.stringify([...document.querySelectorAll('h2')].map(e => e.textContent))"

# IMPORTANT: wrap complex logic in IIFE to avoid "already declared" errors
opencli operate eval "(function(){ const items = [...document.querySelectorAll('.item')]; return JSON.stringify(items.map(e => e.textContent)); })()"
```

### Network (API Discovery)
Expand Down Expand Up @@ -128,8 +157,7 @@ opencli operate close
```bash
opencli operate open https://httpbin.org/forms/post
opencli operate state # See [3] input "Customer Name", [4] input "Telephone"
opencli operate type 3 "OpenCLI"
opencli operate type 4 "555-0100"
opencli operate type 3 "OpenCLI" && opencli operate type 4 "555-0100"
opencli operate get value 3 # Verify: "OpenCLI"
opencli operate close
```
Expand Down Expand Up @@ -204,10 +232,19 @@ Save to `~/.opencli/clis/<site>/<command>.ts` → immediately available as `open

**Always prefer API over UI** — if you discovered an API during browsing, use `fetch()` directly.

## Tips

1. **Always `state` first** — never guess element indices, always inspect first
2. **Sessions persist** — browser stays open between commands, no need to re-open
3. **Use `eval` for data extraction** — `eval "JSON.stringify(...)"` is faster than multiple `get` calls
4. **Use `network` to find APIs** — JSON APIs are more reliable than DOM scraping
5. **Alias**: `opencli op` is shorthand for `opencli operate`

## Troubleshooting

| Error | Fix |
|-------|-----|
| "Browser not connected" | Run `opencli doctor` |
| "attach failed: chrome-extension://" | Disable 1Password temporarily |
| Element not found | `opencli operate scroll down` then `opencli operate state` |
| Element not found | `opencli operate scroll down && opencli operate state` |
| Stale indices after page change | Run `opencli operate state` again to get fresh indices |
24 changes: 23 additions & 1 deletion src/browser/base-page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,29 @@ export abstract class BasePage implements IPage {
// ── Shared DOM helper implementations ──

async click(ref: string): Promise<void> {
await this.evaluate(clickJs(ref));
const result = await this.evaluate(clickJs(ref)) as
| string
| { status: string; x?: number; y?: number; w?: number; h?: number; error?: string }
| null;

// Backwards compat: old format returned 'clicked' string
if (typeof result === 'string' || result == null) return;

// JS click succeeded
if (result.status === 'clicked') return;

// JS click failed — try CDP native click if coordinates available
if (result.x != null && result.y != null) {
const success = await this.tryNativeClick(result.x, result.y);
if (success) return;
}

throw new Error(`Click failed: ${result.error ?? 'JS click and CDP fallback both failed'}`);
}

/** Override in subclasses with CDP native click support */
protected async tryNativeClick(_x: number, _y: number): Promise<boolean> {
return false;
}

async typeText(ref: string, text: string): Promise<void> {
Expand Down
74 changes: 43 additions & 31 deletions src/browser/dom-helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,64 +5,76 @@
* to eliminate code duplication for click, type, press, wait, scroll, etc.
*/

/** Generate JS to click an element by ref */
export function clickJs(ref: string): string {
const safeRef = JSON.stringify(ref);
/** Shared element lookup JS fragment (4-strategy resolution) */
function resolveElementJs(safeRef: string, selectorSet: string): string {
return `
(() => {
const ref = ${safeRef};
// 1. data-opencli-ref (set by snapshot engine)
let el = document.querySelector('[data-opencli-ref="' + ref + '"]');
// 2. data-ref (legacy)
if (!el) el = document.querySelector('[data-ref="' + ref + '"]');
// 3. CSS selector
if (!el && ref.match(/^[a-zA-Z#.\\[]/)) {
try { el = document.querySelector(ref); } catch {}
}
// 4. Numeric index into interactive elements
if (!el) {
const idx = parseInt(ref, 10);
if (!isNaN(idx)) {
el = document.querySelectorAll('a, button, input, select, textarea, [role="button"], [tabindex]:not([tabindex="-1"])')[idx];
el = document.querySelectorAll('${selectorSet}')[idx];
}
}
}`;
}

/** Generate JS to click an element by ref.
* Returns { status, x, y, w, h } for CDP fallback when JS click fails. */
export function clickJs(ref: string): string {
const safeRef = JSON.stringify(ref);
return `
(() => {
${resolveElementJs(safeRef, 'a, button, input, select, textarea, [role="button"], [tabindex]:not([tabindex="-1"])')}
if (!el) throw new Error('Element not found: ' + ref);
el.scrollIntoView({ behavior: 'instant', block: 'center' });
el.click();
return 'clicked';
const rect = el.getBoundingClientRect();
const x = Math.round(rect.left + rect.width / 2);
const y = Math.round(rect.top + rect.height / 2);
try {
el.click();
return { status: 'clicked', x, y, w: Math.round(rect.width), h: Math.round(rect.height) };
} catch (e) {
return { status: 'js_failed', x, y, w: Math.round(rect.width), h: Math.round(rect.height), error: e.message };
}
})()
`;
}

/** Generate JS to type text into an element by ref */
/** Generate JS to type text into an element by ref.
* Uses native setter for React compat + execCommand for contenteditable. */
export function typeTextJs(ref: string, text: string): string {
const safeRef = JSON.stringify(ref);
const safeText = JSON.stringify(text);
return `
(() => {
const ref = ${safeRef};
// 1. data-opencli-ref (set by snapshot engine)
let el = document.querySelector('[data-opencli-ref="' + ref + '"]');
// 2. data-ref (legacy)
if (!el) el = document.querySelector('[data-ref="' + ref + '"]');
// 3. CSS selector
if (!el && ref.match(/^[a-zA-Z#.\\[]/)) {
try { el = document.querySelector(ref); } catch {}
}
// 4. Numeric index into typeable elements
if (!el) {
const idx = parseInt(ref, 10);
if (!isNaN(idx)) {
el = document.querySelectorAll('input, textarea, [contenteditable="true"]')[idx];
}
}
${resolveElementJs(safeRef, 'input, textarea, [contenteditable="true"]')}
if (!el) throw new Error('Element not found: ' + ref);
el.focus();
if (el.isContentEditable) {
el.textContent = ${safeText};
// Select all content + delete, then insert (supports undo, works with rich text editors)
const sel = window.getSelection();
const range = document.createRange();
range.selectNodeContents(el);
sel.removeAllRanges();
sel.addRange(range);
document.execCommand('delete', false);
document.execCommand('insertText', false, ${safeText});
el.dispatchEvent(new Event('input', { bubbles: true }));
} else {
el.value = ${safeText};
// Use native setter for React/framework compatibility (match element type)
const proto = el instanceof HTMLTextAreaElement
? HTMLTextAreaElement.prototype
: HTMLInputElement.prototype;
const nativeSetter = Object.getOwnPropertyDescriptor(proto, 'value')?.set;
if (nativeSetter) {
nativeSetter.call(el, ${safeText});
} else {
el.value = ${safeText};
}
el.dispatchEvent(new Event('input', { bubbles: true }));
el.dispatchEvent(new Event('change', { bubbles: true }));
}
Expand Down
21 changes: 21 additions & 0 deletions src/browser/dom-snapshot.ts
Original file line number Diff line number Diff line change
Expand Up @@ -377,13 +377,34 @@ export function generateSnapshotJs(opts: DomSnapshotOptions = {}): string {
if (role && INTERACTIVE_ROLES.has(role)) return true;
if (el.hasAttribute('onclick') || el.hasAttribute('onmousedown') || el.hasAttribute('ontouchstart')) return true;
if (el.hasAttribute('tabindex') && el.getAttribute('tabindex') !== '-1') return true;
// Framework event listener detection (React/Vue/Angular onClick)
if (hasFrameworkListener(el)) return true;
try { if (window.getComputedStyle(el).cursor === 'pointer') return true; } catch {}
if (el.isContentEditable && el.getAttribute('contenteditable') !== 'false') return true;
// Search element heuristic detection
if (isSearchElement(el)) return true;
return false;
}

function hasFrameworkListener(el) {
try {
// React: __reactProps$xxx / __reactEvents$xxx with onClick/onMouseDown
for (const key of Object.keys(el)) {
if (key.startsWith('__reactProps$') || key.startsWith('__reactEvents$')) {
const props = el[key];
if (props && (props.onClick || props.onMouseDown || props.onPointerDown)) return true;
}
}
// Vue 3: _vei (Vue Event Invoker) with onClick
if (el._vei && (el._vei.onClick || el._vei.click || el._vei.onMousedown)) return true;
// Vue 2: __vue__ instance with $listeners
if (el.__vue__?.$listeners?.click) return true;
// Angular: ng-reflect-click binding
if (el.hasAttribute('ng-reflect-click')) return true;
} catch { /* ignore errors from cross-origin or frozen objects */ }
return false;
}

function isSearchElement(el) {
// Check class names for search indicators
const className = el.className?.toLowerCase() || '';
Expand Down
Loading
Loading