Skip to content

feat: Browser Use best practices — click/type/state improvements#707

Merged
jackwener merged 4 commits intomainfrom
open-operator
Apr 2, 2026
Merged

feat: Browser Use best practices — click/type/state improvements#707
jackwener merged 4 commits intomainfrom
open-operator

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

Inspired by deep analysis of Browser Use, improve OpenCLI Operate's browser interaction layer:

  • Framework listener detection — detect React __reactProps$ onClick, Vue _vei, Angular ng-reflect-click in state output, so <div onClick> elements get [N] indices
  • Click CDP fallback — when JS el.click() fails, automatically fall back to CDP Input.dispatchMouseEvent with coordinates; add clickWithQuads() using DOM.getContentQuads/getBoxModel for precise inline element clicking
  • Type improvements — React-compatible native setter (HTMLInputElement.prototype.value), contenteditable support via execCommand('insertText'), autocomplete/combobox field detection with 400ms wait
  • SKILL.md overhaul — Critical Rules (state over screenshot, eval read-only, IIFE), Command Cost Guide, Action Chaining Rules

Files changed (6 files, +234/-54)

File Change
src/browser/dom-snapshot.ts hasFrameworkListener() for React/Vue/Angular
src/browser/dom-helpers.ts clickJs() returns coords; typeTextJs() native setter + contenteditable
src/browser/base-page.ts click() CDP fallback + error on both-fail
src/browser/page.ts tryNativeClick(), clickWithQuads()
src/cli.ts Autocomplete detection in type command
skills/opencli-operate/SKILL.md Critical Rules, Cost Guide, Chaining Rules

Test plan

  • npm run build passes
  • Twitter (React): state shows [N] on <div> buttons
  • Click fallback: test on Shadow DOM components
  • Type: React form value setting, Notion contenteditable
  • Autocomplete: Twitter search box shows "use state to see suggestions"

- Add Critical Rules section (state over screenshot, verify with get value)
- Add Command Cost Guide (free/instant vs expensive vision tokens)
- Add Action Chaining Rules (safe to chain vs page-changing)
- Add Tips section
- Fix Core Workflow to use state/get value for verification, not screenshot
- Mark screenshot as "ONLY for user deliverables"

Inspired by Browser Use's design: DOM-first state representation,
action cost awareness, and multi-action chaining patterns.
- Add rule: NEVER use eval to click/type — use click/type/select commands
  (eval bypasses scrollIntoView + CDP pipeline, fails on off-screen elements)
- Add rule: eval is read-only, always wrap in IIFE to avoid variable conflicts
- Reorder Critical Rules for priority
- Add IIFE example in Extract section

Root cause: Claude Code was using eval("el.click()") instead of
click <index>, and hitting "already declared" errors from repeated
eval calls in the same page context.
Inspired by deep analysis of Browser Use's design patterns:

1. Framework listener detection (React/Vue/Angular)
   - Detect __reactProps$ onClick, Vue _vei, Angular ng-reflect-click
   - Catches <div onClick> elements that pure ARIA/tag heuristics miss

2. Click CDP fallback
   - clickJs() now returns coordinates on failure
   - BasePage.click() falls back to CDP Input.dispatchMouseEvent
   - Page.clickWithQuads() uses DOM.getContentQuads for inline elements

3. Type improvements
   - React-compatible: use native HTMLInputElement.prototype.value setter
   - Contenteditable: selectAll + execCommand('insertText') for rich editors
   - Autocomplete: detect role=combobox, wait 400ms for dropdown suggestions

4. getContentQuads precise click
   - Page.clickWithQuads() for multi-line inline elements (e.g. wrapped <a>)
   - Falls back through getContentQuads → getBoxModel → JS click
1. clickWithQuads: escape ref with JSON.stringify before inserting into
   JS strings and CSS selectors (injection risk)
2. base-page click: throw error when both JS click and CDP fallback fail
   instead of silently succeeding
3. typeTextJs: use matching prototype for native setter
   (HTMLTextAreaElement for textarea, HTMLInputElement for input)
@jackwener jackwener merged commit de81773 into main Apr 2, 2026
11 checks passed
@jackwener jackwener deleted the open-operator branch April 2, 2026 18:38
just-buer pushed a commit to just-buer/opencli that referenced this pull request Apr 8, 2026
…kwener#707)

* docs: improve operate skill with Browser Use best practices

- Add Critical Rules section (state over screenshot, verify with get value)
- Add Command Cost Guide (free/instant vs expensive vision tokens)
- Add Action Chaining Rules (safe to chain vs page-changing)
- Add Tips section
- Fix Core Workflow to use state/get value for verification, not screenshot
- Mark screenshot as "ONLY for user deliverables"

Inspired by Browser Use's design: DOM-first state representation,
action cost awareness, and multi-action chaining patterns.

* docs: fix operate skill — eval read-only, IIFE, interaction rules

- Add rule: NEVER use eval to click/type — use click/type/select commands
  (eval bypasses scrollIntoView + CDP pipeline, fails on off-screen elements)
- Add rule: eval is read-only, always wrap in IIFE to avoid variable conflicts
- Reorder Critical Rules for priority
- Add IIFE example in Extract section

Root cause: Claude Code was using eval("el.click()") instead of
click <index>, and hitting "already declared" errors from repeated
eval calls in the same page context.

* feat: Browser Use best practices — click/type/state improvements

Inspired by deep analysis of Browser Use's design patterns:

1. Framework listener detection (React/Vue/Angular)
   - Detect __reactProps$ onClick, Vue _vei, Angular ng-reflect-click
   - Catches <div onClick> elements that pure ARIA/tag heuristics miss

2. Click CDP fallback
   - clickJs() now returns coordinates on failure
   - BasePage.click() falls back to CDP Input.dispatchMouseEvent
   - Page.clickWithQuads() uses DOM.getContentQuads for inline elements

3. Type improvements
   - React-compatible: use native HTMLInputElement.prototype.value setter
   - Contenteditable: selectAll + execCommand('insertText') for rich editors
   - Autocomplete: detect role=combobox, wait 400ms for dropdown suggestions

4. getContentQuads precise click
   - Page.clickWithQuads() for multi-line inline elements (e.g. wrapped <a>)
   - Falls back through getContentQuads → getBoxModel → JS click

* fix: address code review — injection, silent failure, setter prototype

1. clickWithQuads: escape ref with JSON.stringify before inserting into
   JS strings and CSS selectors (injection risk)
2. base-page click: throw error when both JS click and CDP fallback fail
   instead of silently succeeding
3. typeTextJs: use matching prototype for native setter
   (HTMLTextAreaElement for textarea, HTMLInputElement for input)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant