Skip to content

fix: SVG className crash + viewport expansion + test suites#733

Merged
jackwener merged 10 commits intomainfrom
feat/v2ex-autoresearch
Apr 3, 2026
Merged

fix: SVG className crash + viewport expansion + test suites#733
jackwener merged 10 commits intomainfrom
feat/v2ex-autoresearch

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

Critical bug fix + AutoResearch test infrastructure.

Critical Fix: SVG className crash in dom-snapshot

isSearchElement() called el.className.toLowerCase() which crashes on SVG elements where className is SVGAnimatedString (not a string). This caused the entire DOM snapshot to fail and fall back to the basic accessibility tree on any page with SVG icons (Zhihu, Twitter, most React sites).

Fix: typeof el.className === 'string' ? el.className : el.className?.baseVal || ''

Impact on Zhihu hot page:

  • Before: 50 interactive elements (accessibility tree fallback), 1/30 hot links indexed
  • After: 597 interactive elements (proper DOM snapshot), 19/30 hot links indexed
  • E2E task: 25 turns → 14 turns (-44%), $0.75 → $0.37 (-51%)

Other Changes

  • viewportExpand: 800 → 2000 (covers ~3 screens instead of ~1)
  • DEBUG_SNAPSHOT env var for snapshot failure debugging
  • extractVerdict: rewrite with brace-counting JSON.parse (handles escaped quotes)
  • runCommand: include stderr in error output
  • V2EX test suite: 70 tasks (7 layers + edge cases + agent-style)
  • Zhihu test suite: 65 tasks (8 layers)
  • Combined eval runner + presets

Test plan

  • npm run build passes
  • Zhihu E2E: 14 turns (was 25)
  • V2EX E2E: 9-10 turns
  • 10/10 complex E2E tasks pass

jackwener added 10 commits April 3, 2026 17:09
AutoResearch framework (Karpathy-style autonomous iteration):
- engine.ts: 8-phase loop (review → modify → commit → verify → guard → decide → log)
- config.ts: typed config + CLI parser + metric extraction
- logger.ts: TSV append-only results log
- commands/run.ts: main loop spawning Claude Code per iteration
- commands/plan.ts: interactive config wizard
- commands/fix.ts: auto-detect broken state, iteratively fix
- commands/debug.ts: hypothesis-driven debugging for failing tasks

V2EX test suite (5 layers, 40 tasks):
- L1 Atomic (10): open, state, click, scroll, eval, back, wait
- L2 Single Page (10): hot topics, node list, topic meta, pagination
- L3 Multi-Step (10): click-read, navigate-node, tab-then-topic, pagination
- L4 Write Ops (5): reply typing, favorite detection, form detection
- L5 Complex Chain (5): cross-page collect, multi-node compare, full workflow

Presets: operate-reliability, skill-quality, v2ex-reliability
- Fix v2ex-collect-hot-authors selector (pathname-based member link detection)
- Fix v2ex-wait-text judge (accept "appeared")
- Fix trailing commas in eval step strings
- Add 20 harder tasks: state+click interaction + long chain workflows
- Baseline: 60/60 across all layers
Knowledge-intensive Chinese Q&A site (React SPA, lazy loading, complex DOM):

- L1 Atomic (10): open, state, title, url, scroll, tab, back, wait, keys, screenshot
- L2 Feed (8): feed titles, hot list, metrics, tabs, authors, content types, avatar, search
- L3 Question (8): title, meta, answer, votes, buttons, descriptions, answer count
- L4 Navigation (8): hot→question, feed→question, author profile, search, topic, user, back
- L5 Write (6): upvote/follow/comment/bookmark/write-answer/share button detection
- L6 Chain (8): read-answer-author, author-profile, multi-hot, search-then-read, scroll-answers
- L7 Search (6): basic, people, topic, click-result, filter, back
- L8 Complex (6): full workflow, deep author chain, cross-question, search-read, 3-page, scroll-deep

Key fixes during development:
- Zhihu search page needs 5s+ wait (SPA lazy loading)
- Back navigation goes to about:blank (daemon init page), fixed with direct navigate
- User profile answers page needs 4s wait for content
- Broader selectors needed (h2 a instead of specific class names)
…mple

Round 1: Fix 2 remaining browse-tasks failures:
- extract-npm-description: use generic <p> selector instead of class-based
- nav-click-link-example: include URL in output (title is 'Example Domains', not 'IANA')
…r year/rating

Round 2: IMDB page selectors were too specific (data-testid changed).
Use generic h1 for title, link text match for year, broader class match for rating.
Round 3: Add 10 edge case tasks (5 V2EX + 5 Zhihu):
- rapid-navigate: 3 consecutive opens
- eval-after-click: verify URL changes after SPA click
- scroll-and-extract: extract after deep scroll
- structured extraction: multi-field JSON from dynamic content
- lazy-load answers: scroll triggers more content

Key finding: Zhihu SPA click() doesn't update location.pathname
immediately. Use window.location.href = a.href for reliable navigation.

V2EX: 65/65, Zhihu: 65/65, Browse: 59/59 = 189/189
… eval for interaction)

Round 4-5: Add 5 tasks that test the actual agent workflow:
- agent-click-first-topic: find topic index via data-opencli-ref
- agent-type-search: type into search using state index
- agent-click-navigate-back: click by ref, verify navigation
- agent-state-has-interactive: verify state output format
- agent-state-after-scroll: verify scroll position in state

V2EX: 70/70 tasks
- eval-skill.ts: remove dead TASKS_FILE variable (skill-tasks.yaml never existed)
- eval-skill.ts: rewrite extractVerdict to use brace-counting JSON.parse
  instead of regex (handles escaped quotes in explanation)
- eval-browse.ts: include stderr in runCommand error output for debuggability
Critical bug: isSearchElement() called el.className.toLowerCase() which
crashes on SVG elements where className is SVGAnimatedString (not a string).
This caused the entire DOM snapshot to fail and fall back to the basic
accessibility tree, losing ALL interactive element indices.

Fix: use typeof check + baseVal fallback for SVG className.

Also:
- Increase viewportExpand from 800 to 2000 (covers ~3 screens)
- Add DEBUG_SNAPSHOT env var for snapshot failure debugging

Impact on Zhihu hot page:
- Before: 50 interactive elements (accessibility tree fallback), 1/30 hot links indexed
- After: 597 interactive elements (proper DOM snapshot), 19/30 hot links indexed
@jackwener jackwener merged commit ff84d19 into main Apr 3, 2026
13 checks passed
@jackwener jackwener deleted the feat/v2ex-autoresearch branch April 3, 2026 11:01
just-buer pushed a commit to just-buer/opencli that referenced this pull request Apr 8, 2026
…r#733)

* feat: AutoResearch framework + V2EX test suite (40 tasks)

AutoResearch framework (Karpathy-style autonomous iteration):
- engine.ts: 8-phase loop (review → modify → commit → verify → guard → decide → log)
- config.ts: typed config + CLI parser + metric extraction
- logger.ts: TSV append-only results log
- commands/run.ts: main loop spawning Claude Code per iteration
- commands/plan.ts: interactive config wizard
- commands/fix.ts: auto-detect broken state, iteratively fix
- commands/debug.ts: hypothesis-driven debugging for failing tasks

V2EX test suite (5 layers, 40 tasks):
- L1 Atomic (10): open, state, click, scroll, eval, back, wait
- L2 Single Page (10): hot topics, node list, topic meta, pagination
- L3 Multi-Step (10): click-read, navigate-node, tab-then-topic, pagination
- L4 Write Ops (5): reply typing, favorite detection, form detection
- L5 Complex Chain (5): cross-page collect, multi-node compare, full workflow

Presets: operate-reliability, skill-quality, v2ex-reliability

* test: V2EX test suite 60/60 — fix selectors, add harder tasks

- Fix v2ex-collect-hot-authors selector (pathname-based member link detection)
- Fix v2ex-wait-text judge (accept "appeared")
- Fix trailing commas in eval step strings
- Add 20 harder tasks: state+click interaction + long chain workflows
- Baseline: 60/60 across all layers

* feat: Zhihu test suite — 60 tasks across 8 layers, 60/60 passing

Knowledge-intensive Chinese Q&A site (React SPA, lazy loading, complex DOM):

- L1 Atomic (10): open, state, title, url, scroll, tab, back, wait, keys, screenshot
- L2 Feed (8): feed titles, hot list, metrics, tabs, authors, content types, avatar, search
- L3 Question (8): title, meta, answer, votes, buttons, descriptions, answer count
- L4 Navigation (8): hot→question, feed→question, author profile, search, topic, user, back
- L5 Write (6): upvote/follow/comment/bookmark/write-answer/share button detection
- L6 Chain (8): read-answer-author, author-profile, multi-hot, search-then-read, scroll-answers
- L7 Search (6): basic, people, topic, click-result, filter, back
- L8 Complex (6): full workflow, deep author chain, cross-question, search-read, 3-page, scroll-deep

Key fixes during development:
- Zhihu search page needs 5s+ wait (SPA lazy loading)
- Back navigation goes to about:blank (daemon init page), fixed with direct navigate
- User profile answers page needs 4s wait for content
- Broader selectors needed (h2 a instead of specific class names)

* feat: combined eval-all runner + combined-reliability preset

* experiment(operate): fix extract-npm-description + nav-click-link-example

Round 1: Fix 2 remaining browse-tasks failures:
- extract-npm-description: use generic <p> selector instead of class-based
- nav-click-link-example: include URL in output (title is 'Example Domains', not 'IANA')

* experiment(operate): fix bench-imdb-matrix — use broader selectors for year/rating

Round 2: IMDB page selectors were too specific (data-testid changed).
Use generic h1 for title, link text match for year, broader class match for rating.

* experiment(operate): add edge cases + fix SPA navigation timing

Round 3: Add 10 edge case tasks (5 V2EX + 5 Zhihu):
- rapid-navigate: 3 consecutive opens
- eval-after-click: verify URL changes after SPA click
- scroll-and-extract: extract after deep scroll
- structured extraction: multi-field JSON from dynamic content
- lazy-load answers: scroll triggers more content

Key finding: Zhihu SPA click() doesn't update location.pathname
immediately. Use window.location.href = a.href for reliable navigation.

V2EX: 65/65, Zhihu: 65/65, Browse: 59/59 = 189/189

* experiment(operate): add agent-style tasks using state+click+type (no eval for interaction)

Round 4-5: Add 5 tasks that test the actual agent workflow:
- agent-click-first-topic: find topic index via data-opencli-ref
- agent-type-search: type into search using state index
- agent-click-navigate-back: click by ref, verify navigation
- agent-state-has-interactive: verify state output format
- agent-state-after-scroll: verify scroll position in state

V2EX: 70/70 tasks

* fix: review fixes — extractVerdict, stderr, dead code

- eval-skill.ts: remove dead TASKS_FILE variable (skill-tasks.yaml never existed)
- eval-skill.ts: rewrite extractVerdict to use brace-counting JSON.parse
  instead of regex (handles escaped quotes in explanation)
- eval-browse.ts: include stderr in runCommand error output for debuggability

* fix: SVG className crash in dom-snapshot + viewport expansion

Critical bug: isSearchElement() called el.className.toLowerCase() which
crashes on SVG elements where className is SVGAnimatedString (not a string).
This caused the entire DOM snapshot to fail and fall back to the basic
accessibility tree, losing ALL interactive element indices.

Fix: use typeof check + baseVal fallback for SVG className.

Also:
- Increase viewportExpand from 800 to 2000 (covers ~3 screens)
- Add DEBUG_SNAPSHOT env var for snapshot failure debugging

Impact on Zhihu hot page:
- Before: 50 interactive elements (accessibility tree fallback), 1/30 hot links indexed
- After: 597 interactive elements (proper DOM snapshot), 19/30 hot links indexed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant