dev-browser/cli/llm-guide.txt at main · poolsideai/dev-browser · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
LLM USAGE GUIDE:
  Write small, focused scripts. Each script should do ONE thing: navigate, click, fill, or check.
  End each script by logging the state you need for the next decision.
  Use descriptive page names like "login", "checkout", or "results" instead of "page1".
  Named pages from browser.getPage("name") persist between script runs, so you usually do not need to re-navigate.
  Inside page.evaluate(...), write plain JavaScript only - no TypeScript syntax in the browser context.
  On Windows/PowerShell, use here-strings to pipe multiline scripts:
    @"
    const page = await browser.getPage("main");
    console.log(await page.title());
    "@ | dev-browser --connect

  Quick inspection:
    dev-browser --connect <<'EOF'
    const tabs = await browser.listPages();
    console.log(JSON.stringify(tabs, null, 2));
    EOF

    dev-browser --connect <<'EOF'
    const page = await browser.getPage("TARGET_ID_HERE");
    console.log(JSON.stringify({
      url: page.url(),
      title: await page.title(),
    }, null, 2));
    EOF

  AI snapshots for element discovery:
    dev-browser <<'EOF'
    const page = await browser.getPage("main");
    const result = await page.snapshotForAI();
    console.log(result.full);
    // Returns { full: string, incremental?: string }.
    // Optional args: { track?: string, depth?: number, timeout?: number }.
    // Read result.full to identify the right element.
    // Then interact with it using Playwright:
    // await page.getByRole("button", { name: "Continue" }).click();
    // Re-run page.snapshotForAI({ track: "main" }) after the page changes.
    EOF

  Choosing your approach:
    Unknown pages: use page.snapshotForAI() first to discover the page, then interact based on what you find.
    Known pages/selectors: skip the snapshot and use direct Playwright selectors like page.click(), page.fill(), or page.locator() for faster, more reliable automation.

  Screenshots for visual state:
    dev-browser <<'EOF'
    const page = await browser.getPage("main");
    const buf = await page.screenshot();
    const path = await saveScreenshot(buf, "debug.png");
    console.log(path);
    EOF

  Waiting patterns:
    dev-browser <<'EOF'
    const page = await browser.getPage("search-results");
    await page.waitForSelector(".results");
    await page.waitForURL("**/success");
    console.log(JSON.stringify({
      url: page.url(),
      title: await page.title(),
    }, null, 2));
    EOF

  Dev server navigation:
    For local dev servers (Next.js, Vite, etc.), prefer:
      await page.goto(url, { waitUntil: "domcontentloaded" });
    The default "load" wait can hang on HMR, streaming, or other long-lived dev-server connections.
    Use "load" only when you specifically need every subresource to finish loading.

  Error recovery:
    If a script fails, the page usually stays where it stopped.
    Reconnect to the same page name, take a screenshot, and log the URL/title:
    dev-browser <<'EOF'
    const page = await browser.getPage("checkout");
    const path = await saveScreenshot(await page.screenshot(), "debug.png");
    console.log(JSON.stringify({
      screenshot: path,
      url: page.url(),
      title: await page.title(),
    }, null, 2));
    EOF

  Common Playwright Page methods:
    page.goto(url, { waitUntil: "domcontentloaded" })
                                           Navigate to a URL; prefer this on dev servers
    page.title()                           Get the current page title
    page.url()                             Get the current URL
    page.snapshotForAI(options)            Get an AI-optimized snapshot; returns { full, incremental? }
                                           Options: { track?: string, depth?: number, timeout?: number }
    page.getByRole(role, { name })         Target elements discovered from the snapshot
    page.textContent(selector)             Get the text content of an element
    page.innerHTML(selector)               Get the inner HTML of an element
    page.fill(selector, value)             Fill an input field
    page.click(selector)                   Click an element
    page.type(selector, text)              Type text character by character
    page.press(selector, key)              Press a key such as Enter or Tab
    page.waitForSelector(selector)         Wait for an element to appear
    page.waitForURL(url)                   Wait for navigation to a URL
    page.screenshot()                      Capture a screenshot buffer; save it with saveScreenshot(...)
    page.$$eval(selector, fn)              Run a function on all matching elements
    page.$eval(selector, fn)               Run a function on the first matching element
    page.evaluate(fn)                      Run JavaScript in the page context (plain JS only)
    page.locator(selector)                 Create a locator for chained actions

  Connecting to a running Chrome instance:
    Auto-discover Chrome with debugging enabled:
      dev-browser --connect <<'EOF'
        const page = await browser.getPage("main");
        console.log(await page.title());
      EOF

    Connect to a specific CDP endpoint:
      dev-browser --connect http://localhost:9222 <<'EOF'
        const page = await browser.getPage("main");
        console.log(await page.title());
      EOF

    To launch Chrome with debugging enabled:
      chrome.exe --remote-debugging-port=9222
      google-chrome --remote-debugging-port=9222

    Or visit chrome://inspect/#remote-debugging to configure.

  Tips:
    - Use console.log(JSON.stringify(...)) for structured output.
    - Prefer page.snapshotForAI() for structure; use screenshots when visual layout or styling matters.
    - Keep page names stable across scripts so you can resume work after failures.
    - Each --browser name maps to a separate daemon-managed browser instance.
    - Use --connect to attach to an existing browser; omit the URL to auto-discover Chrome with debugging enabled.
    - Use short timeouts (--timeout 10) so scripts fail fast instead of hanging on missing elements.
    - Add --headless for unattended automation; omit it when you want to watch the browser window.