Skip to content

Commit 88d7609

Browse files
authored
Merge pull request #22 from browseros-ai/heartbeat-claude-processing
heartbeart while claude execution
2 parents 6174bf8 + 10eea22 commit 88d7609

File tree

4 files changed

+241
-38
lines changed

4 files changed

+241
-38
lines changed
Lines changed: 103 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,118 @@
11
/**
22
* Claude SDK specific system prompt for browser automation
33
*/
4-
export const CLAUDE_SDK_SYSTEM_PROMPT = `You are a browser automation assistant with BrowserTools access.
4+
export const CLAUDE_SDK_SYSTEM_PROMPT = `You are a browser automation assistant with access to specialized browser control tools.
55
6-
# Core Workflow
6+
# Core Principles
77
8-
All browser interactions require a tab ID. Before interacting with a page:
9-
1. Use browser_list_tabs or browser_get_active_tab to identify the target tab
10-
2. Use browser_switch_tab if needed to activate the correct tab
11-
3. Perform actions using the tab's ID
8+
1. **Tab Context Required**: All browser interactions require a valid tab ID. Always identify the target tab before performing actions.
9+
2. **Use the Right Tool**: Choose the most efficient tool for each task. Avoid over-engineering simple operations.
10+
3. **Extract, Don't Execute**: Prefer built-in extraction tools over JavaScript execution when gathering information.
1211
13-
# Essential Tools
12+
# Standard Workflow
1413
15-
**Tab Management:**
16-
- browser_list_tabs - List all open tabs with IDs
17-
- browser_get_active_tab - Get current active tab
18-
- browser_switch_tab(tabId) - Switch to a specific tab
19-
- browser_open_tab(url) - Open new tab
20-
- browser_close_tab(tabId) - Close tab
14+
Before interacting with any page:
15+
1. Identify the target tab using browser_list_tabs or browser_get_active_tab
16+
2. Switch to the correct tab if needed using browser_switch_tab
17+
3. Perform your intended action using the tab's ID
2118
22-
**Navigation & Content:**
23-
- browser_navigate(url, tabId) - Navigate to URL (tabId optional, uses active tab)
24-
- browser_get_interactive_elements(tabId) - Get all clickable/typeable elements with nodeIds
25-
- browser_get_page_content(tabId, type) - Extract text or text-with-links
26-
- browser_get_screenshot(tabId) - Capture screenshot with bounding boxes showing nodeIds
19+
# Tool Selection Guidelines
2720
28-
**Interaction:**
21+
## Content Extraction (Choose in this order)
22+
23+
**For text content and data extraction:**
24+
- PREFER: browser_get_page_content(tabId, type) - Fast, efficient text extraction
25+
- Use type: "text" for plain text content
26+
- Use type: "text-with-links" when URLs are needed
27+
- Supports context: "visible" or "full" page
28+
- Can target specific sections (main, article, navigation, etc.)
29+
30+
**For visual context:**
31+
- USE: browser_get_screenshot(tabId) - Only when visual layout or non-text elements matter
32+
- Shows bounding boxes with nodeIds for interactive elements
33+
- Useful for visual verification or understanding page structure
34+
- Not efficient for extracting text data
35+
36+
**For complex operations:**
37+
- LAST RESORT: browser_execute_javascript(tabId, code) - Only when built-in tools cannot accomplish the task
38+
- Use when you need to manipulate DOM or access browser APIs directly
39+
- Avoid for simple text extraction or standard interactions
40+
41+
## Tab Management
42+
43+
- browser_list_tabs - Get all open tabs with IDs and URLs
44+
- browser_get_active_tab - Get currently active tab
45+
- browser_switch_tab(tabId) - Switch focus to specific tab
46+
- browser_open_tab(url, active?) - Open new tab, optionally make it active
47+
- browser_close_tab(tabId) - Close specific tab
48+
49+
## Navigation
50+
51+
- browser_navigate(url, tabId?) - Navigate to URL (defaults to active tab if tabId omitted)
52+
- browser_get_load_status(tabId) - Check if page has finished loading
53+
54+
## Page Interaction
55+
56+
**Discovery:**
57+
- browser_get_interactive_elements(tabId, simplified?) - Get all clickable/typeable elements with nodeIds
58+
- Use simplified: true (default) for concise output
59+
- Always call this before clicking or typing to get valid nodeIds
60+
61+
**Actions:**
2962
- browser_click_element(tabId, nodeId) - Click element by nodeId
30-
- browser_type_text(tabId, nodeId, text) - Type into input
63+
- browser_type_text(tabId, nodeId, text) - Type into input field
3164
- browser_clear_input(tabId, nodeId) - Clear input field
65+
- browser_send_keys(tabId, key) - Send keyboard input (Enter, Tab, Escape, Arrow keys, etc.)
66+
67+
**Alternative Coordinate-Based Actions:**
68+
- browser_click_coordinates(tabId, x, y) - Click at specific position
69+
- browser_type_at_coordinates(tabId, x, y, text) - Click and type at position
70+
71+
## Scrolling
72+
73+
- browser_scroll_down(tabId) - Scroll down one viewport height
74+
- browser_scroll_up(tabId) - Scroll up one viewport height
3275
- browser_scroll_to_element(tabId, nodeId) - Scroll element into view
3376
34-
**Scrolling:**
35-
- browser_scroll_down(tabId) - Scroll down one viewport
36-
- browser_scroll_up(tabId) - Scroll up one viewport
77+
## Advanced Features
78+
79+
- browser_get_bookmarks(folderId?) - Get browser bookmarks
80+
- browser_create_bookmark(title, url, parentId?) - Create new bookmark
81+
- browser_remove_bookmark(bookmarkId) - Delete bookmark
82+
- browser_search_history(query, maxResults?) - Search browsing history
83+
- browser_get_recent_history(count?) - Get recent history items
84+
85+
# Best Practices
86+
87+
- **Minimize Screenshots**: Only use screenshots when visual context is essential. For data extraction, always prefer browser_get_page_content.
88+
- **Avoid Unnecessary JavaScript**: Built-in tools are faster and more reliable. Only execute custom JavaScript when standard tools cannot accomplish the task.
89+
- **Get Elements First**: Always call browser_get_interactive_elements before clicking or typing to ensure you have valid nodeIds.
90+
- **Wait for Loading**: After navigation, verify the page has loaded before extracting content or interacting.
91+
- **Use Context Options**: When extracting content, specify whether you need "visible" (viewport) or "full" (entire page) context.
92+
- **Target Specific Sections**: Use includeSections parameter in browser_get_page_content to extract only relevant parts (main, article, navigation, etc.).
93+
94+
# Common Patterns
95+
96+
**Extract article text:**
97+
\`\`\`
98+
browser_get_page_content(tabId, "text", { context: "full", includeSections: ["main", "article"] })
99+
\`\`\`
100+
101+
**Get all links on page:**
102+
\`\`\`
103+
browser_get_page_content(tabId, "text-with-links", { context: "visible" })
104+
\`\`\`
105+
106+
**Fill and submit a form:**
107+
\`\`\`
108+
1. browser_get_interactive_elements(tabId)
109+
2. browser_type_text(tabId, inputNodeId, "text")
110+
3. browser_click_element(tabId, submitButtonNodeId)
111+
\`\`\`
37112
38-
**Advanced:**
39-
- browser_execute_javascript(tabId, code) - Execute JS in page
40-
- browser_send_keys(tabId, key) - Send keyboard keys (Enter, Tab, etc.)
113+
**Verify visual layout:**
114+
\`\`\`
115+
browser_get_screenshot(tabId, { size: "medium" })
116+
\`\`\`
41117
42-
Always get interactive elements before clicking/typing to obtain valid nodeIds.`
118+
Focus on efficiency and use the most appropriate tool for each task. When in doubt, prefer simpler tools over complex ones.`

packages/agent/src/agent/ClaudeSDKAgent.ts

Lines changed: 106 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,91 @@ export class ClaudeSDKAgent extends BaseAgent {
101101
)
102102
}
103103

104+
/**
105+
* Wrapper around iterator.next() that yields heartbeat events while waiting
106+
* @param iterator - The async iterator
107+
* @yields Heartbeat events (FormattedEvent) while waiting, then the final iterator result (IteratorResult)
108+
*/
109+
private async *nextWithHeartbeat(iterator: AsyncIterator<any>): AsyncGenerator<any> {
110+
const heartbeatInterval = 20000 // 20 seconds
111+
let heartbeatTimer: NodeJS.Timeout | null = null
112+
let abortHandler: (() => void) | null = null
113+
114+
// Call iterator.next() once - this generator wraps a single next() call
115+
const iteratorPromise = iterator.next()
116+
117+
// Create abort promise
118+
const abortPromise = new Promise<never>((_, reject) => {
119+
if (this.abortController) {
120+
abortHandler = () => {
121+
reject(new Error('Agent execution aborted by client'))
122+
}
123+
this.abortController.signal.addEventListener('abort', abortHandler, { once: true })
124+
}
125+
})
126+
127+
try {
128+
// Loop until the iterator promise resolves, yielding heartbeats while waiting
129+
while (true) {
130+
// Check if execution was aborted
131+
if (this.abortController?.signal.aborted) {
132+
logger.info('⚠️ Agent execution aborted during heartbeat wait')
133+
return
134+
}
135+
136+
// Create timeout promise for this iteration
137+
const timeoutPromise = new Promise(resolve => {
138+
heartbeatTimer = setTimeout(() => resolve({ type: 'heartbeat' }), heartbeatInterval)
139+
})
140+
141+
type RaceResult = { type: 'result'; result: any } | { type: 'heartbeat' }
142+
let race: RaceResult
143+
144+
try {
145+
race = await Promise.race([
146+
iteratorPromise.then(result => ({ type: 'result' as const, result })),
147+
timeoutPromise.then(() => ({ type: 'heartbeat' as const })),
148+
abortPromise
149+
])
150+
} catch (abortError) {
151+
// Abort was triggered during wait
152+
logger.info('⚠️ Agent execution aborted (caught during iterator wait)')
153+
// Cleanup iterator
154+
if (iterator.return) {
155+
await iterator.return(undefined).catch(() => {})
156+
}
157+
return
158+
}
159+
160+
// Clear the timeout if it was set
161+
if (heartbeatTimer) {
162+
clearTimeout(heartbeatTimer)
163+
heartbeatTimer = null
164+
}
165+
166+
if (race.type === 'heartbeat') {
167+
// Heartbeat timeout occurred - yield processing event and continue waiting
168+
yield EventFormatter.createProcessingEvent()
169+
// Loop continues - will race the same iteratorPromise (still pending) vs new timeout
170+
} else {
171+
// Iterator result arrived - yield it and exit this generator
172+
yield race.result
173+
return
174+
}
175+
}
176+
} finally {
177+
// Clean up heartbeat timer
178+
if (heartbeatTimer) {
179+
clearTimeout(heartbeatTimer)
180+
}
181+
182+
// Clean up abort listener if it wasn't triggered
183+
if (abortHandler && this.abortController && !this.abortController.signal.aborted) {
184+
this.abortController.signal.removeEventListener('abort', abortHandler)
185+
}
186+
}
187+
}
188+
104189
/**
105190
* Execute a task using Claude SDK and stream formatted events
106191
*
@@ -137,10 +222,28 @@ export class ClaudeSDKAgent extends BaseAgent {
137222
// Call Claude SDK
138223
const iterator = query({ prompt: message, options })[Symbol.asyncIterator]()
139224

140-
// Stream events
225+
// Stream events with heartbeat
141226
while (true) {
142-
const result = await iterator.next()
143-
if (result.done) break
227+
// Check if execution was aborted
228+
if (this.abortController?.signal.aborted) {
229+
logger.info('⚠️ Agent execution aborted by client')
230+
break
231+
}
232+
233+
let result: IteratorResult<any> | null = null
234+
235+
// Iterate through heartbeat generator to get the actual result
236+
for await (const item of this.nextWithHeartbeat(iterator)) {
237+
if (item && item.done !== undefined) {
238+
// This is the final result
239+
result = item
240+
} else {
241+
// This is a heartbeat/processing event
242+
yield item
243+
}
244+
}
245+
246+
if (!result || result.done) break
144247

145248
const event = result.value
146249

packages/agent/src/utils/EventFormatter.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Formatted event structure for WebSocket clients
88
*/
99
export class FormattedEvent {
10-
type: 'init' | 'thinking' | 'tool_use' | 'tool_result' | 'response' | 'completion' | 'error'
10+
type: 'init' | 'thinking' | 'tool_use' | 'tool_result' | 'response' | 'completion' | 'error' | 'processing'
1111
content: string
1212
metadata?: {
1313
turnCount?: number
@@ -36,6 +36,13 @@ export class FormattedEvent {
3636
*/
3737
export class EventFormatter {
3838

39+
/**
40+
* Create a processing/heartbeat event to indicate Claude is still working
41+
*/
42+
static createProcessingEvent(): FormattedEvent {
43+
return new FormattedEvent('processing', '⏳ Processing...')
44+
}
45+
3946
/**
4047
* Format any Claude SDK event into a FormattedEvent
4148
*/

packages/agent/src/websocket/server.ts

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -393,14 +393,31 @@ async function processMessage(
393393
eventCount++
394394
lastEventType = formattedEvent.type
395395

396-
// Send to client (SAME AS BEFORE)
397-
ws.send(JSON.stringify(formattedEvent.toJSON()))
396+
// Send to client - catch errors if client disconnected
397+
try {
398+
ws.send(JSON.stringify(formattedEvent.toJSON()))
398399

399-
logger.debug('📤 Event sent', {
400-
sessionId,
401-
type: formattedEvent.type,
402-
eventCount
403-
})
400+
logger.debug('📤 Event sent', {
401+
sessionId,
402+
type: formattedEvent.type,
403+
eventCount
404+
})
405+
} catch (sendError) {
406+
// Client disconnected during streaming
407+
logger.info('⚠️ Client disconnected during event streaming, stopping iterator', {
408+
sessionId,
409+
eventCount
410+
})
411+
412+
// Cleanup iterator
413+
if (iterator.return) {
414+
await iterator.return(undefined).catch(() => {})
415+
}
416+
417+
// Exit cleanly - don't throw, just return
418+
// (throwing would trigger outer error handler which tries to sendError again)
419+
return
420+
}
404421
}
405422

406423
logger.info('✅ Message processed successfully', {

0 commit comments

Comments
 (0)