-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: browser checks for run execution #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds slot-status handling and a getBrowserStatus API in BrowserPool; extends timeouts and adds throttled, status-aware waits and richer logging in worker and controller; adjusts dummy-socket messaging, post-upgrade delay, and error propagation. Changes
Sequence Diagram(s)sequenceDiagram
participant W as Worker (pgboss-worker)
participant BP as BrowserPool
participant C as Controller
participant S as Socket (real/dummy)
W->>BP: request slot / getBrowserStatus(id)
BP-->>W: status ("reserved"|"initializing"|"ready"|"failed"|null)
alt status is null
W-->>W: throw slot-missing error
else wait/monitor with throttled logs
W->>C: initialize browser for id
C->>BP: upgradeBrowserSlot(id)
BP-->>C: upgrade result
C-->>C: wait 500ms
alt client connects within 15s
C->>S: use real socket
else
C->>S: create dummy socket
end
C-->>W: init complete (success log)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (4)
server/src/browser-management/classes/BrowserPool.ts (1)
650-655
: API is fine; consider aligning with init lifecycle by actually using 'initializing'getBrowserStatus is useful, but nothing sets "initializing". Hooking controller to set it will make the worker’s status logs clearer than always showing "reserved" until "ready".
If adding a setter feels heavy, a minimal alternative in controller is:
- // before initialize + const info = (browserPool as any).pool?.[id]; + if (info) info.status = "initializing";But a dedicated setter is cleaner and safer.
server/src/pgboss-worker.ts (1)
290-299
: Consider abort-aware page wait to return faster when user abortsYou already define isRunAborted; checking it inside this loop avoids waiting the full page timeout after an abort.
- while (!currentPage && (Date.now() - pageWaitStart) < BROWSER_PAGE_TIMEOUT) { + while (!currentPage && (Date.now() - pageWaitStart) < BROWSER_PAGE_TIMEOUT) { + if (await isRunAborted()) { + throw new Error(`Run ${data.runId} aborted while waiting for page`); + } const currentTime = Date.now();server/src/browser-management/controller.ts (2)
245-245
: Timeout increase to 15s is sensible; guard clearTimeout against pre-assignmentRealistic when pairing with worker’s 60s init timeout. One nit: clearTimeout(connectionTimeout) can run before assignment when a client connects immediately.
- let connectionTimeout: NodeJS.Timeout; + let connectionTimeout: NodeJS.Timeout | null = null; ... - clearTimeout(connectionTimeout); + if (connectionTimeout) clearTimeout(connectionTimeout); ... - clearTimeout(connectionTimeout); + if (connectionTimeout) clearTimeout(connectionTimeout);
284-285
: Arbitrary 500ms delay after upgrade; consider event-based readinessThe sleep may paper over races. If possible, prefer signaling readiness explicitly from RemoteBrowser (e.g., once first page is available or after 'ready' event).
If you keep the sleep, a brief comment explaining why 500ms is needed would help future maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/src/browser-management/classes/BrowserPool.ts
(3 hunks)server/src/browser-management/controller.ts
(3 hunks)server/src/pgboss-worker.ts
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
server/src/pgboss-worker.ts (1)
server/src/server.ts (1)
browserPool
(85-85)
server/src/browser-management/controller.ts (1)
server/src/server.ts (1)
browserPool
(85-85)
🔇 Additional comments (4)
server/src/pgboss-worker.ts (3)
218-220
: Timeouts aligned with longer init pathsInit=60s, page=45s looks reasonable given the extended controller path. No issues.
268-270
: Good: include final status in timeout errorThis improves debuggability. If you adopt early-exit on 'failed', timeouts will be rarer and clearer.
305-305
: Clear timeout messageThe error is precise and includes the configured timeout. LGTM.
server/src/browser-management/controller.ts (1)
291-294
: Message tweaks improve clarity for dummy-socket flowsGood. No functional issues.
// Return undefined for failed slots | ||
if (poolInfo.status === "failed") { | ||
logger.log('debug', `Browser ${id} has failed status`); | ||
return undefined; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
'failed' status check is currently unreachable given failBrowserSlot deletes the entry
getRemoteBrowser returns undefined for status "failed", but failBrowserSlot immediately deletes slots instead of persisting a failed status. As a result, worker logs will see finalStatus=null, never "failed".
To expose meaningful final statuses and enable earlier bailouts:
- Option A (preferred): Persist "failed" for a short TTL (e.g., 30s) before deletion.
- Option B: At least set status="failed" before deletion so other components can read it momentarily.
Proposed minimal change to persist "failed" briefly:
- public failBrowserSlot = (id: string): void => {
- if (this.pool[id]) {
- logger.log('info', `Marking browser slot ${id} as failed`);
- this.deleteRemoteBrowser(id);
- }
- };
+ public failBrowserSlot = (id: string): void => {
+ const info = this.pool[id];
+ if (info) {
+ logger.log('info', `Marking browser slot ${id} as failed`);
+ info.status = "failed";
+ info.browser = null;
+ // Retain failed status briefly for observability, then cleanup
+ setTimeout(() => {
+ this.deleteRemoteBrowser(id);
+ }, 30000);
+ }
+ };
If you go with this, also have the worker break early when it sees status === "failed". I’ve added a suggested worker diff in its review.
Committable suggestion skipped: line range outside the PR's diff.
if (!this.pool[id]) { | ||
logger.log('warn', `Cannot upgrade browser ${id}: slot does not exist in pool`); | ||
return false; | ||
} | ||
|
||
if (this.pool[id].status !== "reserved") { | ||
logger.log('warn', `Cannot upgrade browser ${id}: slot not in reserved state (current: ${this.pool[id].status})`); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Good guard: only upgrade reserved slots; add an 'initializing' step for richer introspection
The stricter validation is appropriate. For better observability, consider setting status="initializing" when initialization starts (in controller) so getBrowserStatus reflects progress beyond "reserved".
Add a setter and use it from the controller:
- Add to BrowserPool:
+ public setBrowserStatus = (id: string, status: "reserved" | "initializing" | "ready" | "failed"): boolean => {
+ const info = this.pool[id];
+ if (!info) return false;
+ info.status = status;
+ return true;
+ };
- In controller before
await browserSession.initialize(...)
:
+ browserPool.setBrowserStatus(id, "initializing");
🤖 Prompt for AI Agents
In server/src/browser-management/classes/BrowserPool.ts around lines 616 to 623,
the guard correctly prevents upgrading non-reserved slots but we should allow
richer status introspection by adding an "initializing" state; add a public
setter method on BrowserPool to set a slot's status (e.g., setStatus(id: string,
status: BrowserStatus)) that validates the slot exists and updates the status
atomically, then update the controller to call pool.setStatus(id,
"initializing") immediately before awaiting browserSession.initialize(...) so
getBrowserStatus will report "initializing" while setup runs.
logger.log('debug', `Starting browser initialization for ${id}`); | ||
await browserSession.initialize(userId); | ||
logger.log('debug', `Browser initialization completed for ${id}`); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Log additions are good; set status='initializing' here for better introspection
Emit initializing state before await initialize so worker logs report real progress.
If you add BrowserPool.setBrowserStatus as suggested:
+ browserPool.setBrowserStatus(id, "initializing");
await browserSession.initialize(userId);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
logger.log('debug', `Starting browser initialization for ${id}`); | |
await browserSession.initialize(userId); | |
logger.log('debug', `Browser initialization completed for ${id}`); | |
logger.log('debug', `Starting browser initialization for ${id}`); | |
browserPool.setBrowserStatus(id, "initializing"); | |
await browserSession.initialize(userId); | |
logger.log('debug', `Browser initialization completed for ${id}`); |
🤖 Prompt for AI Agents
In server/src/browser-management/controller.ts around lines 275 to 278, the code
logs start of initialization but doesn't set the browser status; before awaiting
browserSession.initialize(userId) call BrowserPool.setBrowserStatus(id,
'initializing') (or the equivalent setter on the pool) so the status is
persisted and worker logs reflect real progress, then proceed to await
initialize and afterwards set the final status (e.g., 'ready' or error) as
appropriate.
throw error; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rethrowing here creates an unhandled promise rejection at the call site
initializeBrowserAsync is invoked without await/catch in createRemoteBrowserForRun, so rethrowing will surface as an unhandled rejection and can crash the process depending on Node settings.
Fix by catching at the call site:
- initializeBrowserAsync(id, userId);
+ void initializeBrowserAsync(id, userId).catch((err) => {
+ logger.log('error', `Failed to initialize browser ${id}: ${err?.message || err}`);
+ try { browserPool.failBrowserSlot(id); } catch (e) {
+ logger.log('warn', `Failed to mark slot ${id} as failed: ${e}`);
+ }
+ });
Optionally, if you prefer not to rethrow inside initializeBrowserAsync, remove the throws and rely on logging + failBrowserSlot there; but catching at the call site is safer and explicit.
Also applies to: 312-313
🤖 Prompt for AI Agents
In server/src/browser-management/controller.ts around lines 306-307 (and
similarly 312-313), initializeBrowserAsync currently rethrows errors which leads
to unhandled promise rejections because callers (createRemoteBrowserForRun)
invoke it without await/catch; update the call site(s) in
createRemoteBrowserForRun to await initializeBrowserAsync inside a try/catch and
handle failures by logging and calling failBrowserSlot (or other cleanup)
instead of allowing the rejection to bubble, or alternatively remove the throw
inside initializeBrowserAsync and rely on its internal logging + failBrowserSlot
so the function resolves normally on error.
const currentTime = Date.now(); | ||
|
||
const browserStatus = browserPool.getBrowserStatus(browserId); | ||
if (browserStatus === null) { | ||
throw new Error(`Browser slot ${browserId} does not exist in pool`); | ||
} | ||
|
||
if (currentTime - lastLogTime > 10000) { | ||
logger.log('info', `Browser ${browserId} not ready yet (status: ${browserStatus}), waiting... (${Math.round((currentTime - browserWaitStart) / 1000)}s elapsed)`); | ||
lastLogTime = currentTime; | ||
} | ||
|
||
await new Promise(resolve => setTimeout(resolve, 2000)); | ||
browser = browserPool.getRemoteBrowser(browserId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Early-exit when status is 'failed' to avoid spinning until timeout
If BrowserPool begins to persist 'failed' (recommended), this loop should stop immediately rather than wait until timeout.
Apply:
- const browserStatus = browserPool.getBrowserStatus(browserId);
+ const browserStatus = browserPool.getBrowserStatus(browserId);
if (browserStatus === null) {
throw new Error(`Browser slot ${browserId} does not exist in pool`);
}
+ if (browserStatus === "failed") {
+ throw new Error(`Browser ${browserId} initialization failed`);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const currentTime = Date.now(); | |
const browserStatus = browserPool.getBrowserStatus(browserId); | |
if (browserStatus === null) { | |
throw new Error(`Browser slot ${browserId} does not exist in pool`); | |
} | |
if (currentTime - lastLogTime > 10000) { | |
logger.log('info', `Browser ${browserId} not ready yet (status: ${browserStatus}), waiting... (${Math.round((currentTime - browserWaitStart) / 1000)}s elapsed)`); | |
lastLogTime = currentTime; | |
} | |
await new Promise(resolve => setTimeout(resolve, 2000)); | |
browser = browserPool.getRemoteBrowser(browserId); | |
const currentTime = Date.now(); | |
const browserStatus = browserPool.getBrowserStatus(browserId); | |
if (browserStatus === null) { | |
throw new Error(`Browser slot ${browserId} does not exist in pool`); | |
} | |
if (browserStatus === "failed") { | |
throw new Error(`Browser ${browserId} initialization failed`); | |
} | |
if (currentTime - lastLogTime > 10000) { | |
logger.log('info', `Browser ${browserId} not ready yet (status: ${browserStatus}), waiting... (${Math.round((currentTime - browserWaitStart) / 1000)}s elapsed)`); | |
lastLogTime = currentTime; | |
} | |
await new Promise(resolve => setTimeout(resolve, 2000)); | |
browser = browserPool.getRemoteBrowser(browserId); |
🤖 Prompt for AI Agents
In server/src/pgboss-worker.ts around lines 251 to 264, the loop waiting for a
browser to become ready currently spins until timeout even if the browser status
becomes 'failed'; detect when browserStatus === 'failed' and stop immediately by
throwing a clear Error (or otherwise exiting the wait loop) with context
including browserId and any last error info, and optionally log an error before
throwing so we don't waste time waiting for timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/helpers/clientSelectorGenerator.ts (2)
2496-2499
: Method name might be misleading with depth limitThe method name
getAllDescendantsIncludingShadow
suggests it retrieves all descendants, but themaxDepth
parameter limits traversal depth. Consider renaming togetDescendantsIncludingShadow
orgetDescendantsWithinDepth
to better reflect the bounded traversal behavior.
3911-3911
: Consider documenting different depth limitsThe
getElementDepth
method uses a hard limit of 50 while other traversal methods useMAX_PATH_DEPTH
of 20. Consider adding a comment explaining why different limits are appropriate for different use cases, or consider using a consistent constant if the difference is not intentional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/helpers/clientSelectorGenerator.ts
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (3)
parentElement
(914-914)pathParts
(569-573)current
(1010-1010)
🔇 Additional comments (3)
src/helpers/clientSelectorGenerator.ts (3)
461-461
: Good optimization: Filtering non-meaningful elementsThe addition of
hasAnyMeaningfulChildren
check (line 461) andisMeaningfulElement
filter (lines 548-549) are excellent optimizations that prevent processing of irrelevant DOM elements. This should improve performance and selector quality.Also applies to: 548-549
2739-2792
: Well-implemented depth limiting and shadow DOM traversalThe additions to
getOptimizedStructuralPath
properly handle:
- Element containment validation
- Depth-limited traversal to prevent infinite loops
- Shadow DOM parent chain traversal
- Proper null return when path is incomplete
The implementation is robust and handles edge cases well.
2861-2880
: Excellent shadow DOM containment logicThe
elementContains
method correctly handles element containment across shadow DOM boundaries by following the shadow host chain. This is essential for proper element relationship detection in complex DOM structures with shadow roots.
What this PR does?
Summary by CodeRabbit