Skip to content

fix: CodeRabbit + CI cleanup#25

Merged
eenlars merged 14 commits intomainfrom
codex/coderabbit-ci-fixes
Feb 6, 2026
Merged

fix: CodeRabbit + CI cleanup#25
eenlars merged 14 commits intomainfrom
codex/coderabbit-ci-fixes

Conversation

@eenlars
Copy link
Owner

@eenlars eenlars commented Feb 6, 2026

Fixes failing CI error-pattern check, removes unsafe casts, tightens validation (workspace directories), improves internal resume route typing/logging, and resolves CodeRabbit review threads. Also updates AGENTS.md to remove the 'NEVER COMMIT' rule.

Summary by CodeRabbit

Release Notes

  • New Features

    • Schedule conversation resumptions with customizable delays, messages, and reasons
    • Robust job queue system for automation execution with improved reliability and crash recovery
  • Improvements

    • Reorganized automation configuration UI for improved skills management and accessibility
    • Enhanced error handling and comprehensive logging for scheduled automation tasks

eenlars and others added 14 commits February 5, 2026 22:15
Migrate automation scheduling from in-memory CronService to pg-boss,
a PostgreSQL-backed persistent job queue. This gives us crash recovery,
retries with exponential backoff, dead letter queues, and cron scheduling
persisted in Postgres.

- Create @webalive/job-queue package (pg-boss wrapper singleton)
- Add run-automation queue + worker (replaces CronService executor)
- Add resume-conversation queue + worker (new capability, tool disabled)
- Wire automation CRUD routes to sync pg-boss schedules on create/update/delete
- Add startup sync from Supabase automation_jobs to pg-boss schedules
- Add internal API endpoints for worker-to-server communication
- Pass tab context (tabId/tabGroupId) through worker pool for future use

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Automations were using resolveWorkspace() from workspace-secure.ts which
resolves to /user/src, but the normal chat flow uses /user. This caused
two problems:
1. Claude was jailed to /user/src and couldn't access server.ts,
   package.json, articles.db, scripts, etc.
2. Sites without a src/ directory failed with ENOENT

Now uses getWorkspacePath() from @webalive/shared (same as chat flow).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Production has DATABASE_URL without password and DATABASE_PASSWORD
as a separate env var. pg-boss needs the full connection string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These packages were missing from STANDALONE_PACKAGES, causing
"module not found" errors in production. Also added job-queue
to SUBPROCESS_PACKAGES since it has pg-boss as external dep.

Added DATABASE_PASSWORD to turbo.json build env.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use tabKey() to compute canonical session key instead of trusting raw
  sessionKey from request body
- Document AUTH EXCEPTION for internal routes (X-Internal-Auth is
  intentional — these are localhost-only worker-to-server calls)
- Remove unused sessionKey from request destructuring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|| converts 0 to 300 (falsy), ?? only converts undefined/null.
Found via edge case testing: timeoutSeconds=0 should be rejected
by the executor validation, not silently converted to 300.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion routes

43 tests covering auth, validation, session lookup, happy paths, error handling,
and the timeoutSeconds ?? vs || regression.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ype checks, schedule-resumption hardening

- validation.ts: resolve workspace path and check isPathWithinWorkspace before statSync
- job-queue/index.ts: defer started=true until queues/workers succeed; stop+reset on failure
- job-queue/index.ts: use state IN ('created','retry') instead of lexicographic state < 'active'
- worker-entry.mjs: validate tabId/tabGroupId are strings before writing to process.env
- schedule-resumption/route.ts: return 500 when scheduleResumption returns null
- schedule-resumption/route.ts: catch malformed JSON with 400 instead of falling through to 500
- resume-conversation/route.ts: fix type error from split try/catch body declaration
- 18 new tests for schedule-resumption covering auth, validation, authz, session, enqueue, happy path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…handle malformed JSON in resume-conversation

- executor.ts: resolve workspace path and check isPathWithinWorkspace(resolvedCwd, SITES_ROOT) before statSync
- sync-schedules.ts: only increment synced when scheduleAutomation returns truthy (not null)
- route.test.ts: expect 400 for malformed JSON (matches SyntaxError catch in route handler)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skills selector was hidden inside collapsed "Advanced settings" and the
dropdown was clipped by the scroll container. Moved to top-level field
and changed dropdown to open upward.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

This PR introduces a pg-boss-based job queue system to replace the in-process CronService for scheduling automations and resuming conversations. It adds a new @webalive/job-queue package, new internal API routes for automation execution and conversation resumption, updates existing automation endpoints to synchronize schedules with the queue, and modifies workspace path validation to use filesystem checks. Configuration and instrumentation are updated to initialize and use the job queue on startup.

Changes

Cohort / File(s) Summary
Job Queue Package
packages/job-queue/*
New package introducing pg-boss wrapper with queue management, workers for automation and conversation resumption, type definitions, queue configuration with retry policies, and initialization logic.
Core API Routes
apps/web/app/api/automations/route.ts, apps/web/app/api/automations/[id]/route.ts, apps/web/app/api/automations/[id]/trigger/route.ts
Automation endpoints modified to synchronize cron schedules with pg-boss on create/update/delete; trigger endpoint changed to enqueue jobs instead of direct execution.
New Internal API Routes
apps/web/app/api/internal/run-automation/route.ts, apps/web/app/api/internal/resume-conversation/route.ts
New internal endpoints for executing automation jobs and resuming conversations, with authentication via shared secret and request validation.
Automation & Workspace Utilities
apps/web/lib/automation/executor.ts, apps/web/lib/automation/validation.ts, apps/web/lib/automation/sync-schedules.ts
Executor and validator refactored to use filesystem path-based workspace resolution with traversal guards; new sync-schedules module fetches active automations and registers them with pg-boss on startup.
Schedule Resumption Feature
apps/web/app/api/schedule-resumption/route.ts, packages/tools/src/tools/workspace/schedule-resumption.ts
New API route and Claude tool to schedule delayed conversation resumption via pg-boss with workspace authentication and session validation.
Test Suites
apps/web/app/api/internal/.../route.test.ts, apps/web/app/api/schedule-resumption/__tests__/route.test.ts, apps/web/lib/__tests__/*
Comprehensive tests for new internal routes, schedule-resumption endpoint, and updates to test gating logic based on server config availability.
Worker Pool & Stream Integration
packages/worker-pool/src/types.ts, packages/worker-pool/src/worker-entry.mjs, apps/web/app/api/claude/stream/route.ts
Added tabId and tabGroupId fields to worker request/response for tab-specific context tracking in streams and tool execution.
MCP Tools Registry
packages/tools/src/mcp-server.ts, packages/tools/src/tools/meta/tool-registry.ts, packages/shared/src/stream-tools.ts
Schedule-resumption tool commented out (disabled) in module imports and registry; marked for future enablement.
Infrastructure & Configuration
apps/web/instrumentation.ts, apps/web/lib/error-codes.ts, apps/web/package.json, apps/web/tsconfig.json, packages/job-queue/tsconfig.json, scripts/deployment/lib/standalone-packages.sh
Instrumentation replaced with pg-boss job queue initialization; new ENQUEUE_FAILED error code added; job-queue package dependency added; TypeScript configs updated; deployment scripts updated to include automation and job-queue packages.
Documentation & Guidelines
AGENTS.md, docs/architecture/job-system-pgboss.md
Removed "NEVER COMMIT" guideline from AGENTS.md; added comprehensive job-system architecture documentation covering pg-boss design, triggers, workers, migration plan, and implementation details.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Route
    participant Queue as pg-boss Queue
    participant Worker as Job Worker
    participant Internal as Internal API
    participant Executor as Job Executor

    rect rgba(135, 206, 250, 0.5)
    note over Client,Executor: Automation Scheduling Flow
    Client->>API: POST /api/automations (create/update)
    API->>API: Validate automation
    API->>Queue: scheduleAutomation(cron, payload)
    Queue->>Queue: Register cron schedule
    API-->>Client: 200 OK (scheduled)
    end

    rect rgba(144, 238, 144, 0.5)
    note over Client,Executor: Cron Trigger & Execution Flow
    Queue->>Queue: Cron interval reached
    Queue->>Worker: Dequeue RUN_AUTOMATION job
    Worker->>Internal: POST /api/internal/run-automation
    Internal->>Executor: runAutomationJob(payload)
    Executor-->>Internal: execution result
    Internal-->>Worker: 200 OK
    Worker->>Queue: Mark job completed
    Worker->>Queue: scheduleAutomation (next cron)
    end

    rect rgba(255, 192, 203, 0.5)
    note over Client,Executor: Conversation Resumption Flow
    Client->>API: POST /api/schedule-resumption
    API->>API: Validate session & workspace
    API->>Queue: scheduleResumption(payload, delay)
    Queue->>Queue: Schedule after delay
    API-->>Client: 200 OK (jobId)
    
    Queue->>Worker: Dequeue RESUME_CONVERSATION (after delay)
    Worker->>Internal: POST /api/internal/resume-conversation
    Internal->>Internal: Validate & lookup session
    Internal->>API: POST /api/claude/stream (resume message)
    API-->>Internal: stream response
    Internal-->>Worker: 200 OK
    Worker->>Queue: Mark job completed
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

🐰 Hop, hop, the queues now flow with grace,
Each job finds its rightful place!
No more timing served in-process land—
pg-boss schedules at our command. 🎯
Conversations sleep, then awake anew,
This rabbit's work is finally through!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'fix: CodeRabbit + CI cleanup' is overly vague and does not clearly describe the main change. While the PR description mentions removing CronService and adding pg-boss, the title uses generic terms that don't convey the substantial architectural changes. Consider a more descriptive title such as 'feat: Replace CronService with pg-boss job queue' or 'refactor: Migrate to PostgreSQL-backed job scheduling' to better communicate the primary changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/coderabbit-ci-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@eenlars eenlars merged commit ee5fff0 into main Feb 6, 2026
5 of 6 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/web/tsconfig.json (1)

35-46: ⚠️ Potential issue | 🟡 Minor

Duplicate include entries: lines 45–46 duplicate lines 37–38.

./.next/types/**/*.ts and ./.next/dev/types/**/*.ts resolve to the same paths as the already-present .next/types/**/*.ts (line 37) and .next/dev/types/**/*.ts (line 38). These are redundant and should be removed to avoid confusion.

Also, .next-test/dev/dev/types/**/*.ts on line 44 has a double dev/dev nesting — worth confirming this is intentional rather than a typo.

Proposed fix
   "include": [
     "next-env.d.ts",
     ".next/types/**/*.ts",
     ".next/dev/types/**/*.ts",
     ".next-test/types/**/*.ts",
     ".next-test/dev/types/**/*.ts",
     "**/*.mts",
     "**/*.ts",
-    "**/*.tsx",
-    ".next-test/dev/dev/types/**/*.ts",
-    "./.next/types/**/*.ts",
-    "./.next/dev/types/**/*.ts"
+    "**/*.tsx"
   ],
🤖 Fix all issues with AI agents
In `@AGENTS.md`:
- Around line 10-18: The numbered rules in AGENTS.md currently skip 11 (rules
list ends at 10 but the "Architecture Smell Detector" heading is prefixed with
"12."); update the heading to be consistent by changing the leading "12." to
"11." or remove the numeric prefix entirely so the "Architecture Smell Detector"
section header stands alone; locate the heading text "Architecture Smell
Detector" in the file and adjust its numbering to match the renumbered rules
2–10.

In `@apps/web/app/api/automations/`[id]/trigger/route.ts:
- Around line 85-104: The response always reports queued: true even though
enqueueAutomation (from `@webalive/job-queue`) can return null; update the handler
in route.ts to check the returned pgBossJobId from enqueueAutomation and handle
the null case: if pgBossJobId is non-null return { ok: true, queued: true,
pgBossJobId, message... }, otherwise return an error response (e.g., { ok:
false, queued: false, message: "Failed to enqueue automation" } with an
appropriate non-2xx status) and log the failure; reference the enqueueAutomation
call and the pgBossJobId variable when implementing this check.

In `@apps/web/app/api/automations/route.ts`:
- Around line 268-294: The fallback for action timeout uses the logical OR which
treats 0 as falsy; update the call to scheduleAutomation so the timeout uses
nullish coalescing (use data.action_timeout_seconds ?? 300) instead of
data.action_timeout_seconds || 300 to preserve a legitimate 0 value (locate this
in the scheduleAutomation argument object inside the pg-boss sync block in
route.ts).

In `@apps/web/app/api/internal/run-automation/__tests__/route.test.ts`:
- Around line 321-333: The POST handler for run-automation currently lets
invalid JSON bubble to the outer catch and returns a 500; add a nested try/catch
around await req.json() inside the POST function so JSON parse failures return
createErrorResponse(ErrorCodes.INVALID_JSON, 400) (follow the same pattern used
in resume-conversation), and then update the test in route.test.ts to expect
status 400 instead of 500 for the "should return 500 when body is invalid JSON"
case; ensure you reference the POST handler, req.json(), createErrorResponse,
and ErrorCodes.INVALID_JSON when making the change.

In `@apps/web/app/api/internal/run-automation/route.ts`:
- Around line 26-35: The route handler in route.ts currently lets req.json()
errors bubble up and return INTERNAL_ERROR; wrap the req.json() call in its own
try-catch (same pattern as the sibling resume-conversation route) and on JSON
parse failure return createErrorResponse(ErrorCodes.INVALID_JSON, 400, { field:
"body" }) instead of letting the outer catch produce a 500; update the block
that destructures body (jobId, userId, orgId, workspace, prompt, ...) to run
only after the nested try succeeds.

In `@apps/web/app/api/schedule-resumption/route.ts`:
- Around line 85-87: The fallback for resumeMessage is using || which treats
empty string as falsy; update the assignment of message to use the
nullish-coalescing operator (??) instead of || so only null/undefined fall back
to the default. Locate the variables in this block (now, resumeAt,
resumeMessage, message) and change the expression that sets message to use
resumeMessage ?? `Resuming conversation: ${reason}`.

In `@apps/web/components/automations/AutomationSidePanel.tsx`:
- Around line 258-344: The skills dropdown in AutomationSidePanel (states:
skillsDropdownOpen, setSkillsDropdownOpen; renders availableSkills and skill
buttons) needs a click-outside or onBlur dismissal like the site search; add a
focus-trapping/blur handler on the enclosing relative <div> (or implement a
document click listener using a ref to the wrapper) that calls
setSkillsDropdownOpen(false) when focus or clicks move outside the wrapper, and
ensure each inner dropdown <button> rendered for availableSkills has an explicit
tabIndex (e.g., tabIndex={0}) so relatedTarget is set correctly for onBlur-based
logic; update cleanup to remove any document listeners on unmount.

In `@apps/web/lib/automation/executor.ts`:
- Around line 156-170: The security check uses resolvedCwd but subsequent
operations still use the original cwd; update executor.ts to use resolvedCwd for
all filesystem ops and downstream calls after validation (replace uses of cwd
passed to statSync, error messages, getSystemPrompt, getAllowedTools,
runAgentChild, etc.), keeping the isPathWithinWorkspace(resolvedCwd,
resolvedSitesRoot) check in place; also apply the same fix in
apps/web/lib/automation/validation.ts by replacing cwd with resolvedPath (the
normalized path) where statSync and error messages are used so the validated
normalized path is the one actually used.

In `@apps/web/lib/request-url.test.ts`:
- Around line 10-12: The guard variable hasStreamDomains is missing
DOMAINS.STREAM_PROD_HOST so tests that pass STREAM_PROD_HOST as x-forwarded-host
may run with an undefined host; update the hasStreamDomains expression to
include Boolean(DOMAINS.STREAM_PROD_HOST) alongside DOMAINS.STREAM_DEV,
DOMAINS.STREAM_DEV_HOST and DOMAINS.STREAM_PROD (i.e., ensure hasStreamDomains
checks STREAM_DEV, STREAM_DEV_HOST, STREAM_PROD, and STREAM_PROD_HOST) so the
tests that rely on STREAM_PROD_HOST are properly gated.

In `@docs/architecture/job-system-pgboss.md`:
- Around line 288-292: The example logs the full sessionKey in the ResumeWorker
check; update the console.warn in the block where sdkSessionId is checked to
avoid printing sessionKey directly (e.g., mask or truncate the key or replace it
with a fixed-length hash/identifier) so that only a non-sensitive, redacted
representation of sessionKey is logged; reference the variables sdkSessionId and
sessionKey and the ResumeWorker message to locate and change the log call
accordingly.
- Around line 148-165: The webhook example consumes the Request body twice
causing the second read to fail; fix by reading the body once and reusing it for
both signature verification and parsing — e.g., read the raw text once (used by
verifySignature) and then parse that same text into JSON for webhookPayload
instead of calling req.text() and req.json() separately; update the usage around
verifySignature, the payload assignment, and webhookPayload to use the
single-read value (references: verifySignature, req.text(), req.json(),
webhookPayload, trigger.secret).

In `@packages/job-queue/src/index.ts`:
- Around line 137-145: The spread currently drops startAfter when it is 0
because the truthy check (options?.startAfter && { startAfter: ... }) treats 0
as false; update enqueueAutomation so it only omits startAfter when it is
null/undefined (not when it is 0). Locate enqueueAutomation (and references to
QUEUES.RUN_AUTOMATION, getJobQueue, and instance.send) and replace the
conditional spread with a nullish check or explicit build of an options object
(e.g., add startAfter only if options?.startAfter != null) before calling
instance.send so a value of 0 is preserved.

In `@packages/job-queue/src/workers/resume-conversation.ts`:
- Around line 84-88: The log prints the full sensitive sessionKey; modify the
handler in resume-conversation (where response.status === 404 and sessionKey is
referenced) to avoid logging the composite key — split sessionKey by '::' into
parts (sessionKey variable) and log only the non-sensitive component(s) such as
workspace (e.g., parts[1]) or other allowed identifiers, ensuring
userId/tabGroupId/tabId are not included in the message and update the
console.warn to include the sanitized value instead.
🧹 Nitpick comments (24)
apps/web/lib/__tests__/critical-exports.test.ts (1)

9-14: Consider inlining the config check — the module is already imported in the test.

hasServerConfig() does a redundant dynamic import of @webalive/shared, but its only call site (line 27) is inside a test that already holds the imported module as shared (line 18). You can drop the helper and inline the check:

♻️ Suggested simplification
-// server-config.json isn't guaranteed to exist in local dev or CI; gate these tests
-// on the actual presence of configured paths instead of env heuristics.
-const hasServerConfig = async () => {
-  const shared = await import("@webalive/shared")
-  return Boolean(shared.PATHS?.SITES_ROOT) && shared.PATHS.SITES_ROOT.length > 0
-}

Then at line 27:

-    if (await hasServerConfig()) {
+    if (shared.PATHS?.SITES_ROOT && shared.PATHS.SITES_ROOT.length > 0) {

Also applies to: 27-27

apps/web/lib/__tests__/domain-workspace.test.ts (1)

52-68: Consider it.runIf(hasServerConfig) instead of if guards inside test bodies.

When hasServerConfig is false, these tests silently pass with zero assertions, which masks that they were effectively skipped. Vitest's it.runIf() (or describe.runIf() for the whole block) marks them as "skipped" in the test report, giving better visibility into what actually ran.

♻️ Suggested refactor
   it("generates correct site home paths", () => {
-    // Verify structure: SITES_ROOT + "/" + domain
-    if (hasServerConfig) {
-      expect(getSiteHome("example.com")).toBe(`${PATHS.SITES_ROOT}/example.com`)
-      expect(getSiteHome("example.com")).toMatch(/\/example\.com$/)
-    }
+    // Verify structure: SITES_ROOT + "/" + domain
+    expect(getSiteHome("example.com")).toBe(`${PATHS.SITES_ROOT}/example.com`)
+    expect(getSiteHome("example.com")).toMatch(/\/example\.com$/)
   })

with the wrapping describe changed to:

-describe("Shared config helpers", () => {
+describe.runIf(hasServerConfig)("Shared config helpers (requires server config)", () => {

Alternatively, keep the current describe and use per-test gating:

-  it("generates correct site home paths", () => {
+  it.runIf(hasServerConfig)("generates correct site home paths", () => {

Either approach surfaces skipped tests in CI output.

packages/worker-pool/src/worker-entry.mjs (1)

541-543: Add input validation for tabId and tabGroupId for consistency.

The assignment code is safe (the typeof guard defaults to ""), but the validation block (lines 448–525) validates every other optional string field (sessionCookie, model, systemPrompt, resume, apiKey) with an explicit type check and push to validationErrors. tabId and tabGroupId are silently ignored if they arrive as a non-string type, which is inconsistent and skips the early-reject path that flags malformed payloads.

Proposed fix — add to the optional-string validation section (after line 501)
   if (payload.apiKey !== undefined && typeof payload.apiKey !== "string") {
     validationErrors.push("apiKey must be a string")
   }
+  if (payload.tabId !== undefined && typeof payload.tabId !== "string") {
+    validationErrors.push("tabId must be a string")
+  }
+  if (payload.tabGroupId !== undefined && typeof payload.tabGroupId !== "string") {
+    validationErrors.push("tabGroupId must be a string")
+  }
apps/web/lib/automation/validation.ts (3)

252-252: Prefer async stat over statSync to avoid blocking the event loop.

validateWorkspace is already an async function and is called from async API route handlers. Using statSync blocks the Node.js event loop unnecessarily. Switch to the async version from node:fs/promises.

♻️ Suggested change

Update the import on line 10:

-import { statSync } from "node:fs"
+import { stat } from "node:fs/promises"

Then update the usage:

-  const stat = statSync(cwd, { throwIfNoEntry: false })
-
-  if (!stat || !stat.isDirectory()) {
+  let dirStat: Awaited<ReturnType<typeof stat>> | null = null
+  try {
+    dirStat = await stat(resolvedPath)
+  } catch {
+    // directory does not exist
+  }
+
+  if (!dirStat || !dirStat.isDirectory()) {

Note: this also switches from cwd to resolvedPath for consistency with the traversal check above.


256-259: Internal filesystem path leaked in error message.

The error string embeds the full server-side path (cwd) which exposes infrastructure details (e.g., /srv/webalive/sites/<domain>/user) to callers. If this message can surface to end users via API responses, consider omitting the raw path.

♻️ Suggested change
-        `Site "${hostname}" is not properly deployed. The workspace directory is missing or invalid: ${cwd}. ` +
+        `Site "${hostname}" is not properly deployed. The workspace directory is missing or invalid. ` +

149-149: Falsy check may reject valid field values like 0, false, or empty string "".

!body[field] will treat 0, false, and "" as missing. If any of the required fields can legitimately hold these values (e.g., a numeric field), this will produce a false-negative. Consider using a nullish check instead.

♻️ Suggested change
-  const missing = required.filter(field => !body[field])
+  const missing = required.filter(field => body[field] === undefined || body[field] === null)
apps/web/components/automations/AutomationSidePanel.tsx (2)

15-29: FormData type shadows the global FormData Web API.

Naming this local type FormData shadows the built-in browser FormData class. If this file ever needs the native FormData (e.g., for multipart uploads), it would be inaccessible without aliasing. Consider renaming to something like AutomationFormData.


302-342: Dropdown items lack ARIA roles for screen-reader accessibility.

The skills dropdown behaves as a multi-select listbox but uses generic <button> elements with no role, aria-selected, or aria-expanded attributes. This reduces accessibility for keyboard and screen-reader users.

Consider adding role="listbox" on the dropdown container, role="option" and aria-selected={isSelected} on each item, and aria-expanded={skillsDropdownOpen} on the toggle button (line 288).

As per coding guidelines, "Check accessibility with ARIA labels in components".

apps/web/lib/automation/executor.ts (1)

160-162: Consider realpathSync for symlink-aware validation.

resolve() normalizes .. and . segments but does not resolve symlinks. If a directory under SITES_ROOT is a symlink to an external location, isPathWithinWorkspace would pass on the lexical path while the real target sits outside the boundary.

A defense-in-depth pattern would be to validate the real path after confirming the directory exists:

Suggested addition after the statSync check
     const stat = statSync(resolvedCwd, { throwIfNoEntry: false })
     if (!stat || !stat.isDirectory()) {
       throw new Error(...)
     }
+
+    // Defense-in-depth: also validate the symlink-resolved real path
+    const { realpathSync } = await import("node:fs")
+    const realCwd = realpathSync(resolvedCwd)
+    if (!isPathWithinWorkspace(realCwd, resolve(PATHS.SITES_ROOT))) {
+      throw new Error("Path traversal attack detected (symlink escape)")
+    }

This is an extra hardening layer — whether it's needed depends on how much you trust the filesystem under SITES_ROOT.

apps/web/app/api/internal/run-automation/__tests__/route.test.ts (1)

26-47: Consider using real ErrorCodes and createErrorResponse instead of mocking.

The resume-conversation test file uses the real implementations, which provides better integration coverage. Mocking these internals means changes to createErrorResponse format won't be caught by these tests. As per coding guidelines, test patterns should avoid "mocking internals."

♻️ Suggested approach — follow the resume-conversation test pattern

Remove the ErrorCodes and createErrorResponse mocks and import the real modules, similar to how route.test.ts in the resume-conversation sibling does it:

-// Mock createErrorResponse to return proper NextResponse-like objects
-vi.mock("@/features/auth/lib/auth", () => ({
-  createErrorResponse: (error: string, status: number, fields?: Record<string, unknown>) => {
-    return new Response(
-      JSON.stringify({
-        ok: false,
-        error,
-        message: `Error: ${error}`,
-        ...fields,
-      }),
-      { status, headers: { "Content-Type": "application/json" } },
-    )
-  },
-}))
-
-// Mock ErrorCodes
-vi.mock("@/lib/error-codes", () => ({
-  ErrorCodes: {
-    UNAUTHORIZED: "UNAUTHORIZED",
-    INVALID_REQUEST: "INVALID_REQUEST",
-    INTERNAL_ERROR: "INTERNAL_ERROR",
-  },
-}))

If there are import side-effects preventing this, consider narrowing the mock to only what's truly external.

docs/architecture/job-system-pgboss.md (1)

17-17: Add language specifiers to fenced code blocks.

Several fenced code blocks (lines 17, 64, 241, 472, 530, 567) lack a language specifier, which markdownlint flags (MD040). Adding language hints (e.g., text, plaintext) improves rendering and silences the linter.

apps/web/app/api/automations/route.ts (1)

279-279: Non-null assertion on siteCheck.hostname!.

The validateSiteId call on line 197 ensures this is non-null when siteCheck.valid is true, but the type system doesn't know that. Consider narrowing with a guard or using a typed result to avoid the !.

apps/web/instrumentation.ts (1)

1-9: Stale doc comment references CronService.

Line 5 still says "Used to initialize background services like the CronService" but the implementation now uses the pg-boss JobQueue. Update the comment to reflect the new architecture.

📝 Suggested fix
-* Used to initialize background services like the CronService.
+* Used to initialize background services like the pg-boss JobQueue.
apps/web/app/api/internal/run-automation/route.ts (1)

43-43: timeoutSeconds ?? 300 uses a silent fallback for a configuration value.

As per coding guidelines, "Never write value ?? default for configuration - require values explicitly and throw when missing to fail fast and loud." Consider having the caller (pg-boss worker) always supply timeoutSeconds explicitly, or validate and reject if missing.

That said, since this is an internal API with a well-understood default and the caller does provide the value in normal operation, this is a minor deviation.

apps/web/lib/automation/sync-schedules.ts (1)

91-104: Silent fallbacks for timeoutSeconds and cronTimezone.

Lines 97, 102, and 104 use || 300 and || "UTC" as fallback values for nullable DB columns. Per coding guidelines, configuration values should be required explicitly rather than silently defaulted. If a cron automation lacks a timeout or timezone in the database, that may indicate a data integrity issue that should be surfaced rather than masked.

Consider logging a warning when falling back, or treating missing values as a skip condition (similar to the hostname check on line 81). As per coding guidelines, "Never write value || fallback or value ?? default for configuration - require values explicitly and throw when missing to fail fast and loud."

packages/job-queue/src/workers/resume-conversation.ts (2)

22-24: as type assertion on pg-boss job data.

Line 23 uses as JobData<ResumeConversationPayload> | undefined to cast the untyped pg-boss job. Per coding guidelines, as assertions should be avoided. Consider adding runtime validation (e.g., a zod check on job.data) to confirm the payload shape before processing.


56-57: PORT fallback uses || "9000" instead of requiring the value explicitly.

As per coding guidelines, "Never write value || fallback or value ?? default for configuration - require values explicitly and throw when missing to fail fast and loud."

♻️ Proposed fix
-    const port = process.env.PORT || "9000"
+    const port = process.env.PORT
+    if (!port) throw new Error("PORT environment variable is not configured")
     const url = `http://localhost:${port}/api/internal/resume-conversation`
packages/job-queue/package.json (1)

1-29: Consider adding "private": true to prevent accidental npm publish.

This is an internal workspace package. Adding "private": true guards against accidental publication via npm publish.

Suggested addition
 {
   "name": "@webalive/job-queue",
   "version": "1.0.0",
+  "private": true,
   "type": "module",
apps/web/app/api/schedule-resumption/route.ts (1)

103-110: Avoid leaking workspace in the error response body.

The ENQUEUE_FAILED response includes workspace (the internal site domain). While the user originally supplied workspaceRoot, echoing the derived internal hostname isn't needed for the client and adds unnecessary surface.

Suggested fix
      return createErrorResponse(ErrorCodes.ENQUEUE_FAILED, 500, {
        requestId,
-       workspace,
-       reason: "enqueue_failed",
      })
packages/job-queue/src/workers/run-automation.ts (2)

96-102: Avoid as type assertion on the response JSON.

Line 98 uses as { ok: boolean; error?: string } which the coding guidelines prohibit. Validate the shape at runtime instead.

Suggested fix
-    const result = (await response.json()) as { ok: boolean; error?: string }
-
-    if (!result.ok) {
-      throw new Error(result.error || "Automation execution failed")
+    const result: unknown = await response.json()
+    if (!result || typeof result !== "object" || !("ok" in result) || !(result as Record<string, unknown>).ok) {
+      const errMsg = result && typeof result === "object" && "error" in result
+        ? String((result as Record<string, unknown>).error)
+        : "Automation execution failed"
+      throw new Error(errMsg)
     }

Alternatively, define a Zod schema for the internal API response and parse it. As per coding guidelines: "Never use as type assertions or any in TypeScript - fix types properly at the source instead of silencing the compiler."


67-69: PORT is configuration — consider failing explicitly when missing.

process.env.PORT || "9000" silently falls back. The coding guideline states configuration should be required explicitly and throw when missing. Since this worker runs in the same process as the web server, PORT should always be available.

Suggested fix
-    const port = process.env.PORT || "9000"
+    const port = process.env.PORT
+    if (!port) {
+      throw new Error("PORT environment variable is required for internal API calls")
+    }

As per coding guidelines: "Never write value || fallback or value ?? default for configuration - require values explicitly and throw when missing to fail fast and loud."

packages/tools/src/tools/workspace/schedule-resumption.ts (1)

30-34: Prefer interface over type for object shapes.

Suggested fix
-export type ScheduleResumptionParams = {
-  delay_minutes: number
-  reason: string
-  resume_message?: string
-}
+export interface ScheduleResumptionParams {
+  delay_minutes: number
+  reason: string
+  resume_message?: string
+}

As per coding guidelines: "Prefer interface for defining object shapes in TypeScript."

packages/job-queue/src/index.ts (2)

68-72: Minor startup race: concurrent startJobQueue() calls could double-start.

started is checked at line 69 but set at line 94 after async work. Two concurrent callers entering before started = true would both proceed past the guard. Since this is called once during instrumentation startup, the practical risk is low, but a simple starting flag would close it.

Suggested fix
 let started = false
+let starting = false

 export async function startJobQueue(onEvent?: JobQueueEventHandler): Promise<void> {
-  if (started) {
-    console.log("[JobQueue] Already started")
+  if (started || starting) {
+    console.log("[JobQueue] Already started or starting")
     return
   }
+  starting = true

   const instance = getJobQueue()
   // ... existing code ...
   started = true
+  starting = false
   console.log("[JobQueue] All workers registered")
 } catch (error) {
   console.error("[JobQueue] Failed during queue/worker setup, stopping pg-boss:", error)
+  starting = false
   // ... existing cleanup ...

194-208: unscheduleAutomation requires internal pg-boss schema access because the public API doesn't support cancel-by-singletonKey.

pg-boss 10.1.5's public API only provides cancel(queueName, jobId). There is no built-in method to cancel by singletonKey. To cancel a scheduled job by its name, you must either:

  1. Store the returned jobId and pass it to a modified unscheduleAutomation(name, jobId) + use public boss.cancel(), OR
  2. Query pgboss.job directly (current approach)

Since the function signature only accepts name, option 2 is the current choice. This is acceptable, but document the pg-boss internal dependency (table name pgboss.job, column singleton_key). If pg-boss upgrades rename internal columns, this will silently fail—add a comment and ensure pg-boss version upgrades include a manual check of the pgboss.job schema.

Also, line 170: use options?.tz ?? "UTC" instead of || to avoid treating empty string "" as falsy.

Comment on lines +10 to +18
2. **ALWAYS USE BUN** - Runtime and package manager
3. **GIT HOOKS** - Pre-push hooks run automatically; if they fail, fix the issues (don't force-push)
4. **NO RANDOM ENV VARS** - Don't add environment variables unless absolutely necessary. Use existing config, constants, or code-level defaults instead. Adding .env variables creates deployment complexity and hidden dependencies.
5. **NO EXPLORE AGENT** - Never use `Task(subagent_type=Explore)`. Use Glob and Grep directly instead - they're faster and more precise for this codebase.
6. **USE THE BRAIN** - Query `use_this_to_remember.db` for past decisions, insights, and context before starting work. Store important learnings when you're done.
7. **CADDYFILE IS LARGE** - The generated sites file at `/root/webalive/claude-bridge/ops/caddy/generated/Caddyfile.sites` (synced from `/var/lib/claude-bridge/generated/Caddyfile.sites`) is too large to read in one go. Use `Read` with `offset` and `limit` parameters, or use `Grep` to find specific domain configurations.
8. **OWN YOUR CHANGES** - When deploying or committing, NEVER say "these unrelated changes are not mine" or refuse to include changes in the working directory. If changes exist, they are part of the current work. Take responsibility and include them.
9. **SEEMINGLY UNRELATED ISSUES ARE OFTEN RELATED** - When you see multiple errors or issues, assume they share a common cause until proven otherwise. Type errors in test files often stem from the same interface change. Build failures across packages usually have one root cause. Don't treat each error as isolated - find the pattern first.
10. **INVESTIGATE BEFORE FIXING** - When something is "broken", first understand what it IS. Not all `*.goalive.nl` domains are Vite websites. Check nginx config, caddy-shell config, and existing services before creating anything new.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor numbering gap: rules skip from 10 to 12.

After removing the "NEVER COMMIT" rule and renumbering 2–10, the "Architecture Smell Detector" on Line 33 still says 12. — there is no rule 11. Either renumber it to 11. or drop the number entirely since it has its own ## section heading.

🧰 Tools
🪛 LanguageTool

[style] ~11-~11: Consider using a different verb for a more formal wording.
Context: ... hooks run automatically; if they fail, fix the issues (don't force-push) 4. **NO R...

(FIX_RESOLVE)

🤖 Prompt for AI Agents
In `@AGENTS.md` around lines 10 - 18, The numbered rules in AGENTS.md currently
skip 11 (rules list ends at 10 but the "Architecture Smell Detector" heading is
prefixed with "12."); update the heading to be consistent by changing the
leading "12." to "11." or remove the numeric prefix entirely so the
"Architecture Smell Detector" section header stands alone; locate the heading
text "Architecture Smell Detector" in the file and adjust its numbering to match
the renumbered rules 2–10.

Comment on lines +85 to 104
// Enqueue via pg-boss for immediate execution
const { enqueueAutomation } = await import("@webalive/job-queue")
const pgBossJobId = await enqueueAutomation({
jobId: job.id,
userId: job.user_id,
orgId: job.org_id,
workspace: hostname,
prompt: job.action_prompt,
timeoutSeconds: job.action_timeout_seconds || 300,
model: job.action_model || undefined,
thinkingPrompt: job.action_thinking || undefined,
skills: job.skills || undefined,
})

// Return complete response with timing info
return NextResponse.json({
ok: result.success,
startedAt: startedAt.toISOString(),
completedAt: new Date().toISOString(),
durationMs: result.durationMs,
timeoutSeconds: job.action_timeout_seconds || 300,
error: result.error,
response: result.response,
ok: true,
queued: true,
pgBossJobId,
message: `Automation "${job.name}" has been queued for immediate execution.`,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

enqueueAutomation can return null on failure — response incorrectly reports queued: true.

Per the enqueueAutomation signature in packages/job-queue/src/index.ts (line 136-144), the function returns Promise<string | null>. If null is returned (e.g., pg-boss fails to insert), the response still says { ok: true, queued: true }, misleading the caller.

📝 Suggested fix
     const { enqueueAutomation } = await import("@webalive/job-queue")
     const pgBossJobId = await enqueueAutomation({
       jobId: job.id,
       userId: job.user_id,
       orgId: job.org_id,
       workspace: hostname,
       prompt: job.action_prompt,
-      timeoutSeconds: job.action_timeout_seconds || 300,
+      timeoutSeconds: job.action_timeout_seconds ?? 300,
       model: job.action_model || undefined,
       thinkingPrompt: job.action_thinking || undefined,
       skills: job.skills || undefined,
     })

+    if (!pgBossJobId) {
+      return structuredErrorResponse(ErrorCodes.ENQUEUE_FAILED, { status: 500 })
+    }
+
     return NextResponse.json({
       ok: true,
       queued: true,
       pgBossJobId,
       message: `Automation "${job.name}" has been queued for immediate execution.`,
     })
🤖 Prompt for AI Agents
In `@apps/web/app/api/automations/`[id]/trigger/route.ts around lines 85 - 104,
The response always reports queued: true even though enqueueAutomation (from
`@webalive/job-queue`) can return null; update the handler in route.ts to check
the returned pgBossJobId from enqueueAutomation and handle the null case: if
pgBossJobId is non-null return { ok: true, queued: true, pgBossJobId, message...
}, otherwise return an error response (e.g., { ok: false, queued: false,
message: "Failed to enqueue automation" } with an appropriate non-2xx status)
and log the failure; reference the enqueueAutomation call and the pgBossJobId
variable when implementing this check.

Comment on lines +268 to +294
// Sync to pg-boss if this is an active cron automation
if (data.is_active && data.trigger_type === "cron" && data.cron_schedule && data.action_prompt) {
try {
const { scheduleAutomation } = await import("@webalive/job-queue")
await scheduleAutomation(
`automation-${data.id}`,
data.cron_schedule,
{
jobId: data.id,
userId: user.id,
orgId: domain.org_id,
workspace: siteCheck.hostname!,
prompt: data.action_prompt,
timeoutSeconds: data.action_timeout_seconds || 300,
model: data.action_model || undefined,
thinkingPrompt: data.action_thinking || undefined,
skills: data.skills || undefined,
cronSchedule: data.cron_schedule,
cronTimezone: data.cron_timezone || "UTC",
},
{ tz: data.cron_timezone || "UTC" },
)
} catch (err) {
console.error("[Automations API] Failed to sync pg-boss schedule:", err)
// Don't fail the request — the job is in Supabase, sync will catch it on next restart
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

|| 300 masks a potential zero timeout — use ?? instead.

On line 281, data.action_timeout_seconds || 300 will fall back to 300 if the value is 0, which is a valid (if unusual) input. The nullish coalescing operator preserves 0 while still defaulting null/undefined.

📝 Suggested fix
-            timeoutSeconds: data.action_timeout_seconds || 300,
+            timeoutSeconds: data.action_timeout_seconds ?? 300,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Sync to pg-boss if this is an active cron automation
if (data.is_active && data.trigger_type === "cron" && data.cron_schedule && data.action_prompt) {
try {
const { scheduleAutomation } = await import("@webalive/job-queue")
await scheduleAutomation(
`automation-${data.id}`,
data.cron_schedule,
{
jobId: data.id,
userId: user.id,
orgId: domain.org_id,
workspace: siteCheck.hostname!,
prompt: data.action_prompt,
timeoutSeconds: data.action_timeout_seconds || 300,
model: data.action_model || undefined,
thinkingPrompt: data.action_thinking || undefined,
skills: data.skills || undefined,
cronSchedule: data.cron_schedule,
cronTimezone: data.cron_timezone || "UTC",
},
{ tz: data.cron_timezone || "UTC" },
)
} catch (err) {
console.error("[Automations API] Failed to sync pg-boss schedule:", err)
// Don't fail the request — the job is in Supabase, sync will catch it on next restart
}
}
// Sync to pg-boss if this is an active cron automation
if (data.is_active && data.trigger_type === "cron" && data.cron_schedule && data.action_prompt) {
try {
const { scheduleAutomation } = await import("@webalive/job-queue")
await scheduleAutomation(
`automation-${data.id}`,
data.cron_schedule,
{
jobId: data.id,
userId: user.id,
orgId: domain.org_id,
workspace: siteCheck.hostname!,
prompt: data.action_prompt,
timeoutSeconds: data.action_timeout_seconds ?? 300,
model: data.action_model || undefined,
thinkingPrompt: data.action_thinking || undefined,
skills: data.skills || undefined,
cronSchedule: data.cron_schedule,
cronTimezone: data.cron_timezone || "UTC",
},
{ tz: data.cron_timezone || "UTC" },
)
} catch (err) {
console.error("[Automations API] Failed to sync pg-boss schedule:", err)
// Don't fail the request — the job is in Supabase, sync will catch it on next restart
}
}
🤖 Prompt for AI Agents
In `@apps/web/app/api/automations/route.ts` around lines 268 - 294, The fallback
for action timeout uses the logical OR which treats 0 as falsy; update the call
to scheduleAutomation so the timeout uses nullish coalescing (use
data.action_timeout_seconds ?? 300) instead of data.action_timeout_seconds ||
300 to preserve a legitimate 0 value (locate this in the scheduleAutomation
argument object inside the pg-boss sync block in route.ts).

Comment on lines +321 to +333
it("should return 500 when body is invalid JSON", async () => {
const req = new Request("http://localhost:9000/api/internal/run-automation", {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-Internal-Auth": TEST_SECRET,
},
body: "not json at all",
})

const res = await POST(req)
expect(res.status).toBe(500)
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and display the run-automation route handler
echo "=== run-automation route (POST handler) ==="
fd 'route.ts' --type f | grep -E 'internal/run-automation.*route\.ts$' | head -1 | xargs cat

echo ""
echo "=== resume-conversation route (POST handler) ==="
fd 'route.ts' --type f | grep -E 'internal/resume-conversation.*route\.ts$' | head -1 | xargs cat

Repository: eenlars/alive

Length of output: 6864


Add explicit JSON parse error handling to the route and update test to expect 400.

The run-automation route lacks the nested try-catch around req.json() that resume-conversation implements. Invalid JSON currently falls through to the outer catch block, returning 500 INTERNAL_ERROR instead of 400 INVALID_JSON. Add a dedicated catch for JSON parsing errors (matching the resume-conversation pattern) and update this test to expect 400:

try {
  let body: unknown
  try {
    body = await req.json()
  } catch {
    return createErrorResponse(ErrorCodes.INVALID_JSON, 400)
  }
  // ... rest of logic
} catch (error) {
  console.error("[Internal/RunAutomation] Error:", error)
  return createErrorResponse(ErrorCodes.INTERNAL_ERROR, 500)
}
🤖 Prompt for AI Agents
In `@apps/web/app/api/internal/run-automation/__tests__/route.test.ts` around
lines 321 - 333, The POST handler for run-automation currently lets invalid JSON
bubble to the outer catch and returns a 500; add a nested try/catch around await
req.json() inside the POST function so JSON parse failures return
createErrorResponse(ErrorCodes.INVALID_JSON, 400) (follow the same pattern used
in resume-conversation), and then update the test in route.test.ts to expect
status 400 instead of 500 for the "should return 500 when body is invalid JSON"
case; ensure you reference the POST handler, req.json(), createErrorResponse,
and ErrorCodes.INVALID_JSON when making the change.

Comment on lines +26 to +35
try {
const body = await req.json()

const { jobId, userId, orgId, workspace, prompt, timeoutSeconds, model, thinkingPrompt, skills } = body

if (!jobId || !userId || !orgId || !workspace || !prompt) {
return createErrorResponse(ErrorCodes.INVALID_REQUEST, 400, {
field: "jobId, userId, orgId, workspace, prompt",
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Malformed JSON body returns 500 instead of 400.

If req.json() throws (e.g. malformed JSON), the outer catch on line 55 returns INTERNAL_ERROR (500). The sibling resume-conversation route handles this correctly with a nested try-catch returning INVALID_JSON (400). Apply the same pattern here.

🐛 Proposed fix
   try {
-    const body = await req.json()
+    let body: unknown
+    try {
+      body = await req.json()
+    } catch {
+      return createErrorResponse(ErrorCodes.INVALID_JSON, 400)
+    }
 
     const { jobId, userId, orgId, workspace, prompt, timeoutSeconds, model, thinkingPrompt, skills } = body
🤖 Prompt for AI Agents
In `@apps/web/app/api/internal/run-automation/route.ts` around lines 26 - 35, The
route handler in route.ts currently lets req.json() errors bubble up and return
INTERNAL_ERROR; wrap the req.json() call in its own try-catch (same pattern as
the sibling resume-conversation route) and on JSON parse failure return
createErrorResponse(ErrorCodes.INVALID_JSON, 400, { field: "body" }) instead of
letting the outer catch produce a 500; update the block that destructures body
(jobId, userId, orgId, workspace, prompt, ...) to run only after the nested try
succeeds.

Comment on lines +10 to +12
// server-config.json isn't guaranteed to exist in local dev or CI; gate these tests
// on the actual presence of configured domains instead of env heuristics.
const hasStreamDomains = Boolean(DOMAINS.STREAM_DEV) && Boolean(DOMAINS.STREAM_DEV_HOST) && Boolean(DOMAINS.STREAM_PROD)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

hasStreamDomains guard is missing DOMAINS.STREAM_PROD_HOST.

The tests on lines 64 and 74 pass DOMAINS.STREAM_PROD_HOST as the x-forwarded-host header, but the guard only checks DOMAINS.STREAM_PROD. If STREAM_PROD is set while STREAM_PROD_HOST is not, those tests will run with an undefined host and fail.

Proposed fix
-const hasStreamDomains = Boolean(DOMAINS.STREAM_DEV) && Boolean(DOMAINS.STREAM_DEV_HOST) && Boolean(DOMAINS.STREAM_PROD)
+const hasStreamDomains = Boolean(DOMAINS.STREAM_DEV) && Boolean(DOMAINS.STREAM_DEV_HOST) && Boolean(DOMAINS.STREAM_PROD) && Boolean(DOMAINS.STREAM_PROD_HOST)
🤖 Prompt for AI Agents
In `@apps/web/lib/request-url.test.ts` around lines 10 - 12, The guard variable
hasStreamDomains is missing DOMAINS.STREAM_PROD_HOST so tests that pass
STREAM_PROD_HOST as x-forwarded-host may run with an undefined host; update the
hasStreamDomains expression to include Boolean(DOMAINS.STREAM_PROD_HOST)
alongside DOMAINS.STREAM_DEV, DOMAINS.STREAM_DEV_HOST and DOMAINS.STREAM_PROD
(i.e., ensure hasStreamDomains checks STREAM_DEV, STREAM_DEV_HOST, STREAM_PROD,
and STREAM_PROD_HOST) so the tests that rely on STREAM_PROD_HOST are properly
gated.

Comment on lines +148 to +165

// Validate webhook secret (optional, per trigger)
if (trigger.secret) {
const sig = req.headers.get('x-webhook-signature')
if (!verifySignature(sig, await req.text(), trigger.secret)) {
return Response.json({ error: 'Invalid signature' }, { status: 401 })
}
}

// Enqueue the job
const payload = await req.json()
await boss.send(trigger.queue, {
...trigger.jobData,
webhookPayload: payload, // Pass webhook data to the job
})

return Response.json({ ok: true, queued: true })
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Webhook example consumes the request body twice.

In the webhook trigger example, await req.text() is called on line 152 for signature verification, and then await req.json() is called on line 158. A Request body can only be consumed once — the second call will throw. The example should clone the body or read it once and parse.

📝 Suggested fix for the example
-  if (trigger.secret) {
-    const sig = req.headers.get('x-webhook-signature')
-    if (!verifySignature(sig, await req.text(), trigger.secret)) {
-      return Response.json({ error: 'Invalid signature' }, { status: 401 })
-    }
-  }
-
-  // Enqueue the job
-  const payload = await req.json()
+  const rawBody = await req.text()
+
+  if (trigger.secret) {
+    const sig = req.headers.get('x-webhook-signature')
+    if (!verifySignature(sig, rawBody, trigger.secret)) {
+      return Response.json({ error: 'Invalid signature' }, { status: 401 })
+    }
+  }
+
+  // Enqueue the job
+  const payload = JSON.parse(rawBody)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Validate webhook secret (optional, per trigger)
if (trigger.secret) {
const sig = req.headers.get('x-webhook-signature')
if (!verifySignature(sig, await req.text(), trigger.secret)) {
return Response.json({ error: 'Invalid signature' }, { status: 401 })
}
}
// Enqueue the job
const payload = await req.json()
await boss.send(trigger.queue, {
...trigger.jobData,
webhookPayload: payload, // Pass webhook data to the job
})
return Response.json({ ok: true, queued: true })
}
const rawBody = await req.text()
if (trigger.secret) {
const sig = req.headers.get('x-webhook-signature')
if (!verifySignature(sig, rawBody, trigger.secret)) {
return Response.json({ error: 'Invalid signature' }, { status: 401 })
}
}
// Enqueue the job
const payload = JSON.parse(rawBody)
await boss.send(trigger.queue, {
...trigger.jobData,
webhookPayload: payload, // Pass webhook data to the job
})
return Response.json({ ok: true, queued: true })
}
🤖 Prompt for AI Agents
In `@docs/architecture/job-system-pgboss.md` around lines 148 - 165, The webhook
example consumes the Request body twice causing the second read to fail; fix by
reading the body once and reusing it for both signature verification and parsing
— e.g., read the raw text once (used by verifySignature) and then parse that
same text into JSON for webhookPayload instead of calling req.text() and
req.json() separately; update the usage around verifySignature, the payload
assignment, and webhookPayload to use the single-read value (references:
verifySignature, req.text(), req.json(), webhookPayload, trigger.secret).

Comment on lines +288 to +292

if (!sdkSessionId) {
console.warn(`[ResumeWorker] No session found for ${sessionKey}, skipping`)
return // Job completes (won't retry — session is gone)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Example code logs the full session key — avoid propagating this pattern.

Line 290 logs sessionKey directly. Per project guidelines, session keys must never be exposed in logs. Since architecture docs often serve as copy-paste references, consider masking or truncating the key in the example.

📝 Suggested fix
-    console.warn(`[ResumeWorker] No session found for ${sessionKey}, skipping`)
+    console.warn(`[ResumeWorker] No session found, skipping (key hash: ${sessionKey.slice(-8)})`)

As per coding guidelines: "Session keys should never be exposed in logs or API responses. Always sanitize logs that might contain session data."

🤖 Prompt for AI Agents
In `@docs/architecture/job-system-pgboss.md` around lines 288 - 292, The example
logs the full sessionKey in the ResumeWorker check; update the console.warn in
the block where sdkSessionId is checked to avoid printing sessionKey directly
(e.g., mask or truncate the key or replace it with a fixed-length
hash/identifier) so that only a non-sensitive, redacted representation of
sessionKey is logged; reference the variables sdkSessionId and sessionKey and
the ResumeWorker message to locate and change the log call accordingly.

Comment on lines +137 to +145
export async function enqueueAutomation(
payload: RunAutomationPayload,
options?: { startAfter?: number },
): Promise<string | null> {
const instance = getJobQueue()
return instance.send(QUEUES.RUN_AUTOMATION, payload, {
...(options?.startAfter && { startAfter: options.startAfter }),
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

startAfter: 0 is silently dropped due to falsy check.

options?.startAfter && { startAfter: ... } evaluates to false when startAfter is 0. While 0 delay may not be a typical use case, using a nullish check is safer:

Suggested fix
   return instance.send(QUEUES.RUN_AUTOMATION, payload, {
-    ...(options?.startAfter && { startAfter: options.startAfter }),
+    ...(options?.startAfter != null && { startAfter: options.startAfter }),
   })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function enqueueAutomation(
payload: RunAutomationPayload,
options?: { startAfter?: number },
): Promise<string | null> {
const instance = getJobQueue()
return instance.send(QUEUES.RUN_AUTOMATION, payload, {
...(options?.startAfter && { startAfter: options.startAfter }),
})
}
export async function enqueueAutomation(
payload: RunAutomationPayload,
options?: { startAfter?: number },
): Promise<string | null> {
const instance = getJobQueue()
return instance.send(QUEUES.RUN_AUTOMATION, payload, {
...(options?.startAfter != null && { startAfter: options.startAfter }),
})
}
🤖 Prompt for AI Agents
In `@packages/job-queue/src/index.ts` around lines 137 - 145, The spread currently
drops startAfter when it is 0 because the truthy check (options?.startAfter && {
startAfter: ... }) treats 0 as false; update enqueueAutomation so it only omits
startAfter when it is null/undefined (not when it is 0). Locate
enqueueAutomation (and references to QUEUES.RUN_AUTOMATION, getJobQueue, and
instance.send) and replace the conditional spread with a nullish check or
explicit build of an options object (e.g., add startAfter only if
options?.startAfter != null) before calling instance.send so a value of 0 is
preserved.

Comment on lines +84 to +88
if (response.status === 404) {
console.warn(`[ResumeWorker] Session not found for ${sessionKey}, skipping (no retry)`)
// Return normally — pg-boss marks as complete (no retry)
return
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Session key logged in plaintext.

Line 85 logs the full sessionKey which contains userId::workspace::tabGroupId::tabId. Per coding guidelines, "Session keys should never be exposed in logs or API responses. Always sanitize logs that might contain session data."

Log the individual non-sensitive components (e.g., workspace) instead of the composite key.

🔒 Proposed fix
       if (response.status === 404) {
-        console.warn(`[ResumeWorker] Session not found for ${sessionKey}, skipping (no retry)`)
+        console.warn(`[ResumeWorker] Session not found for ${workspace}, skipping (no retry)`)
         // Return normally — pg-boss marks as complete (no retry)
         return
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (response.status === 404) {
console.warn(`[ResumeWorker] Session not found for ${sessionKey}, skipping (no retry)`)
// Return normally — pg-boss marks as complete (no retry)
return
}
if (response.status === 404) {
console.warn(`[ResumeWorker] Session not found for ${workspace}, skipping (no retry)`)
// Return normally — pg-boss marks as complete (no retry)
return
}
🤖 Prompt for AI Agents
In `@packages/job-queue/src/workers/resume-conversation.ts` around lines 84 - 88,
The log prints the full sensitive sessionKey; modify the handler in
resume-conversation (where response.status === 404 and sessionKey is referenced)
to avoid logging the composite key — split sessionKey by '::' into parts
(sessionKey variable) and log only the non-sensitive component(s) such as
workspace (e.g., parts[1]) or other allowed identifiers, ensuring
userId/tabGroupId/tabId are not included in the message and update the
console.warn to include the sanitized value instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant