Conversation
Test & Lint
|
noa-lucent
left a comment
There was a problem hiding this comment.
Runner backend/client integration still breaks core invariants:
- The Fastify endpoints always respond with generic 500/404 codes and drop dockerode's original , so platform-server can no longer treat benign 304/404/409 cases as no-ops (cleanup + teardown flows will start failing when the runner is enabled).
- The Http client rethrows failures as plain s without any status metadata, compounding the issue because callers have no way to distinguish an expected race from a real failure.
- SSE helpers (, ) never check the HTTP status, so auth/config issues now result in silent empty streams instead of actionable errors.
Please address these to preserve existing semantics before we rely on the runner backend.}
packages/platform-server/src/infra/container/httpDockerRunner.client.ts
Outdated
Show resolved
Hide resolved
packages/platform-server/src/infra/container/httpDockerRunner.client.ts
Outdated
Show resolved
Hide resolved
Test & Lint
|
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for addressing the earlier blockers. The runner now relays dockerade status codes via sendDockerError, the HTTP client surfaces them through DockerRunnerRequestError, and the SSE helpers fail fast whenever the runner rejects a request. Everything looks good from my side.
packages/platform-server/src/infra/container/httpDockerRunner.client.ts
Outdated
Show resolved
Hide resolved
packages/platform-server/src/infra/container/httpDockerRunner.client.ts
Outdated
Show resolved
Hide resolved
|
Summary:\n- align ConfigService with the updated schema so runner settings satisfy the typed contract\n- tighten HttpDockerRunnerClient typing for ensureImage/platform + interactive exec payloads\n- refresh the docker-runner Fastify service for Zod v4, runner label lookups, SSE, and websocket typing\n\nTesting:\n- pnpm --filter @agyn/docker-runner build\n- pnpm lint |
Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Runner-only mode still has a critical gap: the endpoint parses Docker’s event stream per chunk and drops any event that isn’t perfectly aligned with chunk boundaries. Docker emits newline-delimited JSON, so partial/multi-event chunks are common—those events are silently discarded. Now that platform-server always depends on runner to receive container lifecycle events, losing them breaks cleanup and status tracking. Please buffer/split on newlines (like the platform watcher does) so every event is forwarded exactly once.
Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
The endpoint now buffers newline-delimited JSON via the new , flushes trailing data on close, and is covered by targeted tests. Docker events will no longer be dropped when chunk boundaries split objects, so the runner backend preserves the full lifecycle stream. LGTM.
Test, Lint & Build Summary
|
Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Verified the access-key removal: config schema, env templates, docs, Docker runner config, and Http client now rely solely on DOCKER_RUNNER_SHARED_SECRET, and the Fastify auth guard no longer expects X-DR-Access-Key. No lingering references or mismatched tests found.
Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Change scope is limited to loading dotenv for the docker-runner dev entrypoint. Verified README guidance, package dependency, module, main entry import, and lockfile updates—all consistent and guarded so production () still bypasses files. Looks good.
|
Tests & Lint Summary:
|
|
Tests & Lint Summary:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-verified the latest changes: the runner’s interactive exec route now serves 426 responses for non-upgrade HTTP calls while returning 101 on successful upgrades, and the shared websocket util ensures sockets close safely. Added runner WS tests cover both behaviors. On the platform side, the terminal gateway now traps handleConnection rejections and closes the socket with 1011, with regression tests proving the failure path. Everything looks solid.
|
Resolved the Build UI failure: Scope: 6 of 7 workspace projects |
|
Root cause for missing nix installs was two-fold: (1) the runner API rejected |
noa-lucent
left a comment
There was a problem hiding this comment.
Re-checked the timeout fixes: runner now accepts 0ms exec/idle timeouts, and the platform client extends HTTP timeouts to cover longer exec requests (with coverage tests). Added exec shell-semantic regression test ensures string commands still run via /bin/sh. Looks solid.
|
Root cause: when we switched workspace provisioning over to the docker-runner HTTP client we stopped instantiating the local ContainerService, so nothing ever called |
d1fe4ce to
a329e1d
Compare
Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Rebase looks clean. The only delta pulls the shared runnerConfigDefaults into the fs persistence graph test, ensuring the config schema still validates after the merge. No regressions spotted—still good to go.
Summary
Testing
Closes #1272