Skip to content

feat: switch to Chainguard base images for slim Docker builds#5

Merged
franchb merged 1 commit intomainfrom
chore/chainguard-docker-images
Feb 24, 2026
Merged

feat: switch to Chainguard base images for slim Docker builds#5
franchb merged 1 commit intomainfrom
chore/chainguard-docker-images

Conversation

@franchb
Copy link
Owner

@franchb franchb commented Feb 24, 2026

Summary

  • Switch Docker base images from python:3.14-slim to Chainguard images (distroless runtime, wolfi-base for standalone) — reduces api-only from 1.93GB to 816MB uncompressed (58% reduction)
  • Remove all INCLUDE_LOCAL_MODELS/PRELOAD_ML_MODELS conditionals (dead code in this fork) and add Python entrypoint for distroless api-only target (no shell available)
  • Add CI smoke tests (import hindsight_api; import tiktoken) and image size reporting; expand .dockerignore to exclude unnecessary directories from build context

Test plan

  • podman build --target api-only succeeds locally
  • podman build --target standalone succeeds locally
  • Smoke test passes: podman run --rm --entrypoint python <image> -c "import hindsight_api; import tiktoken; print('OK')"
  • Distroless image has no shell: podman run --rm --entrypoint sh <image> fails as expected
  • Standalone image has bash, curl, node, python3, hindsight-api CLI all working
  • CI workflow builds and smoke tests pass on GitHub Actions
  • Release workflow builds without obsolete build-args

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Restructured Docker build configuration with multi-stage Chainguard-based layout and explicit image targets for improved optimization.
    • Implemented smoke tests to validate Docker image builds during the CI pipeline.
    • Added startup dependency checks for database and LLM service health verification in container initialization.

…ro CVEs

Replace python:3.14-slim and wolfi-base with Chainguard images:
- api-only: cgr.dev/chainguard/python:latest (distroless, no shell, 816MB vs 1.93GB)
- standalone: cgr.dev/chainguard/wolfi-base with python-3.14 + nodejs-20
- builder: cgr.dev/chainguard/python:latest-dev (has uv, apk, bash)

Key changes:
- Remove all INCLUDE_LOCAL_MODELS/PRELOAD_ML_MODELS conditionals (always false in this fork)
- Add Python entrypoint for distroless api-only (no shell available)
- Move tiktoken preload to builder stage, COPY cache to runtime
- Clean venv of tests/caches/build artifacts in builder
- Add BuildKit cache mounts for uv
- Expand .dockerignore to exclude unnecessary directories
- Add CI smoke tests (import hindsight_api + tiktoken) and image size reporting
- Remove obsolete build-args from release workflow

Verified locally with podman: both targets build, pass smoke tests,
and distroless image correctly has no shell.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

Docker build infrastructure modernization: expanded .dockerignore exclusions, refactored CI/release workflows to use post-build smoke tests instead of build arguments, completely rewrote the Dockerfile with Chainguard-based multi-stage layout and explicit build targets, and introduced a new Python entrypoint script for container startup with optional dependency waiting.

Changes

Cohort / File(s) Summary
Docker Build Context
.dockerignore
Expanded excluded paths to include .github/, multiple hindsight-related directories (hindsight-docs/, hindsight-cli/, hindsight-dev/, etc.), helm, cookbook, monitoring, scripts, and environment files while retaining exceptions for critical directories and package manifests.
CI/Release Workflows
.github/workflows/ci.yml, .github/workflows/release.yml
Removed per-target INCLUDE_LOCAL_MODELS and PRELOAD_ML_MODELS build arguments from both workflows; CI now uses post-build smoke tests with Python import validation instead of build-args for runtime verification; release workflow simplified by removing build-args block.
Dockerfile Multi-Stage Rewrite
docker/standalone/Dockerfile
Migrated from legacy base images to Chainguard/wolfi-base and Chainguard Python; introduced explicit build targets (api-only, standalone, cp-only); removed conditional build-arg gating; optimized by pre-downloading tiktoken cache during API build; consolidated multi-stage composition with direct COPY references; replaced shell entrypoint with direct Python invocation.
Container Entrypoint Logic
docker/standalone/entrypoint.py
New distroless-compatible Python entrypoint with optional dependency waiting (_wait_for_deps), TCP/HTTP health checks for database and LLM services, graceful signal handling (SIGTERM/SIGINT), and dynamic import of the main API module.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • ci: replace upstream workflows with minimal fork CI #1: Directly related—modifies the same CI/release workflows (.github/workflows/ci.yml and release.yml) and Docker build infrastructure (docker/standalone Dockerfile), introducing consistent changes to build validation and Dockerfile structure.

Poem

🐰 Building layers, Chainguard-clean,
Multi-stage magic, precise and lean,
Tiktoken cached before the run,
Smoke tests whisper—now we're done!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: switching Docker base images to Chainguard for improved slim builds, which is the primary objective reflected across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/chainguard-docker-images

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docker/standalone/Dockerfile (1)

149-178: ⚠️ Potential issue | 🟡 Minor

cp-only stage runs as root — no USER directive.

Every other final stage either uses USER nonroot (standalone) or inherits the Chainguard default nonroot UID (api-only). The cp-only stage runs as root, which contradicts the PR's stated goal of non-root execution and standard container hardening.

🛡️ Proposed fix
 CMD ["/app/start-all.sh"]

Add before CMD:

+USER node
 CMD ["/app/start-all.sh"]

The node:20-alpine image ships a node user (UID 1000) that is appropriate for running Node.js workloads.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/Dockerfile` around lines 149 - 178, The cp-only build stage
currently runs as root (stage name "cp-only" using base image node:20-alpine)
which breaks the non-root goal; fix it by switching the stage to run as the
existing non-root node user provided by node:20-alpine (e.g., add a USER
declaration such as "USER node" before the final CMD) so files copied into /app
and the startup script /app/start-all.sh execute as that non-root user; ensure
WORKDIR /app and copied assets are accessible to that user (adjust ownership or
permissions if needed) and verify the CMD ("/app/start-all.sh") still runs under
the node user.
🧹 Nitpick comments (6)
.github/workflows/ci.yml (2)

125-136: No Docker layer cache configured — each CI run rebuilds from scratch.

The tiktoken preload, apk add build-base, and full Python dependency installation will re-execute on every run. Adding cache-from/cache-to via GitHub's cache backend would significantly reduce build times (typically 5–10× for repeated runs).

💡 Suggested addition
     - name: Build ${{ matrix.name }} image
       uses: docker/build-push-action@v6
       with:
         context: .
         file: docker/standalone/Dockerfile
         target: ${{ matrix.target }}
         push: false
         load: true
         tags: hindsight-${{ matrix.name }}:test
+        cache-from: type=gha
+        cache-to: type=gha,mode=max
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 125 - 136, The workflow currently
builds images with docker/setup-buildx-action@v3 and docker/build-push-action@v6
(the "Build ${{ matrix.name }} image" step) without any layer caching, causing
repeated full rebuilds; update that build step to pass cache-from and cache-to
entries to docker/build-push-action using GitHub Actions cache backend
(type=gha) with a stable key (e.g., including matrix.name or matrix.target) and
cache-to mode=max to persist layers between runs, and ensure cache-from
references the same key so the build reuses previous layers for steps like
Python deps and apk add build-base.

117-120: Both targets share an identical smoke_cmd; the standalone image's Node.js/start-all.sh path is not exercised.

The smoke test validates Python importability only. A broken start-all.sh, missing Node.js binary, or broken Control Plane build in the standalone image would pass the test undetected. Consider adding a second smoke step for standalone that verifies Node.js:

- name: Smoke test standalone Node.js
  if: matrix.target == 'standalone'
  run: docker run --rm --entrypoint node hindsight-standalone-slim:test --version
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 117 - 120, The current smoke step uses
the same smoke_cmd for both targets and only checks Python imports; add a
separate smoke step for the standalone image that actually exercises
Node/entrypoint to catch broken start-all.sh or missing Node.js. Specifically,
for the matrix.target == 'standalone' case (referencing the "standalone" target
and the "standalone-slim" image name in the workflow), add a conditional
job/step that runs the container with docker run --rm --entrypoint node <image>
--version (or equivalent) to verify the Node binary is present and executable;
keep the existing Python smoke_cmd for the slim image. Ensure the step name
clearly indicates it is the standalone Node.js smoke test.
docker/standalone/Dockerfile (2)

39-39: Editable install (-e .) is non-standard for a production image.

Editable installs embed a direct_url.json pointer to the source directory. While this works here because the source stays at /app/api/hindsight_api, it's atypical for immutable container images and can confuse tooling. A regular install is cleaner.

♻️ Proposed refactor
-RUN --mount=type=cache,target=/root/.cache/uv \
-    uv pip install --no-deps -e .
+RUN --mount=type=cache,target=/root/.cache/uv \
+    uv pip install --no-deps .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/Dockerfile` at line 39, The Dockerfile currently uses an
editable pip install ("-e .") which leaves a direct_url.json pointer; change the
install to a standard non-editable install by removing the "-e" flag in the
install command (replace the "pip install --no-deps -e ." invocation with "pip
install --no-deps ."), then rebuild the image to ensure no editable-install
metadata (direct_url.json) is embedded; locate the line containing "pip install
--no-deps -e ." to make this change.

19-19: Mutable latest / latest-dev Chainguard tags on all three base images make builds non-reproducible.

The comment on lines 13–14 acknowledges this trade-off for free-tier Chainguard images, but the risk is worth restating: an upstream Python 3.14→3.15 bump would silently break the venv ABI compatibility between the api-builder stage and the standalone runtime stage. Pinning to a digest or a specific version tag (e.g., :3.14) when Chainguard makes one available would make this more predictable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/Dockerfile` at line 19, The Dockerfile uses mutable
Chainguard tags (e.g., the FROM line "FROM cgr.dev/chainguard/python:latest-dev
AS api-builder") which makes builds non-reproducible; update the Dockerfile to
pin each Chainguard base image used (the "api-builder" stage and the
"standalone" runtime stage and the third base image) to an immutable identifier
— either a specific version tag like ":3.14" or preferably a digest — so the
venv ABI and runtime layers remain stable across builds; replace the mutable
tags with the chosen tags/digests and document the pinned versions in the
Dockerfile comments.
docker/standalone/entrypoint.py (2)

36-50: _parse_db_host_port silently returns None for valid URLs that lack a path segment, skipping the DB health check.

A URL like postgresql://user:pass@host:5432 (no trailing /dbname) causes rest.index("/") to raise ValueError, which is caught and returns None. The consequence in _wait_for_deps is db_ok = True unconditionally, so the DB is never polled even though the URL is a well-formed connection string.

Replace the manual parser with urllib.parse.urlparse, which handles all these edge cases correctly.

♻️ Proposed refactor
+from urllib.parse import urlparse
+
 def _parse_db_host_port(db_url: str) -> tuple[str, int] | None:
     """Extract host and port from a PostgreSQL URL."""
-    # postgresql://user:pass@host:port/dbname
     try:
-        at_idx = db_url.index("@")
-        rest = db_url[at_idx + 1 :]
-        # Strip /dbname and ?params
-        slash_idx = rest.index("/")
-        host_port = rest[:slash_idx]
-        if ":" in host_port:
-            host, port_str = host_port.rsplit(":", 1)
-            return host, int(port_str)
-        return host_port, 5432
-    except (ValueError, IndexError):
+        parsed = urlparse(db_url)
+        if not parsed.hostname:
+            return None
+        return parsed.hostname, parsed.port or 5432
+    except Exception:
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/entrypoint.py` around lines 36 - 50, The current
_parse_db_host_port silently returns None for valid URLs missing a path because
it manually searches for "/" and catches ValueError; replace the manual parsing
with urllib.parse.urlparse in _parse_db_host_port to robustly extract
netloc/hostname and port (use parsed.hostname and parsed.port, default port 5432
when parsed.port is None), return (hostname, port) for valid postgres URLs and
None only for truly invalid inputs or missing hostname; keep the function
signature tuple[str,int] | None and preserve existing callers (e.g.,
_wait_for_deps) so DB health checks run for connection strings without a
trailing path.

48-48: Ruff TRY300: bare return on the non-: path should be in an else block.

Minor style flag. Moving to an else block makes the control flow explicit.

♻️ Proposed fix
         if ":" in host_port:
             host, port_str = host_port.rsplit(":", 1)
             return host, int(port_str)
-        return host_port, 5432
+        else:
+            return host_port, 5432
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/entrypoint.py` at line 48, The lone return "return
host_port, 5432" should be placed in an explicit else branch of the earlier
colon check instead of being a bare fall-through; edit the function that parses
the host/port (the branch that checks for a ":" in the host string, e.g. the `if
':' in host` / `if host_str.count(':')` block) so that the code returning the
default port is inside an `else:` corresponding to that `if`, making the control
flow explicit and removing the bare return path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/standalone/Dockerfile`:
- Around line 127-128: The tiktoken cache COPY in docker/standalone/Dockerfile
currently copies files as root:root which will break if tiktoken later tries to
refresh the cache; update the COPY --from=api-builder /tiktoken_cache
/home/nonroot/.tiktoken invocation to include --chown=nonroot:nonroot so the
files are owned by the nonroot user, and apply the same --chown change to the
equivalent COPY in the standalone stage to ensure consistent ownership for
runtime writes.
- Around line 28-32: The Dockerfile currently copies only pyproject.toml and
runs "uv sync --no-dev --no-install-project", which performs fresh dependency
resolution because no uv.lock is present; update the build to use a lockfile and
frozen installs by copying the appropriate uv.lock into the image (either a
project-specific hindsight-api/uv.lock generated by running "uv lock" in that
directory or the monorepo root uv.lock if hindsight-api is part of a workspace)
and add the "--frozen" flag to the "uv sync" invocation so dependency resolution
is deterministic and reproducible.

In `@docker/standalone/entrypoint.py`:
- Around line 58-59: The environment parsing for max_retries and retry_interval
can raise ValueError and crash the container; update the code that sets
max_retries and retry_interval in entrypoint.py to robustly parse the env vars
by wrapping the int(...) conversions in try/except blocks (or a small helper
like parse_int_env) that catches ValueError, logs a warning, and falls back to
the intended defaults ("0" for HINDSIGHT_RETRY_MAX and "10" for
HINDSIGHT_RETRY_INTERVAL); ensure the handler preserves the semantic that 0
means infinite and that any non-numeric value uses the fallback rather than
raising.
- Around line 26-33: The _check_http function currently allows any URL scheme
and uses a bare except; update it to parse and validate the URL scheme (use
urllib.parse.urlparse and only accept "http" or "https") before calling
urllib.request.urlopen (this prevents file:// or other schemes from being used,
e.g. for HINDSIGHT_API_LLM_BASE_URL), and replace the broad except Exception
with targeted exception handling (catch urllib.error.URLError and
urllib.error.HTTPError and optionally ValueError from parsing) so only
network/URL errors are swallowed and all other exceptions propagate; ensure
_check_http returns False for invalid schemes or caught network errors and True
only on a successful HTTP(S) response.

---

Outside diff comments:
In `@docker/standalone/Dockerfile`:
- Around line 149-178: The cp-only build stage currently runs as root (stage
name "cp-only" using base image node:20-alpine) which breaks the non-root goal;
fix it by switching the stage to run as the existing non-root node user provided
by node:20-alpine (e.g., add a USER declaration such as "USER node" before the
final CMD) so files copied into /app and the startup script /app/start-all.sh
execute as that non-root user; ensure WORKDIR /app and copied assets are
accessible to that user (adjust ownership or permissions if needed) and verify
the CMD ("/app/start-all.sh") still runs under the node user.

---

Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 125-136: The workflow currently builds images with
docker/setup-buildx-action@v3 and docker/build-push-action@v6 (the "Build ${{
matrix.name }} image" step) without any layer caching, causing repeated full
rebuilds; update that build step to pass cache-from and cache-to entries to
docker/build-push-action using GitHub Actions cache backend (type=gha) with a
stable key (e.g., including matrix.name or matrix.target) and cache-to mode=max
to persist layers between runs, and ensure cache-from references the same key so
the build reuses previous layers for steps like Python deps and apk add
build-base.
- Around line 117-120: The current smoke step uses the same smoke_cmd for both
targets and only checks Python imports; add a separate smoke step for the
standalone image that actually exercises Node/entrypoint to catch broken
start-all.sh or missing Node.js. Specifically, for the matrix.target ==
'standalone' case (referencing the "standalone" target and the "standalone-slim"
image name in the workflow), add a conditional job/step that runs the container
with docker run --rm --entrypoint node <image> --version (or equivalent) to
verify the Node binary is present and executable; keep the existing Python
smoke_cmd for the slim image. Ensure the step name clearly indicates it is the
standalone Node.js smoke test.

In `@docker/standalone/Dockerfile`:
- Line 39: The Dockerfile currently uses an editable pip install ("-e .") which
leaves a direct_url.json pointer; change the install to a standard non-editable
install by removing the "-e" flag in the install command (replace the "pip
install --no-deps -e ." invocation with "pip install --no-deps ."), then rebuild
the image to ensure no editable-install metadata (direct_url.json) is embedded;
locate the line containing "pip install --no-deps -e ." to make this change.
- Line 19: The Dockerfile uses mutable Chainguard tags (e.g., the FROM line
"FROM cgr.dev/chainguard/python:latest-dev AS api-builder") which makes builds
non-reproducible; update the Dockerfile to pin each Chainguard base image used
(the "api-builder" stage and the "standalone" runtime stage and the third base
image) to an immutable identifier — either a specific version tag like ":3.14"
or preferably a digest — so the venv ABI and runtime layers remain stable across
builds; replace the mutable tags with the chosen tags/digests and document the
pinned versions in the Dockerfile comments.

In `@docker/standalone/entrypoint.py`:
- Around line 36-50: The current _parse_db_host_port silently returns None for
valid URLs missing a path because it manually searches for "/" and catches
ValueError; replace the manual parsing with urllib.parse.urlparse in
_parse_db_host_port to robustly extract netloc/hostname and port (use
parsed.hostname and parsed.port, default port 5432 when parsed.port is None),
return (hostname, port) for valid postgres URLs and None only for truly invalid
inputs or missing hostname; keep the function signature tuple[str,int] | None
and preserve existing callers (e.g., _wait_for_deps) so DB health checks run for
connection strings without a trailing path.
- Line 48: The lone return "return host_port, 5432" should be placed in an
explicit else branch of the earlier colon check instead of being a bare
fall-through; edit the function that parses the host/port (the branch that
checks for a ":" in the host string, e.g. the `if ':' in host` / `if
host_str.count(':')` block) so that the code returning the default port is
inside an `else:` corresponding to that `if`, making the control flow explicit
and removing the bare return path.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66ab0b0 and 5964428.

📒 Files selected for processing (5)
  • .dockerignore
  • .github/workflows/ci.yml
  • .github/workflows/release.yml
  • docker/standalone/Dockerfile
  • docker/standalone/entrypoint.py
💤 Files with no reviewable changes (1)
  • .github/workflows/release.yml

Comment on lines +28 to +32
COPY hindsight-api/pyproject.toml hindsight-api/README.md ./

# Copy dependency files and README (required by pyproject.toml)
COPY hindsight-api/pyproject.toml ./api/
COPY hindsight-api/README.md ./api/
# Install dependencies only (project itself installed after source copy)
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --no-dev --no-install-project
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify uv.lock exists in hindsight-api
fd -e lock -g 'uv.lock' hindsight-api/

Repository: franchb/hindsight

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Check what files exist in hindsight-api/
echo "=== Contents of hindsight-api/ ===" 
ls -la hindsight-api/ | head -30

echo -e "\n=== Search for uv.lock anywhere in repo ==="
find . -name "*uv.lock*" -o -name "*uv*.lock" 2>/dev/null

echo -e "\n=== Check Dockerfile lines 28-32 ==="
sed -n '28,32p' docker/standalone/Dockerfile

Repository: franchb/hindsight

Length of output: 891


Non-reproducible dependency resolution: lockfile is not used during Docker build.

COPY brings in pyproject.toml and README.md, but no lockfile. Without a uv.lock in the Docker build context, uv sync performs fresh dependency resolution on every build, potentially installing different package versions than those locked in CI. A new transitive release between two Docker builds can silently alter the image's behavior.

There is no uv.lock in the hindsight-api/ directory (the proposed fix references a non-existent file). To fix this, either:

  • Run uv lock in hindsight-api/ to generate a project-specific lockfile, then update the COPY and add --frozen flag.
  • Or, if hindsight-api is part of a monorepo workspace managed by the root uv.lock, copy that lockfile and configure the build accordingly.

Without a lockfile and --frozen flag, the Docker build will not achieve reproducible, deterministic dependency resolution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/Dockerfile` around lines 28 - 32, The Dockerfile currently
copies only pyproject.toml and runs "uv sync --no-dev --no-install-project",
which performs fresh dependency resolution because no uv.lock is present; update
the build to use a lockfile and frozen installs by copying the appropriate
uv.lock into the image (either a project-specific hindsight-api/uv.lock
generated by running "uv lock" in that directory or the monorepo root uv.lock if
hindsight-api is part of a workspace) and add the "--frozen" flag to the "uv
sync" invocation so dependency resolution is deterministic and reproducible.

Comment on lines +127 to +128
# Copy tiktoken cache (root-owned but world-readable; tiktoken only reads)
COPY --from=api-builder /tiktoken_cache /home/nonroot/.tiktoken
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing --chown on tiktoken cache COPY: if tiktoken ever refreshes the cache, the nonroot user will get a permission error.

The files land as root:root. Tiktoken currently only reads the cache, so this works. However, if tiktoken ever attempts to write a stale-cache refresh (e.g., a new encoding added upstream), the nonroot user will hit a silent failure. Being explicit with ownership is more defensive.

🛡️ Proposed fix
-COPY --from=api-builder /tiktoken_cache /home/nonroot/.tiktoken
+COPY --chown=nonroot:nonroot --from=api-builder /tiktoken_cache /home/nonroot/.tiktoken

Apply the same fix to the standalone stage (line 194).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Copy tiktoken cache (root-owned but world-readable; tiktoken only reads)
COPY --from=api-builder /tiktoken_cache /home/nonroot/.tiktoken
# Copy tiktoken cache (root-owned but world-readable; tiktoken only reads)
COPY --chown=nonroot:nonroot --from=api-builder /tiktoken_cache /home/nonroot/.tiktoken
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/Dockerfile` around lines 127 - 128, The tiktoken cache COPY
in docker/standalone/Dockerfile currently copies files as root:root which will
break if tiktoken later tries to refresh the cache; update the COPY
--from=api-builder /tiktoken_cache /home/nonroot/.tiktoken invocation to include
--chown=nonroot:nonroot so the files are owned by the nonroot user, and apply
the same --chown change to the equivalent COPY in the standalone stage to ensure
consistent ownership for runtime writes.

Comment on lines +26 to +33
def _check_http(url: str, timeout: float = 5.0) -> bool:
"""Check if an HTTP endpoint returns a successful response."""
try:
req = urllib.request.Request(url, method="GET")
with urllib.request.urlopen(req, timeout=timeout):
return True
except Exception:
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Two Ruff findings in _check_http: unvalidated URL scheme (S310) and blind except Exception (BLE001).

  1. urllib.request.urlopen follows any scheme, including file://. If HINDSIGHT_API_LLM_BASE_URL is set to a file:// URI (misconfiguration or env-var injection), the function would silently read local files and return True.
  2. except Exception suppresses everything including KeyboardInterrupt subclasses and programming errors. Prefer specific urllib exceptions.
🛡️ Proposed fix
+import urllib.error
+
 def _check_http(url: str, timeout: float = 5.0) -> bool:
     """Check if an HTTP endpoint returns a successful response."""
+    if not url.startswith(("http://", "https://")):
+        print(f"Refusing non-HTTP URL: {url}", file=sys.stderr, flush=True)
+        return False
     try:
         req = urllib.request.Request(url, method="GET")
         with urllib.request.urlopen(req, timeout=timeout):
             return True
-    except Exception:
+    except (urllib.error.URLError, OSError):
         return False
🧰 Tools
🪛 Ruff (0.15.2)

[error] 29-29: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


[error] 30-30: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


[warning] 32-32: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/entrypoint.py` around lines 26 - 33, The _check_http
function currently allows any URL scheme and uses a bare except; update it to
parse and validate the URL scheme (use urllib.parse.urlparse and only accept
"http" or "https") before calling urllib.request.urlopen (this prevents file://
or other schemes from being used, e.g. for HINDSIGHT_API_LLM_BASE_URL), and
replace the broad except Exception with targeted exception handling (catch
urllib.error.URLError and urllib.error.HTTPError and optionally ValueError from
parsing) so only network/URL errors are swallowed and all other exceptions
propagate; ensure _check_http returns False for invalid schemes or caught
network errors and True only on a successful HTTP(S) response.

Comment on lines +58 to +59
max_retries = int(os.environ.get("HINDSIGHT_RETRY_MAX", "0")) # 0 = infinite
retry_interval = int(os.environ.get("HINDSIGHT_RETRY_INTERVAL", "10"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unhandled ValueError crashes the container on startup if env vars are non-numeric.

int(os.environ.get("HINDSIGHT_RETRY_MAX", "0")) and int(os.environ.get("HINDSIGHT_RETRY_INTERVAL", "10")) raise unhandled ValueError if the operator accidentally sets, e.g., HINDSIGHT_RETRY_MAX=infinite.

🛡️ Proposed fix
-    max_retries = int(os.environ.get("HINDSIGHT_RETRY_MAX", "0"))  # 0 = infinite
-    retry_interval = int(os.environ.get("HINDSIGHT_RETRY_INTERVAL", "10"))
+    try:
+        max_retries = int(os.environ.get("HINDSIGHT_RETRY_MAX", "0"))
+    except ValueError:
+        print("Invalid HINDSIGHT_RETRY_MAX; defaulting to 0 (infinite)", file=sys.stderr, flush=True)
+        max_retries = 0
+    try:
+        retry_interval = int(os.environ.get("HINDSIGHT_RETRY_INTERVAL", "10"))
+    except ValueError:
+        print("Invalid HINDSIGHT_RETRY_INTERVAL; defaulting to 10s", file=sys.stderr, flush=True)
+        retry_interval = 10
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
max_retries = int(os.environ.get("HINDSIGHT_RETRY_MAX", "0")) # 0 = infinite
retry_interval = int(os.environ.get("HINDSIGHT_RETRY_INTERVAL", "10"))
try:
max_retries = int(os.environ.get("HINDSIGHT_RETRY_MAX", "0"))
except ValueError:
print("Invalid HINDSIGHT_RETRY_MAX; defaulting to 0 (infinite)", file=sys.stderr, flush=True)
max_retries = 0
try:
retry_interval = int(os.environ.get("HINDSIGHT_RETRY_INTERVAL", "10"))
except ValueError:
print("Invalid HINDSIGHT_RETRY_INTERVAL; defaulting to 10s", file=sys.stderr, flush=True)
retry_interval = 10
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/standalone/entrypoint.py` around lines 58 - 59, The environment
parsing for max_retries and retry_interval can raise ValueError and crash the
container; update the code that sets max_retries and retry_interval in
entrypoint.py to robustly parse the env vars by wrapping the int(...)
conversions in try/except blocks (or a small helper like parse_int_env) that
catches ValueError, logs a warning, and falls back to the intended defaults ("0"
for HINDSIGHT_RETRY_MAX and "10" for HINDSIGHT_RETRY_INTERVAL); ensure the
handler preserves the semantic that 0 means infinite and that any non-numeric
value uses the fallback rather than raising.

@franchb franchb merged commit ce79731 into main Feb 24, 2026
6 checks passed
@franchb franchb deleted the chore/chainguard-docker-images branch February 24, 2026 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant