fix(registry): verify registry artifact content hashes#2763
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
✅ No security or compliance issues detected. Reviewed everything up to bc4c4c0. Security Overview
Detected Code Changes
|
There was a problem hiding this comment.
💡 Codex Review
tracecat/tracecat/executor/registry_artifacts.py
Lines 707 to 711 in 0f8c51f
When a lock carries an expected artifact hash, integrity failures on the primary .squashfs candidate are currently bypassable because the fallback TarballArtifact is created without any expected hash. In materialize(), candidate failures are caught and the next candidate is tried, so a hash mismatch (or other download integrity error) on the squashfs can still lead to executing an unverified tarball sibling if it exists. This weakens the new content-hash verification guarantee for artifact resolution.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
2 issues found across 29 files
Confidence score: 3/5
- There is concrete integrity risk in
tracecat/executor/registry_artifacts.py: in the TAR_GZ flow,expected_hashis not applied toTarballArtifact, andTarballArtifact.downloaddoes not enforce hash verification, so tampered artifacts could be accepted. tracecat/registry/versions/schemas.pycurrently accepts anyartifact_hashstring despite SHA-256 documentation, which can let invalid data in and weaken downstream validation guarantees.- Given the medium-high severity and confidence on artifact verification behavior, this looks mergeable only with caution rather than low-risk safe-to-merge.
- Pay close attention to
tracecat/executor/registry_artifacts.py,tracecat/registry/versions/schemas.py- missing hash enforcement and weak input validation are the main regression/integrity concerns.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tracecat/registry/versions/schemas.py">
<violation number="1" location="tracecat/registry/versions/schemas.py:164">
P2: Validate `artifact_hash` format in the create schema; currently any string is accepted even though it is documented as SHA-256.</violation>
</file>
<file name="tracecat/executor/registry_artifacts.py">
<violation number="1" location="tracecat/executor/registry_artifacts.py:703">
P1: Tarball artifacts don't get hash verification. The `expected_hash` is propagated to `SquashfsArtifact` but not to the primary `TarballArtifact` in the TAR_GZ branch. Additionally, `TarballArtifact.download` never passes `self.expected_hash` to `_download_s3_artifact`, so even if it were set, the hash wouldn't be checked. This leaves the tarball download path (the fallback format) without integrity verification.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| SquashfsArtifact( | ||
| uri=artifact_uri, | ||
| cache_key=ctx.cache_key, | ||
| expected_hash=expected_hash, |
There was a problem hiding this comment.
P1: Tarball artifacts don't get hash verification. The expected_hash is propagated to SquashfsArtifact but not to the primary TarballArtifact in the TAR_GZ branch. Additionally, TarballArtifact.download never passes self.expected_hash to _download_s3_artifact, so even if it were set, the hash wouldn't be checked. This leaves the tarball download path (the fallback format) without integrity verification.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tracecat/executor/registry_artifacts.py, line 703:
<comment>Tarball artifacts don't get hash verification. The `expected_hash` is propagated to `SquashfsArtifact` but not to the primary `TarballArtifact` in the TAR_GZ branch. Additionally, `TarballArtifact.download` never passes `self.expected_hash` to `_download_s3_artifact`, so even if it were set, the hash wouldn't be checked. This leaves the tarball download path (the fallback format) without integrity verification.</comment>
<file context>
@@ -658,6 +700,7 @@ async def _artifact_candidates(
SquashfsArtifact(
uri=artifact_uri,
cache_key=ctx.cache_key,
+ expected_hash=expected_hash,
)
]
</file context>
| artifact_hash: str | None = Field( | ||
| default=None, | ||
| description="SHA-256 content hash of the execution artifact", | ||
| ) |
There was a problem hiding this comment.
P2: Validate artifact_hash format in the create schema; currently any string is accepted even though it is documented as SHA-256.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tracecat/registry/versions/schemas.py, line 164:
<comment>Validate `artifact_hash` format in the create schema; currently any string is accepted even though it is documented as SHA-256.</comment>
<file context>
@@ -161,6 +161,10 @@ class RegistryVersionCreate(BaseModel):
...,
description="S3 URI to the execution artifact for this registry version",
)
+ artifact_hash: str | None = Field(
+ default=None,
+ description="SHA-256 content hash of the execution artifact",
</file context>
| artifact_hash: str | None = Field( | |
| default=None, | |
| description="SHA-256 content hash of the execution artifact", | |
| ) | |
| artifact_hash: str | None = Field( | |
| default=None, | |
| min_length=64, | |
| max_length=64, | |
| pattern=r"^[a-fA-F0-9]{64}$", | |
| description="SHA-256 content hash of the execution artifact", | |
| ) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc4c4c0d4b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| expected_hash=expected_hash, | ||
| ) | ||
| ] | ||
| if tarball_uri := _tarball_uri_for_squashfs(artifact_uri): |
There was a problem hiding this comment.
Block unverified tarball fallback after hash-locked squashfs
When a lock carries an artifact_hash, only the SquashfsArtifact candidate gets that expected hash, but a sibling .tar.gz candidate is still queued as fallback. If the squashfs download fails integrity validation (or any other squashfs error) and a tarball object exists, execution can proceed from the tarball without any hash check, which defeats the new content-hash verification guarantee for that origin. This affects environments where legacy tarball siblings are present alongside squashfs artifacts.
Useful? React with 👍 / 👎.

Summary
#sha256=...artifact references so executors verify downloaded SquashFS artifactsgzip -6Stack
Stacked on #2753, which keeps the prebuilt manifest startup-sync performance changes and manifest-fingerprint fallback.
Testing
uv run pytest tests/unit/executor/test_registry_helpers.py tests/unit/test_registry_artifacts.py tests/unit/test_registry_sync_base_service.py tests/unit/test_registry_lock_service.py tests/unit/test_registry_platform_startup.py tests/unit/test_registry_sync_runner.py tests/unit/test_registry_sync_schemas.pyuv run pytest tests/unit/test_registry_sync_artifact.pyuv run ruff check .uv run ruff format --check .uv run pyright tracecat/config.py tracecat/registry/sync/artifact.py tests/unit/test_registry_sync_artifact.pyuv run alembic headspnpm -C frontend exec biome check src/client/schemas.gen.ts src/client/types.gen.ts