Skip to content

fix(registry): verify registry artifact content hashes#2763

Open
daryllimyt wants to merge 4 commits into
build/imrpove-syncfrom
build/registry-artifact-content-hash
Open

fix(registry): verify registry artifact content hashes#2763
daryllimyt wants to merge 4 commits into
build/imrpove-syncfrom
build/registry-artifact-content-hash

Conversation

@daryllimyt
Copy link
Copy Markdown
Contributor

@daryllimyt daryllimyt commented May 23, 2026

Summary

  • add registry artifact content hashes to version records and prebuilt artifact metadata
  • include #sha256=... artifact references so executors verify downloaded SquashFS artifacts
  • make SquashFS output deterministic by sorting staged files, fixing filesystem timestamps, normalizing metadata, and using portable gzip -6
  • default SquashFS build processors to detected CPU count while preserving the env override

Stack

Stacked on #2753, which keeps the prebuilt manifest startup-sync performance changes and manifest-fingerprint fallback.

Testing

  • uv run pytest tests/unit/executor/test_registry_helpers.py tests/unit/test_registry_artifacts.py tests/unit/test_registry_sync_base_service.py tests/unit/test_registry_lock_service.py tests/unit/test_registry_platform_startup.py tests/unit/test_registry_sync_runner.py tests/unit/test_registry_sync_schemas.py
  • uv run pytest tests/unit/test_registry_sync_artifact.py
  • uv run ruff check .
  • uv run ruff format --check .
  • uv run pyright tracecat/config.py tracecat/registry/sync/artifact.py tests/unit/test_registry_sync_artifact.py
  • uv run alembic heads
  • pnpm -C frontend exec biome check src/client/schemas.gen.ts src/client/types.gen.ts

Copy link
Copy Markdown
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@daryllimyt daryllimyt added engine Improvements or additions to the workflow engine build Build system and package dependency changes fix Bug fix labels May 23, 2026
@zeropath-ai
Copy link
Copy Markdown

zeropath-ai Bot commented May 23, 2026

No security or compliance issues detected. Reviewed everything up to bc4c4c0.

Security Overview
Detected Code Changes
Change Type Relevant files
Enhancement ► alembic/versions/b4f8c1d2e3a4_add_registry_version_artifact_hash.py
    Add registry version artifact hash
► frontend/src/client/schemas.gen.ts
    Add artifact_hash field to schema definitions
► frontend/src/client/types.gen.ts
    Add artifact_hash to RegistryLock and RegistryVersionRead types
► tests/unit/executor/test_registry_helpers.py
    Add tests for artifact hash matching in registry artifact URI retrieval
► tests/unit/test_registry_artifacts.py
    Add tests for registry artifact cache key computation with hash fragments
    Add test for download verification using expected hash
► tests/unit/test_registry_lock_service.py
    Add test for preferring artifact hash in lock resolution
► tests/unit/test_registry_platform_startup.py
    Preserve existing artifact hash when reusing objects during promotion
► tests/unit/test_registry_sync_artifact.py
    Add tests for deterministic staging of SquashFS entries
► tests/unit/test_registry_sync_base_service.py
    Return artifact hash in build results
    Include artifact hash in sync results for platform built-ins
► tests/unit/test_registry_sync_runner.py
    Include artifact hash in runner results
► tracecat/admin/registry/schemas.py
    Add artifact_hash to RegistryVersionRead schema
► tracecat/admin/registry/service.py
    Include artifact_hash when listing registry versions
► tracecat/db/models.py
    Add artifact_hash column to BaseRegistryVersion model
► tracecat/executor/backends/registry_helpers.py
    Include origin_fingerprints in get_registry_artifacts_for_lock call
    Update bundled builtin matching logic to consider artifact hashes
► tracecat/executor/registry_artifacts.py
    Add expected_hash to RegistryArtifact dataclass
    Pass expected_sha256 to S3 download function
    Implement registry_artifact_ref and split_registry_artifact_ref functions
    Pass expected_hash to materialize method
► tracecat/executor/service.py
    Add artifact_hash to RegistryArtifactsContext
    Include fingerprint in artifact cache key
    Fetch artifact_hash from database and include in RegistryArtifactsContext
► tracecat/registry/lock/service.py
    Include artifact_hash in platform and custom registry lookups
    Prefer artifact_hash over manifest fingerprint for origin fingerprints in lock resolution
► tracecat/registry/lock/types.py
    Update description for origin_fingerprints in RegistryLock
► tracecat/registry/repositories/schemas.py
    Add artifact_hash to RegistryVersionRead schema
► tracecat/registry/sync/artifact.py
    Add reproducible SquashFS timestamp constant
    Stage entries in deterministic order for reproducible SquashFS
    Add SquashFS timestamp and root mode to mksquashfs arguments
► tracecat/registry/sync/base_service.py
    Add artifact_hash to VersionProtocol
    Return artifact_hash in ArtifactsBuildResult
    Include artifact_hash in BaseSyncResult
    Load prebuilt artifact metadata for builtin registries
► tracecat/registry/sync/runner.py
    Add write_prebuilt_registry_artifact_metadata function
    Include artifact_hash in runner results
► tracecat/config.py
    Set default for TRACECAT__REGISTRY_SYNC_SQUASHFS_PROCESSORS based on CPU count
► tracecat/executor/backends/registry_helpers.py
    Use artifact_hash when loading prebuilt builtin registry metadata
Configuration changes ► tracecat/config.py
    Update default for TRACECAT__REGISTRY_SYNC_SQUASHFS_PROCESSORS

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

candidates.append(
TarballArtifact(
uri=tarball_uri,
cache_key=ctx.cache_key,
)

P1 Badge Enforce hash verification on SquashFS fallback tarball

When a lock carries an expected artifact hash, integrity failures on the primary .squashfs candidate are currently bypassable because the fallback TarballArtifact is created without any expected hash. In materialize(), candidate failures are caught and the next candidate is tried, so a hash mismatch (or other download integrity error) on the squashfs can still lead to executing an unverified tarball sibling if it exists. This weakens the new content-hash verification guarantee for artifact resolution.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 29 files

Confidence score: 3/5

  • There is concrete integrity risk in tracecat/executor/registry_artifacts.py: in the TAR_GZ flow, expected_hash is not applied to TarballArtifact, and TarballArtifact.download does not enforce hash verification, so tampered artifacts could be accepted.
  • tracecat/registry/versions/schemas.py currently accepts any artifact_hash string despite SHA-256 documentation, which can let invalid data in and weaken downstream validation guarantees.
  • Given the medium-high severity and confidence on artifact verification behavior, this looks mergeable only with caution rather than low-risk safe-to-merge.
  • Pay close attention to tracecat/executor/registry_artifacts.py, tracecat/registry/versions/schemas.py - missing hash enforcement and weak input validation are the main regression/integrity concerns.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/registry/versions/schemas.py">

<violation number="1" location="tracecat/registry/versions/schemas.py:164">
P2: Validate `artifact_hash` format in the create schema; currently any string is accepted even though it is documented as SHA-256.</violation>
</file>

<file name="tracecat/executor/registry_artifacts.py">

<violation number="1" location="tracecat/executor/registry_artifacts.py:703">
P1: Tarball artifacts don't get hash verification. The `expected_hash` is propagated to `SquashfsArtifact` but not to the primary `TarballArtifact` in the TAR_GZ branch. Additionally, `TarballArtifact.download` never passes `self.expected_hash` to `_download_s3_artifact`, so even if it were set, the hash wouldn't be checked. This leaves the tarball download path (the fallback format) without integrity verification.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

SquashfsArtifact(
uri=artifact_uri,
cache_key=ctx.cache_key,
expected_hash=expected_hash,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Tarball artifacts don't get hash verification. The expected_hash is propagated to SquashfsArtifact but not to the primary TarballArtifact in the TAR_GZ branch. Additionally, TarballArtifact.download never passes self.expected_hash to _download_s3_artifact, so even if it were set, the hash wouldn't be checked. This leaves the tarball download path (the fallback format) without integrity verification.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tracecat/executor/registry_artifacts.py, line 703:

<comment>Tarball artifacts don't get hash verification. The `expected_hash` is propagated to `SquashfsArtifact` but not to the primary `TarballArtifact` in the TAR_GZ branch. Additionally, `TarballArtifact.download` never passes `self.expected_hash` to `_download_s3_artifact`, so even if it were set, the hash wouldn't be checked. This leaves the tarball download path (the fallback format) without integrity verification.</comment>

<file context>
@@ -658,6 +700,7 @@ async def _artifact_candidates(
                 SquashfsArtifact(
                     uri=artifact_uri,
                     cache_key=ctx.cache_key,
+                    expected_hash=expected_hash,
                 )
             ]
</file context>

Comment on lines +164 to +167
artifact_hash: str | None = Field(
default=None,
description="SHA-256 content hash of the execution artifact",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Validate artifact_hash format in the create schema; currently any string is accepted even though it is documented as SHA-256.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tracecat/registry/versions/schemas.py, line 164:

<comment>Validate `artifact_hash` format in the create schema; currently any string is accepted even though it is documented as SHA-256.</comment>

<file context>
@@ -161,6 +161,10 @@ class RegistryVersionCreate(BaseModel):
         ...,
         description="S3 URI to the execution artifact for this registry version",
     )
+    artifact_hash: str | None = Field(
+        default=None,
+        description="SHA-256 content hash of the execution artifact",
</file context>
Suggested change
artifact_hash: str | None = Field(
default=None,
description="SHA-256 content hash of the execution artifact",
)
artifact_hash: str | None = Field(
default=None,
min_length=64,
max_length=64,
pattern=r"^[a-fA-F0-9]{64}$",
description="SHA-256 content hash of the execution artifact",
)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc4c4c0d4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

expected_hash=expected_hash,
)
]
if tarball_uri := _tarball_uri_for_squashfs(artifact_uri):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block unverified tarball fallback after hash-locked squashfs

When a lock carries an artifact_hash, only the SquashfsArtifact candidate gets that expected hash, but a sibling .tar.gz candidate is still queued as fallback. If the squashfs download fails integrity validation (or any other squashfs error) and a tarball object exists, execution can proceed from the tarball without any hash check, which defeats the new content-hash verification guarantee for that origin. This affects environments where legacy tarball siblings are present alongside squashfs artifacts.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Build system and package dependency changes engine Improvements or additions to the workflow engine fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant