Skip to content

Add new rego function for blob files#3125

Open
simonbaird wants to merge 7 commits intoconforma:mainfrom
simonbaird:blob-files-rego-func
Open

Add new rego function for blob files#3125
simonbaird wants to merge 7 commits intoconforma:mainfrom
simonbaird:blob-files-rego-func

Conversation

@simonbaird
Copy link
Member

@simonbaird simonbaird commented Feb 25, 2026

The goal is to be able to access the task definitions inside a Tekton task bundle, so we can then apply the task related policy checks to them.

Ref: https://issues.redhat.com/browse/EC-1681

@simonbaird
Copy link
Member Author

The Konflux failure seems unrelated, I can see this:

Error retrieving pipeline for pipelinerun rhtap-contract-tenant/ec-main-enterprise-contract-mglg2: resolver failed to get Pipeline : error requesting remote resource: timed out waiting for the condition

@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 80.58824% with 33 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/rego/oci/oci.go 80.58% 33 Missing ⚠️
Flag Coverage Δ
acceptance 55.03% <21.76%> (-0.55%) ⬇️
generative 18.24% <0.00%> (-0.26%) ⬇️
integration 27.13% <0.00%> (-0.37%) ⬇️
unit 68.58% <80.58%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/rego/oci/oci.go 89.63% <80.58%> (-2.37%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@simonbaird simonbaird force-pushed the blob-files-rego-func branch 2 times, most recently from 048bb45 to a231832 Compare February 26, 2026 14:18
@simonbaird simonbaird marked this pull request as ready for review February 26, 2026 15:39
@qodo-code-review
Copy link
Contributor

Review Summary by Qodo

Add ec.oci.blob_files rego function for blob tar archive extraction

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add new ec.oci.blob_files rego function to extract structured files from OCI blob tar archives
• Support YAML and JSON file parsing within blob archives for policy checks
• Refactor blob digest verification to be optional for compressed layer handling
• Add comprehensive unit tests and documentation for new function
Diagram
flowchart LR
  A["OCI Blob Reference"] -->|ec.oci.blob_files| B["Tar Archive Reader"]
  B -->|Extract Files| C["YAML/JSON Parser"]
  C -->|Convert to JSON| D["Structured File Object"]
  E["File Paths Array"] -->|Filter| B
  D -->|Cache Result| F["Return to Rego Policy"]
Loading

Grey Divider

File Changes

1. internal/rego/oci/oci.go ✨ Enhancement +242/-11

Implement blob tar archive file extraction rego function

• Add registerOCIBlobFiles() function to register new rego builtin with proper type declarations
• Implement ociBlobFiles() function to extract and parse YAML/JSON files from tar archives within
 blobs
• Refactor ociBlob() into ociBlobInternal() with optional digest verification parameter
• Add tar archive, path, and strings imports for file extraction functionality
• Add caching and singleflight mechanisms to prevent redundant blob processing

internal/rego/oci/oci.go


2. internal/rego/oci/oci_test.go 🧪 Tests +215/-0

Add unit tests for blob files extraction function

• Add comprehensive TestOCIBlobFiles() test suite with 8 test cases covering success, error, and
 edge cases
• Test YAML/JSON file extraction, multiple files, missing files, and invalid input handling
• Test malformed JSON/YAML file handling and cache behavior
• Add helper function to create tar archives with computed SHA256 digests for testing

internal/rego/oci/oci_test.go


3. docs/modules/ROOT/pages/ec_oci_blob_files.adoc 📝 Documentation +21/-0

Add documentation for blob files rego function

• Create new documentation page for ec.oci.blob_files function
• Document function signature, parameters, and return type
• Explain usage for extracting structured files from blob tar archives

docs/modules/ROOT/pages/ec_oci_blob_files.adoc


View more (5)
4. docs/modules/ROOT/pages/rego_builtins.adoc 📝 Documentation +2/-0

Update rego builtins reference documentation

• Add reference to new ec.oci.blob_files function in builtin functions table
• Include brief description of function purpose

docs/modules/ROOT/pages/rego_builtins.adoc


5. docs/modules/ROOT/partials/rego_nav.adoc 📝 Documentation +1/-0

Update documentation navigation sidebar

• Add navigation entry for ec.oci.blob_files documentation page

docs/modules/ROOT/partials/rego_nav.adoc


6. go.mod Dependencies +0/-1

Remove unused dependency

• Remove unused github.com/rivo/uniseg dependency

go.mod


7. go.sum Dependencies +6/-40

Update Go module dependencies to latest versions

• Update multiple Google Cloud and related dependencies to newer versions
• Update cloud.google.com/go/auth, longrunning, logging, trace, and monitoring packages
• Update googleapis/gax-go, enterprise-certificate-proxy, and genproto packages
• Update google.golang.org/api and grpc packages
• Update sigstore/rekor and related security packages
• Update various utility packages like go-chi, go-openapi, mattn/go-runewidth, and olekukonko
 packages

go.sum


8. acceptance/go.sum Additional files +10/-24

...

acceptance/go.sum


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Feb 26, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (3) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Digest verification disabled 📘 Rule violation ⛨ Security
Description
ociBlobFiles() fetches blob content while explicitly disabling digest verification, reducing
integrity validation of external OCI content. This can allow tampered or unexpected blob data to be
processed.
Code

internal/rego/oci/oci.go[R1042-1044]

+		// Get the blob content first (skip digest verification due to compressed/uncompressed mismatch)
+		blobTerm, err := ociBlobInternal(bctx, refTerm, false)
+		if err != nil || blobTerm == nil {
Evidence
The input blob is fetched from an external OCI registry, but the new code path calls
ociBlobInternal(..., false) and documents skipping digest verification, which is a missing
validation step for external input.

Rule 6: Generic: Security-First Input Validation and Data Handling
internal/rego/oci/oci.go[1042-1044]
internal/rego/oci/oci.go[499-507]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` disables digest verification by calling `ociBlobInternal(..., false)`, which reduces validation of external OCI inputs.

## Issue Context
The blob content comes from an external OCI registry. Digest verification is an important integrity check to ensure the fetched content matches the expected reference digest.

## Fix Focus Areas
- internal/rego/oci/oci.go[499-517]
- internal/rego/oci/oci.go[1042-1044]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Unbounded tar entry read 📘 Rule violation ⛨ Security
Description
The new tar extraction path reads matched entries using io.ReadAll(archive) without validating
header.Size or enforcing a maximum size. A malicious/large blob can cause excessive memory usage
or denial of service.
Code

internal/rego/oci/oci.go[R1137-1146]

+			// Read the file content
+			data, err := io.ReadAll(archive)
+			if err != nil {
+				logger.WithFields(log.Fields{
+					"action": "read file content",
+					"file":   header.Name,
+					"error":  err,
+				}).Error("failed to read file content")
+				return nil, nil //nolint:nilerr
+			}
Evidence
The code processes untrusted tar content from an OCI blob and performs an unbounded read of each
selected file; the compliance checklist requires validation/handling of boundary values for external
inputs to avoid security and reliability issues.

Rule 6: Generic: Security-First Input Validation and Data Handling
Rule 3: Generic: Robust Error Handling and Edge Case Management
internal/rego/oci/oci.go[1137-1146]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` reads tar entry contents with `io.ReadAll` without any size checks/limits, allowing untrusted OCI blobs to cause large memory allocations.

## Issue Context
OCI blobs are external input. Tar headers provide a `Size`, and the implementation should handle boundary values and enforce safe limits.

## Fix Focus Areas
- internal/rego/oci/oci.go[1102-1160]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Cross-builtin cache collision 🐞 Bug ✓ Correctness
Description
ociBlobFiles reuses ComponentCache.imageFilesCache/imageFilesFlight with the same cacheKey
scheme as ociImageFiles (ref + hash(paths)). If both builtins are called with the same ref+paths,
one can return the other's cached result even though their extraction semantics differ.
Code

internal/rego/oci/oci.go[R1019-1035]

+	// Build cache key from ref + paths (hash the paths for a stable key)
+	pathsHash := fmt.Sprintf("%x", sha256.Sum256([]byte(pathsTerm.String())))[:12]
+	cacheKey := refStr + ":" + pathsHash
+
+	// Use component-scoped cache if available, otherwise fall back to global.
+	// Blob files data can be substantial and is unique per component.
+	cc := componentCacheFromContext(bctx.Context)
+
+	// Check cache first (fast path)
+	if cached, found := cc.imageFilesCache.Load(cacheKey); found {
+		logger.Debug("Blob files served from cache")
+		return cached.(*ast.Term), nil
+	}
+
+	// Use singleflight to prevent thundering herd
+	result, err, _ := cc.imageFilesFlight.Do(cacheKey, func() (any, error) {
+		// Double-check cache inside singleflight
Evidence
Both builtins compute cacheKey := refStr + ":" + pathsHash and read/write cc.imageFilesCache
under that key. However, ociImageFiles delegates to files.ImageFiles (directory-based matching
and supported extensions), while ociBlobFiles does exact-path matching on a tar stream and
attempts to parse even unsupported extensions. If keys collide, the cached value can be semantically
wrong for the caller.

internal/rego/oci/oci.go[899-935]
internal/rego/oci/oci.go[1019-1035]
internal/rego/oci/oci.go[956-980]
internal/rego/oci/oci.go[1092-1119]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ociImageFiles` and `ociBlobFiles` share the same cache map (`cc.imageFilesCache`) and compute cache keys the same way (`refStr + &#x27;:&#x27; + hash(paths)`), but they have different matching/extraction semantics. A cache hit can return the wrong builtin&#x27;s result.

### Issue Context
Both builtins store `*ast.Term` values, so this may not panic immediately; it can silently return incorrect file objects.

### Fix Focus Areas
- internal/rego/oci/oci.go[899-1001]
- internal/rego/oci/oci.go[1003-1181]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. imageFilesCache used for blobs 📘 Rule violation ✓ Correctness
Description
ociBlobFiles() stores blob-file extraction results in cc.imageFilesCache/cc.imageFilesFlight,
which is misleading and reduces self-documentation. This increases maintenance risk and can cause
incorrect future reuse of cache structures.
Code

internal/rego/oci/oci.go[R1027-1035]

+	// Check cache first (fast path)
+	if cached, found := cc.imageFilesCache.Load(cacheKey); found {
+		logger.Debug("Blob files served from cache")
+		return cached.(*ast.Term), nil
+	}
+
+	// Use singleflight to prevent thundering herd
+	result, err, _ := cc.imageFilesFlight.Do(cacheKey, func() (any, error) {
+		// Double-check cache inside singleflight
Evidence
The compliance checklist requires meaningful, self-documenting identifiers; reusing cache/flight
names that explicitly refer to image files for blob-file extraction is misleading in the newly added
code.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
internal/rego/oci/oci.go[1027-1035]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` uses `imageFilesCache` and `imageFilesFlight` for caching blob-file extraction results, which is misleading naming.

## Issue Context
The code now supports multiple OCI rego builtins (image files vs blob files). Cache names should clearly reflect what they store to reduce confusion and future bugs.

## Fix Focus Areas
- internal/rego/oci/oci.go[1019-1035]
- internal/rego/oci/oci.go[1085-1090]
- internal/rego/oci/oci.go[1171-1174]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Tar path exact match 🐞 Bug ✓ Correctness
Description
ociBlobFiles matches requested paths against tar header.Name verbatim. If the archive stores
names like ./task.yaml (common) or otherwise differs in normalization, requested files won’t be
found and the function will return an empty object.
Code

internal/rego/oci/oci.go[R1116-1118]

+			// Check if this file matches any of our target paths
+			if !targetPathSet[header.Name] {
+				continue
Evidence
The implementation performs an exact string lookup on header.Name and does not normalize either
requested paths or archive paths. In contrast, existing image extraction code normalizes paths using
path.Clean(...) before matching.

internal/rego/oci/oci.go[1067-1075]
internal/rego/oci/oci.go[1116-1119]
internal/fetchers/oci/files/files.go[123-128]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ociBlobFiles` performs exact matching between requested paths and `tar.Header.Name` without normalization. This can cause unexpected misses when archives contain `./` prefixes or other equivalent path forms.

### Issue Context
The existing image file extractor uses `path.Clean(...)` for matching, which is more tolerant.

### Fix Focus Areas
- internal/rego/oci/oci.go[1067-1120]
- internal/fetchers/oci/files/files.go[118-139]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. No early tar exit 🐞 Bug ➹ Performance
Description
ociBlobFiles scans the entire tar stream even after all requested files are already extracted. For
large bundles, this does unnecessary work and delays evaluation.
Code

internal/rego/oci/oci.go[R1102-1120]

+		extractedFiles := map[string]json.RawMessage{}
+		for {
+			header, err := archive.Next()
+			if err != nil {
+				if err == io.EOF {
+					break
+				}
+				logger.WithFields(log.Fields{
+					"action": "read tar header",
+					"error":  err,
+				}).Error("failed to read tar archive")
+				return nil, nil //nolint:nilerr
+			}
+
+			// Check if this file matches any of our target paths
+			if !targetPathSet[header.Name] {
+				continue
+			}
+
Evidence
The tar loop runs until EOF and never removes found targets or checks whether all requested files
have been satisfied. This is avoidable because paths is known up front and a set is already
maintained.

internal/rego/oci/oci.go[1096-1101]
internal/rego/oci/oci.go[1102-1108]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The tar scan in `ociBlobFiles` continues until EOF even when all requested files have already been found.

### Issue Context
A target set already exists; it can be updated as files are found to support early exit.

### Fix Focus Areas
- internal/rego/oci/oci.go[1096-1161]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@simonbaird
Copy link
Member Author

I'm burning some tokens to get above the CodeCov threshold. New revision coming soon.

@simonbaird simonbaird marked this pull request as draft February 26, 2026 20:31
@simonbaird simonbaird marked this pull request as ready for review February 26, 2026 22:00
@qodo-code-review
Copy link
Contributor

Review Summary by Qodo

Add ec.oci.blob_files rego function with defensive limits and comprehensive tests

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add new ec.oci.blob_files rego function to extract structured files from blob tar archives
• Implement defensive size limits (500MB) to prevent memory exhaustion attacks
• Refactor blob digest verification logic to support both compressed and uncompressed layers
• Distinguish cache keys between blob and image file operations to avoid theoretical clashes
• Add comprehensive unit tests covering success cases, error handling, and edge cases
• Add make target for automated Go code formatting with gofmt -w
Diagram
flowchart LR
  A["OCI Blob Reference"] -->|ociBlobFiles| B["Tar Archive Reader"]
  B -->|Extract Files| C["Path Matching"]
  C -->|Size Validation| D["YAML/JSON Parsing"]
  D -->|Cache| E["Structured Files Object"]
  F["Memory Limit 500MB"] -.->|Protection| D
Loading

Grey Divider

File Changes

1. internal/rego/oci/oci.go ✨ Enhancement +270/-12

Add blob_files function with size limits and refactored digest verification

internal/rego/oci/oci.go


2. internal/rego/oci/oci_test.go 🧪 Tests +279/-0

Add comprehensive tests for ociBlobFiles function with edge cases

internal/rego/oci/oci_test.go


3. Makefile ✨ Enhancement +3/-0

Add fmt-all target for automated Go code formatting

Makefile


View more (6)
4. docs/modules/ROOT/pages/ec_oci_blob_files.adoc 📝 Documentation +21/-0

Add documentation for new ec.oci.blob_files rego function

docs/modules/ROOT/pages/ec_oci_blob_files.adoc


5. docs/modules/ROOT/pages/rego_builtins.adoc 📝 Documentation +2/-0

Update rego builtins reference with blob_files function entry

docs/modules/ROOT/pages/rego_builtins.adoc


6. docs/modules/ROOT/partials/rego_nav.adoc 📝 Documentation +1/-0

Add navigation link for blob_files documentation page

docs/modules/ROOT/partials/rego_nav.adoc


7. go.mod Dependencies +0/-1

Remove unused rivo/uniseg dependency from module

go.mod


8. go.sum Dependencies +6/-40

Update dependency checksums and versions for multiple packages

go.sum


9. acceptance/go.sum Dependencies +10/-24

Update acceptance test dependency checksums and versions

acceptance/go.sum


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Feb 26, 2026

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (5) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Unverified blob poisons cache 🐞 Bug ⛨ Security ⭐ New
Description
ociBlobFiles fetches blob content via ociBlobInternal(..., verifyDigest=false), but
ociBlobInternal still stores the blob in cc.blobCache keyed only by refStr. Subsequent calls
to ec.oci.blob for the same ref will be served from cache and skip digest verification, making
results depend on evaluation order.
Code

internal/rego/oci/oci.go[R1048-1050]

+		// Get the blob content first (skip digest verification due to compressed/uncompressed mismatch)
+		blobTerm, err := ociBlobInternal(bctx, refTerm, false)
+		if err != nil || blobTerm == nil {
Evidence
ociBlobFiles explicitly disables digest verification when calling ociBlobInternal.
ociBlobInternal checks cc.blobCache before verifying the digest and stores the blob into that
same cache regardless of verifyDigest. This means an unverified fetch can satisfy later verified
fetches from cache with no verification step.

internal/rego/oci/oci.go[1048-1050]
internal/rego/oci/oci.go[440-444]
internal/rego/oci/oci.go[512-523]
internal/rego/oci/oci.go[530-532]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ociBlobFiles` calls `ociBlobInternal(..., verifyDigest=false)` which still stores the blob into the shared `cc.blobCache` keyed only by `refStr`. If `ec.oci.blob` is later called for the same `refStr`, it can return the cached value without performing digest verification, making the result depend on the order of builtin evaluation.

### Issue Context
`cc.blobCache` is checked before any digest verification occurs, and `cc.blobCache.Store(refStr, term)` happens regardless of the `verifyDigest` flag.

### Fix Focus Areas
- internal/rego/oci/oci.go[440-444]
- internal/rego/oci/oci.go[512-532]
- internal/rego/oci/oci.go[1048-1050]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Buffers entire layer in memory 🐞 Bug ⛯ Reliability ⭐ New
Description
ociBlobFiles fetches the whole uncompressed blob into memory (via ociBlobInternal), then
constructs a tar.Reader over the in-memory string. This defeats the per-entry size guard
(maxTarEntrySize) because the process can OOM while buffering/decompressing the full layer before
any tar-entry checks run.
Code

internal/rego/oci/oci.go[R1048-1101]

+		// Get the blob content first (skip digest verification due to compressed/uncompressed mismatch)
+		blobTerm, err := ociBlobInternal(bctx, refTerm, false)
+		if err != nil || blobTerm == nil {
+			logger.WithFields(log.Fields{
+				"action": "fetch blob",
+				"error":  err,
+			}).Error("failed to fetch blob content")
+			return nil, nil //nolint:nilerr
+		}
+
+		blobContent, ok := blobTerm.Value.(ast.String)
+		if !ok {
+			logger.Error("blob content is not a string")
+			return nil, nil
+		}
+
+		pathsArray, err := builtins.ArrayOperand(pathsTerm.Value, 1)
+		if err != nil {
+			logger.WithFields(log.Fields{
+				"action": "convert paths",
+				"error":  err,
+			}).Error("failed to convert paths to array operand")
+			return nil, nil //nolint:nilerr
+		}
+
+		// Collect target paths for exact file matching
+		var targetPaths []string
+		err = pathsArray.Iter(func(pathTerm *ast.Term) error {
+			pathString, ok := pathTerm.Value.(ast.String)
+			if !ok {
+				return fmt.Errorf("path is not a string: %#v", pathTerm)
+			}
+			targetPaths = append(targetPaths, string(pathString))
+			return nil
+		})
+		if err != nil {
+			logger.WithFields(log.Fields{
+				"action": "iterate paths",
+				"error":  err,
+			}).Error("failed iterating paths")
+			return nil, nil //nolint:nilerr
+		}
+
+		if len(targetPaths) == 0 {
+			logger.Debug("No paths specified, returning empty result")
+			term := ast.NewTerm(ast.NewObject())
+			cc.imageFilesCache.Store(cacheKey, term)
+			return term, nil
+		}
+
+		// Create a tar reader from the blob content
+		blobReader := strings.NewReader(string(blobContent))
+		archive := tar.NewReader(blobReader)
+
Evidence
ociBlobInternal uncompresses the layer and copies it fully into a bytes.Buffer, then returns it
as an ast.String. ociBlobFiles then wraps that full string with strings.NewReader to parse the
tar. The entry-size check only happens later while iterating tar headers, so it cannot protect
against large overall blobs.

internal/rego/oci/oci.go[473-491]
internal/rego/oci/oci.go[1098-1101]
internal/rego/oci/oci.go[1143-1150]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ociBlobFiles` currently loads the entire uncompressed layer into memory (via `ociBlobInternal`) before parsing it as a tar archive. This can OOM on large layers even if individual file entries are limited.

### Issue Context
The per-entry `maxTarEntrySize` check happens after the entire blob has already been copied into a `bytes.Buffer`.

### Fix Focus Areas
- internal/rego/oci/oci.go[473-532]
- internal/rego/oci/oci.go[1048-1101]
- internal/rego/oci/oci.go[1143-1163]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Digest verification disabled 📘 Rule violation ⛨ Security
Description
ociBlobFiles() fetches blob content while explicitly disabling digest verification, reducing
integrity validation of external OCI content. This can allow tampered or unexpected blob data to be
processed.
Code

internal/rego/oci/oci.go[R1042-1044]

+		// Get the blob content first (skip digest verification due to compressed/uncompressed mismatch)
+		blobTerm, err := ociBlobInternal(bctx, refTerm, false)
+		if err != nil || blobTerm == nil {
Evidence
The input blob is fetched from an external OCI registry, but the new code path calls
ociBlobInternal(..., false) and documents skipping digest verification, which is a missing
validation step for external input.

Rule 6: Generic: Security-First Input Validation and Data Handling
internal/rego/oci/oci.go[1042-1044]
internal/rego/oci/oci.go[499-507]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` disables digest verification by calling `ociBlobInternal(..., false)`, which reduces validation of external OCI inputs.
## Issue Context
The blob content comes from an external OCI registry. Digest verification is an important integrity check to ensure the fetched content matches the expected reference digest.
## Fix Focus Areas
- internal/rego/oci/oci.go[499-517]
- internal/rego/oci/oci.go[1042-1044]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
4. Unbounded tar entry read 📘 Rule violation ⛨ Security
Description
The new tar extraction path reads matched entries using io.ReadAll(archive) without validating
header.Size or enforcing a maximum size. A malicious/large blob can cause excessive memory usage
or denial of service.
Code

internal/rego/oci/oci.go[R1137-1146]

+			// Read the file content
+			data, err := io.ReadAll(archive)
+			if err != nil {
+				logger.WithFields(log.Fields{
+					"action": "read file content",
+					"file":   header.Name,
+					"error":  err,
+				}).Error("failed to read file content")
+				return nil, nil //nolint:nilerr
+			}
Evidence
The code processes untrusted tar content from an OCI blob and performs an unbounded read of each
selected file; the compliance checklist requires validation/handling of boundary values for external
inputs to avoid security and reliability issues.

Rule 6: Generic: Security-First Input Validation and Data Handling
Rule 3: Generic: Robust Error Handling and Edge Case Management
internal/rego/oci/oci.go[1137-1146]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` reads tar entry contents with `io.ReadAll` without any size checks/limits, allowing untrusted OCI blobs to cause large memory allocations.
## Issue Context
OCI blobs are external input. Tar headers provide a `Size`, and the implementation should handle boundary values and enforce safe limits.
## Fix Focus Areas
- internal/rego/oci/oci.go[1102-1160]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Cross-builtin cache collision 🐞 Bug ✓ Correctness
Description
ociBlobFiles reuses ComponentCache.imageFilesCache/imageFilesFlight with the same cacheKey
scheme as ociImageFiles (ref + hash(paths)). If both builtins are called with the same ref+paths,
one can return the other's cached result even though their extraction semantics differ.
Code

internal/rego/oci/oci.go[R1019-1035]

+	// Build cache key from ref + paths (hash the paths for a stable key)
+	pathsHash := fmt.Sprintf("%x", sha256.Sum256([]byte(pathsTerm.String())))[:12]
+	cacheKey := refStr + ":" + pathsHash
+
+	// Use component-scoped cache if available, otherwise fall back to global.
+	// Blob files data can be substantial and is unique per component.
+	cc := componentCacheFromContext(bctx.Context)
+
+	// Check cache first (fast path)
+	if cached, found := cc.imageFilesCache.Load(cacheKey); found {
+		logger.Debug("Blob files served from cache")
+		return cached.(*ast.Term), nil
+	}
+
+	// Use singleflight to prevent thundering herd
+	result, err, _ := cc.imageFilesFlight.Do(cacheKey, func() (any, error) {
+		// Double-check cache inside singleflight
Evidence
Both builtins compute cacheKey := refStr + ":" + pathsHash and read/write cc.imageFilesCache
under that key. However, ociImageFiles delegates to files.ImageFiles (directory-based matching
and supported extensions), while ociBlobFiles does exact-path matching on a tar stream and
attempts to parse even unsupported extensions. If keys collide, the cached value can be semantically
wrong for the caller.

internal/rego/oci/oci.go[899-935]
internal/rego/oci/oci.go[1019-1035]
internal/rego/oci/oci.go[956-980]
internal/rego/oci/oci.go[1092-1119]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociImageFiles` and `ociBlobFiles` share the same cache map (`cc.imageFilesCache`) and compute cache keys the same way (`refStr + &amp;#x27;:&amp;#x27; + hash(paths)`), but they have different matching/extraction semantics. A cache hit can return the wrong builtin&amp;#x27;s result.
### Issue Context
Both builtins store `*ast.Term` values, so this may not panic immediately; it can silently return incorrect file objects.
### Fix Focus Areas
- internal/rego/oci/oci.go[899-1001]
- internal/rego/oci/oci.go[1003-1181]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

6. ref logged unsanitized 📘 Rule violation ⛨ Security ⭐ New
Description
The new ociBlobFiles() code logs the raw ref value, which may include sensitive registry
information (e.g., embedded credentials) and would be written to logs. This violates the requirement
to avoid sensitive data in logs at any level.
Code

internal/rego/oci/oci.go[R1017-1019]

+	refStr := string(uri)
+	logger = logger.WithField("ref", refStr)
+
Evidence
PR Compliance ID 5 forbids sensitive data in logs; the added code logs the full OCI reference string
via WithField("ref", refStr).

Rule 5: Generic: Secure Logging Practices
internal/rego/oci/oci.go[1017-1019]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` logs the raw OCI reference (`refStr`) which could contain sensitive information (e.g., embedded credentials). This violates secure logging requirements.

## Issue Context
Even debug logs must not contain sensitive data.

## Fix Focus Areas
- internal/rego/oci/oci.go[1017-1019]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Snake_case variables in Go 📘 Rule violation ✓ Correctness ⭐ New
Description
New variables expected_digest and computed_digest use snake_case, reducing readability and
deviating from Go naming conventions. This violates the requirement for meaningful, self-documenting
identifiers.
Code

internal/rego/oci/oci.go[R512-520]

+		expected_digest := ref.DigestStr()
+		if verifyDigest {
+			computed_digest := fmt.Sprintf("sha256:%x", hasher.Sum(nil))
+			if computed_digest != expected_digest {
+				logger.WithFields(log.Fields{
+					"action":          "verify digest",
+					"computed_digest": computed_digest,
+					"expected_digest": expected_digest,
+				}).Error("computed digest does not match expected digest")
Evidence
PR Compliance ID 2 requires clear, readable naming; the added identifiers use snake_case
(expected_digest, computed_digest) rather than conventional Go camelCase, harming consistency
and self-documentation.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
internal/rego/oci/oci.go[512-520]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Snake_case variable names (`expected_digest`, `computed_digest`) were introduced in Go code, reducing readability and consistency.

## Issue Context
Go style and self-documenting naming conventions prefer camelCase for local variables.

## Fix Focus Areas
- internal/rego/oci/oci.go[512-522]
- internal/rego/oci/oci.go[525-528]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. imageFilesCache used for blobs 📘 Rule violation ✓ Correctness
Description
ociBlobFiles() stores blob-file extraction results in cc.imageFilesCache/cc.imageFilesFlight,
which is misleading and reduces self-documentation. This increases maintenance risk and can cause
incorrect future reuse of cache structures.
Code

internal/rego/oci/oci.go[R1027-1035]

+	// Check cache first (fast path)
+	if cached, found := cc.imageFilesCache.Load(cacheKey); found {
+		logger.Debug("Blob files served from cache")
+		return cached.(*ast.Term), nil
+	}
+
+	// Use singleflight to prevent thundering herd
+	result, err, _ := cc.imageFilesFlight.Do(cacheKey, func() (any, error) {
+		// Double-check cache inside singleflight
Evidence
The compliance checklist requires meaningful, self-documenting identifiers; reusing cache/flight
names that explicitly refer to image files for blob-file extraction is misleading in the newly added
code.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
internal/rego/oci/oci.go[1027-1035]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles()` uses `imageFilesCache` and `imageFilesFlight` for caching blob-file extraction results, which is misleading naming.
## Issue Context
The code now supports multiple OCI rego builtins (image files vs blob files). Cache names should clearly reflect what they store to reduce confusion and future bugs.
## Fix Focus Areas
- internal/rego/oci/oci.go[1019-1035]
- internal/rego/oci/oci.go[1085-1090]
- internal/rego/oci/oci.go[1171-1174]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
9. Tar path exact match 🐞 Bug ✓ Correctness
Description
ociBlobFiles matches requested paths against tar header.Name verbatim. If the archive stores
names like ./task.yaml (common) or otherwise differs in normalization, requested files won’t be
found and the function will return an empty object.
Code

internal/rego/oci/oci.go[R1116-1118]

+			// Check if this file matches any of our target paths
+			if !targetPathSet[header.Name] {
+				continue
Evidence
The implementation performs an exact string lookup on header.Name and does not normalize either
requested paths or archive paths. In contrast, existing image extraction code normalizes paths using
path.Clean(...) before matching.

internal/rego/oci/oci.go[1067-1075]
internal/rego/oci/oci.go[1116-1119]
internal/fetchers/oci/files/files.go[123-128]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ociBlobFiles` performs exact matching between requested paths and `tar.Header.Name` without normalization. This can cause unexpected misses when archives contain `./` prefixes or other equivalent path forms.
### Issue Context
The existing image file extractor uses `path.Clean(...)` for matching, which is more tolerant.
### Fix Focus Areas
- internal/rego/oci/oci.go[1067-1120]
- internal/fetchers/oci/files/files.go[118-139]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


10. No early tar exit 🐞 Bug ➹ Performance
Description
ociBlobFiles scans the entire tar stream even after all requested files are already extracted. For
large bundles, this does unnecessary work and delays evaluation.
Code

internal/rego/oci/oci.go[R1102-1120]

+		extractedFiles := map[string]json.RawMessage{}
+		for {
+			header, err := archive.Next()
+			if err != nil {
+				if err == io.EOF {
+					break
+				}
+				logger.WithFields(log.Fields{
+					"action": "read tar header",
+					"error":  err,
+				}).Error("failed to read tar archive")
+				return nil, nil //nolint:nilerr
+			}
+
+			// Check if this file matches any of our target paths
+			if !targetPathSet[header.Name] {
+				continue
+			}
+
Evidence
The tar loop runs until EOF and never removes found targets or checks whether all requested files
have been satisfied. This is avoidable because paths is known up front and a set is already
maintained.

internal/rego/oci/oci.go[1096-1101]
internal/rego/oci/oci.go[1102-1108]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The tar scan in `ociBlobFiles` continues until EOF even when all requested files have already been found.
### Issue Context
A target set already exists; it can be updated as files are found to support early exit.
### Fix Focus Areas
- internal/rego/oci/oci.go[1096-1161]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +1048 to +1050
// Get the blob content first (skip digest verification due to compressed/uncompressed mismatch)
blobTerm, err := ociBlobInternal(bctx, refTerm, false)
if err != nil || blobTerm == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Unverified blob poisons cache 🐞 Bug ⛨ Security

ociBlobFiles fetches blob content via ociBlobInternal(..., verifyDigest=false), but
ociBlobInternal still stores the blob in cc.blobCache keyed only by refStr. Subsequent calls
to ec.oci.blob for the same ref will be served from cache and skip digest verification, making
results depend on evaluation order.
Agent Prompt
### Issue description
`ociBlobFiles` calls `ociBlobInternal(..., verifyDigest=false)` which still stores the blob into the shared `cc.blobCache` keyed only by `refStr`. If `ec.oci.blob` is later called for the same `refStr`, it can return the cached value without performing digest verification, making the result depend on the order of builtin evaluation.

### Issue Context
`cc.blobCache` is checked before any digest verification occurs, and `cc.blobCache.Store(refStr, term)` happens regardless of the `verifyDigest` flag.

### Fix Focus Areas
- internal/rego/oci/oci.go[440-444]
- internal/rego/oci/oci.go[512-532]
- internal/rego/oci/oci.go[1048-1050]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess we could check the digest after every cache fetch. Not sure how much I care about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

Hmm, I guess we could check the digest after every cache fetch. Not sure how much I care about it.

Answer:

If you want Qodo Merge to be less “pushy” on items like this (where you may accept the risk / tradeoff), add pragmatic guidance to the improve suggestions prompt.

Put this in your .pr_agent.toml (or wiki config) to make security/cache-order findings come with a brief risk assessment + options, and to avoid insisting on fixing low-severity issues:

[pr_code_suggestions]
extra_instructions = """\
- When raising Security/Bug findings, include: (a) realistic exploitability, (b) likelihood, (c) impact, and (d) whether results depend on evaluation order.
- If a fix has performance/caching tradeoffs (e.g., verifying digest after every cache hit), propose at least 2 alternatives (e.g., cache keys include verifyDigest/digest; store verified-only; verify on cache fetch; separate caches) and clearly label pros/cons.
- Do not insist on changing behavior when the issue is low likelihood/low impact; explicitly mark it as "optional hardening" unless you can justify a concrete exploit path.
- Prefer concise recommendations and avoid repeating the same concern across multiple suggestions.
"""

If you meant this should never be flagged, you can further refine the instruction to treat “optional hardening” items as low-priority (or increase your filtering using pr_code_suggestions.suggestions_score_threshold), but you’ll need to decide your team’s acceptable risk level to tune that accurately.

Relevant Sources:

Comment on lines 1048 to 1101
// Get the blob content first (skip digest verification due to compressed/uncompressed mismatch)
blobTerm, err := ociBlobInternal(bctx, refTerm, false)
if err != nil || blobTerm == nil {
logger.WithFields(log.Fields{
"action": "fetch blob",
"error": err,
}).Error("failed to fetch blob content")
return nil, nil //nolint:nilerr
}

blobContent, ok := blobTerm.Value.(ast.String)
if !ok {
logger.Error("blob content is not a string")
return nil, nil
}

pathsArray, err := builtins.ArrayOperand(pathsTerm.Value, 1)
if err != nil {
logger.WithFields(log.Fields{
"action": "convert paths",
"error": err,
}).Error("failed to convert paths to array operand")
return nil, nil //nolint:nilerr
}

// Collect target paths for exact file matching
var targetPaths []string
err = pathsArray.Iter(func(pathTerm *ast.Term) error {
pathString, ok := pathTerm.Value.(ast.String)
if !ok {
return fmt.Errorf("path is not a string: %#v", pathTerm)
}
targetPaths = append(targetPaths, string(pathString))
return nil
})
if err != nil {
logger.WithFields(log.Fields{
"action": "iterate paths",
"error": err,
}).Error("failed iterating paths")
return nil, nil //nolint:nilerr
}

if len(targetPaths) == 0 {
logger.Debug("No paths specified, returning empty result")
term := ast.NewTerm(ast.NewObject())
cc.imageFilesCache.Store(cacheKey, term)
return term, nil
}

// Create a tar reader from the blob content
blobReader := strings.NewReader(string(blobContent))
archive := tar.NewReader(blobReader)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Buffers entire layer in memory 🐞 Bug ⛯ Reliability

ociBlobFiles fetches the whole uncompressed blob into memory (via ociBlobInternal), then
constructs a tar.Reader over the in-memory string. This defeats the per-entry size guard
(maxTarEntrySize) because the process can OOM while buffering/decompressing the full layer before
any tar-entry checks run.
Agent Prompt
### Issue description
`ociBlobFiles` currently loads the entire uncompressed layer into memory (via `ociBlobInternal`) before parsing it as a tar archive. This can OOM on large layers even if individual file entries are limited.

### Issue Context
The per-entry `maxTarEntrySize` check happens after the entire blob has already been copied into a `bytes.Buffer`.

### Fix Focus Areas
- internal/rego/oci/oci.go[473-532]
- internal/rego/oci/oci.go[1048-1101]
- internal/rego/oci/oci.go[1143-1163]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is true. I don't want to fix it, but I'll add a comment.

st3penta
st3penta previously approved these changes Feb 27, 2026
Copy link
Contributor

@st3penta st3penta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, i just left a comment about a possible duplicate

@simonbaird
Copy link
Member Author

simonbaird commented Feb 27, 2026

A couple of Qodo suggestions make sense. I'm pushing a couple more tweaks.

simonbaird and others added 5 commits February 27, 2026 12:04
I guess our Renovate PRs don't do this.

Unrelated cleanup while working on the PR for....

Ref: https://issues.redhat.com/browse/EC-1681
The goal is to be able to access the task definitions inside a
Tekton task bundle, so we can then apply task definition related
policy checks to them, but I think this is a generally useful
addition to our custom oci functions.

Ref: https://issues.redhat.com/browse/EC-1681
Co-authored-by: Claude Code <noreply@anthropic.com>
Introduce a limit to avoid potential memory exhaustion attacks.

Honestly I'm not sure how important this is, but I guess it's fine.
It was suggested by in code review by Qodo.

We actually did have a limit in the past, but it was too small, and
caused problems with large SBOMs. I hestiated to bring back a limit,
however, the context is a little different, ociBlob vs ociBlobFile,
and I'm making the limit 500MB instead of 10MB.

Ref: https://issues.redhat.com/browse/EC-1681
Co-authored-by: Claude Code <noreply@anthropic.com>
Multiple LLMs are worried about using the same cache key for two
potentially different things. Let's make it clear that it's not a
problem.

In reality I don't think it could be a problem, since the cache key
includes the image ref, which in the case of a blob will always have
a digest. But, I figure there's no harm in making the code intent
clear, and avoid having to explain that subtle detail to the bots.

Ref: https://issues.redhat.com/browse/EC-1681
Co-authored-by: Claude Code <noreply@anthropic.com>
Sometimes Claude doesn't format everything so I want an easy way to
fix the formatting. There might be a better way to do it (and maybe
make lintfix will do it?), but this works for me.

Unrelated tweak while working on...

Ref: https://issues.redhat.com/browse/EC-1681
@simonbaird
Copy link
Member Author

Remove the snake case vars, and rename the cache vars to indicate they're used for image files and blob files now. Also added a comment about the limiting. The other Qodo suggestions I think I'm comfortable ignoring for now.

simonbaird and others added 2 commits February 27, 2026 12:13
Well the bot wanted the limit checks, then the bot pointed out that
they aren't very effective. I don't want to add limit checks on all
the oci fetches, but I also don't want to re-remove what we have
here, so let's at least point out that its value is questionable.

Ref: https://issues.redhat.com/browse/EC-1681
Since we're using the cache for both image files and blob files,
let's stop calling it imageFilesCache. Same thing for single flight.

Suggested by Qodo in code review.

Co-authored-by: Claude Code <noreply@anthropic.com>
Ref: https://issues.redhat.com/browse/EC-1681
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants