fix: redact vault secrets from persisted log files by stack72 · Pull Request #437 · systeminit/swamp

stack72 · 2026-02-23T23:39:28Z

Summary

Vault secrets resolved via vault.get() were leaking into log files in plaintext (.swamp/outputs/ and .swamp/workflow-runs/), defeating the purpose of vault encryption
Adds a SecretRedactor that collects resolved vault secret values and replaces them with *** in log output before writing to disk
Redaction is applied only to log files, not data files — data file integrity is preserved for inter-step workflow data flow

Why log-only redaction?

The original plan called for redacting both logs and data files. During implementation we identified that data file redaction would:

Break inter-step workflows — downstream steps reading resource attributes via CEL would get *** instead of actual values
Silently corrupt data — models write data expecting it to be persisted as-is
Cause false positives — common strings matching a secret value would be replaced in unrelated fields

Log files are audit/debug output where secrets should never appear. Data files are the model's responsibility. This narrower scope is the correct fix.

Closes #429

Test plan

New unit tests for SecretRedactor (8 tests covering basic redaction, short secret bypass, multiple secrets, longest-first ordering, JSON-escaped variants, empty state)
All 1868 existing tests pass
UAT test cases filed at systeminit/swamp-uat#26 for end-to-end verification

🤖 Generated with Claude Code

Vault secrets resolved via `vault.get()` CEL expressions were leaking into log files in plaintext, defeating the purpose of vault encryption. This adds automatic redaction at the logging infrastructure layer so secrets are replaced with `***` before being written to disk. ## Approach: log-only redaction The original plan called for redacting secrets from both log files and data files (`.swamp/data/`). During implementation, we identified that data file redaction would silently corrupt model output data and break inter-step workflow data flow — downstream steps reading resource attributes via CEL expressions would receive `***` instead of actual values. It would also cause false positives where common strings matching a secret value get silently replaced in unrelated fields. Log files are audit/debug output where secrets should never appear. Data files are the model's responsibility and their integrity must be preserved for workflow execution. This narrower scope is the correct fix. ## Implementation - New `SecretRedactor` class (`src/domain/secrets/`) collects vault secret values and replaces them with `***` in text. Ignores secrets < 3 chars to prevent false positives, sorts longest-first for substring overlap, and auto-adds JSON-escaped variants. - `ModelResolver.resolveVaultExpressions()` accepts an optional redactor and calls `addSecret()` after each vault secret is resolved. - `ExpressionEvaluationService` threads the redactor through runtime expression resolution methods. - `RunFileSink.register()` accepts an optional redactor per log file registration and applies redaction to log lines before writing. - CLI entry point (`model_method_run.ts`) creates a `SecretRedactor`, passes it to both `runFileSink.register()` and vault resolution. The shared reference ensures the sink redacts using secrets populated during resolution. - Workflow entry point (`execution_service.ts`) creates a `SecretRedactor` per workflow run, threads it through `StepExecutionContext` so each step's vault resolution populates the same redactor used by the workflow log sink. Closes #429 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stack72 · 2026-02-23T23:41:39Z

Follow-up: data file redaction

This PR intentionally scopes redaction to log files only. The data file problem (secrets persisting in .swamp/data/) is still real but requires a different approach.

Blanket string-replace redaction on data files causes silent data corruption, breaks inter-step workflow data flow, and produces false positives. The right solution is field-level redaction using the sensitive: true Zod metadata already described in design/vaults.md. This lets models explicitly declare which fields contain secrets, so:

Only marked fields are redacted/omitted on disk
In-memory values remain intact for inter-step data flow
No false positives from common strings matching secret values
Display layer (swamp data get) can mask sensitive fields

This is a larger piece of work touching the schema system, data writer, and display layer — it should be tracked as a separate issue.

github-actions

Code Review: Redact Vault Secrets from Persisted Log Files

Verdict: ✅ Approved

This is a well-implemented security fix that prevents vault secrets from leaking into log files. The approach is sound and the code follows project conventions.

Summary of Changes

New SecretRedactor class in src/domain/secrets/ collects resolved vault secret values
Redactor is passed through expression evaluation to capture secrets during vault.get() resolution
RunFileSink applies redaction before writing log lines to disk
Log-only redaction scope is the correct design choice (data files need unredacted values for inter-step workflows)

Compliance with CLAUDE.md ✓

TypeScript strict mode, no any types
Named exports used throughout
AGPLv3 license headers present
Unit tests co-located with source (secret_redactor_test.ts)
Uses @std/assert for assertions

Domain-Driven Design Review ✓

The SecretRedactor is appropriately placed in src/domain/secrets/ as a domain concept. While it's stateful (collecting secrets during execution), this is a pragmatic design for an execution context object. The alternative functional approach would add complexity without clear benefit.

Test Coverage ✓

8 comprehensive unit tests covering:

Basic redaction
Short secret bypass (< 3 chars to prevent false positives)
Multiple secrets
Longest-first ordering (handles substring overlap)
JSON-escaped variants
Empty state handling

Security Review ✓

Correctly addresses vault secret leakage (issue #429)
3-character minimum prevents false positives on common short strings
Longest-first sorting handles abc vs abcdef overlap correctly
JSON-escaped variants handled for secrets appearing in JSON log output
split().join() approach is safe against regex injection

Suggestions (Non-blocking)

Minor comment clarification: Line 34 of secret_redactor.ts says "for JSON data files" but could say "for JSON in log output" to align with the PR description's log-only redaction scope.

Well done on this security fix!

🤖 Reviewed by Claude

stack72 mentioned this pull request Feb 23, 2026

Support marking model output fields as secrets for vault storage #433

Closed

github-actions bot approved these changes Feb 23, 2026

View reviewed changes

stack72 merged commit 848d01a into main Feb 23, 2026
4 checks passed

stack72 deleted the redact-vault-secrets-from-logs branch February 23, 2026 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: redact vault secrets from persisted log files#437

fix: redact vault secrets from persisted log files#437
stack72 merged 1 commit intomainfrom
redact-vault-secrets-from-logs

stack72 commented Feb 23, 2026

Uh oh!

stack72 commented Feb 23, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

stack72 commented Feb 23, 2026

Summary

Why log-only redaction?

Test plan

Uh oh!

stack72 commented Feb 23, 2026

Follow-up: data file redaction

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review: Redact Vault Secrets from Persisted Log Files

Summary of Changes

Compliance with CLAUDE.md ✓

Domain-Driven Design Review ✓

Test Coverage ✓

Security Review ✓

Suggestions (Non-blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant