Skip to content

Comments

fix: redact vault secrets from persisted log files#437

Merged
stack72 merged 1 commit intomainfrom
redact-vault-secrets-from-logs
Feb 23, 2026
Merged

fix: redact vault secrets from persisted log files#437
stack72 merged 1 commit intomainfrom
redact-vault-secrets-from-logs

Conversation

@stack72
Copy link
Contributor

@stack72 stack72 commented Feb 23, 2026

Summary

  • Vault secrets resolved via vault.get() were leaking into log files in plaintext (.swamp/outputs/ and .swamp/workflow-runs/), defeating the purpose of vault encryption
  • Adds a SecretRedactor that collects resolved vault secret values and replaces them with *** in log output before writing to disk
  • Redaction is applied only to log files, not data files — data file integrity is preserved for inter-step workflow data flow

Why log-only redaction?

The original plan called for redacting both logs and data files. During implementation we identified that data file redaction would:

  • Break inter-step workflows — downstream steps reading resource attributes via CEL would get *** instead of actual values
  • Silently corrupt data — models write data expecting it to be persisted as-is
  • Cause false positives — common strings matching a secret value would be replaced in unrelated fields

Log files are audit/debug output where secrets should never appear. Data files are the model's responsibility. This narrower scope is the correct fix.

Closes #429

Test plan

  • New unit tests for SecretRedactor (8 tests covering basic redaction, short secret bypass, multiple secrets, longest-first ordering, JSON-escaped variants, empty state)
  • All 1868 existing tests pass
  • UAT test cases filed at systeminit/swamp-uat#26 for end-to-end verification

🤖 Generated with Claude Code

Vault secrets resolved via `vault.get()` CEL expressions were leaking into
log files in plaintext, defeating the purpose of vault encryption. This adds
automatic redaction at the logging infrastructure layer so secrets are
replaced with `***` before being written to disk.

## Approach: log-only redaction

The original plan called for redacting secrets from both log files and data
files (`.swamp/data/`). During implementation, we identified that data file
redaction would silently corrupt model output data and break inter-step
workflow data flow — downstream steps reading resource attributes via CEL
expressions would receive `***` instead of actual values. It would also
cause false positives where common strings matching a secret value get
silently replaced in unrelated fields.

Log files are audit/debug output where secrets should never appear. Data
files are the model's responsibility and their integrity must be preserved
for workflow execution. This narrower scope is the correct fix.

## Implementation

- New `SecretRedactor` class (`src/domain/secrets/`) collects vault secret
  values and replaces them with `***` in text. Ignores secrets < 3 chars to
  prevent false positives, sorts longest-first for substring overlap, and
  auto-adds JSON-escaped variants.
- `ModelResolver.resolveVaultExpressions()` accepts an optional redactor and
  calls `addSecret()` after each vault secret is resolved.
- `ExpressionEvaluationService` threads the redactor through runtime
  expression resolution methods.
- `RunFileSink.register()` accepts an optional redactor per log file
  registration and applies redaction to log lines before writing.
- CLI entry point (`model_method_run.ts`) creates a `SecretRedactor`, passes
  it to both `runFileSink.register()` and vault resolution. The shared
  reference ensures the sink redacts using secrets populated during
  resolution.
- Workflow entry point (`execution_service.ts`) creates a `SecretRedactor`
  per workflow run, threads it through `StepExecutionContext` so each step's
  vault resolution populates the same redactor used by the workflow log sink.

Closes #429

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@stack72
Copy link
Contributor Author

stack72 commented Feb 23, 2026

Follow-up: data file redaction

This PR intentionally scopes redaction to log files only. The data file problem (secrets persisting in .swamp/data/) is still real but requires a different approach.

Blanket string-replace redaction on data files causes silent data corruption, breaks inter-step workflow data flow, and produces false positives. The right solution is field-level redaction using the sensitive: true Zod metadata already described in design/vaults.md. This lets models explicitly declare which fields contain secrets, so:

  • Only marked fields are redacted/omitted on disk
  • In-memory values remain intact for inter-step data flow
  • No false positives from common strings matching secret values
  • Display layer (swamp data get) can mask sensitive fields

This is a larger piece of work touching the schema system, data writer, and display layer — it should be tracked as a separate issue.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Redact Vault Secrets from Persisted Log Files

Verdict: ✅ Approved

This is a well-implemented security fix that prevents vault secrets from leaking into log files. The approach is sound and the code follows project conventions.

Summary of Changes

  • New SecretRedactor class in src/domain/secrets/ collects resolved vault secret values
  • Redactor is passed through expression evaluation to capture secrets during vault.get() resolution
  • RunFileSink applies redaction before writing log lines to disk
  • Log-only redaction scope is the correct design choice (data files need unredacted values for inter-step workflows)

Compliance with CLAUDE.md ✓

  • TypeScript strict mode, no any types
  • Named exports used throughout
  • AGPLv3 license headers present
  • Unit tests co-located with source (secret_redactor_test.ts)
  • Uses @std/assert for assertions

Domain-Driven Design Review ✓

The SecretRedactor is appropriately placed in src/domain/secrets/ as a domain concept. While it's stateful (collecting secrets during execution), this is a pragmatic design for an execution context object. The alternative functional approach would add complexity without clear benefit.

Test Coverage ✓

8 comprehensive unit tests covering:

  • Basic redaction
  • Short secret bypass (< 3 chars to prevent false positives)
  • Multiple secrets
  • Longest-first ordering (handles substring overlap)
  • JSON-escaped variants
  • Empty state handling

Security Review ✓

  • Correctly addresses vault secret leakage (issue #429)
  • 3-character minimum prevents false positives on common short strings
  • Longest-first sorting handles abc vs abcdef overlap correctly
  • JSON-escaped variants handled for secrets appearing in JSON log output
  • split().join() approach is safe against regex injection

Suggestions (Non-blocking)

  1. Minor comment clarification: Line 34 of secret_redactor.ts says "for JSON data files" but could say "for JSON in log output" to align with the PR description's log-only redaction scope.

Well done on this security fix!

🤖 Reviewed by Claude

@stack72 stack72 merged commit 848d01a into main Feb 23, 2026
4 checks passed
@stack72 stack72 deleted the redact-vault-secrets-from-logs branch February 23, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Vault secrets are persisted in plaintext in data files and workflow run logs

1 participant