fix: redact vault secrets from persisted log files#437
Conversation
Vault secrets resolved via `vault.get()` CEL expressions were leaking into log files in plaintext, defeating the purpose of vault encryption. This adds automatic redaction at the logging infrastructure layer so secrets are replaced with `***` before being written to disk. ## Approach: log-only redaction The original plan called for redacting secrets from both log files and data files (`.swamp/data/`). During implementation, we identified that data file redaction would silently corrupt model output data and break inter-step workflow data flow — downstream steps reading resource attributes via CEL expressions would receive `***` instead of actual values. It would also cause false positives where common strings matching a secret value get silently replaced in unrelated fields. Log files are audit/debug output where secrets should never appear. Data files are the model's responsibility and their integrity must be preserved for workflow execution. This narrower scope is the correct fix. ## Implementation - New `SecretRedactor` class (`src/domain/secrets/`) collects vault secret values and replaces them with `***` in text. Ignores secrets < 3 chars to prevent false positives, sorts longest-first for substring overlap, and auto-adds JSON-escaped variants. - `ModelResolver.resolveVaultExpressions()` accepts an optional redactor and calls `addSecret()` after each vault secret is resolved. - `ExpressionEvaluationService` threads the redactor through runtime expression resolution methods. - `RunFileSink.register()` accepts an optional redactor per log file registration and applies redaction to log lines before writing. - CLI entry point (`model_method_run.ts`) creates a `SecretRedactor`, passes it to both `runFileSink.register()` and vault resolution. The shared reference ensures the sink redacts using secrets populated during resolution. - Workflow entry point (`execution_service.ts`) creates a `SecretRedactor` per workflow run, threads it through `StepExecutionContext` so each step's vault resolution populates the same redactor used by the workflow log sink. Closes #429 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow-up: data file redactionThis PR intentionally scopes redaction to log files only. The data file problem (secrets persisting in Blanket string-replace redaction on data files causes silent data corruption, breaks inter-step workflow data flow, and produces false positives. The right solution is field-level redaction using the
This is a larger piece of work touching the schema system, data writer, and display layer — it should be tracked as a separate issue. |
There was a problem hiding this comment.
Code Review: Redact Vault Secrets from Persisted Log Files
Verdict: ✅ Approved
This is a well-implemented security fix that prevents vault secrets from leaking into log files. The approach is sound and the code follows project conventions.
Summary of Changes
- New
SecretRedactorclass insrc/domain/secrets/collects resolved vault secret values - Redactor is passed through expression evaluation to capture secrets during
vault.get()resolution RunFileSinkapplies redaction before writing log lines to disk- Log-only redaction scope is the correct design choice (data files need unredacted values for inter-step workflows)
Compliance with CLAUDE.md ✓
- TypeScript strict mode, no
anytypes - Named exports used throughout
- AGPLv3 license headers present
- Unit tests co-located with source (
secret_redactor_test.ts) - Uses
@std/assertfor assertions
Domain-Driven Design Review ✓
The SecretRedactor is appropriately placed in src/domain/secrets/ as a domain concept. While it's stateful (collecting secrets during execution), this is a pragmatic design for an execution context object. The alternative functional approach would add complexity without clear benefit.
Test Coverage ✓
8 comprehensive unit tests covering:
- Basic redaction
- Short secret bypass (< 3 chars to prevent false positives)
- Multiple secrets
- Longest-first ordering (handles substring overlap)
- JSON-escaped variants
- Empty state handling
Security Review ✓
- Correctly addresses vault secret leakage (issue #429)
- 3-character minimum prevents false positives on common short strings
- Longest-first sorting handles
abcvsabcdefoverlap correctly - JSON-escaped variants handled for secrets appearing in JSON log output
split().join()approach is safe against regex injection
Suggestions (Non-blocking)
- Minor comment clarification: Line 34 of
secret_redactor.tssays "for JSON data files" but could say "for JSON in log output" to align with the PR description's log-only redaction scope.
Well done on this security fix!
🤖 Reviewed by Claude
Summary
vault.get()were leaking into log files in plaintext (.swamp/outputs/and.swamp/workflow-runs/), defeating the purpose of vault encryptionSecretRedactorthat collects resolved vault secret values and replaces them with***in log output before writing to diskWhy log-only redaction?
The original plan called for redacting both logs and data files. During implementation we identified that data file redaction would:
***instead of actual valuesLog files are audit/debug output where secrets should never appear. Data files are the model's responsibility. This narrower scope is the correct fix.
Closes #429
Test plan
SecretRedactor(8 tests covering basic redaction, short secret bypass, multiple secrets, longest-first ordering, JSON-escaped variants, empty state)🤖 Generated with Claude Code