-
Notifications
You must be signed in to change notification settings - Fork 153
ENT-231: Secrets redaction #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Secrets (high-entropy strings like API keys) were not being redacted before being persisted to the entire/sessions metadata branch, meaning they could end up in permanent git history. This applies the Shannon entropy-based redaction from the redact package to every content write path in WriteCommitted. Redaction points in committed.go: - Transcripts (JSONL-aware, before chunking so content hash is correct) - User prompts (plain string redaction) - Context files (plain byte redaction) - Subagent transcripts (JSONL-aware) - Incremental checkpoint data (may contain tool input payloads) - copyMetadataDir files (JSONL-aware for .jsonl, plain for others) Also fixes and cleans up the redact package: - Fix compiler error: redactString (unexported) → String (exported) - Rename exports to follow Go convention (avoid stutter): RedactString → String, RedactBytes → Bytes, etc. - Add Bytes/JSONLBytes convenience wrappers for []byte call sites - Remove unused findSecrets and scanJSONValue functions Not redacted (by design): metadata.json files (structured operational data only), shadow branch writes, and working directory source files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 2373a92c6c40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces secrets redaction to prevent API keys and sensitive data from being committed to the entire/checkpoints/v1 branch. It implements entropy-based detection (Shannon entropy > 4.5) on 10+ character alphanumeric strings to identify potential secrets, which are then replaced with [REDACTED] before storage.
Changes:
- New
redactpackage with entropy-based secret detection for plain text and JSONL content - Integration of redaction at all checkpoint write paths (transcripts, prompts, context, metadata files)
- Protection for specific fields (IDs, signatures) and image objects from redaction
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| redact/redact.go | Core redaction logic with Shannon entropy calculation and JSONL-aware replacement |
| redact/redact_test.go | Unit tests for basic redaction functionality |
| cmd/entire/cli/checkpoint/committed.go | Integration of redaction into all checkpoint write operations and new createRedactedBlobFromFile function |
| cmd/entire/cli/checkpoint/checkpoint_test.go | Integration tests verifying redaction across different checkpoint types |
Entire-Checkpoint: 0dbd56975036
Entire-Checkpoint: 0dbd56975036
Entire-Checkpoint: 0dbd56975036
Entire-Checkpoint: 2373a92c6c40
I suspect this case shouldn't be triggered, but oh well. Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 530a376fb9e2
Entire-Checkpoint: 6e06350ae701
Entire-Checkpoint: 2bb8c088bdb4
Entire-Checkpoint: 4de153fa6df6
We might replace functions or filenames and a string without square brackets is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
I've tested this with rewind manually, the session seems to resume okay - not sure what claude's memory looks like afterwards but I guess let's see what general usage tells us |
Description
Claude happily stores scary secrets in session data, which we persist in the
entire/checkpoints/v1branch. To avoid folks shooting themselves in the foot and pushing API keys to public repos, we're introducing a filtering mechanism for all files written to the checkpoints branch.We find secrets by looking for 10+ character strings with a specific threshold Shannon entropy value of > 4.5.
An open question is whether we should skip binary files here:
cli/cmd/entire/cli/checkpoint/committed.go
Lines 1085 to 1094 in c72f6b7
..because we don't want to be munging files that aren't plain text. However, my sense is "no don't bother", because this stuff is in
copyMetadataDir, which should just contain stuff that we control, right? Keen to hear from a CLI expert here.The results of this redaction can be seen here:
cli/39/7737088f0a/0/full.jsonl
Line 4 in a0820d4
Note
Medium Risk
Changes what gets persisted into git history for checkpoints (including hashing/chunking inputs) using heuristic entropy-based detection, which could cause unexpected redactions or missed secrets and affect downstream consumers relying on exact content.
Overview
Adds automatic secrets redaction for data written to the
entire/sessions(checkpoints) branch, replacing high-entropy token-like substrings withREDACTED.Redaction is applied to committed checkpoint transcripts (before chunking/content-hash), prompts, context, incremental task checkpoint payloads, subagent transcripts, and files copied via
copyMetadataDir(with JSONL-aware handling and a binary-file skip to avoid corruption). New unit tests cover redaction behavior end-to-end forWriteCommittedandcopyMetadataDir, plus focused tests for the newredactpackage’s entropy-based detection and JSONL field/object skip rules.Written by Cursor Bugbot for commit d0161a4. This will update automatically on new commits. Configure here.