Skip to content

Conversation

@toothbrush
Copy link
Contributor

@toothbrush toothbrush commented Feb 6, 2026

Description

Claude happily stores scary secrets in session data, which we persist in the entire/checkpoints/v1 branch. To avoid folks shooting themselves in the foot and pushing API keys to public repos, we're introducing a filtering mechanism for all files written to the checkpoints branch.

We find secrets by looking for 10+ character strings with a specific threshold Shannon entropy value of > 4.5.

An open question is whether we should skip binary files here:

if strings.HasSuffix(treePath, ".jsonl") {
content = redact.JSONLBytes(content)
} else {
content = redact.Bytes(content)
}
hash, err := CreateBlobFromContent(repo, content)
if err != nil {
return plumbing.ZeroHash, 0, fmt.Errorf("failed to create blob: %w", err)
}

..because we don't want to be munging files that aren't plain text. However, my sense is "no don't bother", because this stuff is in copyMetadataDir, which should just contain stuff that we control, right? Keen to hear from a CLI expert here.

The results of this redaction can be seen here:

{"parentUuid":"4fc1afa2-ca11-4ce9-bbb8-cd59269a836e","isSidechain":false,"userType":"external","cwd":"/Users/paul/src/entireio/cli","sessionId":"9b76dc0f-dc1b-4822-ac56-53cfa3fbc032","version":"2.1.34","gitBranch":"20260206-farting-around-with-secrets","type":"user","message":{"role":"user","content":"I'm testing the new feature in Entire CLI that redacts secrets. Please create a file SECRETS.md with the following contents:\n\nHere is some config:\napi_key = \"[REDACTED]\"\nsecret_key = \"[REDACTED]\"\nEnd of config."},"uuid":"983faa2c-6a94-46db-9de3-360d430fa8ab","timestamp":"2026-02-06T08:56:15.541Z","thinkingMetadata":{"maxThinkingTokens":31999},"todos":[],"permissionMode":"default"}


Note

Medium Risk
Changes what gets persisted into git history for checkpoints (including hashing/chunking inputs) using heuristic entropy-based detection, which could cause unexpected redactions or missed secrets and affect downstream consumers relying on exact content.

Overview
Adds automatic secrets redaction for data written to the entire/sessions (checkpoints) branch, replacing high-entropy token-like substrings with REDACTED.

Redaction is applied to committed checkpoint transcripts (before chunking/content-hash), prompts, context, incremental task checkpoint payloads, subagent transcripts, and files copied via copyMetadataDir (with JSONL-aware handling and a binary-file skip to avoid corruption). New unit tests cover redaction behavior end-to-end for WriteCommitted and copyMetadataDir, plus focused tests for the new redact package’s entropy-based detection and JSONL field/object skip rules.

Written by Cursor Bugbot for commit d0161a4. This will update automatically on new commits. Configure here.

toothbrush and others added 3 commits February 6, 2026 14:23
Secrets (high-entropy strings like API keys) were not being redacted
before being persisted to the entire/sessions metadata branch, meaning
they could end up in permanent git history. This applies the Shannon
entropy-based redaction from the redact package to every content write
path in WriteCommitted.

Redaction points in committed.go:
- Transcripts (JSONL-aware, before chunking so content hash is correct)
- User prompts (plain string redaction)
- Context files (plain byte redaction)
- Subagent transcripts (JSONL-aware)
- Incremental checkpoint data (may contain tool input payloads)
- copyMetadataDir files (JSONL-aware for .jsonl, plain for others)

Also fixes and cleans up the redact package:
- Fix compiler error: redactString (unexported) → String (exported)
- Rename exports to follow Go convention (avoid stutter):
  RedactString → String, RedactBytes → Bytes, etc.
- Add Bytes/JSONLBytes convenience wrappers for []byte call sites
- Remove unused findSecrets and scanJSONValue functions

Not redacted (by design): metadata.json files (structured operational
data only), shadow branch writes, and working directory source files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 2373a92c6c40
@toothbrush toothbrush requested a review from a team as a code owner February 6, 2026 09:08
Copilot AI review requested due to automatic review settings February 6, 2026 09:08
@toothbrush toothbrush changed the title Secrets redaction ENT-231: Secrets redaction Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces secrets redaction to prevent API keys and sensitive data from being committed to the entire/checkpoints/v1 branch. It implements entropy-based detection (Shannon entropy > 4.5) on 10+ character alphanumeric strings to identify potential secrets, which are then replaced with [REDACTED] before storage.

Changes:

  • New redact package with entropy-based secret detection for plain text and JSONL content
  • Integration of redaction at all checkpoint write paths (transcripts, prompts, context, metadata files)
  • Protection for specific fields (IDs, signatures) and image objects from redaction

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
redact/redact.go Core redaction logic with Shannon entropy calculation and JSONL-aware replacement
redact/redact_test.go Unit tests for basic redaction functionality
cmd/entire/cli/checkpoint/committed.go Integration of redaction into all checkpoint write operations and new createRedactedBlobFromFile function
cmd/entire/cli/checkpoint/checkpoint_test.go Integration tests verifying redaction across different checkpoint types

Entire-Checkpoint: 0dbd56975036
Entire-Checkpoint: 0dbd56975036
Entire-Checkpoint: 0dbd56975036
Entire-Checkpoint: 2373a92c6c40
I suspect this case shouldn't be triggered, but oh well.

Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 2373a92c6c40
Entire-Checkpoint: 530a376fb9e2
Entire-Checkpoint: 6e06350ae701
Entire-Checkpoint: 2bb8c088bdb4
Entire-Checkpoint: 4de153fa6df6
Entire-Checkpoint: 4de153fa6df6
We might replace functions or filenames and a string without square
brackets is safer.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@khaong
Copy link
Contributor

khaong commented Feb 8, 2026

I've tested this with rewind manually, the session seems to resume okay - not sure what claude's memory looks like afterwards but I guess let's see what general usage tells us

@khaong khaong merged commit 94cda18 into main Feb 8, 2026
4 checks passed
@khaong khaong deleted the 20260206-secrets-redaction branch February 8, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants