Skip to content

fix(codex): scope token dedup key to fork parent so subagent replays collapse#681

Open
RedesignedRobot wants to merge 2 commits into
junhoyeo:mainfrom
RedesignedRobot:fix/codex-fork-replay-dedup
Open

fix(codex): scope token dedup key to fork parent so subagent replays collapse#681
RedesignedRobot wants to merge 2 commits into
junhoyeo:mainfrom
RedesignedRobot:fix/codex-fork-replay-dedup

Conversation

@RedesignedRobot
Copy link
Copy Markdown

@RedesignedRobot RedesignedRobot commented Jun 6, 2026

Codex subagent/fork fan-out replays the parent's token_count history into every child file. Each replayed row has the same cumulative total but a distinct per-file session id, and the dedup key is scoped by session id, so the copies never collapse and every sibling gets counted. One real day reported 12.9B tokens and tripped the daily submit cap; deduped it is ~1B (#679).

Scope the codex token_count dedup key to forked_from_id instead of the child's own session id. Sibling replays share one key and collapse. Non-forked sessions keep their own id, so unrelated work never merges.

From 175 real session files on the affected day:

  • 87.9% of token_count rows are cross-file duplicates (4,450 rows replayed across 16 files each, etc).
  • counting each distinct cumulative row once drops the day from 13.7B to 1.65B.
  • 0 cumulative rows are shared across unrelated fork families, so parent-scoping only collapses real replays.

This changes test_..._deduplicates_parent_replay_across_forks. Its two sibling forks each do one own turn that lands on a byte-identical cumulative total, and it asserted both survive; with parent-scoped keys they collapse, so the assertion is updated. In real data, identical sibling cumulative vectors are the replay signature rather than independent work, since the cumulative encodes each fork's divergent context size. If you would rather preserve that case, the alternative is a cross-file pass that collapses only contiguous shared runs instead of single rows; happy to switch.

Closes #679.


Summary by cubic

Fix token overcounting in Codex by scoping token_count dedup keys to the fork parent (forked_from_id) so sibling subagent replays collapse. Also bump the message cache schema to reparse stale dedup_keys and apply the fix to previously scanned sessions; closes #679.

  • Bug Fixes
    • Scope the dedup key to forked_from_id to collapse identical cumulative token_count rows across sibling forks; fallback to the session id for non-forked sessions, so unrelated sessions never merge.
    • Bump message cache schema to 18 to force a reparse of cached messages with old per-child dedup_keys.
    • Update test to reflect collapsed sibling turns (3 messages; totals 140/14), preventing inflated daily counts.

Written for commit 0545dee. Summary will update on new commits.

Review in cubic

…collapse

Fork/subagent children replay the parent's token_count history into every
child file with identical cumulative totals but a distinct per-file session
id. The dedup key was scoped by session id, so sibling replays never
collapsed and each copy was counted. Scope the key to forked_from_id so
sibling replays share one key; unrelated sessions keep their own id.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Jun 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
tokscale Ignored Ignored Preview Jun 6, 2026 9:05pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d2ca695ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/tokscale-core/src/sessions/codex.rs
The fork-parent dedup key only changes parser output, but cached
UnifiedMessages store their dedup_key. Files already in
source-message-cache.bin were returned with old per-child keys, so
the fix was latent for already-scanned sessions (the junhoyeo#679 case).
Bump the cache schema version to force a one-time reparse.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex: fork/subagent replayed token_count history is double-counted, inflating per-day token totals

1 participant