Skip to content

feat(memory): accurate token counting with tiktoken-rs#816

Merged
bug-ops merged 3 commits intomainfrom
m27-token-counting
Feb 24, 2026
Merged

feat(memory): accurate token counting with tiktoken-rs#816
bug-ops merged 3 commits intomainfrom
m27-token-counting

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Feb 24, 2026

Summary

  • Replace chars/4 heuristic (up to 30% error) with tiktoken-rs cl100k_base tokenizer
  • Add TokenCounter with DashMap cache (10k cap, random eviction) for amortized O(1) lookups
  • Implement OpenAI tool schema token formula (FUNC_INIT, PROP_KEY, ENUM_ITEM, etc.)
  • Graceful fallback to chars/4 when tokenizer initialization fails
  • Input size guard (64KB) to prevent cache pollution from oversized input
  • Migrate all ~30 estimate_tokens() call sites across zeph-memory, zeph-core, zeph-index

Closes #789, closes #794, closes #795, closes #796, closes #797, closes #798

Test plan

  • cargo +nightly fmt --check passes
  • cargo clippy --workspace -- -D warnings — 0 warnings
  • cargo nextest run --workspace --lib --bins — 2553/2553 passed
  • Pinned-value test for tool schema token counting
  • Unicode test (Cyrillic, CJK, emoji)
  • Cache eviction test at capacity
  • Oversized input fallback test
  • Cache-miss benchmark with unique inputs per iteration

@github-actions github-actions bot added memory Persistence and memory rust core dependencies enhancement New feature or request size/XL documentation Improvements or additions to documentation labels Feb 24, 2026
@bug-ops bug-ops enabled auto-merge (squash) February 24, 2026 15:10
Introduce TokenCounter backed by tiktoken-rs cl100k_base with DashMap
cache (10k cap). Provides accurate token counting for context budget
allocation, tool schema token estimation via OpenAI formula, and
graceful fallback to chars/4 when tokenizer is unavailable. Includes
input size guard (64KB) to prevent cache pollution from oversized input.
@bug-ops bug-ops merged commit 6c8152d into main Feb 24, 2026
23 checks passed
@bug-ops bug-ops deleted the m27-token-counting branch February 24, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies documentation Improvements or additions to documentation enhancement New feature or request memory Persistence and memory rust size/XL

Projects

None yet

1 participant