-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Problem
The current token counting mechanism is a mix of a simple estimation (length / 4) and a call to the Google AI countTokens API. This is not a robust solution for a production environment for several reasons:
- Inaccurate Fallback: The length / 4 estimation is highly inaccurate for different types of content (code, JSON, etc.) and different languages.
- Vendor Lock-in: The countTokens API is specific to Google AI models. The system should be able to handle models from other vendors (e.g., OpenAI, Anthropic) that use different tokenizers.
- No Local Tokenization: The system relies on a network call for accurate token counting, which introduces latency and a point of failure. For models where the tokenizer is available locally (like iktoken for OpenAI models), it should be used.
- No Model-Specific Tokenization: The current implementation does not account for different tokenization rules for different models from the same vendor (e.g., gpt-3.5-turbo vs. gpt-4).
Desired State
A production-ready token counting system should have the following characteristics:
- Pluggable Tokenizers: The system should support multiple tokenization strategies and allow for new ones to be added easily.
- Model-Specific Configuration: The system should be able to determine which tokenizer to use based on the model being used.
- Local First, Remote Fallback: For models with available local tokenizers ( iktoken), the system should use them first to avoid network latency. If a local tokenizer is not available, it should fall back to a remote API call if possible.
- Improved Estimation: The fallback estimation logic should be more sophisticated than a simple character ratio, taking into account content type and language.
- Caching: All token counting operations (local and remote) should be cached to avoid redundant computations.
Implementation Plan
Phase 1: Create a Pluggable Tokenizer Framework in TypeScript
-
Define ITokenizer Interface (src/core/tokenizers/ITokenizer.ts):
- Create a new interface that defines the contract for all tokenizers.
- It should have a single method: countTokens(text: string): Promise.
-
Create a TokenizerFactory (src/core/tokenizers/TokenizerFactory.ts):
- This factory will be responsible for creating the correct tokenizer based on the model name.
- It will have a create(modelName: string): ITokenizer method.
- It will maintain a mapping of model names (or prefixes) to tokenizer implementations.
-
Implement TiktokenTokenizer (src/core/tokenizers/TiktokenTokenizer.ts):
- Create a class that implements the ITokenizer interface.
- It will use the iktoken library to count tokens.
- The constructor will take the model name and use iktoken.get_encoding_for_model() to get the correct encoding.
- The countTokens method will encode the text and return the length of the resulting array.
-
Implement GoogleAITokenizer (src/core/tokenizers/GoogleAITokenizer.ts):
- Create a class that implements the ITokenizer interface.
- It will use the Google AI countTokens API.
- The constructor will take the API key and model name.
- The countTokens method will make the API call and return the otalTokens.
-
Implement EstimationTokenizer (src/core/tokenizers/EstimationTokenizer.ts):
- Create a class that implements the ITokenizer interface.
- This will be the fallback tokenizer.
- It will contain the improved estimation logic based on content type and language.
Phase 2: Refactor the TokenCounter to Use the New Framework
-
Update src/core/token-counter.ts:
- The TokenCounter class will no longer have any tokenization logic itself.
- It will use the TokenizerFactory to get the correct tokenizer for the given model.
- The count method will delegate the token counting to the tokenizer instance.
- The TokenCounter will still be responsible for caching the results.
-
Update the count_tokens MCP Tool:
- The count_tokens tool in src/server/index.ts will be updated to take the modelName as an argument.
- It will use the TokenCounter to count the tokens for the given text and model.
Phase 3: Update the PowerShell Orchestrator
- Modify hooks/handlers/token-optimizer-orchestrator.ps1:
- The PowerShell TokenCounter class will be removed.
- All calls to the TokenCounter will be replaced with calls to the count_tokens MCP tool.
- The modelName will be retrieved from the environment or a configuration file and passed to the count_tokens tool.
- The fallback estimation logic in PowerShell will be removed.