Skip to content

feat: implement sophisticated token counting (beyond character/4) #124

@ooples

Description

@ooples

Problem

The current token counting mechanism is a mix of a simple estimation (length / 4) and a call to the Google AI countTokens API. This is not a robust solution for a production environment for several reasons:

  • Inaccurate Fallback: The length / 4 estimation is highly inaccurate for different types of content (code, JSON, etc.) and different languages.
  • Vendor Lock-in: The countTokens API is specific to Google AI models. The system should be able to handle models from other vendors (e.g., OpenAI, Anthropic) that use different tokenizers.
  • No Local Tokenization: The system relies on a network call for accurate token counting, which introduces latency and a point of failure. For models where the tokenizer is available locally (like iktoken for OpenAI models), it should be used.
  • No Model-Specific Tokenization: The current implementation does not account for different tokenization rules for different models from the same vendor (e.g., gpt-3.5-turbo vs. gpt-4).

Desired State

A production-ready token counting system should have the following characteristics:

  • Pluggable Tokenizers: The system should support multiple tokenization strategies and allow for new ones to be added easily.
  • Model-Specific Configuration: The system should be able to determine which tokenizer to use based on the model being used.
  • Local First, Remote Fallback: For models with available local tokenizers ( iktoken), the system should use them first to avoid network latency. If a local tokenizer is not available, it should fall back to a remote API call if possible.
  • Improved Estimation: The fallback estimation logic should be more sophisticated than a simple character ratio, taking into account content type and language.
  • Caching: All token counting operations (local and remote) should be cached to avoid redundant computations.

Implementation Plan

Phase 1: Create a Pluggable Tokenizer Framework in TypeScript

  1. Define ITokenizer Interface (src/core/tokenizers/ITokenizer.ts):

    • Create a new interface that defines the contract for all tokenizers.
    • It should have a single method: countTokens(text: string): Promise.
  2. Create a TokenizerFactory (src/core/tokenizers/TokenizerFactory.ts):

    • This factory will be responsible for creating the correct tokenizer based on the model name.
    • It will have a create(modelName: string): ITokenizer method.
    • It will maintain a mapping of model names (or prefixes) to tokenizer implementations.
  3. Implement TiktokenTokenizer (src/core/tokenizers/TiktokenTokenizer.ts):

    • Create a class that implements the ITokenizer interface.
    • It will use the iktoken library to count tokens.
    • The constructor will take the model name and use iktoken.get_encoding_for_model() to get the correct encoding.
    • The countTokens method will encode the text and return the length of the resulting array.
  4. Implement GoogleAITokenizer (src/core/tokenizers/GoogleAITokenizer.ts):

    • Create a class that implements the ITokenizer interface.
    • It will use the Google AI countTokens API.
    • The constructor will take the API key and model name.
    • The countTokens method will make the API call and return the otalTokens.
  5. Implement EstimationTokenizer (src/core/tokenizers/EstimationTokenizer.ts):

    • Create a class that implements the ITokenizer interface.
    • This will be the fallback tokenizer.
    • It will contain the improved estimation logic based on content type and language.

Phase 2: Refactor the TokenCounter to Use the New Framework

  1. Update src/core/token-counter.ts:

    • The TokenCounter class will no longer have any tokenization logic itself.
    • It will use the TokenizerFactory to get the correct tokenizer for the given model.
    • The count method will delegate the token counting to the tokenizer instance.
    • The TokenCounter will still be responsible for caching the results.
  2. Update the count_tokens MCP Tool:

    • The count_tokens tool in src/server/index.ts will be updated to take the modelName as an argument.
    • It will use the TokenCounter to count the tokens for the given text and model.

Phase 3: Update the PowerShell Orchestrator

  1. Modify hooks/handlers/token-optimizer-orchestrator.ps1:
    • The PowerShell TokenCounter class will be removed.
    • All calls to the TokenCounter will be replaced with calls to the count_tokens MCP tool.
    • The modelName will be retrieved from the environment or a configuration file and passed to the count_tokens tool.
    • The fallback estimation logic in PowerShell will be removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions