Skip to content

Add Google Gemini Tokenizer Integration #9

@ivelin-web

Description

@ivelin-web
  • Install and import @lenml/tokenizer-gemini.
  • If possible, use fromPreTrained() to load the default Gemini encoder.
  • Map Gemini models (1.0 Pro, 1.5 Pro, Flash) to this tokenizer. Confirm max context windows (e.g., 32 k for 1.0, up to 1 M for 1.5 Pro).
  • Provide a fallback: if the WASM bundle is too large or fails, default to calling Google’s REST countTokens endpoint (requires API key) or use text.length / 4.
  • Update tokenizers/index.ts to return a function (text) => tokenizer.encode(text).length when tenant === 'gemini'.
  • Write a simple script to feed known‐length text to the Gemini tokenizer and compare with API results for validation.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions