Skip to content

feat: Adding multiple tokenizers specification for open ai frontend #8027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

oandreeva-nv
Copy link
Contributor

@oandreeva-nv oandreeva-nv commented Feb 21, 2025

What does the PR do?

This PR adds support for using multiple tokenizers in the OpenAI-compatible frontend, allowing different models to use their own specific tokenizers. This is crucial for correctly handling various model architectures and their chat templates.

Implementation

  • Extended --tokenizer flag to support tokenizer mapping configuration to maintain backward compatibility with single tokenizer setup

Example Usage

python3 python/openai/openai_frontend/main.py --model-repository tiny_models/ --tokenizer "tiny_llama:TinyLlama/TinyLlama-1.1B-Chat-v1.0" "phi-4:microsoft/Phi-4-mini-instruct"

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

Added TestMultipleTokenizers class to test the feature

  • CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@oandreeva-nv oandreeva-nv force-pushed the oandreeva_openai_multiple_tokenizers branch from e0bc399 to aaf4f6d Compare April 15, 2025 19:52
@oandreeva-nv oandreeva-nv changed the title Adding multiple tokenizers specification for open ai frontend feat: Adding multiple tokenizers specification for open ai frontend Apr 15, 2025
@oandreeva-nv oandreeva-nv marked this pull request as ready for review April 15, 2025 20:13
lora_names=lora_names,
tokenizer=self.tokenizer_map.get(name, default_tokenizer),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the tokenizer technically could be None now, should we add a check in chat method to have tokenizer to not apply_chat_template if this is None and raise an exception.

https://github.com/triton-inference-server/server/blob/main/python/openai/openai_frontend/engine/triton_engine.py#L146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants