feat: Adding multiple tokenizers specification for open ai frontend #8027

oandreeva-nv · 2025-02-21T20:39:25Z

What does the PR do?

This PR adds support for using multiple tokenizers in the OpenAI-compatible frontend, allowing different models to use their own specific tokenizers. This is crucial for correctly handling various model architectures and their chat templates.

Implementation

Extended --tokenizer flag to support tokenizer mapping configuration to maintain backward compatibility with single tokenizer setup

Example Usage

python3 python/openai/openai_frontend/main.py --model-repository tiny_models/ --tokenizer "tiny_llama:TinyLlama/TinyLlama-1.1B-Chat-v1.0" "phi-4:microsoft/Phi-4-mini-instruct"

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

Added TestMultipleTokenizers class to test the feature

CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

richardhuo-nv · 2025-04-17T18:42:42Z

python/openai/openai_frontend/engine/triton_engine.py

                lora_names=lora_names,
+                tokenizer=self.tokenizer_map.get(name, default_tokenizer),


Since the tokenizer technically could be None now, should we add a check in chat method to have tokenizer to not apply_chat_template if this is None and raise an exception.

https://github.com/triton-inference-server/server/blob/main/python/openai/openai_frontend/engine/triton_engine.py#L146

oandreeva-nv added 4 commits April 15, 2025 10:34

Added main logic

b62f080

Fixing issues after rebase

10d9e47

Refactor

3fa2832

Tests

aaf4f6d

oandreeva-nv force-pushed the oandreeva_openai_multiple_tokenizers branch from e0bc399 to aaf4f6d Compare April 15, 2025 19:52

Models for testing

5ae2d84

oandreeva-nv changed the title ~~Adding multiple tokenizers specification for open ai frontend~~ feat: Adding multiple tokenizers specification for open ai frontend Apr 15, 2025

oandreeva-nv marked this pull request as ready for review April 15, 2025 20:13

oandreeva-nv requested review from richardhuo-nv, kthui and rmccorm4 April 15, 2025 20:15

richardhuo-nv reviewed Apr 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Adding multiple tokenizers specification for open ai frontend #8027

feat: Adding multiple tokenizers specification for open ai frontend #8027

Uh oh!

oandreeva-nv commented Feb 21, 2025 •

edited

Loading

Uh oh!

richardhuo-nv Apr 17, 2025

Uh oh!

Uh oh!

		lora_names=lora_names,
		tokenizer=self.tokenizer_map.get(name, default_tokenizer),

feat: Adding multiple tokenizers specification for open ai frontend #8027

Are you sure you want to change the base?

feat: Adding multiple tokenizers specification for open ai frontend #8027

Uh oh!

Conversation

oandreeva-nv commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Implementation

Example Usage

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

richardhuo-nv Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oandreeva-nv commented Feb 21, 2025 •

edited

Loading