Skip to content

[v4] Tokenizers.js migration#1501

Merged
xenova merged 17 commits intov4from
v4-tokenizers-migration
Jan 21, 2026
Merged

[v4] Tokenizers.js migration#1501
xenova merged 17 commits intov4from
v4-tokenizers-migration

Conversation

@xenova
Copy link
Collaborator

@xenova xenova commented Jan 21, 2026

Following up on huggingface/tokenizers.js#18, we now move all tokenizers logic to the dedicated @huggingface/tokenizers library.

@xenova xenova requested a review from nico-martin January 21, 2026 00:02
Copy link
Collaborator

@nico-martin nico-martin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsc --build has 5 errors:

> tsc --build

src/models/vits/tokenization_vits.js:1:10 - error TS2305: Module '"@huggingface/tokenizers"' has no exported member 'Decoder'.

1 import { Decoder } from '@huggingface/tokenizers';
           ~~~~~~~

src/models/vits/tokenization_vits.js:6:16 - error TS8030: The type of a function declaration must match the function's signature.

6     /** @type {Decoder['decode_chain']} */
                 ~~~~~~~~~~~~~~~~~~~~~~~

src/models/vits/tokenization_vits.js:20:51 - error TS2554: Expected 0 arguments, but got 1.

20         this._tokenizer.decoder = new VitsDecoder({});
                                                     ~~

src/pipelines/fill-mask.js:104:21 - error TS2578: Unused '@ts-expect-error' directive.

104                     // @ts-expect-error TS2367
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~

src/pipelines/text-generation.js:8:21 - error TS2307: Cannot find module '../tokenizers.js' or its corresponding type declarations.

8  * @typedef {import('../tokenizers.js').Message[]} Chat
                      ~~~~~~~~~~~~~~~~~~

But other than that I have nothing to add. Great work!

@xenova
Copy link
Collaborator Author

xenova commented Jan 21, 2026

Thanks! Fixed those issues. The one tokenizers-related one should be fixable on your side by upgrading to v0.1 (maybe a clean install is needed, especially if you were building via local dev). Either way. Good to go for now, then will open up another PR for some pipeline types improvements.

@xenova xenova merged commit 66cf69c into v4 Jan 21, 2026
1 check failed
@xenova xenova deleted the v4-tokenizers-migration branch January 21, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants