Closed
Description
I want add some special tokens such as <CON_START> . But T5tokenizer/MT5tokenizer both can't tokenize correctly after using additional_special_tokens parameter. It still split these special tokens to subwords.
It works when using OpenAIGPTTokenizer additional_special_tokens parameter. It's clear that after declare additional_special_tokens parameter, OpenAIGPTTokenizer tokenize as one word rather split it.
The version of transformers is 4.2.2
And I'm not sure this problem is related with issue624 in T5 which talk about SentencePiece extra vocab.
Thank you for your feedback
Metadata
Metadata
Assignees
Labels
No labels