Description
Is your feature request related to a problem? Please describe.
Whisper tokenizer support needed
Describe the solution you'd like
Would be nice to have support for the Whisper tokenizer.
Describe alternatives you've considered
I'm new to tokenizers so I'm not sure if what I'm doing right now is correct but I'm trying to use a BpeTokenizer passing vocab and merges files and the special tokens (not straightforward because for example I'm reading this file https://huggingface.co/onnx-community/whisper-large-v3-turbo/blob/main/special_tokens_map.json and I need to read vocab file too to get the max id to know where to start from to map special token to id number)
The linked repository has even a tokenizer.json that I suppose contains already everything without the need to pass vocab and merges, but I don't see a way to use it out of the box (I haven't find a constructor that accepts a tokenizer.json file)