Skip to content

Conversation

@akashmjn
Copy link
Contributor

Patches the script to determine what type of tokenizer files are present and convert appropriately.

Mostly borrows from #725 as a reference. For the same reason mentioned in that PR, converted multilingual checkpoints will continue to exactly match while .en checkpoints have a 17 byte difference compared to ggml files downloaded by whisper.cpp.

The script will also now produce exactly the same files regardless of which type of tokenizer files were used for conversion.

-rw-r--r--  1 Akash  staff  487614184 Jun 10 21:30 ggml-small.en.hf.bin
-rw-r--r--  1 Akash  staff  487614184 Jun 10 21:28 ggml-small.en.tiktoken.bin
-rw-r--r--  1 Akash  staff  487601967 Jun 10 21:32 ggml-small.hf.bin
-rw-r--r--  1 Akash  staff  487601967 Jun 10 21:35 ggml-small.tiktoken.bin

@ggerganov ggerganov merged commit 3ec7bff into ggml-org:master Jun 25, 2023
@ggerganov
Copy link
Member

Thank you!

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
….json tokenizer files (ggml-org#1001)

* patch checkpoint convert script to keep compatibility with older hf_transformers whisper tokenizer

* typo fix
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
….json tokenizer files (ggml-org#1001)

* patch checkpoint convert script to keep compatibility with older hf_transformers whisper tokenizer

* typo fix
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
….json tokenizer files (ggml-org#1001)

* patch checkpoint convert script to keep compatibility with older hf_transformers whisper tokenizer

* typo fix
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
….json tokenizer files (ggml-org#1001)

* patch checkpoint convert script to keep compatibility with older hf_transformers whisper tokenizer

* typo fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants