Closed
Description
What happened?
I tried to convert a BLOOM-based model (https://huggingface.co/TurkuNLP/gpt3-finnish-large) to GGUF. First, I had to change the architecture to BloomForCausalLM
, and with that change I got the following error from the conversion script:
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: bc01ce58980e1db43859146dc51b1758b3b88729b217a74792e9f8d43e479d21
WARNING:hf-to-gguf:**************************************************************************************
I also tried to convert one of the original BLOOM models (560m), and got the same error (but with a different hash). It seems that BLOOM's pre-tokenizer was not added when they were dealt with in #6920. BLOOM is listed as a supported model in the README, so converting should work.
Name and Version
$ ./bin/llama-cli --version
version: 3481 (5e2727f)
built with cc (GCC) 14.1.1 20240720 for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response