Bug: BLOOM pre-tokenizer is missing

### What happened?

I tried to convert a BLOOM-based model (https://huggingface.co/TurkuNLP/gpt3-finnish-large) to GGUF. First, I had to change the architecture to `BloomForCausalLM`, and with that change I got the following error from the conversion script:

```
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:**          There are 2 possible reasons for this:
WARNING:hf-to-gguf:**          - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:**          - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:**          Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref:     https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh:  bc01ce58980e1db43859146dc51b1758b3b88729b217a74792e9f8d43e479d21
WARNING:hf-to-gguf:**************************************************************************************
```

I also tried to convert one of the original BLOOM models (560m), and got the same error (but with a different hash). It seems that BLOOM's pre-tokenizer was not added when they were dealt with in #6920. BLOOM is listed as a supported model in the README, so converting should work.

### Name and Version

$ ./bin/llama-cli --version
version: 3481 (5e2727fe)
built with cc (GCC) 14.1.1 20240720 for x86_64-pc-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: BLOOM pre-tokenizer is missing #8741

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: BLOOM pre-tokenizer is missing #8741

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions