Add Phi-4-mini-instruct support #12099

ns3284 · 2025-02-27T22:40:02Z

Added new vocab type: gpt-4o
Added Phi3 support for partial_rotary_factor
Added Phi3 support for tie_word_embeddings

ngxson · 2025-02-27T22:52:07Z

Few things to note (I'll push a commit tomorrow when I get back to work):

In the update.py script, it's better to point to the repo Xenova/gpt-4o
Need to add a dedicated KV metadata for partial_rotary_factor to make it more explicit, just to be a bit future-proof here

ns3284 · 2025-02-27T22:58:30Z

Still unsure about the KV metadata part, but pushed updates for the Xenova/gpt-4o.

something like this?

        rotary_factor = self.find_hparam(["partial_rotary_factor", "rope_pct"], optional=True)
        rotary_factor = rotary_factor if rotary_factor is not None else 1.0

Mungert69 · 2025-02-28T01:37:27Z

Thanks I have tested a few different gguf models created with your branch and they seem to be working ok. Posting them to huggingface https://huggingface.co/Mungert/Phi-4-mini-instruct.gguf . Many thanks for getting Phi-4-mini-instruct working

ngxson · 2025-02-28T08:53:39Z

@Mungert69 please don't post gguf on HF before the PR is merging, as there can be more works and your gguf may break after this is finished.

ngxson

This PR is also missing tokenizer .inp/.out files.

Since I cannot push to this PR (because you created from your master branch), I will make another PR to replace it. Will keep your commits there so you're still in co-author

ngxson · 2025-02-28T09:35:20Z

src/llama-model.cpp

@@ -2223,8 +2228,15 @@ bool llama_model::load_tensors(llama_model_loader & ml) {
                        layer.ffn_down = create_tensor(tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd }, 0);
                        layer.ffn_up = create_tensor(tn(LLM_TENSOR_FFN_UP, "weight", i), { n_embd, 2 * n_ff }, 0);

-                        layer.rope_long  = create_tensor(tn(LLM_TENSOR_ROPE_FACTORS_LONG,  "weight", i), { n_embd_head/2 }, TENSOR_NOT_REQUIRED | (i != 0 ? TENSOR_DUPLICATED : 0));
-                        layer.rope_short = create_tensor(tn(LLM_TENSOR_ROPE_FACTORS_SHORT, "weight", i), { n_embd_head/2 }, TENSOR_NOT_REQUIRED | (i != 0 ? TENSOR_DUPLICATED : 0));
+                        if (hparams.rope_scaling_type_train == LLAMA_ROPE_SCALING_TYPE_LONGROPE) {


I think this check works but not actually correct, since scaling_type is to calculate attn_factor

Also, the else branch is redundant because we know from the conversion script that if rot_pct is not set, then we're sure that n_rot = n_embd / n_head

Other arch like LLM_ARCH_LLAMA does the same thing

ngxson · 2025-02-28T09:42:50Z

convert_hf_to_gguf_update.py

@@ -109,6 +109,7 @@ class TOKENIZER_TYPE(IntEnum):
    {"name": "megrez",           "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Infinigence/Megrez-3B-Instruct"},
    {"name": "deepseek-v3",      "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/deepseek-ai/DeepSeek-V3"},
    {"name": "deepseek-r1-qwen", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"},
+    {"name": "gpt-4o",           "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Xenova/gpt-4o", },


this does not actually work since Xenova/gpt-4o misses config.json, I have to make an exception for it

ngxson · 2025-02-28T09:43:10Z

convert_hf_to_gguf.py


    def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]:
+        if self.hparams.get("partial_rotary_factor") is not None:


This can be written shorter: self.hparams.get("partial_rotary_factor", 1.0)

ngxson · 2025-02-28T09:48:24Z

Need to add a dedicated KV metadata for partial_rotary_factor to make it more explicit, just to be a bit future-proof here

Small correction, this is not needed. Other arch like Phi2Model just scale the n_rot accordingly

ngxson · 2025-02-28T09:52:52Z

Close and supersede by #12108

* Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>

…2108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>

nisparks added 2 commits February 27, 2025 17:19

Added Phi-4-mini-instruct support

1837951

Update regex per ngxson

3968c5a

github-actions bot added the python python script changes label Feb 27, 2025

ns3284 mentioned this pull request Feb 27, 2025

Feature Request: Support for Phi-4-mini-instruct #12091

Closed

4 tasks

Change the vocab base to Xenova/gpt-4o

958c7ca

ag2s20150909 mentioned this pull request Feb 28, 2025

phi4 multimodal and mini instruct support ollama/ollama#9387

Open

ngxson reviewed Feb 28, 2025

View reviewed changes

ngxson mentioned this pull request Feb 28, 2025

llama : add Phi-4-mini support (supersede #12099) #12108

Merged

ngxson closed this Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Phi-4-mini-instruct support #12099

Add Phi-4-mini-instruct support #12099

Uh oh!

ns3284 commented Feb 27, 2025

Uh oh!

ngxson commented Feb 27, 2025

Uh oh!

ns3284 commented Feb 27, 2025 •

edited

Loading

Uh oh!

Mungert69 commented Feb 28, 2025

Uh oh!

ngxson commented Feb 28, 2025

Uh oh!

ngxson left a comment

Uh oh!

ngxson Feb 28, 2025

Uh oh!

ngxson Feb 28, 2025

Uh oh!

ngxson Feb 28, 2025

Uh oh!

ngxson Feb 28, 2025

Uh oh!

ngxson commented Feb 28, 2025

Uh oh!

ngxson commented Feb 28, 2025

Uh oh!

Uh oh!


		def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]:
		if self.hparams.get("partial_rotary_factor") is not None:

Add Phi-4-mini-instruct support #12099

Add Phi-4-mini-instruct support #12099

Uh oh!

Conversation

ns3284 commented Feb 27, 2025

Uh oh!

ngxson commented Feb 27, 2025

Uh oh!

ns3284 commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mungert69 commented Feb 28, 2025

Uh oh!

ngxson commented Feb 28, 2025

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Feb 28, 2025

Uh oh!

ngxson commented Feb 28, 2025

Uh oh!

Uh oh!

ns3284 commented Feb 27, 2025 •

edited

Loading