-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Add Phi-4-mini-instruct support #12099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Few things to note (I'll push a commit tomorrow when I get back to work):
|
Still unsure about the KV metadata part, but pushed updates for the Xenova/gpt-4o. something like this?
|
Thanks I have tested a few different gguf models created with your branch and they seem to be working ok. Posting them to huggingface https://huggingface.co/Mungert/Phi-4-mini-instruct.gguf . Many thanks for getting Phi-4-mini-instruct working |
@Mungert69 please don't post gguf on HF before the PR is merging, as there can be more works and your gguf may break after this is finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is also missing tokenizer .inp/.out
files.
Since I cannot push to this PR (because you created from your master
branch), I will make another PR to replace it. Will keep your commits there so you're still in co-author
@@ -2223,8 +2228,15 @@ bool llama_model::load_tensors(llama_model_loader & ml) { | |||
layer.ffn_down = create_tensor(tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd }, 0); | |||
layer.ffn_up = create_tensor(tn(LLM_TENSOR_FFN_UP, "weight", i), { n_embd, 2 * n_ff }, 0); | |||
|
|||
layer.rope_long = create_tensor(tn(LLM_TENSOR_ROPE_FACTORS_LONG, "weight", i), { n_embd_head/2 }, TENSOR_NOT_REQUIRED | (i != 0 ? TENSOR_DUPLICATED : 0)); | |||
layer.rope_short = create_tensor(tn(LLM_TENSOR_ROPE_FACTORS_SHORT, "weight", i), { n_embd_head/2 }, TENSOR_NOT_REQUIRED | (i != 0 ? TENSOR_DUPLICATED : 0)); | |||
if (hparams.rope_scaling_type_train == LLAMA_ROPE_SCALING_TYPE_LONGROPE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check works but not actually correct, since scaling_type is to calculate attn_factor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the else
branch is redundant because we know from the conversion script that if rot_pct
is not set, then we're sure that n_rot = n_embd / n_head
Other arch like LLM_ARCH_LLAMA
does the same thing
@@ -109,6 +109,7 @@ class TOKENIZER_TYPE(IntEnum): | |||
{"name": "megrez", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Infinigence/Megrez-3B-Instruct"}, | |||
{"name": "deepseek-v3", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/deepseek-ai/DeepSeek-V3"}, | |||
{"name": "deepseek-r1-qwen", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"}, | |||
{"name": "gpt-4o", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Xenova/gpt-4o", }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not actually work since Xenova/gpt-4o
misses config.json
, I have to make an exception for it
|
||
def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]: | ||
if self.hparams.get("partial_rotary_factor") is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be written shorter: self.hparams.get("partial_rotary_factor", 1.0)
Small correction, this is not needed. Other arch like |
Close and supersede by #12108 |
* Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
…2108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
…2108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
…2108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
Added new vocab type: gpt-4o
Added Phi3 support for partial_rotary_factor
Added Phi3 support for tie_word_embeddings