-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Phi-3 models #6849
Comments
Model directly works 👍 GGUF link - https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Phi-3-mini-4k-instruct-q4.gguf |
Have you tested compatibility with the |
I believe llama cpp does not support long rope which is use by 128k variant. |
yeah, I tried to convert 128K version. |
Also |
@MoonRide303 Same error with |
Only partially. MS is using some new rope technique they're calling "longrope". As-is, LCPP will work ok for the first few gens but will then abruptly go insane. This new longrope thing is likely the culprit. |
This model is insane for its size. |
template for llamacpp main.exe --model models/new3/Phi-3-mini-4k-instruct-fp16.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 0 --interactive -ins -ngl 99 --simple-io --in-prefix "<|user|>\n" --in-suffix "<|end|>\n<|assistant|>" -p "<|system|>You are a helpful assistant.<|end|>\n " |
Tested with llamacpp. Do you also have a problem : generating tokens until I manually stop it? I had to add |
Not too bad ... not level llama 8b but still phi-3
llama 3
Llama 3 is on totally different level comparing to phi-3 ... |
Doing my part by adding the chat template :) #6857 |
Closing this since PR: #6857 was merged into master with support for Phi-3 4K context length. |
What about 128k context length variant? |
Support for 128K context length seems pretty important to me for "Phi-3" support to be considered "done", right? @criminact |
Status: Phi-3 4K models are supported in master after #6857 merge Phi-3 128K models aren't supported yet (as of 24th Apr 2024) |
Are templates different for 4K vs. 128K? |
Hi guys, what to do with this error? I fine-tuned my own phi-3 and converted it to gguf with this command: I get the error when I run
I would be very thankful for any help or push in the right direction. |
With reduced context size of 60000 I can load a 128K model. The prompting is still messed up though. ./main --model /opt/data/pjh64/Phi-3-mini-128K-Instruct.gguf/phi-3-mini-128K-Instruct_q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 60000 --interactive -ins -ngl 33 --simple-io --in-prefix "<|user|>\n" --in-suffix "<|end|>\n<|assistant|>" -p "<|system|>You are a helpful assistant.<|end|>\n " main: interactive mode on. ' == Running in interactive mode. ==
In this mystical place lived Elara, a beautiful young maiden blessed with iridescent hair and eyes that mirrored the depth of the cosmos. Elara had one unique trait - she could converse with nature itself. She conversed with trees whispering secrets in rustling leaves, birds humming songs only they could understand. One fateful day, a dark cloud loomed over Veridia. A malicious sorcerer named Malachar desired to steal the magical essence of Veridia for his own nefarious purposes. Upon hearing this news, Elara decided she wouldn't let her homeland fall into despair. With bravery coursing through her veins and courage in her heart, she embarked on a peril With each passing day, Elara encountered numerous trials that tested her courage, wisdom, and resilience. She journeyed across treacherous terrains, braved wild beasts and outsmarted magical illusions crafted by Malachar himself. As Elara ventured deeper into the darkness of Maleficent's lair, she came face-to-face with the sorcerer. A battle of magic unfolded - a clash between good and evil, light against dark. Despite feeling overwhelmed by Malachar_s mightier spells, Elara held on to her heart's purity, believing in herself and her mission for Veridia's peace. In the end, it was Elara who prevailed. With a final surge of magic she wielded from within, she vanquished Malachar, breaking his dark curse over Veridia. Afterwards, with peace restored to Veridia and its inhabitants living in harmony once more, Elara became the beloved guardian of Luminae Woods, continuing her duty as the voice Thus ends a tale about courage, goodness, and the power that resides within us all. It's a timeless story of how one person can make an immense difference in preserving peace and harmony. And so, dear listener, let this legend inspire you to face your own battles with bravery and integrity _ for it is these virtues which truly define the worthiness of any individual or character.<|end|>
|
@mirek190 Sadly |
@MoonRide303 do you know how much VRAM is required for handling 128K tokens? |
Might be different for main & server with some options, but it seems to be around 50 GB for passkey - I was getting this error when I tried to launch CUDA version of passkey for Phi-3-mini-128k-instruct-Q6_K.gguf:
I am not sure which option should be used to decrease memory requirements for KV cache - I tried adding
With the default f16 type for KV cache I am able to launch server with up to -c 61440 (on a 16 GB VRAM GPU). |
I know 128k support is already on the TODO list, but I thought I'd add how nice it would be, since there are almost no other models with a context length that size. Llama 3 is only 8k, so it'll be a very big deal when this is released. |
Absolutely. Wllama already has verified support for the 4K version, and added an additional fix for it. I believe it will be the most important model for browser-based AI for a while. I know Transformers.js has already added support for it, with a great demo too, and WebLLM support seems on the way too. But those both require WebGPU support. Wllama works with CPU only, so there will be a (slower) fallback option for Safari and Firefox users, finally making Phi 3 128K universally available in the browser. And, importantly, with some headroom to really use that large context, even with a 4GB total memory limit. |
New models are out. Not sure which ones are supported and which ones need changing. But probably all the 128K versions have the same issue.
|
I was able to convert the Medium models without issues, have not tested it yet. The small-128k models apparently use a new Rope Scaling method called 'su'. |
I think "su" is the same for mini, so hopefully any current effort will carry these through as well. |
When trying to convert Phi-3-small-8k-instruct:
Also different tokenizer - based on cl100k_base tiktoken, adapted to support ChatML: |
It looks like #7225 was merged. Is there any other outstanding work on the Phi-3 models? Does the new Phi-3 vision model work? |
Do you have found a solution for that, facing the same issue? |
@RLXIWC Update |
I am already using version 0.2.76, but I still get the error. The model is Phi-3-mini-4k-instruct-q4.gguf from Huggingface. |
That error comes from That means there's some issue with your |
I am having a similar issue, not the NotImplementedError: Architecture 'Phi3ForSequenceClassification' not supported! (edit: removed link) |
Maybe you should use the convert-hf-to-gguf.py script. |
I have used I was trying to convert a Phi-3 mini (3.8B) based LLM to f16 GGUF with llama.cpp that uses the Linking #7439 in any case too |
looks like the new Phi-3 from today uses microsoft's new longrope which is still unsupported |
Sent a request to the MS Team on HF to support the longrope implementation if they can: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions/88 |
The new released model https://huggingface.co/openbmb/MiniCPM3-4B
|
Microsoft recently released Phi-3 models in 3 variants (mini, small & medium). Can we add support for this new family of models.
The text was updated successfully, but these errors were encountered: