-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShareGPT4V-7B-- a multimodel model that surpasses Llava #4196
Comments
ShareGPT4V: #4172 |
I'm using it since a while, it's great |
Thats great news, indeed! So I downloaded the files required, threw them in a folder named |
@itsPreto Don't smash all of it together, each folder has an own config.json. Everything else is as described here: |
@cmp-nct Ty! I was able to convert/split/quantize the model but I'm not able to actually run it due to an old gguf format? Not sure why that would be since I have the latest from Any ideas? Here's the directory for the model:
|
It's not related to the image converter. Also, I'd use q4_k not q4_0, if you are on a recent release you can use K quants on llava (about 40 tensors will fallback to compatibility quants) |
Just to make sure I'm not crazy I cloned the project fresh, ran through the instructions again and quantized to I've been quantizing my LLMs up until now just fine (text models), have yet to run into this issue with |
That is how it should look like, it looks like something is wrong with the projector, make sure your files are all complete.
I've no idea where your problem originates from, so far best I can recommend is to ensure your python files are complete and then separate them into two directories (just as they come from HF git), so you don't deviate from the usual process and possibly hit something untested. |
Okay I'm strictly following the steps in the example:
lets try running it now:
Wow! I have no clue why it worked this time around since I did virtually nothing different... mildly infuriating but it finally works and works GREAT! Thanks! @cmp-nct |
I changed verbosity of this line to 3 in llava-cli.cpp, remake llava-cli and got the detailed info.
|
Only thing I can suggest is make sure you're following the instructions exactly as they are written-- seems to be a lot of untested edge cases with Llava implementation still. Also, make sure you're using the modified
|
@winer632 You failed to correctly convert the model. |
It works! Thanks a lot! |
Yes I followed the instructions exactly as they were written. Following are my steps. Download the data here to the /home/llm/llama.cpp/models/ShareGPT4V-7B/ directory Download the data here to the /home/llm/llama.cpp/models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ directory First modify /home/llm/llama.cpp/examples/llava/convert-image-encoder-to-gguf.py according to this PR https://github.com/ggerganov/llama.cpp/pull/4172/files Then install the following dependencies Execute the python ./examples/llava/llava-surgery.py -m models/ShareGPT4V-7B command in the /home/llm/llama.cpp directory Found that projector was generated Execute python ./examples/llava/convert-image-encoder-to-gguf.py -m models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ --llava-projector models/ShareGPT4V in the /home/llm/llama.cpp directory -7B/llava.projector --output-dir models/ShareGPT4V-7B --clip_model_is_vision Found that mmproj-model-f16.gguf was generated Execute python convert.py models/ShareGPT4V-7B in the /home/llm/llama.cpp directory Found that ggml-model-f16.gguf was generated Execute make llava-cli in the /home/llm/llama.cpp directory Execute ./llava-cli -m ./models/ShareGPT4V-7B/ggml-model-f16.gguf --mmproj ./models/ShareGPT4V-7B/mmproj-model-f16 in the /home/llm/llama.cpp directory .gguf --image ../test_photo/zhuanma.jpeg Got this error |
@cmp-nct @itsPreto @Galunid ./llava-cli -m ./models/ShareGPT4V-gguf/ShareGPT4V-f16.gguf --mmproj ./models/ShareGPT4V-gguf/mmproj-model-f16.gguf --image ../test_photo/zhuanma.jpeg |
You can add |
Thank you! |
closed in #4172 |
are there plans to support this new SOTA open source vision model?
--despite its compact size, the model is able to extract text from images with incredible accuracy.
The text was updated successfully, but these errors were encountered: