Skip to content

Question about llama.cpp and llava-cli when used with llava 1.6 for vision: #5852

Closed
@RandomGitUser321

Description

@RandomGitUser321

I've been using the llava-v1.6-mistral-7b model for doing captions lately. I know it's relatively new and does some different things under the hood vs other/older vision models.

From the little bit of testing I've performed, it seems like server makes it fall back to the llava 1.5 vision, rather than using the 1.6 mode. When I check the total token counts, those always seem to be really low. This seems to affect any apps that use llama.cpp, like LM Studio and Jan. If I use llava-cli, with the same settings, the image alone encodes to 2880 tokens, which indicates that it's encoding the tiles correctly. Is there any way to make the server use llava-cli? Anyway to make llava-cli behave like a server? Am I doing something wrong?

I wrote a python program to batch caption folders of images, but I'm having to do it a really hacky way where it basically runs a command prompt behind the scenes, the python script captures the output of the window as a log, parses the log to trim out the non-response text, formats it, saves it, etc. The problem is that it's really annoying because it has to fully reload the model for each image.

For reference, this is how I'm running llava-cli:
llava-cli -m "C:\pathtomodel\llava-v1.6-mistral-7b.Q4_K_M.gguf" --mmproj "C:\pathtovision\mmproj-model-f16.gguf" --image "c:\pathtoimage\image.png" --temp 0.2 --n-gpu-layers 100 -n 2048 -c 4096 --mlock -p "<image>\nUSER:\nProvide a full description. Be as accurate and detailed as possible. \nASSISTANT:\n" >> log.txt (the >> log.txt is what you'd use if you were manually running it straight from a cmd prompt and not from some python script that can capture it for you)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions