-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc/Feature]: Llava 1.5 in OpenAI compatible server #3873
Comments
I believe image input protocol has not been implemented indeed! This is more than documentation. |
The PR #3042 , which introduced the LLaVA feature, appears not to incorporate functionalities for the OpenAI-compatible server. Based on the documentation, it's feasible to extend the existing OpenAI-compatible server (See Image Input tab in following Link) to support this feature without the need to develop a dedicated server specifically for image inputs. However, it's important to note the distinctions between GPT-4V and LLaVA, particularly that LLaVA currently does not support multiple image inputs and the 'detail' parameter. According to OpenAI Documentation,
Example of uploading base64 encoded images
Please inform me if anyone is already working on implementing this feature. |
Based on Note: This change adds However, there is more work to be done:
UPDATE: I have created a new branch on my fork ( |
thankfully i only need llava 😄! |
I'll create a PR once more testing has been done. It would be great if we could compile a list of models that work/don't work with my implementation of this API. Currently, I assume that at most one image is provided since it appears that this is also the case for vLLM internals. How difficult would it be to support multiple images (possibly of different sizes)? |
Do there exist models support multiple image inputs? |
GPT-4's API supports multiple images, so I guess their model can already handle such input. Looking at open source, I found that MMICL explicitly supports multiple images per text prompt. They use |
I have opened a PR to support single-image input, with a POC using We can deal with multi-image input further down the line. NOTE: If you have previously checked out |
FYI - this is WIP and we plan to have it in the next major release. See our plan here #4194 (comment) |
Closing this as we merged #5237 |
📚 The doc issue
Hey vLLM team it looks like there is added support for llava 1.5 but there are no docs or examples on how to use it via the api server. Are there any reference examples? For using llava via the OpenAI sdk?
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: