-
-
Couldn't load subscription status.
- Fork 10.9k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
I am currently exploring the capabilities of the VLLM library and am interested in understanding its support for multi-modal inputs, particularly for models like MiniCPM-V2.6. I would like to know if VLLM is designed to handle multi-image and video inputs for such models.
Alternatives
- Model of Interest: MiniCPM-V2.6
- Types of Input: Multi-image and video
- Current Understanding:
- I have reviewed the documentation and initial examples provided with VLLM.
- It seems that both
multiple 'image_url' inputandlist value in image_urlis currently not supported. - However, I am not sure if it supports the processing of multiple images or videos as input to a model like MiniCPM-V2.6.
Questions
- Does VLLM support the integration of MiniCPM-V2.6 for processing multi-image and video inputs?
- If yes, could you provide an example or a guide on how to set up and use this feature?
- If not, are there any plans to extend VLLM's capabilities to support such inputs in the future?
Additional context
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request
