Skip to content

[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6 #7546

@Dong148

Description

@Dong148

🚀 The feature, motivation and pitch

I am currently exploring the capabilities of the VLLM library and am interested in understanding its support for multi-modal inputs, particularly for models like MiniCPM-V2.6. I would like to know if VLLM is designed to handle multi-image and video inputs for such models.

Alternatives

  1. Model of Interest: MiniCPM-V2.6
  2. Types of Input: Multi-image and video
  3. Current Understanding:
    • I have reviewed the documentation and initial examples provided with VLLM.
  • It seems that both multiple 'image_url' input and list value in image_url is currently not supported.
  • However, I am not sure if it supports the processing of multiple images or videos as input to a model like MiniCPM-V2.6.

Questions

  1. Does VLLM support the integration of MiniCPM-V2.6 for processing multi-image and video inputs?
  2. If yes, could you provide an example or a guide on how to set up and use this feature?
  3. If not, are there any plans to extend VLLM's capabilities to support such inputs in the future?

Additional context

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions